dinoocch commented on a change in pull request #3975: ReadTheDocs documentation for Table Configs, Monitoring, and Deployment URL: https://github.com/apache/incubator-pinot/pull/3975#discussion_r266704832
########## File path: docs/tableconfig_schema.rst ########## @@ -0,0 +1,172 @@ +.. +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the NOTICE file +.. distributed with this work for additional information +.. regarding copyright ownership. The ASF licenses this file +.. to you under the Apache License, Version 2.0 (the +.. "License"); you may not use this file except in compliance +.. with the License. You may obtain a copy of the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, +.. software distributed under the License is distributed on an +.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +.. KIND, either express or implied. See the License for the +.. specific language governing permissions and limitations +.. under the License. +.. + +Table Config +======================= + +Table Config +------------- + +Introduction to table configs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Using tables is how Pinot serves and organizes data. There are many settings in the table config which will influence how Pinot operates. The first and most significant distinction is using an offline versus a realtime table. + +An offline table in Pinot is used to host data which might be periodically uploaded - daily, weekly, etc. A realtime table, however, is used to consume data from incoming data streams and serve this data in a near-realtime manner. This might also be referred to as nearline or just plain 'realtime'. + +In this section a sample table configuration will be shown and all sections will be explained and if applicable have appropriate sections linked to for further explanation of those corresponding Pinot features. + +Sample table config and descriptions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A sample table config is shown below which has sub-sections collasped. The sub sections will be described individually in following sections. + +The ``tableName`` should only contain alpha-numeric characters, hyphens ('-'), or underscores ('_'). Though using a double-underscore ('__') is not allowed and reserved for other features within Pinot. + +The ``tableType`` will indicate the type of the table, ``OFFLINE`` or ``REALTIME``. There are some settings specific to each type. This differentiation will be called out below as options are explained. + +.. code-block:: none + + { + "tableName": "myPinotTable", + "tableType": "REALTIME" + "segmentsConfig": {}, + "tableIndexConfig": {}, + "tenants": {}, + "routing": {}, + "task": {}, + "metadata": {} + } + +Segments Config Section +~~~~~~~~~~~~~~~~~~~~~~~ + +The ``segmentsConfig`` section has information about configuring + +* Segment Retention - with the ``retentionTimeUnit`` and ``retentionTimeValue`` options. +* Segment Push - Using ``segmentPushFrequency`` to indicate how frequently segments are uploaded. +* Replication - Using ``replication`` for offline tables and ``replicasPerPartition`` for realtime tables will indicate how many replicas of data will be present. +* Schema - The name of the schema that's been uploaded to the controller +* Time column - using ``timeColumnName`` and ``timeType``, this must match what's configured in the preceeding schema +* Segment assignment strategy - Described more on the page `Customizing Pinot <customizations.html#segment-assignment-strategies>`_ + + +.. code-block:: none + + "segmentsConfig": { + "retentionTimeUnit": "DAYS", + "retentionTimeValue": "5", + "segmentPushFrequency": "daily", + "segmentPushType": "APPEND", + "replication": "3", + "replicasPerPartition": "3", + "schemaName": "ugcGestureEvents", + "timeColumnName": "daysSinceEpoch", + "timeType": "DAYS", + "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy" + }, + +Table Index Config Section +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``tableIndexConfig`` section has information about how to configure: + +* Inverted Indexes - Using the ``invertedIndexColumns`` to specify a list of real column names as specified in the schema. +* No Dictionary Columns - Using the ``noDictionaryColumns`` to specify a list of real column names as specified in the schema. Column names present will NOT have a dictionary created. More info on indexes can be found on the `Index Techniques <index_techniques.html>`_ page. +* Sorted Column - Using the ``sortedColumn`` to specify a list of real column names as specified in the schema. +* Aggregate Metrics - Using ``aggregateMetrics`` set to ``"true"`` to enable the feature and ``"false"`` to disable. This feature is only available on REALTIME tables. +* Data Partitioning Strategy using the ``segmentPartitionConfig`` to configure based on documentation in the `Data Partitioning Strategies <customizations.html#data-partitioning-strategies>`_ section. +* Load Mode - Using ``loadMode`` either ``"MMAP"`` or ``"HEAP"`` can be configured. +* Lazy Loading of Data - Using ``lazyLoad`` this feature can be enabled by setting it to ``"true"`` and disabled by setting to ``"false"`` +* Segment Format Version - Using the ``segmentFormatVersion`` field, this should always be set to ``"v3"``. +* Stream Configs - This section is where the bulk of the settings specific to only REALTIME tables are found. These options are explained in detail in the `Pluggable Streams <pluggable_streams.html#pluggable-streams>`_ page. + +.. code-block:: none + + "tableIndexConfig": { + "invertedIndexColumns": [], + "noDictionaryColumns" : [], + "sortedColumn": [ + "nameOfSortedColumn" + ], + "noDictionaryColumns": [ + "nameOfNoDictionaryColumn" + ], + "aggregateMetrics": "true", + "segmentPartitionConfig": { + "columnPartitionMap": { + "contentId": { + "functionName": "murmur", + "numPartitions": 32 + } + } + }, + "loadMode": "MMAP", + "lazyLoad": "false", + "segmentFormatVersion": "v3", + "streamConfigs": {} + }, + +Tenants Section +~~~~~~~~~~~~~~~ + +The ``tenants`` section has two config fields in it. These fields are used to configure which tenants are used within Helix. Review comment: two -> two main tagOverrideConfig is here as well ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org