[GitHub] tajo pull request: TAJO-1962: Add description for session variable...

eminency Thu, 05 Nov 2015 19:50:55 -0800

Github user eminency commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/848#discussion_r44102039
  
    --- Diff: tajo-docs/src/main/sphinx/tsql/variables.rst ---
    @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique 
session, and the client an
     Also, ``\unset key`` will unset the session variable named *key*.
     
     
    -Now, tajo provides the following session variables.
    -
    -* ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
    -* ``DIST_QUERY_JOIN_TASK_VOLUME``
    -* ``DIST_QUERY_SORT_TASK_VOLUME``
    -* ``DIST_QUERY_GROUPBY_TASK_VOLUME``
    -* ``DIST_QUERY_JOIN_PARTITION_VOLUME``
    -* ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
    -* ``DIST_QUERY_TABLE_PARTITION_VOLUME``
    -* ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
    -* ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
    -* ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
    -* ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
    -* ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
    -* ``MAX_OUTPUT_FILE_SIZE``
    -* ``CODEGEN``
    -* ``CLIENT_SESSION_EXPIRY_TIME``
    -* ``CLI_MAX_COLUMN``
    -* ``CLI_NULL_CHAR``
    -* ``CLI_PRINT_PAUSE_NUM_RECORDS``
    -* ``CLI_PRINT_PAUSE``
    -* ``CLI_PRINT_ERROR_TRACE``
    -* ``CLI_OUTPUT_FORMATTER_CLASS``
    -* ``CLI_ERROR_STOP``
    -* ``TIMEZONE``
    -* ``DATE_ORDER``
    -* ``TEXT_NULL``
    -* ``DEBUG_ENABLED``
    -* ``BEHAVIOR_ARITHMETIC_ABORT``
    -* ``RESULT_SET_FETCH_ROWNUM``
    +Currently, tajo provides the following session variables.
    +
    +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
    +
    +A threshold for non-cross joins. When a non-cross join query is executed 
with the broadcast join, the whole size of broadcasted tables won't exceed this 
threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 5120
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
    +
    +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
    +
    +A threshold for cross joins. When a cross join query is executed, the 
whole size of broadcasted tables won't exceed this threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 1024
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
    +
    +.. warning::
    +  In Tajo, the broadcast join is only the way to perform cross joins. 
Since the cross join is a very expensive operation, this value need to be tuned 
carefully.
    +
    +.. describe:: JOIN_TASK_INPUT_SIZE
    +
    +The repartition join is executed in two stages. When a join query is 
executed with the repartition join, this value indicates the amount of input 
data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the 
join query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set JOIN_TASK_INPUT_SIZE 64
    +
    +.. describe:: JOIN_PER_SHUFFLE_SIZE
    +
    +The repartition join is executed in two stages. When a join query is 
executed with the repartition join,
    +this value indicates the output size of each task at the first stage, 
which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 128
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set JOIN_PER_SHUFFLE_SIZE 128
    +
    +.. describe:: HASH_JOIN_SIZE_LIMIT
    +
    +This value provides the criterion to decide the algorithm to perform a 
join in a task.
    +If the input data is smaller than this value, join is performed with the 
in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set HASH_JOIN_SIZE_LIMIT 64
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
    +
    +This value provides the criterion to decide the algorithm to perform an 
inner join in a task.
    +If the input data is smaller than this value, the inner join is performed 
with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set INNER_HASH_JOIN_SIZE_LIMIT 64
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
    +
    +This value provides the criterion to decide the algorithm to perform an 
outer join in a task.
    +If the input data is smaller than this value, the outer join is performed 
with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set OUTER_HASH_JOIN_SIZE_LIMIT 64
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +.. describe:: JOIN_HASH_TABLE_SIZE
    +
    +The initial size of hash table for in-memory hash join.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set JOIN_HASH_TABLE_SIZE 100000
    +
    +.. describe:: SORT_TASK_INPUT_SIZE
    +
    +The sort operation is executed in two stages. When a sort query is 
executed, this value indicates the amount of input data processed by each task 
at the second stage.
    +As a result, it determines the degree of the parallel processing of the 
sort query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set SORT_TASK_INPUT_SIZE 64
    +
    +.. describe:: EXTSORT_BUFFER_SIZE
    +
    +A threshold to choose the sort algorithm. If the input data is larger than 
this threshold, the external sort algorithm is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 200
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set EXTSORT_BUFFER_SIZE 200
    +
    +.. describe:: SORT_LIST_SIZE
    +
    +The initial size of list for in-memory sort.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set SORT_LIST_SIZE 100000
    +
    +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
    +
    +A flag to enable the multi-level algorithm for distinct aggregation. If 
this value is set, 3-phase aggregation algorithm is used.
    +Otherwise, 2-phase aggregation algorithm is used.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set GROUPBY_MULTI_LEVEL_ENABLED true
    +
    +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
    +
    +The aggregation is executed in two stages. When an aggregation query is 
executed,
    +this value indicates the output size of each task at the first stage, 
which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 256
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set GROUPBY_PER_SHUFFLE_SIZE 256
    +
    +.. describe:: GROUPBY_TASK_INPUT_SIZE
    +
    +The aggregation operation is executed in two stages. When an aggregation 
query is executed, this value indicates the amount of input data processed by 
each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the 
aggregation query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set GROUPBY_TASK_INPUT_SIZE 64
    +
    +.. describe:: HASH_GROUPBY_SIZE_LIMIT
    +
    +This value provides the criterion to decide the algorithm to perform an 
aggregation in a task.
    +If the input data is smaller than this value, the aggregation is performed 
with the in-memory hash aggregation.
    +Otherwise, the sort-based aggregation is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set HASH_GROUPBY_SIZE_LIMIT 64
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +.. describe:: AGG_HASH_TABLE_SIZE
    +
    +The initial size of list for in-memory sort.
    --- End diff --
    
    Description is for 'list', but name is 'hash table'.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1962: Add description for session variable...

Reply via email to