[
https://issues.apache.org/jira/browse/TAJO-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993142#comment-14993142
]
ASF GitHub Bot commented on TAJO-1962:
--------------------------------------
Github user jihoonson commented on a diff in the pull request:
https://github.com/apache/tajo/pull/848#discussion_r44104757
--- Diff: tajo-docs/src/main/sphinx/tsql/variables.rst ---
@@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique
session, and the client an
Also, ``\unset key`` will unset the session variable named *key*.
-Now, tajo provides the following session variables.
-
-* ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
-* ``DIST_QUERY_JOIN_TASK_VOLUME``
-* ``DIST_QUERY_SORT_TASK_VOLUME``
-* ``DIST_QUERY_GROUPBY_TASK_VOLUME``
-* ``DIST_QUERY_JOIN_PARTITION_VOLUME``
-* ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
-* ``DIST_QUERY_TABLE_PARTITION_VOLUME``
-* ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
-* ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
-* ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
-* ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
-* ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
-* ``MAX_OUTPUT_FILE_SIZE``
-* ``CODEGEN``
-* ``CLIENT_SESSION_EXPIRY_TIME``
-* ``CLI_MAX_COLUMN``
-* ``CLI_NULL_CHAR``
-* ``CLI_PRINT_PAUSE_NUM_RECORDS``
-* ``CLI_PRINT_PAUSE``
-* ``CLI_PRINT_ERROR_TRACE``
-* ``CLI_OUTPUT_FORMATTER_CLASS``
-* ``CLI_ERROR_STOP``
-* ``TIMEZONE``
-* ``DATE_ORDER``
-* ``TEXT_NULL``
-* ``DEBUG_ENABLED``
-* ``BEHAVIOR_ARITHMETIC_ABORT``
-* ``RESULT_SET_FETCH_ROWNUM``
+Currently, tajo provides the following session variables.
+
+.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
+
+A threshold for non-cross joins. When a non-cross join query is executed
with the broadcast join, the whole size of broadcasted tables won't exceed this
threshold.
+
+ * Property value: Integer
+ * Unit: KB
+ * Default value: 5120
+ * Example
+
+.. code-block:: sh
+
+ \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
+
+.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
+
+A threshold for cross joins. When a cross join query is executed, the
whole size of broadcasted tables won't exceed this threshold.
+
+ * Property value: Integer
+ * Unit: KB
+ * Default value: 1024
+ * Example
+
+.. code-block:: sh
+
+ \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
+
+.. warning::
+ In Tajo, the broadcast join is only the way to perform cross joins.
Since the cross join is a very expensive operation, this value need to be tuned
carefully.
+
+.. describe:: JOIN_TASK_INPUT_SIZE
+
+The repartition join is executed in two stages. When a join query is
executed with the repartition join, this value indicates the amount of input
data processed by each task at the second stage.
+As a result, it determines the degree of the parallel processing of the
join query.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set JOIN_TASK_INPUT_SIZE 64
+
+.. describe:: JOIN_PER_SHUFFLE_SIZE
+
+The repartition join is executed in two stages. When a join query is
executed with the repartition join,
+this value indicates the output size of each task at the first stage,
which determines the number of partitions to be shuffled between two stages.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 128
+ * Example
+
+.. code-block:: sh
+
+ \set JOIN_PER_SHUFFLE_SIZE 128
+
+.. describe:: HASH_JOIN_SIZE_LIMIT
+
+This value provides the criterion to decide the algorithm to perform a
join in a task.
+If the input data is smaller than this value, join is performed with the
in-memory hash join.
+Otherwise, the sort-merge join is used.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set HASH_JOIN_SIZE_LIMIT 64
+
+.. warning::
+ This value is the size of the input stored on file systems. So, when the
input data is loaded into JVM heap,
+ its actual size is usually much larger than the configured value, which
means that too large threshold can cause unexpected OutOfMemory errors.
+ This value should be tuned carefully.
+
+.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
+
+This value provides the criterion to decide the algorithm to perform an
inner join in a task.
+If the input data is smaller than this value, the inner join is performed
with the in-memory hash join.
+Otherwise, the sort-merge join is used.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set INNER_HASH_JOIN_SIZE_LIMIT 64
+
+.. warning::
+ This value is the size of the input stored on file systems. So, when the
input data is loaded into JVM heap,
+ its actual size is usually much larger than the configured value, which
means that too large threshold can cause unexpected OutOfMemory errors.
+ This value should be tuned carefully.
+
+.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
+
+This value provides the criterion to decide the algorithm to perform an
outer join in a task.
+If the input data is smaller than this value, the outer join is performed
with the in-memory hash join.
+Otherwise, the sort-merge join is used.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set OUTER_HASH_JOIN_SIZE_LIMIT 64
+
+.. warning::
+ This value is the size of the input stored on file systems. So, when the
input data is loaded into JVM heap,
+ its actual size is usually much larger than the configured value, which
means that too large threshold can cause unexpected OutOfMemory errors.
+ This value should be tuned carefully.
+
+.. describe:: JOIN_HASH_TABLE_SIZE
+
+The initial size of hash table for in-memory hash join.
+
+ * Property value: Integer
+ * Default value: 100000
+ * Example
+
+.. code-block:: sh
+
+ \set JOIN_HASH_TABLE_SIZE 100000
+
+.. describe:: SORT_TASK_INPUT_SIZE
+
+The sort operation is executed in two stages. When a sort query is
executed, this value indicates the amount of input data processed by each task
at the second stage.
+As a result, it determines the degree of the parallel processing of the
sort query.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set SORT_TASK_INPUT_SIZE 64
+
+.. describe:: EXTSORT_BUFFER_SIZE
+
+A threshold to choose the sort algorithm. If the input data is larger than
this threshold, the external sort algorithm is used.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 200
+ * Example
+
+.. code-block:: sh
+
+ \set EXTSORT_BUFFER_SIZE 200
+
+.. describe:: SORT_LIST_SIZE
+
+The initial size of list for in-memory sort.
+
+ * Property value: Integer
+ * Default value: 100000
+ * Example
+
+.. code-block:: sh
+
+ \set SORT_LIST_SIZE 100000
+
+.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
+
+A flag to enable the multi-level algorithm for distinct aggregation. If
this value is set, 3-phase aggregation algorithm is used.
+Otherwise, 2-phase aggregation algorithm is used.
+
+ * Property value: Boolean
+ * Default value: true
+ * Example
+
+.. code-block:: sh
+
+ \set GROUPBY_MULTI_LEVEL_ENABLED true
+
+.. describe:: GROUPBY_PER_SHUFFLE_SIZE
+
+The aggregation is executed in two stages. When an aggregation query is
executed,
+this value indicates the output size of each task at the first stage,
which determines the number of partitions to be shuffled between two stages.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 256
+ * Example
+
+.. code-block:: sh
+
+ \set GROUPBY_PER_SHUFFLE_SIZE 256
+
+.. describe:: GROUPBY_TASK_INPUT_SIZE
+
+The aggregation operation is executed in two stages. When an aggregation
query is executed, this value indicates the amount of input data processed by
each task at the second stage.
+As a result, it determines the degree of the parallel processing of the
aggregation query.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set GROUPBY_TASK_INPUT_SIZE 64
+
+.. describe:: HASH_GROUPBY_SIZE_LIMIT
+
+This value provides the criterion to decide the algorithm to perform an
aggregation in a task.
+If the input data is smaller than this value, the aggregation is performed
with the in-memory hash aggregation.
+Otherwise, the sort-based aggregation is used.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set HASH_GROUPBY_SIZE_LIMIT 64
+
+.. warning::
+ This value is the size of the input stored on file systems. So, when the
input data is loaded into JVM heap,
+ its actual size is usually much larger than the configured value, which
means that too large threshold can cause unexpected OutOfMemory errors.
+ This value should be tuned carefully.
+
+.. describe:: AGG_HASH_TABLE_SIZE
+
+The initial size of list for in-memory sort.
--- End diff --
My mistake. Thanks.
> Add description for session variables
> -------------------------------------
>
> Key: TAJO-1962
> URL: https://issues.apache.org/jira/browse/TAJO-1962
> Project: Tajo
> Issue Type: Task
> Components: Documentation
> Reporter: Jihoon Son
> Assignee: Jihoon Son
> Fix For: 0.12.0, 0.11.1
>
>
> Our document (http://tajo.apache.org/docs/devel/tsql/variables.html) only
> shows the list of session variables. It would be much helpful if we add some
> description.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)