[
https://issues.apache.org/jira/browse/TAJO-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993143#comment-14993143
]
ASF GitHub Bot commented on TAJO-1962:
--------------------------------------
Github user jihoonson commented on a diff in the pull request:
https://github.com/apache/tajo/pull/848#discussion_r44104763
--- Diff: tajo-docs/src/main/sphinx/tsql/variables.rst ---
@@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique
session, and the client an
Also, ``\unset key`` will unset the session variable named *key*.
-Now, tajo provides the following session variables.
-
-* ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
-* ``DIST_QUERY_JOIN_TASK_VOLUME``
-* ``DIST_QUERY_SORT_TASK_VOLUME``
-* ``DIST_QUERY_GROUPBY_TASK_VOLUME``
-* ``DIST_QUERY_JOIN_PARTITION_VOLUME``
-* ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
-* ``DIST_QUERY_TABLE_PARTITION_VOLUME``
-* ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
-* ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
-* ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
-* ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
-* ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
-* ``MAX_OUTPUT_FILE_SIZE``
-* ``CODEGEN``
-* ``CLIENT_SESSION_EXPIRY_TIME``
-* ``CLI_MAX_COLUMN``
-* ``CLI_NULL_CHAR``
-* ``CLI_PRINT_PAUSE_NUM_RECORDS``
-* ``CLI_PRINT_PAUSE``
-* ``CLI_PRINT_ERROR_TRACE``
-* ``CLI_OUTPUT_FORMATTER_CLASS``
-* ``CLI_ERROR_STOP``
-* ``TIMEZONE``
-* ``DATE_ORDER``
-* ``TEXT_NULL``
-* ``DEBUG_ENABLED``
-* ``BEHAVIOR_ARITHMETIC_ABORT``
-* ``RESULT_SET_FETCH_ROWNUM``
+Currently, tajo provides the following session variables.
+
+.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
+
+A threshold for non-cross joins. When a non-cross join query is executed
with the broadcast join, the whole size of broadcasted tables won't exceed this
threshold.
+
+ * Property value: Integer
+ * Unit: KB
+ * Default value: 5120
+ * Example
+
+.. code-block:: sh
+
+ \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
+
+.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
+
+A threshold for cross joins. When a cross join query is executed, the
whole size of broadcasted tables won't exceed this threshold.
+
+ * Property value: Integer
+ * Unit: KB
+ * Default value: 1024
+ * Example
+
+.. code-block:: sh
+
+ \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
+
+.. warning::
+ In Tajo, the broadcast join is only the way to perform cross joins.
Since the cross join is a very expensive operation, this value need to be tuned
carefully.
+
+.. describe:: JOIN_TASK_INPUT_SIZE
+
+The repartition join is executed in two stages. When a join query is
executed with the repartition join, this value indicates the amount of input
data processed by each task at the second stage.
+As a result, it determines the degree of the parallel processing of the
join query.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set JOIN_TASK_INPUT_SIZE 64
+
+.. describe:: JOIN_PER_SHUFFLE_SIZE
+
+The repartition join is executed in two stages. When a join query is
executed with the repartition join,
+this value indicates the output size of each task at the first stage,
which determines the number of partitions to be shuffled between two stages.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 128
+ * Example
+
+.. code-block:: sh
+
+ \set JOIN_PER_SHUFFLE_SIZE 128
+
+.. describe:: HASH_JOIN_SIZE_LIMIT
+
+This value provides the criterion to decide the algorithm to perform a
join in a task.
+If the input data is smaller than this value, join is performed with the
in-memory hash join.
+Otherwise, the sort-merge join is used.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set HASH_JOIN_SIZE_LIMIT 64
+
+.. warning::
+ This value is the size of the input stored on file systems. So, when the
input data is loaded into JVM heap,
+ its actual size is usually much larger than the configured value, which
means that too large threshold can cause unexpected OutOfMemory errors.
+ This value should be tuned carefully.
+
+.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
+
+This value provides the criterion to decide the algorithm to perform an
inner join in a task.
+If the input data is smaller than this value, the inner join is performed
with the in-memory hash join.
+Otherwise, the sort-merge join is used.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set INNER_HASH_JOIN_SIZE_LIMIT 64
+
+.. warning::
+ This value is the size of the input stored on file systems. So, when the
input data is loaded into JVM heap,
+ its actual size is usually much larger than the configured value, which
means that too large threshold can cause unexpected OutOfMemory errors.
+ This value should be tuned carefully.
+
+.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
+
+This value provides the criterion to decide the algorithm to perform an
outer join in a task.
+If the input data is smaller than this value, the outer join is performed
with the in-memory hash join.
+Otherwise, the sort-merge join is used.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set OUTER_HASH_JOIN_SIZE_LIMIT 64
+
+.. warning::
+ This value is the size of the input stored on file systems. So, when the
input data is loaded into JVM heap,
+ its actual size is usually much larger than the configured value, which
means that too large threshold can cause unexpected OutOfMemory errors.
+ This value should be tuned carefully.
+
+.. describe:: JOIN_HASH_TABLE_SIZE
+
+The initial size of hash table for in-memory hash join.
+
+ * Property value: Integer
+ * Default value: 100000
+ * Example
+
+.. code-block:: sh
+
+ \set JOIN_HASH_TABLE_SIZE 100000
+
+.. describe:: SORT_TASK_INPUT_SIZE
+
+The sort operation is executed in two stages. When a sort query is
executed, this value indicates the amount of input data processed by each task
at the second stage.
+As a result, it determines the degree of the parallel processing of the
sort query.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set SORT_TASK_INPUT_SIZE 64
+
+.. describe:: EXTSORT_BUFFER_SIZE
+
+A threshold to choose the sort algorithm. If the input data is larger than
this threshold, the external sort algorithm is used.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 200
+ * Example
+
+.. code-block:: sh
+
+ \set EXTSORT_BUFFER_SIZE 200
+
+.. describe:: SORT_LIST_SIZE
+
+The initial size of list for in-memory sort.
+
+ * Property value: Integer
+ * Default value: 100000
+ * Example
+
+.. code-block:: sh
+
+ \set SORT_LIST_SIZE 100000
+
+.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
+
+A flag to enable the multi-level algorithm for distinct aggregation. If
this value is set, 3-phase aggregation algorithm is used.
+Otherwise, 2-phase aggregation algorithm is used.
+
+ * Property value: Boolean
+ * Default value: true
+ * Example
+
+.. code-block:: sh
+
+ \set GROUPBY_MULTI_LEVEL_ENABLED true
+
+.. describe:: GROUPBY_PER_SHUFFLE_SIZE
+
+The aggregation is executed in two stages. When an aggregation query is
executed,
+this value indicates the output size of each task at the first stage,
which determines the number of partitions to be shuffled between two stages.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 256
+ * Example
+
+.. code-block:: sh
+
+ \set GROUPBY_PER_SHUFFLE_SIZE 256
+
+.. describe:: GROUPBY_TASK_INPUT_SIZE
+
+The aggregation operation is executed in two stages. When an aggregation
query is executed, this value indicates the amount of input data processed by
each task at the second stage.
+As a result, it determines the degree of the parallel processing of the
aggregation query.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set GROUPBY_TASK_INPUT_SIZE 64
+
+.. describe:: HASH_GROUPBY_SIZE_LIMIT
+
+This value provides the criterion to decide the algorithm to perform an
aggregation in a task.
+If the input data is smaller than this value, the aggregation is performed
with the in-memory hash aggregation.
+Otherwise, the sort-based aggregation is used.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set HASH_GROUPBY_SIZE_LIMIT 64
+
+.. warning::
+ This value is the size of the input stored on file systems. So, when the
input data is loaded into JVM heap,
+ its actual size is usually much larger than the configured value, which
means that too large threshold can cause unexpected OutOfMemory errors.
+ This value should be tuned carefully.
+
+.. describe:: AGG_HASH_TABLE_SIZE
+
+The initial size of list for in-memory sort.
+
+ * Property value: Integer
+ * Default value: 10000
+ * Example
+
+.. code-block:: sh
+
+ \set AGG_HASH_TABLE_SIZE 10000
+
+.. describe:: TIMEZONE
+
+Refer to :doc:`/time_zone`.
+
+ * Property value: Time zone id
+ * Default value: Default time zone of JVM
+ * Example
+
+.. code-block:: sh
+
+ \set TIMEZONE GMT+9
+
+.. describe:: DATE_ORDER
+
+Date order specification.
+
+ * Property value: One of YMD, DMY, MDY.
+ * Default value: YMD
+ * Example
+
+.. code-block:: sh
+
+ \set DATE_ORDER YMD
+
+.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED
+
+If this value is true, a partitioned table is overwritten even if a
subquery leads to no result. Otherwise, the table data will be kept if there is
no result.
+
+ * Property value: Boolean
+ * Default value: false
+ * Example
+
+.. code-block:: sh
+
+ \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false
+
+.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE
+
+In Tajo, storing a partition table is executed in two stages.
+This value indicates the output size of a task of the former stage, which
determines the number of partitions to be shuffled between two stages.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 256
+ * Example
+
+.. code-block:: sh
+
+ \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256
+
+.. describe:: ARITHABORT
+
+A flag to indicate how to handle the errors caused by invalid arithmetic
operations. If true, a running query will be terminated with an overflow or a
divide-by-zero.
+
+ * Property value: Boolean
+ * Default value: false
+ * Example
+
+.. code-block:: sh
+
+ \set ARITHABORT false
+
+.. describe:: MAX_OUTPUT_FILE_SIZE
+
+Maximum per-output file size. 0 means infinite.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 0
+ * Example
+
+.. code-block:: sh
+
+ \set MAX_OUTPUT_FILE_SIZE 0
+
+.. describe:: SESSION_EXPIRY_TIME
+
+Session expiry time.
+
+ * Property value: Integer
+ * Unit: seconds
+ * Default value: 3600
+ * Example
+
+.. code-block:: sh
+
+ \set SESSION_EXPIRY_TIME 3600
+
+.. describe:: CLI_COLUMNS
+
+Sets the width for the wrapped format.
+
+ * Property value: Integer
+ * Default value: 120
+ * Example
+
+.. code-block:: sh
+
+ \set CLI_COLUMNS 120
+
+.. describe:: CLI_NULL_CHAR
+
+Sets the string to be printed in place of a null value.
+
+ * Property value: String
+ * Default value: ''
+ * Example
+
+.. code-block:: sh
+
+ \set CLI_NULL_CHAR ''
+
+.. describe:: CLI_PAGE_ROWS
+
+Sets the number of rows for paging.
+
+ * Property value: Integer
+ * Default value: 100
+ * Example
+
+.. code-block:: sh
+
+ \set CLI_PAGE_ROWS 100
+
+.. describe:: CLI_PAGING_ENABLED
+
+Enable paging of result display.
+
+ * Property value: Boolean
+ * Default value: true
+ * Example
+
+.. code-block:: sh
+
+ \set CLI_PAGING_ENABLED true
+
+.. describe:: CLI_DISPLAY_ERROR_TRACE
+
+Enable display of error trace.
+
+ * Property value: Boolean
+ * Default value: true
+ * Example
+
+.. code-block:: sh
+
+ \set CLI_DISPLAY_ERROR_TRACE true
+
+.. describe:: CLI_FORMATTER_CLASS
+
+Sets the output format class to display results.
+
+ * Property value: Class name
+ * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
+ * Example
+
+.. code-block:: sh
+
+ \set CLI_FORMATTER_CLASS
org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
+
+.. describe:: ON_ERROR_STOP
+
+tsql will exist if an error occurs.
+
+ * Property value: Boolean
+ * Default value: false
+ * Example
+
+.. code-block:: sh
+
+ \set ON_ERROR_STOP false
+
+.. describe:: NULL_CHAR
+
+Null char of text file output.
+
+ * Property value: String
+ * Default value: '\\N'
+ * Example
+
+.. code-block:: sh
+
+ \set NULL_CHAR '\\N'
+
+.. describe:: DEBUG_ENABLED
+
+Debug mode enabled.
--- End diff --
Thanks for comment. I fixed.
> Add description for session variables
> -------------------------------------
>
> Key: TAJO-1962
> URL: https://issues.apache.org/jira/browse/TAJO-1962
> Project: Tajo
> Issue Type: Task
> Components: Documentation
> Reporter: Jihoon Son
> Assignee: Jihoon Son
> Fix For: 0.12.0, 0.11.1
>
>
> Our document (http://tajo.apache.org/docs/devel/tsql/variables.html) only
> shows the list of session variables. It would be much helpful if we add some
> description.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)