[GitHub] tajo pull request: TAJO-1962: Add description for session variable...

jihoonson Thu, 05 Nov 2015 21:12:06 -0800

Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/848#discussion_r44104763
  
    --- Diff: tajo-docs/src/main/sphinx/tsql/variables.rst ---
    @@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique 
session, and the client an
     Also, ``\unset key`` will unset the session variable named *key*.
     
     
    -Now, tajo provides the following session variables.
    -
    -* ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
    -* ``DIST_QUERY_JOIN_TASK_VOLUME``
    -* ``DIST_QUERY_SORT_TASK_VOLUME``
    -* ``DIST_QUERY_GROUPBY_TASK_VOLUME``
    -* ``DIST_QUERY_JOIN_PARTITION_VOLUME``
    -* ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
    -* ``DIST_QUERY_TABLE_PARTITION_VOLUME``
    -* ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
    -* ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
    -* ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
    -* ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
    -* ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
    -* ``MAX_OUTPUT_FILE_SIZE``
    -* ``CODEGEN``
    -* ``CLIENT_SESSION_EXPIRY_TIME``
    -* ``CLI_MAX_COLUMN``
    -* ``CLI_NULL_CHAR``
    -* ``CLI_PRINT_PAUSE_NUM_RECORDS``
    -* ``CLI_PRINT_PAUSE``
    -* ``CLI_PRINT_ERROR_TRACE``
    -* ``CLI_OUTPUT_FORMATTER_CLASS``
    -* ``CLI_ERROR_STOP``
    -* ``TIMEZONE``
    -* ``DATE_ORDER``
    -* ``TEXT_NULL``
    -* ``DEBUG_ENABLED``
    -* ``BEHAVIOR_ARITHMETIC_ABORT``
    -* ``RESULT_SET_FETCH_ROWNUM``
    +Currently, tajo provides the following session variables.
    +
    +.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
    +
    +A threshold for non-cross joins. When a non-cross join query is executed 
with the broadcast join, the whole size of broadcasted tables won't exceed this 
threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 5120
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
    +
    +.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
    +
    +A threshold for cross joins. When a cross join query is executed, the 
whole size of broadcasted tables won't exceed this threshold.
    +
    +  * Property value: Integer
    +  * Unit: KB
    +  * Default value: 1024
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
    +
    +.. warning::
    +  In Tajo, the broadcast join is only the way to perform cross joins. 
Since the cross join is a very expensive operation, this value need to be tuned 
carefully.
    +
    +.. describe:: JOIN_TASK_INPUT_SIZE
    +
    +The repartition join is executed in two stages. When a join query is 
executed with the repartition join, this value indicates the amount of input 
data processed by each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the 
join query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set JOIN_TASK_INPUT_SIZE 64
    +
    +.. describe:: JOIN_PER_SHUFFLE_SIZE
    +
    +The repartition join is executed in two stages. When a join query is 
executed with the repartition join,
    +this value indicates the output size of each task at the first stage, 
which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 128
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set JOIN_PER_SHUFFLE_SIZE 128
    +
    +.. describe:: HASH_JOIN_SIZE_LIMIT
    +
    +This value provides the criterion to decide the algorithm to perform a 
join in a task.
    +If the input data is smaller than this value, join is performed with the 
in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set HASH_JOIN_SIZE_LIMIT 64
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +.. describe:: INNER_HASH_JOIN_SIZE_LIMIT
    +
    +This value provides the criterion to decide the algorithm to perform an 
inner join in a task.
    +If the input data is smaller than this value, the inner join is performed 
with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set INNER_HASH_JOIN_SIZE_LIMIT 64
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +.. describe:: OUTER_HASH_JOIN_SIZE_LIMIT
    +
    +This value provides the criterion to decide the algorithm to perform an 
outer join in a task.
    +If the input data is smaller than this value, the outer join is performed 
with the in-memory hash join.
    +Otherwise, the sort-merge join is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set OUTER_HASH_JOIN_SIZE_LIMIT 64
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +.. describe:: JOIN_HASH_TABLE_SIZE
    +
    +The initial size of hash table for in-memory hash join.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set JOIN_HASH_TABLE_SIZE 100000
    +
    +.. describe:: SORT_TASK_INPUT_SIZE
    +
    +The sort operation is executed in two stages. When a sort query is 
executed, this value indicates the amount of input data processed by each task 
at the second stage.
    +As a result, it determines the degree of the parallel processing of the 
sort query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set SORT_TASK_INPUT_SIZE 64
    +
    +.. describe:: EXTSORT_BUFFER_SIZE
    +
    +A threshold to choose the sort algorithm. If the input data is larger than 
this threshold, the external sort algorithm is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 200
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set EXTSORT_BUFFER_SIZE 200
    +
    +.. describe:: SORT_LIST_SIZE
    +
    +The initial size of list for in-memory sort.
    +
    +  * Property value: Integer
    +  * Default value: 100000
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set SORT_LIST_SIZE 100000
    +
    +.. describe:: GROUPBY_MULTI_LEVEL_ENABLED
    +
    +A flag to enable the multi-level algorithm for distinct aggregation. If 
this value is set, 3-phase aggregation algorithm is used.
    +Otherwise, 2-phase aggregation algorithm is used.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set GROUPBY_MULTI_LEVEL_ENABLED true
    +
    +.. describe:: GROUPBY_PER_SHUFFLE_SIZE
    +
    +The aggregation is executed in two stages. When an aggregation query is 
executed,
    +this value indicates the output size of each task at the first stage, 
which determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 256
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set GROUPBY_PER_SHUFFLE_SIZE 256
    +
    +.. describe:: GROUPBY_TASK_INPUT_SIZE
    +
    +The aggregation operation is executed in two stages. When an aggregation 
query is executed, this value indicates the amount of input data processed by 
each task at the second stage.
    +As a result, it determines the degree of the parallel processing of the 
aggregation query.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set GROUPBY_TASK_INPUT_SIZE 64
    +
    +.. describe:: HASH_GROUPBY_SIZE_LIMIT
    +
    +This value provides the criterion to decide the algorithm to perform an 
aggregation in a task.
    +If the input data is smaller than this value, the aggregation is performed 
with the in-memory hash aggregation.
    +Otherwise, the sort-based aggregation is used.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 64
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set HASH_GROUPBY_SIZE_LIMIT 64
    +
    +.. warning::
    +  This value is the size of the input stored on file systems. So, when the 
input data is loaded into JVM heap,
    +  its actual size is usually much larger than the configured value, which 
means that too large threshold can cause unexpected OutOfMemory errors.
    +  This value should be tuned carefully.
    +
    +.. describe:: AGG_HASH_TABLE_SIZE
    +
    +The initial size of list for in-memory sort.
    +
    +  * Property value: Integer
    +  * Default value: 10000
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set AGG_HASH_TABLE_SIZE 10000
    +
    +.. describe:: TIMEZONE
    +
    +Refer to :doc:`/time_zone`.
    +
    +  * Property value: Time zone id
    +  * Default value: Default time zone of JVM
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set TIMEZONE GMT+9
    +
    +.. describe:: DATE_ORDER
    +
    +Date order specification.
    +
    +  * Property value: One of YMD, DMY, MDY.
    +  * Default value: YMD
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set DATE_ORDER YMD
    +
    +.. describe:: PARTITION_NO_RESULT_OVERWRITE_ENABLED
    +
    +If this value is true, a partitioned table is overwritten even if a 
subquery leads to no result. Otherwise, the table data will be kept if there is 
no result.
    +
    +  * Property value: Boolean
    +  * Default value: false
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set PARTITION_NO_RESULT_OVERWRITE_ENABLED false
    +
    +.. describe:: TABLE_PARTITION_PER_SHUFFLE_SIZE
    +
    +In Tajo, storing a partition table is executed in two stages.
    +This value indicates the output size of a task of the former stage, which 
determines the number of partitions to be shuffled between two stages.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 256
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set TABLE_PARTITION_PER_SHUFFLE_SIZE 256
    +
    +.. describe:: ARITHABORT
    +
    +A flag to indicate how to handle the errors caused by invalid arithmetic 
operations. If true, a running query will be terminated with an overflow or a 
divide-by-zero.
    +
    +  * Property value: Boolean
    +  * Default value: false
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set ARITHABORT false
    +
    +.. describe:: MAX_OUTPUT_FILE_SIZE
    +
    +Maximum per-output file size. 0 means infinite.
    +
    +  * Property value: Integer
    +  * Unit: MB
    +  * Default value: 0
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set MAX_OUTPUT_FILE_SIZE 0
    +
    +.. describe:: SESSION_EXPIRY_TIME
    +
    +Session expiry time.
    +
    +  * Property value: Integer
    +  * Unit: seconds
    +  * Default value: 3600
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set SESSION_EXPIRY_TIME 3600
    +
    +.. describe:: CLI_COLUMNS
    +
    +Sets the width for the wrapped format.
    +
    +  * Property value: Integer
    +  * Default value: 120
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set CLI_COLUMNS 120
    +
    +.. describe:: CLI_NULL_CHAR
    +
    +Sets the string to be printed in place of a null value.
    +
    +  * Property value: String
    +  * Default value: ''
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set CLI_NULL_CHAR ''
    +
    +.. describe:: CLI_PAGE_ROWS
    +
    +Sets the number of rows for paging.
    +
    +  * Property value: Integer
    +  * Default value: 100
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set CLI_PAGE_ROWS 100
    +
    +.. describe:: CLI_PAGING_ENABLED
    +
    +Enable paging of result display.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set CLI_PAGING_ENABLED true
    +
    +.. describe:: CLI_DISPLAY_ERROR_TRACE
    +
    +Enable display of error trace.
    +
    +  * Property value: Boolean
    +  * Default value: true
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set CLI_DISPLAY_ERROR_TRACE true
    +
    +.. describe:: CLI_FORMATTER_CLASS
    +
    +Sets the output format class to display results.
    +
    +  * Property value: Class name
    +  * Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set CLI_FORMATTER_CLASS 
org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
    +
    +.. describe:: ON_ERROR_STOP
    +
    +tsql will exist if an error occurs.
    +
    +  * Property value: Boolean
    +  * Default value: false
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set ON_ERROR_STOP false
    +
    +.. describe:: NULL_CHAR
    +
    +Null char of text file output.
    +
    +  * Property value: String
    +  * Default value: '\\N'
    +  * Example
    +
    +.. code-block:: sh
    +
    +  \set NULL_CHAR '\\N'
    +
    +.. describe:: DEBUG_ENABLED
    +
    +Debug mode enabled.
    --- End diff --
    
    Thanks for comment. I fixed.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1962: Add description for session variable...

Reply via email to