Github user hyunsik commented on a diff in the pull request:
https://github.com/apache/tajo/pull/848#discussion_r44102768
--- Diff: tajo-docs/src/main/sphinx/tsql/variables.rst ---
@@ -28,35 +30,456 @@ Each client connection to TajoMaster creates a unique
session, and the client an
Also, ``\unset key`` will unset the session variable named *key*.
-Now, tajo provides the following session variables.
-
-* ``DIST_QUERY_BROADCAST_JOIN_THRESHOLD``
-* ``DIST_QUERY_JOIN_TASK_VOLUME``
-* ``DIST_QUERY_SORT_TASK_VOLUME``
-* ``DIST_QUERY_GROUPBY_TASK_VOLUME``
-* ``DIST_QUERY_JOIN_PARTITION_VOLUME``
-* ``DIST_QUERY_GROUPBY_PARTITION_VOLUME``
-* ``DIST_QUERY_TABLE_PARTITION_VOLUME``
-* ``EXECUTOR_EXTERNAL_SORT_BUFFER_SIZE``
-* ``EXECUTOR_HASH_JOIN_SIZE_THRESHOLD``
-* ``EXECUTOR_INNER_HASH_JOIN_SIZE_THRESHOLD``
-* ``EXECUTOR_OUTER_HASH_JOIN_SIZE_THRESHOLD``
-* ``EXECUTOR_GROUPBY_INMEMORY_HASH_THRESHOLD``
-* ``MAX_OUTPUT_FILE_SIZE``
-* ``CODEGEN``
-* ``CLIENT_SESSION_EXPIRY_TIME``
-* ``CLI_MAX_COLUMN``
-* ``CLI_NULL_CHAR``
-* ``CLI_PRINT_PAUSE_NUM_RECORDS``
-* ``CLI_PRINT_PAUSE``
-* ``CLI_PRINT_ERROR_TRACE``
-* ``CLI_OUTPUT_FORMATTER_CLASS``
-* ``CLI_ERROR_STOP``
-* ``TIMEZONE``
-* ``DATE_ORDER``
-* ``TEXT_NULL``
-* ``DEBUG_ENABLED``
-* ``BEHAVIOR_ARITHMETIC_ABORT``
-* ``RESULT_SET_FETCH_ROWNUM``
+Currently, tajo provides the following session variables.
+
+.. describe:: BROADCAST_NON_CROSS_JOIN_THRESHOLD
+
+A threshold for non-cross joins. When a non-cross join query is executed
with the broadcast join, the whole size of broadcasted tables won't exceed this
threshold.
+
+ * Property value: Integer
+ * Unit: KB
+ * Default value: 5120
+ * Example
+
+.. code-block:: sh
+
+ \set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
+
+.. describe:: BROADCAST_CROSS_JOIN_THRESHOLD
+
+A threshold for cross joins. When a cross join query is executed, the
whole size of broadcasted tables won't exceed this threshold.
+
+ * Property value: Integer
+ * Unit: KB
+ * Default value: 1024
+ * Example
+
+.. code-block:: sh
+
+ \set BROADCAST_CROSS_JOIN_THRESHOLD 1024
+
+.. warning::
+ In Tajo, the broadcast join is only the way to perform cross joins.
Since the cross join is a very expensive operation, this value need to be tuned
carefully.
+
+.. describe:: JOIN_TASK_INPUT_SIZE
+
+The repartition join is executed in two stages. When a join query is
executed with the repartition join, this value indicates the amount of input
data processed by each task at the second stage.
+As a result, it determines the degree of the parallel processing of the
join query.
+
+ * Property value: Integer
+ * Unit: MB
+ * Default value: 64
+ * Example
+
+.. code-block:: sh
+
+ \set JOIN_TASK_INPUT_SIZE 64
+
+.. describe:: JOIN_PER_SHUFFLE_SIZE
--- End diff --
Actually, join is processed in the second stage. The first stage is for
shuffle. This parameters only determines the input size of the second stages.
So, this name is proper in my opinion.
In addition, this parameter has a similar context to map (reduce) task
size. So, it would be familiar to those who have Hadoop experiences.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---