[jira] [Updated] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax

2024-05-24 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27980:
--
Description: 
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name REWRITE DATA
  future --- 
[USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 

  was:
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name
REWRITE DATA [USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 


> Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
> --
>
> Key: HIVE-27980
> URL: https://issues.apache.org/jira/browse/HIVE-27980
> Project: Hive
>  Issue Type: New Feature
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
> {code:java}
> ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
> Add support for OPTIMIZE TABLE syntax. Example:
> {code:java}
> OPTIMIZE TABLE name REWRITE DATA
>   future --- 
> [USING BIN_PACK]
> [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
> WHERE category = 'c1' {code}
> This syntax will be inline with Impala.
> Also, OPTIMIZE command is not limited to compaction, but also supports other 
> table maintenance operations.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax

2024-05-24 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27980:
--
Description: 
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name REWRITE DATA

  future options support --- 
[USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 

  was:
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name REWRITE DATA
  future options support --- 
[USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 


> Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
> --
>
> Key: HIVE-27980
> URL: https://issues.apache.org/jira/browse/HIVE-27980
> Project: Hive
>  Issue Type: New Feature
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
> {code:java}
> ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
> Add support for OPTIMIZE TABLE syntax. Example:
> {code:java}
> OPTIMIZE TABLE name REWRITE DATA
>   future options support --- 
> [USING BIN_PACK]
> [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
> WHERE category = 'c1' {code}
> This syntax will be inline with Impala.
> Also, OPTIMIZE command is not limited to compaction, but also supports other 
> table maintenance operations.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27980) Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax

2024-05-24 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27980:
--
Description: 
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name REWRITE DATA
  future options support --- 
[USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 

  was:
Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
{code:java}
ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
Add support for OPTIMIZE TABLE syntax. Example:
{code:java}
OPTIMIZE TABLE name REWRITE DATA
  future --- 
[USING BIN_PACK]
[ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
WHERE category = 'c1' {code}
This syntax will be inline with Impala.

Also, OPTIMIZE command is not limited to compaction, but also supports other 
table maintenance operations.

 


> Hive Iceberg Compaction: add support for OPTIMIZE TABLE syntax
> --
>
> Key: HIVE-27980
> URL: https://issues.apache.org/jira/browse/HIVE-27980
> Project: Hive
>  Issue Type: New Feature
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Presently Hive Iceberg supports Major compaction using HIVE ACID syntax below.
> {code:java}
> ALTER TABLE name COMPACT MAJOR [AND WAIT] {code}
> Add support for OPTIMIZE TABLE syntax. Example:
> {code:java}
> OPTIMIZE TABLE name REWRITE DATA
>   future options support --- 
> [USING BIN_PACK]
> [ ( { FILE_SIZE_THRESHOLD | MIN_INPUT_FILES } =  [, ... ] ) ]
> WHERE category = 'c1' {code}
> This syntax will be inline with Impala.
> Also, OPTIMIZE command is not limited to compaction, but also supports other 
> table maintenance operations.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26018) The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR

2024-05-24 Thread Sungwoo Park (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849421#comment-17849421
 ] 

Sungwoo Park commented on HIVE-26018:
-

Currently uniquejoin.q passes because it uses MapReduce execution engine. If 
Tez execution engine is used, uniquejoin.q fails for the same reason described 
in this JIRA.

The difference in the outcome is due to different representations of empty rows 
in MapReduce and Tez. If there is no row for the given key,

1. MapReduce's JoinOperator: the storage is empty
2. Tez's MapJoinOperator/CommonMergeJoinOperator: the storage contains an dummy 
row.

Does anyone still use UNIQUEJOIN in production? This is a correctness issue, so 
we would like to investigate further if UNIQUEJOIN is still used.

cc. [~seonggon]

> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR
> ---
>
> Key: HIVE-26018
> URL: https://issues.apache.org/jira/browse/HIVE-26018
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: GuangMing Lu
>Priority: Major
>
> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and 
> the result Is not correct, for example:
> CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
> CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;
> insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
> insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');
> SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE  
> T2_n1x b (b.key);
> Hive on Tez result: wrong
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +--+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> +-+
> SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);
> Hive on Tez result: wrong
> +---+
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +-+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |ccc    |ccc    |
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)