[jira] [Resolved] (HIVE-28353) Iceberg: Reading *Files Metadata table files if the column is of TIMESTAMP type

2024-06-28 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-28353.
-
Fix Version/s: 4.1.0
 Assignee: Ayush Saxena
   Resolution: Fixed

> Iceberg: Reading *Files Metadata table files if the column is of TIMESTAMP 
> type
> ---
>
> Key: HIVE-28353
> URL: https://issues.apache.org/jira/browse/HIVE-28353
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: hive-4.1.0-must
> Fix For: 4.1.0
>
>
> If the main table has a column of type TIMESTAMP, reading the *FILES Metadata 
> table fails
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.time.OffsetDateTime cannot be cast to 
> java.time.LocalDateTime
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:98)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:537)
> at 
> org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:194)
> ... 55 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28353) Iceberg: Reading *Files Metadata table files if the column is of TIMESTAMP type

2024-06-28 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860927#comment-17860927
 ] 

Ayush Saxena commented on HIVE-28353:
-

Committed to master.
Thanx [~simhadri-g] for the review!!!

> Iceberg: Reading *Files Metadata table files if the column is of TIMESTAMP 
> type
> ---
>
> Key: HIVE-28353
> URL: https://issues.apache.org/jira/browse/HIVE-28353
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Priority: Major
>  Labels: hive-4.1.0-must
>
> If the main table has a column of type TIMESTAMP, reading the *FILES Metadata 
> table fails
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.time.OffsetDateTime cannot be cast to 
> java.time.LocalDateTime
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:98)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:537)
> at 
> org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:194)
> ... 55 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28353) Iceberg: Reading *Files Metadata table files if the column is of TIMESTAMP type

2024-06-28 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-28353:

Labels: hive-4.1.0-must  (was: )

> Iceberg: Reading *Files Metadata table files if the column is of TIMESTAMP 
> type
> ---
>
> Key: HIVE-28353
> URL: https://issues.apache.org/jira/browse/HIVE-28353
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Priority: Major
>  Labels: hive-4.1.0-must
>
> If the main table has a column of type TIMESTAMP, reading the *FILES Metadata 
> table fails
> {noformat}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.ClassCastException: java.time.OffsetDateTime cannot be cast to 
> java.time.LocalDateTime
> at 
> org.apache.hadoop.hive.ql.exec.ListSinkOperator.process(ListSinkOperator.java:98)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:173)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:545)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:537)
> at 
> org.apache.hadoop.hive.ql.exec.FetchTask.executeInner(FetchTask.java:194)
> ... 55 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28338) Client connection count is not correct in HiveMetaStore#close

2024-06-28 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860919#comment-17860919
 ] 

Zhihua Deng commented on HIVE-28338:


Fix has been merged. Thank you for the contribution, [~wechar]!

> Client connection count is not correct in HiveMetaStore#close
> -
>
> Key: HIVE-28338
> URL: https://issues.apache.org/jira/browse/HIVE-28338
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-24349 introduced a bug in {{HiveMetaStoreClient}} for embedded 
> metastore, where the log would print negative connection counts.
> *Root Cause*
> Connection count is only used in remote metastore, we do not need decrease 
> connection counts when transport is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28338) Client connection count is not correct in HiveMetaStore#close

2024-06-28 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-28338.

Fix Version/s: 4.1.0
   Resolution: Fixed

> Client connection count is not correct in HiveMetaStore#close
> -
>
> Key: HIVE-28338
> URL: https://issues.apache.org/jira/browse/HIVE-28338
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> HIVE-24349 introduced a bug in {{HiveMetaStoreClient}} for embedded 
> metastore, where the log would print negative connection counts.
> *Root Cause*
> Connection count is only used in remote metastore, we do not need decrease 
> connection counts when transport is null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2

2024-06-28 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860827#comment-17860827
 ] 

Stamatis Zampetakis commented on HIVE-28339:


The bulk of stateful content that is maintained by Jenkins is located under the 
"/var/jenkins_home" directory.
 
{noformat}
kubectl exec jenkins-6858ddb664-sg6nl df
Filesystem 1K-blocks  Used Available Use% Mounted on
overlay 98831908   4612016  94203508   5% /
tmpfs  65536 0 65536   0% /dev
tmpfs6645236 0   6645236   0% /sys/fs/cgroup
/dev/sdb   308521792 279898320  28607088  91% /var/jenkins_home
/dev/sda1   98831908   4612016  94203508   5% /etc/hosts
shm65536 0 65536   0% /dev/shm
tmpfs   1080112812  10801116   1% 
/run/secrets/kubernetes.io/serviceaccount
tmpfs6645236 0   6645236   0% /proc/acpi
tmpfs6645236 0   6645236   0% /proc/scsi
tmpfs6645236 0   6645236   0% /sys/firmware
{noformat}

As expected the persistent volume used by the Jenkins pod is mounted to the 
"/var/jenkins_home" directory (see kubectl describe 
pod/jenkins-6858ddb664-sg6nl).

For testing purposes we need to obtain a backup of the jenkins_home directory 
and try to mount it to the new (upgraded) Jenkins image to ensure that 
everything will work smoothly.

Currently, the jenkins_home directory is 280GB which makes a complete local 
backup and testing impractical. The majority of disk space is occupied by the 
"jobs" directory and in particular by archives that are kept for each build, 
test results, and log files for each run. These files are kept for archiving 
and diagnosability purposes when users wants to consult the results of each 
build. However, they are not indispensable for the correct functionality of the 
Jenkins instance so for the sake of our experiments we can exclude them from 
the backup. The command that was used to create the backup is given below.

{code:bash}
kubectl exec jenkins-6858ddb664-sg6nl -- tar cf - --exclude=junitResult.xml 
--exclude=*log* --exclude=archive --exclude=workflow --exclude=*git/objects* 
/var/jenkins_home > jenkins_home_backup.tar
{code}
The command took ~5 minutes to run and created an archive of 1.2GB. The 
exclusions are  referring to voluminous files that are nonessential for testing 
the upgrade.

I am now in the process of testing the new Jenkins image locally by mounting 
the unpacked jenkins_home_backup.tar directory to the /var/jenkins_home 
directory of the container.

> Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
> -
>
> Key: HIVE-28339
> URL: https://issues.apache.org/jira/browse/HIVE-28339
> Project: Hive
>  Issue Type: Task
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> The Jenkins version that is used in [https://ci.hive.apache.org/] is 
> currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] 
> which was released in 2022.
> The latest stable version at the moment is 
> [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many 
> improvements, bug and CVE fixes.
> The Dockerfile that is used to build the Jenkins file can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile]
> The Kubernetes deployment files can be found here:
> [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28357) TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO is flaky

2024-06-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-28357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-28357:

Description: 
https://ci.hive.apache.org/job/hive-flaky-check/843/

{code}
13:24:19  [INFO] ---
13:24:19  [INFO]  T E S T S
13:24:19  [INFO] ---
13:24:19  [INFO] Running 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs
13:25:06  [ERROR] Tests run: 23, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 43.212 s <<< FAILURE! - in 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs
13:25:06  [ERROR] 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO
  Time elapsed: 1.45 s  <<< FAILURE!
13:25:06  org.junit.ComparisonFailure: Location does not match 
expected:<...table/state=CA/city=[SanFrancisc]o> but 
was:<...table/state=CA/city=[PaloAlt]o>
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.verifyLocations(TestGetPartitionsUsingProjectionAndFilterSpecs.java:870)
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.getPartitionsWithVals(TestGetPartitionsUsingProjectionAndFilterSpecs.java:781)
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.runGetPartitionsUsingVals(TestGetPartitionsUsingProjectionAndFilterSpecs.java:792)
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO(TestGetPartitionsUsingProjectionAndFilterSpecs.java:656)
13:25:06  
{code}

https://github.com/apache/hive/blob/7f6367e0c6e21b11ef62da1ea6681a54d547de07/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestGetPartitionsUsingProjectionAndFilterSpecs.java#L870

  was:
https://ci.hive.apache.org/job/hive-flaky-check/843/

{code}
13:24:19  [INFO] ---
13:24:19  [INFO]  T E S T S
13:24:19  [INFO] ---
13:24:19  [INFO] Running 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs
13:25:06  [ERROR] Tests run: 23, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 43.212 s <<< FAILURE! - in 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs
13:25:06  [ERROR] 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO
  Time elapsed: 1.45 s  <<< FAILURE!
13:25:06  org.junit.ComparisonFailure: Location does not match 
expected:<...table/state=CA/city=[SanFrancisc]o> but 
was:<...table/state=CA/city=[PaloAlt]o>
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.verifyLocations(TestGetPartitionsUsingProjectionAndFilterSpecs.java:870)
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.getPartitionsWithVals(TestGetPartitionsUsingProjectionAndFilterSpecs.java:781)
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.runGetPartitionsUsingVals(TestGetPartitionsUsingProjectionAndFilterSpecs.java:792)
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO(TestGetPartitionsUsingProjectionAndFilterSpecs.java:656)
13:25:06  
{code}


> TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO
>  is flaky
> ---
>
> Key: HIVE-28357
> URL: https://issues.apache.org/jira/browse/HIVE-28357
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Major
>
> https://ci.hive.apache.org/job/hive-flaky-check/843/
> {code}
> 13:24:19  [INFO] ---
> 13:24:19  [INFO]  T E S T S
> 13:24:19  [INFO] ---
> 13:24:19  [INFO] Running 
> org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs
> 13:25:06  [ERROR] Tests run: 23, Failures: 1, Errors: 0, Skipped: 0, Time 
> elapsed: 43.212 s <<< FAILURE! - in 
> org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs
> 13:25:06  [ERROR] 
> org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO
>   Time elapsed: 1.45 s  <<< FAILURE!
> 13:25:06  org.junit.ComparisonFailure: Location does not match 
> expected:<...table/state=CA/city=[SanFrancisc]o> but 
> was:<...table/state=CA/city=[PaloAlt]o>
> 13:25:06  at 
> o

[jira] [Created] (HIVE-28357) TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO is flaky

2024-06-28 Thread Jira
László Bodor created HIVE-28357:
---

 Summary: 
TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO
 is flaky
 Key: HIVE-28357
 URL: https://issues.apache.org/jira/browse/HIVE-28357
 Project: Hive
  Issue Type: Bug
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28357) TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO is flaky

2024-06-28 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-28357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-28357:

Description: 
https://ci.hive.apache.org/job/hive-flaky-check/843/

{code}
13:24:19  [INFO] ---
13:24:19  [INFO]  T E S T S
13:24:19  [INFO] ---
13:24:19  [INFO] Running 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs
13:25:06  [ERROR] Tests run: 23, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 43.212 s <<< FAILURE! - in 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs
13:25:06  [ERROR] 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO
  Time elapsed: 1.45 s  <<< FAILURE!
13:25:06  org.junit.ComparisonFailure: Location does not match 
expected:<...table/state=CA/city=[SanFrancisc]o> but 
was:<...table/state=CA/city=[PaloAlt]o>
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.verifyLocations(TestGetPartitionsUsingProjectionAndFilterSpecs.java:870)
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.getPartitionsWithVals(TestGetPartitionsUsingProjectionAndFilterSpecs.java:781)
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.runGetPartitionsUsingVals(TestGetPartitionsUsingProjectionAndFilterSpecs.java:792)
13:25:06at 
org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO(TestGetPartitionsUsingProjectionAndFilterSpecs.java:656)
13:25:06  
{code}

> TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO
>  is flaky
> ---
>
> Key: HIVE-28357
> URL: https://issues.apache.org/jira/browse/HIVE-28357
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Priority: Major
>
> https://ci.hive.apache.org/job/hive-flaky-check/843/
> {code}
> 13:24:19  [INFO] ---
> 13:24:19  [INFO]  T E S T S
> 13:24:19  [INFO] ---
> 13:24:19  [INFO] Running 
> org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs
> 13:25:06  [ERROR] Tests run: 23, Failures: 1, Errors: 0, Skipped: 0, Time 
> elapsed: 43.212 s <<< FAILURE! - in 
> org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs
> 13:25:06  [ERROR] 
> org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO
>   Time elapsed: 1.45 s  <<< FAILURE!
> 13:25:06  org.junit.ComparisonFailure: Location does not match 
> expected:<...table/state=CA/city=[SanFrancisc]o> but 
> was:<...table/state=CA/city=[PaloAlt]o>
> 13:25:06  at 
> org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.verifyLocations(TestGetPartitionsUsingProjectionAndFilterSpecs.java:870)
> 13:25:06  at 
> org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.getPartitionsWithVals(TestGetPartitionsUsingProjectionAndFilterSpecs.java:781)
> 13:25:06  at 
> org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.runGetPartitionsUsingVals(TestGetPartitionsUsingProjectionAndFilterSpecs.java:792)
> 13:25:06  at 
> org.apache.hadoop.hive.metastore.TestGetPartitionsUsingProjectionAndFilterSpecs.testGetPartitionsUsingValuesWithJDO(TestGetPartitionsUsingProjectionAndFilterSpecs.java:656)
> 13:25:06  
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28352) Schematool fails to upgradeSchema on dbType=hive

2024-06-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28352:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Schematool fails to upgradeSchema on dbType=hive
> 
>
> Key: HIVE-28352
> URL: https://issues.apache.org/jira/browse/HIVE-28352
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must
> Fix For: 4.1.0
>
>
> Schematool tries to refer to incorrect file names.
> {code:java}
> $ schematool -metaDbType derby -dbType hive -initSchemaTo 3.0.0 -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> $ schematool -metaDbType derby -dbType hive -upgradeSchema -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> ...
> Completed upgrade-3.0.0-to-3.1.0.hive.sql
> Upgrade script upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Upgrade 
> FAILED! Metastore state would be inconsistent !!
> Upgrade FAILED! Metastore state would be inconsistent !!
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: 
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Use 
> --verbose for detailed stacktrace.
> Use --verbose for detailed stacktrace.
> 2024-06-27T01:41:46,573 ERROR [main] schematool.MetastoreSchemaTool: *** 
> schemaTool failed ***
> *** schemaTool failed *** {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28352) Schematool fails to upgradeSchema on dbType=hive

2024-06-28 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860789#comment-17860789
 ] 

Denys Kuzmenko commented on HIVE-28352:
---

Merged to master
Thanks for the fix [~okumin] and [~dengzh] for the review. We'll have to 
cherry-pick this into 4.0.1

> Schematool fails to upgradeSchema on dbType=hive
> 
>
> Key: HIVE-28352
> URL: https://issues.apache.org/jira/browse/HIVE-28352
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: hive-4.0.1-must
>
> Schematool tries to refer to incorrect file names.
> {code:java}
> $ schematool -metaDbType derby -dbType hive -initSchemaTo 3.0.0 -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> $ schematool -metaDbType derby -dbType hive -upgradeSchema -url 
> jdbc:hive2://hive-hiveserver2:1/default -driver 
> org.apache.hive.jdbc.HiveDriver
> ...
> Completed upgrade-3.0.0-to-3.1.0.hive.sql
> Upgrade script upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Upgrade 
> FAILED! Metastore state would be inconsistent !!
> Upgrade FAILED! Metastore state would be inconsistent !!
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: 
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> Underlying cause: java.io.FileNotFoundException : 
> /opt/hive/scripts/metastore/upgrade/hive/upgrade-3.1.0-to-4.0.0-alpha-1.hive.hive.sql
>  (No such file or directory)
> 2024-06-27T01:41:46,572 ERROR [main] schematool.MetastoreSchemaTool: Use 
> --verbose for detailed stacktrace.
> Use --verbose for detailed stacktrace.
> 2024-06-27T01:41:46,573 ERROR [main] schematool.MetastoreSchemaTool: *** 
> schemaTool failed ***
> *** schemaTool failed *** {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26018) The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR

2024-06-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-26018:
--
Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR
> ---
>
> Key: HIVE-26018
> URL: https://issues.apache.org/jira/browse/HIVE-26018
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: GuangMing Lu
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and 
> the result Is not correct, for example:
> CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
> CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;
> insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
> insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');
> SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE  
> T2_n1x b (b.key);
> Hive on Tez result: wrong
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +--+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> +-+
> SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);
> Hive on Tez result: wrong
> +---+
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +-+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |ccc    |ccc    |
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26018) The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR

2024-06-28 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860755#comment-17860755
 ] 

Denys Kuzmenko commented on HIVE-26018:
---

Merged to master.
Thanks for the fix [~seonggon] and [~kkasa] for the review!

> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR
> ---
>
> Key: HIVE-26018
> URL: https://issues.apache.org/jira/browse/HIVE-26018
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: GuangMing Lu
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
>
> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and 
> the result Is not correct, for example:
> CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
> CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;
> insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
> insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');
> SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE  
> T2_n1x b (b.key);
> Hive on Tez result: wrong
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +--+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> +-+
> SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);
> Hive on Tez result: wrong
> +---+
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +-+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |ccc    |ccc    |
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26018) The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR

2024-06-28 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-26018.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR
> ---
>
> Key: HIVE-26018
> URL: https://issues.apache.org/jira/browse/HIVE-26018
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0, 4.0.0
>Reporter: GuangMing Lu
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> The result of UNIQUEJOIN on Hive on Tez is inconsistent with that of MR, and 
> the result Is not correct, for example:
> CREATE TABLE T1_n1x(key STRING, val STRING) STORED AS orc;
> CREATE TABLE T2_n1x(key STRING, val STRING) STORED AS orc;
> insert into T1_n1x values('aaa', '111'),('bbb', '222'),('ccc', '333');
> insert into T2_n1x values('aaa', '111'),('ddd', '444'),('ccc', '333');
> SELECT a.key, b.key FROM UNIQUEJOIN PRESERVE T1_n1x a (a.key), PRESERVE  
> T2_n1x b (b.key);
> Hive on Tez result: wrong
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +--+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> +-+
> SELECT a.key, b.key FROM UNIQUEJOIN T1_n1x a (a.key), T2_n1x b (b.key);
> Hive on Tez result: wrong
> +---+
> |a.key  |b.key  |
> |aaa    |aaa    |
> |bbb    |NULL  |
> |ccc    |ccc    |
> |NULL  |ddd    |
> +-+
> Hive on MR result: right
> |a.key  |b.key  |
> |aaa    |aaa    |
> |ccc    |ccc    |
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28346) Make ALTER CHANGE COLUMN more efficient with many partitions

2024-06-28 Thread Butao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860745#comment-17860745
 ] 

Butao Zhang edited comment on HIVE-28346 at 6/28/24 8:17 AM:
-

Oh, sorry. I just realized your ticket is aim for specific column change not 
table rename. The optimization I mentioned above are mainly for table level 
rename.


was (Author: zhangbutao):
Oh, sorry. I just realized your ticket is aim for partition rename not table 
rename. The optimization I mentioned above are mainly for table level rename.

> Make ALTER CHANGE COLUMN more efficient with many partitions
> 
>
> Key: HIVE-28346
> URL: https://issues.apache.org/jira/browse/HIVE-28346
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Reporter: John Sherman
>Priority: Major
>
> Currently by default when a column is renamed, its column stats are renamed 
> and maintained too via updateOrGetPartitionColumnStats()
> However; in the case of a partitioned table this gets updated per partition, 
> rather than via a bulk operation -
> [https://github.com/apache/hive/blob/1c9969a003b09abc851ae7e19631ad208d3b6066/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L452]
> So a table with N partitions, will end up making at least N HMS calls (one 
> per partition ) for a CHANGE COLUMN. This can take many minutes/hours for 
> large partitioned tables, up to even hitting various timeouts.
> Ideally - it should be able to make a single HMS or update call via direct 
> SQL to update all the partitions at once.
> We do have a work around for this:
> {code:java}
>  
> COLSTATS_RETAIN_ON_COLUMN_REMOVAL("metastore.colstats.retain.on.column.removal",
>         "hive.metastore.colstats.retain.on.column.removal", true,
>         "Whether to retain column statistics during column removals in 
> partitioned tables - disabling this purges all column statistics data for all 
> partition to retain working consistency"),{code}
> However, this has some downsides:
> 1) It is set to retain stats by default
> 2) It affects all tables if enabled
> 3) It drops ALL column stats and not just the column being renamed.
> 4) It is not clear to users that this configuration will solve their issue 
> (which presents typically as a ALTER CHANGE COLUMN operation timing out or 
> taking a very long time).
> Ideally we could make an API for bulk updates to partition objects that is 
> much more efficient. Another approach could be to add a threshold 
> configuration that if the number of partitions is > then some configured 
> value ALTER would drop the column stats, and under it would retain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28346) Make ALTER CHANGE COLUMN more efficient with many partitions

2024-06-28 Thread Butao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860745#comment-17860745
 ] 

Butao Zhang commented on HIVE-28346:


Oh, sorry. I just realized your ticket is aim for partition rename not table 
rename. The optimization I mentioned above are mainly for table level rename.

> Make ALTER CHANGE COLUMN more efficient with many partitions
> 
>
> Key: HIVE-28346
> URL: https://issues.apache.org/jira/browse/HIVE-28346
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, Metastore
>Reporter: John Sherman
>Priority: Major
>
> Currently by default when a column is renamed, its column stats are renamed 
> and maintained too via updateOrGetPartitionColumnStats()
> However; in the case of a partitioned table this gets updated per partition, 
> rather than via a bulk operation -
> [https://github.com/apache/hive/blob/1c9969a003b09abc851ae7e19631ad208d3b6066/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L452]
> So a table with N partitions, will end up making at least N HMS calls (one 
> per partition ) for a CHANGE COLUMN. This can take many minutes/hours for 
> large partitioned tables, up to even hitting various timeouts.
> Ideally - it should be able to make a single HMS or update call via direct 
> SQL to update all the partitions at once.
> We do have a work around for this:
> {code:java}
>  
> COLSTATS_RETAIN_ON_COLUMN_REMOVAL("metastore.colstats.retain.on.column.removal",
>         "hive.metastore.colstats.retain.on.column.removal", true,
>         "Whether to retain column statistics during column removals in 
> partitioned tables - disabling this purges all column statistics data for all 
> partition to retain working consistency"),{code}
> However, this has some downsides:
> 1) It is set to retain stats by default
> 2) It affects all tables if enabled
> 3) It drops ALL column stats and not just the column being renamed.
> 4) It is not clear to users that this configuration will solve their issue 
> (which presents typically as a ALTER CHANGE COLUMN operation timing out or 
> taking a very long time).
> Ideally we could make an API for bulk updates to partition objects that is 
> much more efficient. Another approach could be to add a threshold 
> configuration that if the number of partitions is > then some configured 
> value ALTER would drop the column stats, and under it would retain.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27985) Avoid duplicate files.

2024-06-28 Thread Chenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860727#comment-17860727
 ] 

Chenyu Zheng commented on HIVE-27985:
-

[~glapark] 

Thanks for your reply! And Sorry for miss this comment.
I don't think non-deterministic results are relevant to speculative execution. 
This is also the case with task attempt reruns
In our production, we have encountered statements like "distribute by rand()". 
For this example, when some task attempt is failed, and the other task attempt 
of this task reruns. And we may get the different result.
I think that since randomness is introduced, which may result in different 
results each time the task runs, as long as the task is run successfully, the 
result should be correct.
In my experience, the problem with duplicate files comes from task attempt 
retry. Speculative execution just increases the probability of this problem.
 

> Avoid duplicate files.
> --
>
> Key: HIVE-27985
> URL: https://issues.apache.org/jira/browse/HIVE-27985
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 4.0.0
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
> Attachments: how tez examples commit.png
>
>
> *1 introducation*
> Hive on Tez occasionally produces duplicated files, especially speculative 
> execution is enable. Hive identifies and removes duplicate files through 
> removeTempOrDuplicateFiles. However, this logic often does not take effect. 
> For example, the killed task attempt may commit files during the execution of 
> this method. Or the files under HIVE_UNION_SUBDIR_X are not recognized during 
> union all. There are many issues to solve these problems, mainly focusing on 
> how to identify duplicate files. *This issue mainly solves this problem by 
> avoiding the generation of duplicate files.*
> *2 How Tez avoids duplicate files?*
> After testing, I found that Hadoop MapReduce examples and Tez examples do not 
> have this problem. Through OutputCommitter, duplicate files can be avoided if 
> designed properly. Let's analyze how Tez avoids duplicate files.
> {color:#172b4d} _Note: Compared with Tez, Hadoop MapReduce has one more 
> commitPending, which is not critical, so only analyzing Tez._{color}
> !how tez examples commit.png|width=778,height=483!
>  
> Let’s analyze this step:
>  * (1) {*}process records{*}: Process records.
>  * (2) {*}send canCommit request{*}: After all Records are processed, call 
> canCommit remotely to AM.
>  * (3) {*}update commitAttempt{*}: After AM receives the canCommit request, 
> it will check whether there are other tasksattempts in the current task that 
> have already executed canCommit. If there is no other taskattempt to execute 
> canCommit first, return true. Otherwise return false. This ensures that only 
> one taskattempt is committed for each task.
>  * (4) {*}return canCommit response{*}: Task receives AM's response. If 
> returns true, it means it can be committed. If false is returned, it means 
> that another task attempt has already executed the commit first, and you 
> cannot commit. The task will jump into (2) loop to execute canCommit until it 
> is killed or other tasks fail.
>  * (5) {*}output.commit{*}: Execute commit, specifically rename the generated 
> temporary file to the final file.
>  * (6) {*}notify succeeded{*}: Although the task has completed the final 
> file, AM still needs to be notified that its work is completed. Therefore, AM 
> needs to be notified through heartbeat that the current task attempt has been 
> completed.
> There is a problem in the above steps. That is, if an exception occurs in the 
> task after (5) and before (6), AM does not know that the Task attempt has 
> been completed, so AM will still start a new task attempt, and the new task 
> attempt will generate a new file, so It will cause duplication. I added code 
> for randomly throwing exceptions between (5) and (6), and found that in fact, 
> Tez example did not produce data duplication. Why? Mainly because the final 
> file generated by which task attempt is the same is the same. When a new task 
> attempt commits and finds that the final file exists (this file was generated 
> by the previous task attempt), it will be deleted firstly, then renamed. 
> Regardless of whether the previous task attempt was committed normally, the 
> last successful task will clear the previous error results.
> To summarize, tez-examples uses two methods to avoid duplicate files:
>  * (1) Avoid repeated commit through canCommit. This is particularly 
> effective for tasks with speculative execution turned on.
>  * (2) The final file names generated by different task attempts are the 
> same. Combined with canCommit, it can be guaranteed that only one file 
> generated 

[jira] [Comment Edited] (HIVE-27985) Avoid duplicate files.

2024-06-28 Thread Chenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860727#comment-17860727
 ] 

Chenyu Zheng edited comment on HIVE-27985 at 6/28/24 7:21 AM:
--

[~glapark] 

Thanks for your reply! And Sorry for miss this comment.
I don't think non-deterministic results are relevant to speculative execution. 
This is also the case with task attempt reruns
In our production, we have encountered statements like "distribute by rand()". 
For this example, when some task attempt is failed, and the other task attempt 
of this task reruns. And we may get the different result.
I think that since randomness is introduced, which may result in different 
results each time the task runs, as long as the task is run successfully, the 
result should be regard as "correct".
In my experience, the problem with duplicate files comes from task attempt 
retry. Speculative execution just increases the probability of this problem.
 


was (Author: zhengchenyu):
[~glapark] 

Thanks for your reply! And Sorry for miss this comment.
I don't think non-deterministic results are relevant to speculative execution. 
This is also the case with task attempt reruns
In our production, we have encountered statements like "distribute by rand()". 
For this example, when some task attempt is failed, and the other task attempt 
of this task reruns. And we may get the different result.
I think that since randomness is introduced, which may result in different 
results each time the task runs, as long as the task is run successfully, the 
result should be correct.
In my experience, the problem with duplicate files comes from task attempt 
retry. Speculative execution just increases the probability of this problem.
 

> Avoid duplicate files.
> --
>
> Key: HIVE-27985
> URL: https://issues.apache.org/jira/browse/HIVE-27985
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 4.0.0
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
> Attachments: how tez examples commit.png
>
>
> *1 introducation*
> Hive on Tez occasionally produces duplicated files, especially speculative 
> execution is enable. Hive identifies and removes duplicate files through 
> removeTempOrDuplicateFiles. However, this logic often does not take effect. 
> For example, the killed task attempt may commit files during the execution of 
> this method. Or the files under HIVE_UNION_SUBDIR_X are not recognized during 
> union all. There are many issues to solve these problems, mainly focusing on 
> how to identify duplicate files. *This issue mainly solves this problem by 
> avoiding the generation of duplicate files.*
> *2 How Tez avoids duplicate files?*
> After testing, I found that Hadoop MapReduce examples and Tez examples do not 
> have this problem. Through OutputCommitter, duplicate files can be avoided if 
> designed properly. Let's analyze how Tez avoids duplicate files.
> {color:#172b4d} _Note: Compared with Tez, Hadoop MapReduce has one more 
> commitPending, which is not critical, so only analyzing Tez._{color}
> !how tez examples commit.png|width=778,height=483!
>  
> Let’s analyze this step:
>  * (1) {*}process records{*}: Process records.
>  * (2) {*}send canCommit request{*}: After all Records are processed, call 
> canCommit remotely to AM.
>  * (3) {*}update commitAttempt{*}: After AM receives the canCommit request, 
> it will check whether there are other tasksattempts in the current task that 
> have already executed canCommit. If there is no other taskattempt to execute 
> canCommit first, return true. Otherwise return false. This ensures that only 
> one taskattempt is committed for each task.
>  * (4) {*}return canCommit response{*}: Task receives AM's response. If 
> returns true, it means it can be committed. If false is returned, it means 
> that another task attempt has already executed the commit first, and you 
> cannot commit. The task will jump into (2) loop to execute canCommit until it 
> is killed or other tasks fail.
>  * (5) {*}output.commit{*}: Execute commit, specifically rename the generated 
> temporary file to the final file.
>  * (6) {*}notify succeeded{*}: Although the task has completed the final 
> file, AM still needs to be notified that its work is completed. Therefore, AM 
> needs to be notified through heartbeat that the current task attempt has been 
> completed.
> There is a problem in the above steps. That is, if an exception occurs in the 
> task after (5) and before (6), AM does not know that the Task attempt has 
> been completed, so AM will still start a new task attempt, and the new task 
> attempt will generate a new file, so It will cause duplication. I added code 
> for randomly throwing exceptions between (5) and (6), and found