> On Feb. 4, 2020, 10:16 p.m., Rajesh Balamohan wrote: > > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java > > Line 4382 (original), 4397 (patched) > > <https://reviews.apache.org/r/71904/diff/3/?file=2210218#file2210218line4397> > > > > Is this needed for direct insert?. In objectstores, we could have calls > > getting throttled.
That's a really good question, I was thinking about it a lot. I think it is not needed. This method does two things: removes the temporarily and duplicated files and returns the emptyBuckets list. This list contains elements if the number of buckets are bigger than the number of files. In this case, for MM tables, empty files will be created. But this is not the case for ACID tables, there won't be any empty files created for ACID tables. I want to revisit this topic whether or not we need these empty files, but for now, I would go with the same behaviour as for ACID tables. About the temp file removal, when the direct insert is finished all files which are not committed (meaning not in the manifest files) will be deleted prior to this call. So there shouldn't be any unnecessary files left at this point. I remove this call, and upload a patch to see the result of the pre-commit tests. If everything passes, I think it is safe to remove this call in case of direct insert. - Marta ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/71904/#review219494 ----------------------------------------------------------- On Feb. 18, 2020, 12:21 p.m., Marta Kuczora wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/71904/ > ----------------------------------------------------------- > > (Updated Feb. 18, 2020, 12:21 p.m.) > > > Review request for hive, Gopal V and Peter Vary. > > > Bugs: HIVE-21164 > https://issues.apache.org/jira/browse/HIVE-21164 > > > Repository: hive-git > > > Description > ------- > > Extended the original patch with saving the task attempt ids in the file > names and also fixed some bugs in the original patch. > With this fix, inserting into an ACID table would not use move task to place > the generated files into the final directory. It will inserts every files to > the final directory and then clean up the files which are not needed (like > written by failed task attempts). > Also fixed the replication tests which failed for the original patch as well. > > > Diffs > ----- > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java d3cb60b790 > > hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java > da677c7977 > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > 056cd27496 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java > 31d15fdef9 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorTestUtil.java > c2aa73b5f1 > itests/src/test/resources/testconfiguration.properties 1b1bf1147a > ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java > 9a3258115b > ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java 9ad4e71482 > ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 06e4ebee82 > ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 6c67bc7dd8 > ql/src/java/org/apache/hadoop/hive/ql/io/AcidInputFormat.java bba3960102 > ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java 1e8bb223f2 > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 2f5ec5270c > ql/src/java/org/apache/hadoop/hive/ql/io/HiveFileFormatUtils.java > 8980a6292a > ql/src/java/org/apache/hadoop/hive/ql/io/RecordUpdater.java 737e6774b7 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 76984abd0a > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java > c4c56f8477 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java > b8a0f0465c > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java > 398698ec06 > > ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java > 2543dc6fc4 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1eb9c12cc8 > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java > 73ca658d9c > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java > 33d3beba46 > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java > c102a69f8f > ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java ecc7bdee4d > ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java bed05819b5 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 739f2b654b > ql/src/java/org/apache/hadoop/hive/ql/util/UpgradeTool.java 58e6289583 > ql/src/test/org/apache/hadoop/hive/ql/TestTxnAddPartition.java c9cb6692df > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands.java 842140815d > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java e56d83158f > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java 908ceb43fc > ql/src/test/org/apache/hadoop/hive/ql/TestTxnConcatenate.java 8676e0db11 > ql/src/test/org/apache/hadoop/hive/ql/TestTxnExIm.java 66b2b2768b > ql/src/test/org/apache/hadoop/hive/ql/TestTxnLoadData.java bb55d9fd79 > ql/src/test/org/apache/hadoop/hive/ql/TestTxnNoBuckets.java ea6b1d9bec > ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java > af14e628b3 > ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 83db48e758 > ql/src/test/org/apache/hadoop/hive/ql/exec/TestFileSinkOperator.java > 2c4b69b2fe > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 48e9afc496 > ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java > cfd7290762 > ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestWorker.java > 70ae85c458 > ql/src/test/queries/clientpositive/tez_acid_union_dynamic_partition.q > PRE-CREATION > ql/src/test/queries/clientpositive/tez_acid_union_dynamic_partition_2.q > PRE-CREATION > ql/src/test/queries/clientpositive/tez_acid_union_multiinsert.q > PRE-CREATION > ql/src/test/results/clientpositive/acid_subquery.q.out 1dc1775557 > ql/src/test/results/clientpositive/create_transactional_full_acid.q.out > e324d5ec43 > > ql/src/test/results/clientpositive/encrypted/encryption_insert_partition_dynamic.q.out > 61b0057adb > ql/src/test/results/clientpositive/llap/acid_no_buckets.q.out fbf4e481f1 > ql/src/test/results/clientpositive/llap/insert_overwrite.q.out fbc3326b39 > ql/src/test/results/clientpositive/llap/mm_all.q.out 226f2a9374 > > ql/src/test/results/clientpositive/llap/tez_acid_union_dynamic_partition.q.out > PRE-CREATION > > ql/src/test/results/clientpositive/llap/tez_acid_union_dynamic_partition_2.q.out > PRE-CREATION > ql/src/test/results/clientpositive/llap/tez_acid_union_multiinsert.q.out > PRE-CREATION > ql/src/test/results/clientpositive/mm_all.q.out 143ebd69f9 > streaming/src/test/org/apache/hive/streaming/TestStreaming.java 35a220facd > > > Diff: https://reviews.apache.org/r/71904/diff/5/ > > > Testing > ------- > > Had to modify some tests because of the file name changes. Also added some > specific tests. > In the pre-commit run all tests passed successfully. > > > Thanks, > > Marta Kuczora > >