anishek commented on a change in pull request #2579:
URL: https://github.com/apache/hive/pull/2579#discussion_r690255149
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
##########
@@ -234,12 +234,24 @@ void run(HiveConf conf, String jobName, Table t,
Partition p, StorageDescriptor
"especially if this message repeats. Check that compaction is running
properly. Check for any " +
"runaway/mis-configured process writing to ACID tables, especially
using Streaming Ingest API.");
int numMinorCompactions = parsedDeltas.size() / maxDeltasToHandle;
+ parsedDeltas.sort(AcidUtils.ParsedDeltaLight::compareTo);
+
+ int start = 0;
+ int end = maxDeltasToHandle;
+
for (int jobSubId = 0; jobSubId < numMinorCompactions; jobSubId++) {
+ while (parsedDeltas.get(end).getMinWriteId() == parsedDeltas.get(end -
1).getMinWriteId() &&
+ parsedDeltas.get(end).getMaxWriteId() == parsedDeltas.get(end -
1).getMaxWriteId()) {
Review comment:
actually thinking some more:: if there are multiple deltas with same
min,max and different statement id then we have to make sure that value of
hive.compactor.delta.pct.threshold is greater than any length of such sorted
list or we have to artificially at that point increase the above value with a
warning in log
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]