Re: [PR] HIVE-28165 HiveSplitGenerator: send splits through filesystem instead of RPC in case of big payload [hive]

via GitHub Tue, 02 Apr 2024 00:51:18 -0700


InvisibleProgrammer commented on code in PR #5174:
URL: https://github.com/apache/hive/pull/5174#discussion_r1547328886



##########
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java:
##########
@@ -154,6 +165,90 @@ private void prepare(InputInitializerContext 
initializerContext) throws IOExcept
     LOG.info("SplitLocationProvider: " + splitLocationProvider);
   }
 
+  /**
+   * SplitSerializer is a helper class for taking care of serializing splits 
to the tez scratch dir
+   * when a size criteria defined by 
"hive.tez.split.fs.serialization.threshold" is met.
+   * It utilizes an ExecutorService for parallel writes to prevent a single 
split write operation
+   * becoming the bottleneck (as write() is called from a loop currently).
+   */
+  class SplitSerializer {
+    // fields needed for filepath
+    private String queryId;
+    private String inputName;
+    private int vertexId;
+    private Path appStagingPath;
+    // metrics
+    private AtomicInteger timeSpentWithSplitWriteMs = new AtomicInteger(0);

Review Comment:
   What is the reason why all the private fields are initialized in `lazyInit` 
but the AtomicInteger fields are not? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-28165 HiveSplitGenerator: send splits through filesystem instead of RPC in case of big payload [hive]

Reply via email to