Lucas61000 commented on code in PR #5983:
URL: https://github.com/apache/hive/pull/5983#discussion_r2215120041


##########
shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java:
##########
@@ -249,26 +270,49 @@ protected boolean initNextRecordReader(K key) throws 
IOException {
         return false;
       }
 
-      // get a record reader for the idx-th chunk
-      try {
-        curReader = rrConstructor.newInstance(new Object[]
-            {split, jc, reporter, Integer.valueOf(idx), preReader});
-
-        // change the key if need be
-        if (key != null) {
-          K newKey = curReader.createKey();
-          ((CombineHiveKey)key).setKey(newKey);
+      if (skipCorruptfile) {
+        // get a record reader for the idx-th chunk
+        try {
+          curReader = rrConstructor.newInstance(new Object[]
+                  {split, jc, reporter, Integer.valueOf(idx), preReader});
+
+          // change the key if need be
+          if (key != null) {
+            K newKey = curReader.createKey();
+            ((CombineHiveKey) key).setKey(newKey);
+          }
+
+          // setup some helper config variables.
+          jc.set("map.input.file", split.getPath(idx).toString());
+          jc.setLong("map.input.start", split.getOffset(idx));
+          jc.setLong("map.input.length", split.getLength(idx));
+        } catch (InvocationTargetException ITe) {
+          return false;
+        } catch (Exception e) {
+          curReader = 
HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(e, jc);

Review Comment:
   Thanks for the review. The issue I’m facing is that some users don’t care 
about data integrity—they just want their jobs to run smoothly and hope to skip 
files corrupted during transmission.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to