Lucas61000 commented on code in PR #5983: URL: https://github.com/apache/hive/pull/5983#discussion_r2215120041
########## shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java: ########## @@ -249,26 +270,49 @@ protected boolean initNextRecordReader(K key) throws IOException { return false; } - // get a record reader for the idx-th chunk - try { - curReader = rrConstructor.newInstance(new Object[] - {split, jc, reporter, Integer.valueOf(idx), preReader}); - - // change the key if need be - if (key != null) { - K newKey = curReader.createKey(); - ((CombineHiveKey)key).setKey(newKey); + if (skipCorruptfile) { + // get a record reader for the idx-th chunk + try { + curReader = rrConstructor.newInstance(new Object[] + {split, jc, reporter, Integer.valueOf(idx), preReader}); + + // change the key if need be + if (key != null) { + K newKey = curReader.createKey(); + ((CombineHiveKey) key).setKey(newKey); + } + + // setup some helper config variables. + jc.set("map.input.file", split.getPath(idx).toString()); + jc.setLong("map.input.start", split.getOffset(idx)); + jc.setLong("map.input.length", split.getLength(idx)); + } catch (InvocationTargetException ITe) { + return false; + } catch (Exception e) { + curReader = HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(e, jc); Review Comment: Thanks for the review. The issue I’m facing is that some users don’t care about data integrity—they just want their jobs to run smoothly and hope to skip files corrupted during transmission. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org