[jira] [Updated] (SPARK-18600) BZ2 CRC read error needs better reporting
[ https://issues.apache.org/jira/browse/SPARK-18600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-18600: -- Labels: spree (was: ) > BZ2 CRC read error needs better reporting > - > > Key: SPARK-18600 > URL: https://issues.apache.org/jira/browse/SPARK-18600 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Charles R Allen >Priority: Minor > Labels: spree > > {code} > 16/11/25 20:05:03 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 148 > in stage 5.0 failed 1 times, most recent failure: Lost task 148.0 in stage > 5.0 (TID 5945, localhost): org.apache.spark.SparkException: Task failed while > writing rows > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: com.univocity.parsers.common.TextParsingException: > java.lang.IllegalStateException - Error reading from input > Parser Configuration: CsvParserSettings: > Auto configuration enabled=true > Autodetect column delimiter=false > Autodetect quotes=false > Column reordering enabled=true > Empty value=null > Escape unquoted values=false > Header extraction enabled=null > Headers=[INTERVALSTARTTIME_GMT, INTERVALENDTIME_GMT, OPR_DT, OPR_HR, > NODE_ID_XML, NODE_ID, NODE, MARKET_RUN_ID, LMP_TYPE, XML_DATA_ITEM, > PNODE_RESMRID, GRP_TYPE, POS, VALUE, OPR_INTERVAL, GROUP] > Ignore leading whitespaces=false > Ignore trailing whitespaces=false > Input buffer size=128 > Input reading on separate thread=false > Keep escape sequences=false > Line separator detection enabled=false > Maximum number of characters per column=100 > Maximum number of columns=20480 > Normalize escaped line separators=true > Null value= > Number of records to read=all > Row processor=none > RowProcessor error handler=null > Selected fields=none > Skip empty lines=true > Unescaped quote handling=STOP_AT_DELIMITERFormat configuration: > CsvFormat: > Comment character=\0 > Field delimiter=, > Line separator (normalized)=\n > Line separator sequence=\n > Quote character=" > Quote escape character=\ > Quote escape escape character=null > Internal state when error was thrown: line=27089, column=13, record=27089, > charIndex=4451456, headers=[INTERVALSTARTTIME_GMT, INTERVALENDTIME_GMT, > OPR_DT, OPR_HR, NODE_ID_XML, NODE_ID, NODE, MARKET_RUN_ID, LMP_TYPE, > XML_DATA_ITEM, PNODE_RESMRID, GRP_TYPE, POS, VALUE, OPR_INTERVAL, GROUP] > at > com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:302) > at > com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:431) > at > org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.next(CSVParser.scala:148) > at > org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.next(CSVParser.scala:131) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execu
[jira] [Updated] (SPARK-18600) BZ2 CRC read error needs better reporting
[ https://issues.apache.org/jira/browse/SPARK-18600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-18600: -- Priority: Minor (was: Major) Description: {code} 16/11/25 20:05:03 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. org.apache.spark.SparkException: Job aborted due to stage failure: Task 148 in stage 5.0 failed 1 times, most recent failure: Lost task 148.0 in stage 5.0 (TID 5945, localhost): org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.univocity.parsers.common.TextParsingException: java.lang.IllegalStateException - Error reading from input Parser Configuration: CsvParserSettings: Auto configuration enabled=true Autodetect column delimiter=false Autodetect quotes=false Column reordering enabled=true Empty value=null Escape unquoted values=false Header extraction enabled=null Headers=[INTERVALSTARTTIME_GMT, INTERVALENDTIME_GMT, OPR_DT, OPR_HR, NODE_ID_XML, NODE_ID, NODE, MARKET_RUN_ID, LMP_TYPE, XML_DATA_ITEM, PNODE_RESMRID, GRP_TYPE, POS, VALUE, OPR_INTERVAL, GROUP] Ignore leading whitespaces=false Ignore trailing whitespaces=false Input buffer size=128 Input reading on separate thread=false Keep escape sequences=false Line separator detection enabled=false Maximum number of characters per column=100 Maximum number of columns=20480 Normalize escaped line separators=true Null value= Number of records to read=all Row processor=none RowProcessor error handler=null Selected fields=none Skip empty lines=true Unescaped quote handling=STOP_AT_DELIMITERFormat configuration: CsvFormat: Comment character=\0 Field delimiter=, Line separator (normalized)=\n Line separator sequence=\n Quote character=" Quote escape character=\ Quote escape escape character=null Internal state when error was thrown: line=27089, column=13, record=27089, charIndex=4451456, headers=[INTERVALSTARTTIME_GMT, INTERVALENDTIME_GMT, OPR_DT, OPR_HR, NODE_ID_XML, NODE_ID, NODE, MARKET_RUN_ID, LMP_TYPE, XML_DATA_ITEM, PNODE_RESMRID, GRP_TYPE, POS, VALUE, OPR_INTERVAL, GROUP] at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:302) at com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:431) at org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.next(CSVParser.scala:148) at org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.next(CSVParser.scala:131) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:253) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1345) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeR