[ https://issues.apache.org/jira/browse/SPARK-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037186#comment-16037186 ]
Dongjoon Hyun commented on SPARK-20984: --------------------------------------- Thank you for confirming! > Reading back from ORC format gives error on big endian systems. > --------------------------------------------------------------- > > Key: SPARK-20984 > URL: https://issues.apache.org/jira/browse/SPARK-20984 > Project: Spark > Issue Type: Bug > Components: Input/Output > Affects Versions: 2.0.0 > Environment: Redhat 7 on power 7 Big endian platform. > [testuser@soe10-vm12 spark]$ cat /etc/redhat- > redhat-access-insights/ redhat-release > [testuser@soe10-vm12 spark]$ cat /etc/redhat-release > Red Hat Enterprise Linux Server release 7.2 (Maipo) > [testuser@soe10-vm12 spark]$ lscpu > Architecture: ppc64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Big Endian > CPU(s): 8 > On-line CPU(s) list: 0-7 > Thread(s) per core: 1 > Core(s) per socket: 1 > Socket(s): 8 > NUMA node(s): 1 > Model: IBM pSeries (emulated by qemu) > L1d cache: 32K > L1i cache: 32K > NUMA node0 CPU(s): 0-7 > [testuser@soe10-vm12 spark]$ > Reporter: Mahesh > > All orc test cases seem to be failing here. Looks like spark is not able to > read back what is written. Following is a way to check it on spark shell. I > am also pasting the test case which probably passes on x86. > All test cases in OrcHadoopFsRelationSuite.scala are failing. > test("SPARK-12218: 'Not' is included in ORC filter pushdown") { > import testImplicits._ > withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> "true") { > withTempPath { dir => > val path = s"${dir.getCanonicalPath}/table1" > (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", > "b").write.orc(path) > checkAnswer( > spark.read.orc(path).where("not (a = 2) or not(b in ('1'))"), > (1 to 5).map(i => Row(i, (i % 2).toString))) > checkAnswer( > spark.read.orc(path).where("not (a = 2 and b in ('1'))"), > (1 to 5).map(i => Row(i, (i % 2).toString))) > } > } > } > Same can be reproduced on spark shell > **Create a DF and write it in orc**** > scala> (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", > "b").write.orc("test") > **Now try to read it back**** > scala> spark.read.orc("test").where("not (a = 2) or not(b in ('1'))").show > 17/06/05 04:20:48 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > org.iq80.snappy.CorruptionException: Invalid copy offset for opcode starting > at 13 > at > org.iq80.snappy.SnappyDecompressor.decompressAllTags(SnappyDecompressor.java:165) > at > org.iq80.snappy.SnappyDecompressor.uncompress(SnappyDecompressor.java:76) > at org.iq80.snappy.Snappy.uncompress(Snappy.java:43) > at > org.apache.hadoop.hive.ql.io.orc.SnappyCodec.decompress(SnappyCodec.java:71) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:214) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hive.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737) > at > org.apache.hive.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) > at > org.apache.hive.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10661) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10625) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725) > at > org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10937) > at > org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:113) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:205) > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:539) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.SparkOrcNewRecordReader.<init>(SparkOrcNewRecordReader.java:48) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:151) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:125) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:279) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:263) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:124) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:98) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/05 04:20:48 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, > localhost): org.iq80.snappy.CorruptionException: Invalid copy offset for > opcode starting at 13 > at > org.iq80.snappy.SnappyDecompressor.decompressAllTags(SnappyDecompressor.java:165) > at > org.iq80.snappy.SnappyDecompressor.uncompress(SnappyDecompressor.java:76) > at org.iq80.snappy.Snappy.uncompress(Snappy.java:43) > at > org.apache.hadoop.hive.ql.io.orc.SnappyCodec.decompress(SnappyCodec.java:71) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:214) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hive.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737) > at > org.apache.hive.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) > at > org.apache.hive.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10661) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10625) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725) > at > org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10937) > at > org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:113) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:205) > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:539) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.SparkOrcNewRecordReader.<init>(SparkOrcNewRecordReader.java:48) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:151) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:125) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:279) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:263) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:124) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:98) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 17/06/05 04:20:48 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; > aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 > (TID 0, localhost): org.iq80.snappy.CorruptionException: Invalid copy offset > for opcode starting at 13 > at > org.iq80.snappy.SnappyDecompressor.decompressAllTags(SnappyDecompressor.java:165) > at > org.iq80.snappy.SnappyDecompressor.uncompress(SnappyDecompressor.java:76) > at org.iq80.snappy.Snappy.uncompress(Snappy.java:43) > at > org.apache.hadoop.hive.ql.io.orc.SnappyCodec.decompress(SnappyCodec.java:71) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:214) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hive.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737) > at > org.apache.hive.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) > at > org.apache.hive.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10661) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10625) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725) > at > org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10937) > at > org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:113) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:205) > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:539) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.SparkOrcNewRecordReader.<init>(SparkOrcNewRecordReader.java:48) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:151) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:125) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:279) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:263) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:124) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:98) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1442) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1441) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1441) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1667) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1893) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1906) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1919) > at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:347) > at > org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:39) > at > org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2201) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) > at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2554) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2200) > at > org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2207) > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1943) > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1942) > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2584) > at org.apache.spark.sql.Dataset.head(Dataset.scala:1942) > at org.apache.spark.sql.Dataset.take(Dataset.scala:2157) > at org.apache.spark.sql.Dataset.showString(Dataset.scala:239) > at org.apache.spark.sql.Dataset.show(Dataset.scala:526) > at org.apache.spark.sql.Dataset.show(Dataset.scala:486) > at org.apache.spark.sql.Dataset.show(Dataset.scala:495) > ... 48 elided > Caused by: org.iq80.snappy.CorruptionException: Invalid copy offset for > opcode starting at 13 > at > org.iq80.snappy.SnappyDecompressor.decompressAllTags(SnappyDecompressor.java:165) > at org.iq80.snappy.SnappyDecompressor.uncompress(SnappyDecompressor.java:76) > at org.iq80.snappy.Snappy.uncompress(Snappy.java:43) > at > org.apache.hadoop.hive.ql.io.orc.SnappyCodec.decompress(SnappyCodec.java:71) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:214) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hive.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737) > at > org.apache.hive.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) > at > org.apache.hive.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10661) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.<init>(OrcProto.java:10625) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725) > at > org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10937) > at > org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:113) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:805) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:776) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:986) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1019) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:205) > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:539) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:230) > at > org.apache.hadoop.hive.ql.io.orc.SparkOrcNewRecordReader.<init>(SparkOrcNewRecordReader.java:48) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:151) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:125) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:279) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:263) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:124) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:98) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:314) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org