That usually happens when you have different types for a column in some parquet files. In this case, I think you have a column of `Long` type that got a file with `Integer` type, I had to deal with similar problem once. You would have to cast it yourself to Long.
On Mon, Jul 9, 2018 at 2:53 PM Nirav Patel <npa...@xactlycorp.com> wrote: > I am getting following error after performing joins between 2 dataframe. > It happens on call to .show() method. I assume it's an issue with > incompatible type but it's been really hard to identify which column of > which dataframe have that incompatibility. > Any pointers? > > > 11:06:10.304 13700 [Executor task launch worker for task 16] WARN > o.a.s.s.e.datasources.FileScanRDD - Skipped the rest of the content in the > corrupted file: path: > maprfs:///user/hive/warehouse/analytics.db/myTable/BUSINESS_ID=123/part-00000-b01dbc82-9bc3-43c5-89c6-4c9b2d407106.c000.snappy.parquet, > range: 0-14248, partition values: [1085] > java.lang.UnsupportedOperationException: Unimplemented type: IntegerType > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBinaryBatch(VectorizedColumnReader.java:431) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:203) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:230) > at > org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137) > at > org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:154) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > > [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> > > <https://www.instagram.com/xactlycorp/> > <https://www.linkedin.com/company/xactly-corporation> > <https://twitter.com/Xactly> <https://www.facebook.com/XactlyCorp> > <http://www.youtube.com/xactlycorporation> -- Sent from my iPhone