Hi Jeetendra, Cheng I am using following code for joining
val Bookings = sqlContext.load("/home/administrator/stageddata/Bookings") val Customerdetails = sqlContext.load("/home/administrator/stageddata/Customerdetails") val CD = Customerdetails. where($"CreatedOn" > "2015-04-01 00:00:00.0"). where($"CreatedOn" < "2015-05-01 00:00:00.0") //Bookings by CD val r1 = Bookings. withColumnRenamed("ID","ID2") val r2 = CD. join(r1,CD.col("CustomerID") === r1.col("ID2"),"left") r2.saveAsParquetFile("/home/administrator/stageddata/BOOKING_FULL"); @Cheng I am not appending the joined table to an existing parquet file, it is a new file. @Jitender I have a rather large parquet file and it also contains some confidential data. Can you tell me what you need to check in it. Thanks On 8 June 2015 at 16:47, Jeetendra Gangele <gangele...@gmail.com> wrote: > Parquet file when are you loading these file? > can you please share the code where you are passing parquet file to spark?. > > On 8 June 2015 at 16:39, Cheng Lian <lian.cs....@gmail.com> wrote: > >> Are you appending the joined DataFrame whose PolicyType is string to an >> existing Parquet file whose PolicyType is int? The exception indicates that >> Parquet found a column with conflicting data types. >> >> Cheng >> >> >> On 6/8/15 5:29 PM, bipin wrote: >> >>> Hi I get this error message when saving a table: >>> >>> parquet.io.ParquetDecodingException: The requested schema is not >>> compatible >>> with the file schema. incompatible types: optional binary PolicyType >>> (UTF8) >>> != optional int32 PolicyType >>> at >>> >>> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105) >>> at >>> >>> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97) >>> at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386) >>> at >>> >>> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87) >>> at >>> >>> parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61) >>> at parquet.schema.MessageType.accept(MessageType.java:55) >>> at >>> parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148) >>> at >>> parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137) >>> at >>> parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157) >>> at >>> >>> parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107) >>> at >>> >>> parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:94) >>> at >>> parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:64) >>> at >>> >>> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282) >>> at >>> >>> parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252) >>> at >>> org.apache.spark.sql.parquet.ParquetRelation2.org >>> $apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667) >>> at >>> >>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) >>> at >>> >>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) >>> at >>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) >>> at org.apache.spark.scheduler.Task.run(Task.scala:64) >>> at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:745) >>> >>> I joined two tables both loaded from parquet file, the joined table when >>> saved throws this error. I could not find anything about this error. >>> Could >>> this be a bug ? >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-using-saveAsParquetFile-tp23204.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > Hi, > > Find my attached resume. I have total around 7 years of work experience. > I worked for Amazon and Expedia in my previous assignments and currently I > am working with start- up technology company called Insideview in hyderabad. > > Regards > Jeetendra >