Re: Error in using saveAsParquetFile

2015-06-09 Thread Bipin Nag
Cheng you were right. It works when I remove the field from either one. I
should have checked the types beforehand. What confused me is that Spark
attempted to join it and midway threw the error. It isn't quite there yet.
Thanks for the help.

On Mon, Jun 8, 2015 at 8:29 PM Cheng Lian lian.cs@gmail.com wrote:

  I suspect that Bookings and Customerdetails both have a PolicyType field,
 one is string and the other is an int.


 Cheng


 On 6/8/15 9:15 PM, Bipin Nag wrote:

  Hi Jeetendra, Cheng

  I am using following code for joining

 val Bookings = sqlContext.load(/home/administrator/stageddata/Bookings)
 val Customerdetails =
 sqlContext.load(/home/administrator/stageddata/Customerdetails)

 val CD = Customerdetails.
 where($CreatedOn  2015-04-01 00:00:00.0).
 where($CreatedOn  2015-05-01 00:00:00.0)

 //Bookings by CD
 val r1 = Bookings.
 withColumnRenamed(ID,ID2)
 val r2 = CD.
 join(r1,CD.col(CustomerID) === r1.col(ID2),left)

 r2.saveAsParquetFile(/home/administrator/stageddata/BOOKING_FULL);

  @Cheng I am not appending the joined table to an existing parquet file,
 it is a new file.
  @Jitender I have a rather large parquet file and it also contains some
 confidential data. Can you tell me what you need to check in it.

  Thanks


 On 8 June 2015 at 16:47, Jeetendra Gangele gangele...@gmail.com wrote:

 Parquet file when are you loading these file?
 can you please share the code where you are passing parquet file to
 spark?.

 On 8 June 2015 at 16:39, Cheng Lian lian.cs@gmail.com wrote:

 Are you appending the joined DataFrame whose PolicyType is string to an
 existing Parquet file whose PolicyType is int? The exception indicates that
 Parquet found a column with conflicting data types.

 Cheng


 On 6/8/15 5:29 PM, bipin wrote:

 Hi I get this error message when saving a table:

 parquet.io.ParquetDecodingException: The requested schema is not
 compatible
 with the file schema. incompatible types: optional binary PolicyType
 (UTF8)
 != optional int32 PolicyType
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97)
 at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87)
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61)
 at parquet.schema.MessageType.accept(MessageType.java:55)
 at
 parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
 at
 parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137)
 at
 parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157)
 at

 parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
 at

 parquet.hadoop.InternalParquetRecordWriter.init(InternalParquetRecordWriter.java:94)
 at
 parquet.hadoop.ParquetRecordWriter.init(ParquetRecordWriter.java:64)
 at

 parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
 at

 parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
 at
 org.apache.spark.sql.parquet.ParquetRelation2.org
 $apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
 at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 at org.apache.spark.scheduler.Task.run(Task.scala:64)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 I joined two tables both loaded from parquet file, the joined table when
 saved throws this error. I could not find anything about this error.
 Could
 this be a bug ?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-using-saveAsParquetFile-tp23204.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




   --
  Hi,

  Find my attached resume. I have total around 7 years of work experience.
 I worked for Amazon and Expedia in my 

Re: Error in using saveAsParquetFile

2015-06-09 Thread Cheng Lian
Yeah, this does look confusing. We are trying to improve the error 
reporting by catching similar issues at the end of the analysis phase 
and give more descriptive error messages. Part of the work can be found 
here: 
https://github.com/apache/spark/blob/0902a11940e550e85a53e110b490fe90e16ddaf4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala


Cheng

On 6/9/15 2:51 PM, Bipin Nag wrote:
Cheng you were right. It works when I remove the field from either 
one. I should have checked the types beforehand. What confused me is 
that Spark attempted to join it and midway threw the error. It isn't 
quite there yet. Thanks for the help.


On Mon, Jun 8, 2015 at 8:29 PM Cheng Lian lian.cs@gmail.com 
mailto:lian.cs@gmail.com wrote:


I suspect that Bookings and Customerdetails both have a PolicyType
field, one is string and the other is an int.


Cheng


On 6/8/15 9:15 PM, Bipin Nag wrote:

Hi Jeetendra, Cheng

I am using following code for joining

val Bookings =
sqlContext.load(/home/administrator/stageddata/Bookings)
val Customerdetails =
sqlContext.load(/home/administrator/stageddata/Customerdetails)

val CD = Customerdetails.
where($CreatedOn  2015-04-01 00:00:00.0).
where($CreatedOn  2015-05-01 00:00:00.0)

//Bookings by CD
val r1 = Bookings.
withColumnRenamed(ID,ID2)
val r2 = CD.
join(r1,CD.col(CustomerID) === r1.col(ID2),left)

r2.saveAsParquetFile(/home/administrator/stageddata/BOOKING_FULL);

@Cheng I am not appending the joined table to an existing parquet
file, it is a new file.
@Jitender I have a rather large parquet file and it also contains
some confidential data. Can you tell me what you need to check in it.

Thanks


On 8 June 2015 at 16:47, Jeetendra Gangele gangele...@gmail.com
mailto:gangele...@gmail.com wrote:

Parquet file when are you loading these file?
can you please share the code where you are passing parquet
file to spark?.

On 8 June 2015 at 16:39, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:

Are you appending the joined DataFrame whose PolicyType
is string to an existing Parquet file whose PolicyType is
int? The exception indicates that Parquet found a column
with conflicting data types.

Cheng


On 6/8/15 5:29 PM, bipin wrote:

Hi I get this error message when saving a table:

parquet.io
http://parquet.io.ParquetDecodingException: The
requested schema is not compatible
with the file schema. incompatible types: optional
binary PolicyType (UTF8)
!= optional int32 PolicyType
at

parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
at

parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97)
at
parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
at

parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87)
at

parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61)
at
parquet.schema.MessageType.accept(MessageType.java:55)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157)
at

parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
at

parquet.hadoop.InternalParquetRecordWriter.init(InternalParquetRecordWriter.java:94)
at

parquet.hadoop.ParquetRecordWriter.init(ParquetRecordWriter.java:64)
at

parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
at

parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
at
org.apache.spark.sql.parquet.ParquetRelation2.org

http://org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
at

org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
at


Re: Error in using saveAsParquetFile

2015-06-08 Thread Jeetendra Gangele
Parquet file when are you loading these file?
can you please share the code where you are passing parquet file to spark?.

On 8 June 2015 at 16:39, Cheng Lian lian.cs@gmail.com wrote:

 Are you appending the joined DataFrame whose PolicyType is string to an
 existing Parquet file whose PolicyType is int? The exception indicates that
 Parquet found a column with conflicting data types.

 Cheng


 On 6/8/15 5:29 PM, bipin wrote:

 Hi I get this error message when saving a table:

 parquet.io.ParquetDecodingException: The requested schema is not
 compatible
 with the file schema. incompatible types: optional binary PolicyType
 (UTF8)
 != optional int32 PolicyType
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97)
 at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87)
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61)
 at parquet.schema.MessageType.accept(MessageType.java:55)
 at
 parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
 at
 parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137)
 at
 parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157)
 at

 parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
 at

 parquet.hadoop.InternalParquetRecordWriter.init(InternalParquetRecordWriter.java:94)
 at
 parquet.hadoop.ParquetRecordWriter.init(ParquetRecordWriter.java:64)
 at

 parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
 at

 parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
 at
 org.apache.spark.sql.parquet.ParquetRelation2.org
 $apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
 at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 at org.apache.spark.scheduler.Task.run(Task.scala:64)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 I joined two tables both loaded from parquet file, the joined table when
 saved throws this error. I could not find anything about this error. Could
 this be a bug ?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-using-saveAsParquetFile-tp23204.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
Hi,

Find my attached resume. I have total around 7 years of work experience.
I worked for Amazon and Expedia in my previous assignments and currently I
am working with start- up technology company called Insideview in hyderabad.

Regards
Jeetendra


Re: Error in using saveAsParquetFile

2015-06-08 Thread Cheng Lian
Are you appending the joined DataFrame whose PolicyType is string to an 
existing Parquet file whose PolicyType is int? The exception indicates 
that Parquet found a column with conflicting data types.


Cheng

On 6/8/15 5:29 PM, bipin wrote:

Hi I get this error message when saving a table:

parquet.io.ParquetDecodingException: The requested schema is not compatible
with the file schema. incompatible types: optional binary PolicyType (UTF8)
!= optional int32 PolicyType
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97)
at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87)
at
parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61)
at parquet.schema.MessageType.accept(MessageType.java:55)
at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137)
at parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157)
at
parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
at
parquet.hadoop.InternalParquetRecordWriter.init(InternalParquetRecordWriter.java:94)
at 
parquet.hadoop.ParquetRecordWriter.init(ParquetRecordWriter.java:64)
at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
at
parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
at
org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
at
org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I joined two tables both loaded from parquet file, the joined table when
saved throws this error. I could not find anything about this error. Could
this be a bug ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-using-saveAsParquetFile-tp23204.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Error in using saveAsParquetFile

2015-06-08 Thread Bipin Nag
Hi Jeetendra, Cheng

I am using following code for joining

val Bookings = sqlContext.load(/home/administrator/stageddata/Bookings)
val Customerdetails =
sqlContext.load(/home/administrator/stageddata/Customerdetails)

val CD = Customerdetails.
where($CreatedOn  2015-04-01 00:00:00.0).
where($CreatedOn  2015-05-01 00:00:00.0)

//Bookings by CD
val r1 = Bookings.
withColumnRenamed(ID,ID2)
val r2 = CD.
join(r1,CD.col(CustomerID) === r1.col(ID2),left)

r2.saveAsParquetFile(/home/administrator/stageddata/BOOKING_FULL);

@Cheng I am not appending the joined table to an existing parquet file, it
is a new file.
@Jitender I have a rather large parquet file and it also contains some
confidential data. Can you tell me what you need to check in it.

Thanks


On 8 June 2015 at 16:47, Jeetendra Gangele gangele...@gmail.com wrote:

 Parquet file when are you loading these file?
 can you please share the code where you are passing parquet file to spark?.

 On 8 June 2015 at 16:39, Cheng Lian lian.cs@gmail.com wrote:

 Are you appending the joined DataFrame whose PolicyType is string to an
 existing Parquet file whose PolicyType is int? The exception indicates that
 Parquet found a column with conflicting data types.

 Cheng


 On 6/8/15 5:29 PM, bipin wrote:

 Hi I get this error message when saving a table:

 parquet.io.ParquetDecodingException: The requested schema is not
 compatible
 with the file schema. incompatible types: optional binary PolicyType
 (UTF8)
 != optional int32 PolicyType
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97)
 at parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87)
 at

 parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61)
 at parquet.schema.MessageType.accept(MessageType.java:55)
 at
 parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
 at
 parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137)
 at
 parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157)
 at

 parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
 at

 parquet.hadoop.InternalParquetRecordWriter.init(InternalParquetRecordWriter.java:94)
 at
 parquet.hadoop.ParquetRecordWriter.init(ParquetRecordWriter.java:64)
 at

 parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
 at

 parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
 at
 org.apache.spark.sql.parquet.ParquetRelation2.org
 $apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
 at

 org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
 at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 at org.apache.spark.scheduler.Task.run(Task.scala:64)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 I joined two tables both loaded from parquet file, the joined table when
 saved throws this error. I could not find anything about this error.
 Could
 this be a bug ?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-using-saveAsParquetFile-tp23204.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




 --
 Hi,

 Find my attached resume. I have total around 7 years of work experience.
 I worked for Amazon and Expedia in my previous assignments and currently I
 am working with start- up technology company called Insideview in hyderabad.

 Regards
 Jeetendra



Re: Error in using saveAsParquetFile

2015-06-08 Thread Cheng Lian
I suspect that Bookings and Customerdetails both have a PolicyType 
field, one is string and the other is an int.


Cheng

On 6/8/15 9:15 PM, Bipin Nag wrote:

Hi Jeetendra, Cheng

I am using following code for joining

val Bookings = sqlContext.load(/home/administrator/stageddata/Bookings)
val Customerdetails = 
sqlContext.load(/home/administrator/stageddata/Customerdetails)


val CD = Customerdetails.
where($CreatedOn  2015-04-01 00:00:00.0).
where($CreatedOn  2015-05-01 00:00:00.0)

//Bookings by CD
val r1 = Bookings.
withColumnRenamed(ID,ID2)
val r2 = CD.
join(r1,CD.col(CustomerID) === r1.col(ID2),left)

r2.saveAsParquetFile(/home/administrator/stageddata/BOOKING_FULL);

@Cheng I am not appending the joined table to an existing parquet 
file, it is a new file.
@Jitender I have a rather large parquet file and it also contains some 
confidential data. Can you tell me what you need to check in it.


Thanks


On 8 June 2015 at 16:47, Jeetendra Gangele gangele...@gmail.com 
mailto:gangele...@gmail.com wrote:


Parquet file when are you loading these file?
can you please share the code where you are passing parquet file
to spark?.

On 8 June 2015 at 16:39, Cheng Lian lian.cs@gmail.com
mailto:lian.cs@gmail.com wrote:

Are you appending the joined DataFrame whose PolicyType is
string to an existing Parquet file whose PolicyType is int?
The exception indicates that Parquet found a column with
conflicting data types.

Cheng


On 6/8/15 5:29 PM, bipin wrote:

Hi I get this error message when saving a table:

parquet.io http://parquet.io.ParquetDecodingException:
The requested schema is not compatible
with the file schema. incompatible types: optional binary
PolicyType (UTF8)
!= optional int32 PolicyType
at

parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:105)
at

parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:97)
at
parquet.schema.PrimitiveType.accept(PrimitiveType.java:386)
at

parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:87)
at

parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:61)
at
parquet.schema.MessageType.accept(MessageType.java:55)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:148)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:137)
at
parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:157)
at

parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:107)
at

parquet.hadoop.InternalParquetRecordWriter.init(InternalParquetRecordWriter.java:94)
at

parquet.hadoop.ParquetRecordWriter.init(ParquetRecordWriter.java:64)
at

parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:282)
at

parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:252)
at
org.apache.spark.sql.parquet.ParquetRelation2.org

http://org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:667)
at

org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
at

org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

I joined two tables both loaded from parquet file, the
joined table when
saved throws this error. I could not find anything about
this error. Could
this be a bug ?



--
View this message in context:

http://apache-spark-user-list.1001560.n3.nabble.com/Error-in-using-saveAsParquetFile-tp23204.html
Sent from the Apache Spark User List mailing list archive
at