Re: How to create Spark DataFrame using custom Hadoop InputFormat?

2015-07-31 Thread Umesh Kacha
Hi thanks Void works I use same custom format in Hive and it works with
Void as key. Please share example if you have to create DataFrame using
custom Hadoop format.
On Aug 1, 2015 2:07 AM, Ted Yu yuzhih...@gmail.com wrote:

 I don't think using Void class is the right choice - it is not even a
 Writable.

 BTW in the future, capture text output instead of image.

 Thanks

 On Fri, Jul 31, 2015 at 12:35 PM, Umesh Kacha umesh.ka...@gmail.com
 wrote:

 Hi Ted thanks My key is always Void because my custom format file is non
 splittable so key is Void and values is  MyRecordWritable which extends
 Hadoop Writable. I am sharing my log as snap please dont mind as I cant
 paste code outside.

 Regards,
 Umesh

 On Sat, Aug 1, 2015 at 12:59 AM, Ted Yu yuzhih...@gmail.com wrote:

 Looking closer at the code you posted, the error likely was caused by
 the 3rd parameter: Void.class

 It is supposed to be the class of key.

 FYI

 On Fri, Jul 31, 2015 at 11:24 AM, unk1102 umesh.ka...@gmail.com wrote:

 Hi I am having my own Hadoop custom InputFormat which I need to use in
 creating DataFrame. I tried to do the following

 JavaPairRDDVoid,MyRecordWritable myFormatAsPairRdd =

 jsc.hadoopFile(hdfs://tmp/data/myformat.xyz,MyInputFormat.class,Void.class,MyRecordWritable.class);
 JavaRDDMyRecordWritable myformatRdd =  myFormatAsPairRdd.values();
 DataFrame myFormatAsDataframe =
 sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
 myFormatAsDataframe.show();

 Above code does not work and throws exception saying
 java.lang.IllegalArgumentException object is not an instance of
 declaring
 class

 My custom Hadoop InputFormat works very well with Hive,MapReduce etc
 How do
 I make it work with Spark please guide I am new to Spark. Thank in
 advance.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org







Re: How to create Spark DataFrame using custom Hadoop InputFormat?

2015-07-31 Thread Ted Yu
Can you pastebin the complete stack trace ?

If you can show skeleton of MyInputFormat and MyRecordWritable, that would
provide additional information as well.

Cheers

On Fri, Jul 31, 2015 at 11:24 AM, unk1102 umesh.ka...@gmail.com wrote:

 Hi I am having my own Hadoop custom InputFormat which I need to use in
 creating DataFrame. I tried to do the following

 JavaPairRDDVoid,MyRecordWritable myFormatAsPairRdd =

 jsc.hadoopFile(hdfs://tmp/data/myformat.xyz,MyInputFormat.class,Void.class,MyRecordWritable.class);
 JavaRDDMyRecordWritable myformatRdd =  myFormatAsPairRdd.values();
 DataFrame myFormatAsDataframe =
 sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
 myFormatAsDataframe.show();

 Above code does not work and throws exception saying
 java.lang.IllegalArgumentException object is not an instance of declaring
 class

 My custom Hadoop InputFormat works very well with Hive,MapReduce etc How do
 I make it work with Spark please guide I am new to Spark. Thank in advance.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: How to create Spark DataFrame using custom Hadoop InputFormat?

2015-07-31 Thread Umesh Kacha
Hi Ted thanks much for the reply. I cant share code on public forum. I have
created custom format by extending Hadoop mapred InputFormat class and same
way RecordReader class. If you can help me how do I use the same in
DataFrame it would be very helpful.

On Sat, Aug 1, 2015 at 12:12 AM, Ted Yu yuzhih...@gmail.com wrote:

 Can you pastebin the complete stack trace ?

 If you can show skeleton of MyInputFormat and MyRecordWritable, that
 would provide additional information as well.

 Cheers

 On Fri, Jul 31, 2015 at 11:24 AM, unk1102 umesh.ka...@gmail.com wrote:

 Hi I am having my own Hadoop custom InputFormat which I need to use in
 creating DataFrame. I tried to do the following

 JavaPairRDDVoid,MyRecordWritable myFormatAsPairRdd =

 jsc.hadoopFile(hdfs://tmp/data/myformat.xyz,MyInputFormat.class,Void.class,MyRecordWritable.class);
 JavaRDDMyRecordWritable myformatRdd =  myFormatAsPairRdd.values();
 DataFrame myFormatAsDataframe =
 sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
 myFormatAsDataframe.show();

 Above code does not work and throws exception saying
 java.lang.IllegalArgumentException object is not an instance of declaring
 class

 My custom Hadoop InputFormat works very well with Hive,MapReduce etc How
 do
 I make it work with Spark please guide I am new to Spark. Thank in
 advance.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: How to create Spark DataFrame using custom Hadoop InputFormat?

2015-07-31 Thread Ted Yu
I don't think using Void class is the right choice - it is not even a
Writable.

BTW in the future, capture text output instead of image.

Thanks

On Fri, Jul 31, 2015 at 12:35 PM, Umesh Kacha umesh.ka...@gmail.com wrote:

 Hi Ted thanks My key is always Void because my custom format file is non
 splittable so key is Void and values is  MyRecordWritable which extends
 Hadoop Writable. I am sharing my log as snap please dont mind as I cant
 paste code outside.

 Regards,
 Umesh

 On Sat, Aug 1, 2015 at 12:59 AM, Ted Yu yuzhih...@gmail.com wrote:

 Looking closer at the code you posted, the error likely was caused by the
 3rd parameter: Void.class

 It is supposed to be the class of key.

 FYI

 On Fri, Jul 31, 2015 at 11:24 AM, unk1102 umesh.ka...@gmail.com wrote:

 Hi I am having my own Hadoop custom InputFormat which I need to use in
 creating DataFrame. I tried to do the following

 JavaPairRDDVoid,MyRecordWritable myFormatAsPairRdd =

 jsc.hadoopFile(hdfs://tmp/data/myformat.xyz,MyInputFormat.class,Void.class,MyRecordWritable.class);
 JavaRDDMyRecordWritable myformatRdd =  myFormatAsPairRdd.values();
 DataFrame myFormatAsDataframe =
 sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
 myFormatAsDataframe.show();

 Above code does not work and throws exception saying
 java.lang.IllegalArgumentException object is not an instance of declaring
 class

 My custom Hadoop InputFormat works very well with Hive,MapReduce etc How
 do
 I make it work with Spark please guide I am new to Spark. Thank in
 advance.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






Re: How to create Spark DataFrame using custom Hadoop InputFormat?

2015-07-31 Thread Ted Yu
Looking closer at the code you posted, the error likely was caused by the
3rd parameter: Void.class

It is supposed to be the class of key.

FYI

On Fri, Jul 31, 2015 at 11:24 AM, unk1102 umesh.ka...@gmail.com wrote:

 Hi I am having my own Hadoop custom InputFormat which I need to use in
 creating DataFrame. I tried to do the following

 JavaPairRDDVoid,MyRecordWritable myFormatAsPairRdd =

 jsc.hadoopFile(hdfs://tmp/data/myformat.xyz,MyInputFormat.class,Void.class,MyRecordWritable.class);
 JavaRDDMyRecordWritable myformatRdd =  myFormatAsPairRdd.values();
 DataFrame myFormatAsDataframe =
 sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
 myFormatAsDataframe.show();

 Above code does not work and throws exception saying
 java.lang.IllegalArgumentException object is not an instance of declaring
 class

 My custom Hadoop InputFormat works very well with Hive,MapReduce etc How do
 I make it work with Spark please guide I am new to Spark. Thank in advance.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org