Re: How to create Spark DataFrame using custom Hadoop InputFormat?
Hi thanks Void works I use same custom format in Hive and it works with Void as key. Please share example if you have to create DataFrame using custom Hadoop format. On Aug 1, 2015 2:07 AM, Ted Yu yuzhih...@gmail.com wrote: I don't think using Void class is the right choice - it is not even a Writable. BTW in the future, capture text output instead of image. Thanks On Fri, Jul 31, 2015 at 12:35 PM, Umesh Kacha umesh.ka...@gmail.com wrote: Hi Ted thanks My key is always Void because my custom format file is non splittable so key is Void and values is MyRecordWritable which extends Hadoop Writable. I am sharing my log as snap please dont mind as I cant paste code outside. Regards, Umesh On Sat, Aug 1, 2015 at 12:59 AM, Ted Yu yuzhih...@gmail.com wrote: Looking closer at the code you posted, the error likely was caused by the 3rd parameter: Void.class It is supposed to be the class of key. FYI On Fri, Jul 31, 2015 at 11:24 AM, unk1102 umesh.ka...@gmail.com wrote: Hi I am having my own Hadoop custom InputFormat which I need to use in creating DataFrame. I tried to do the following JavaPairRDDVoid,MyRecordWritable myFormatAsPairRdd = jsc.hadoopFile(hdfs://tmp/data/myformat.xyz,MyInputFormat.class,Void.class,MyRecordWritable.class); JavaRDDMyRecordWritable myformatRdd = myFormatAsPairRdd.values(); DataFrame myFormatAsDataframe = sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class); myFormatAsDataframe.show(); Above code does not work and throws exception saying java.lang.IllegalArgumentException object is not an instance of declaring class My custom Hadoop InputFormat works very well with Hive,MapReduce etc How do I make it work with Spark please guide I am new to Spark. Thank in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to create Spark DataFrame using custom Hadoop InputFormat?
Can you pastebin the complete stack trace ? If you can show skeleton of MyInputFormat and MyRecordWritable, that would provide additional information as well. Cheers On Fri, Jul 31, 2015 at 11:24 AM, unk1102 umesh.ka...@gmail.com wrote: Hi I am having my own Hadoop custom InputFormat which I need to use in creating DataFrame. I tried to do the following JavaPairRDDVoid,MyRecordWritable myFormatAsPairRdd = jsc.hadoopFile(hdfs://tmp/data/myformat.xyz,MyInputFormat.class,Void.class,MyRecordWritable.class); JavaRDDMyRecordWritable myformatRdd = myFormatAsPairRdd.values(); DataFrame myFormatAsDataframe = sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class); myFormatAsDataframe.show(); Above code does not work and throws exception saying java.lang.IllegalArgumentException object is not an instance of declaring class My custom Hadoop InputFormat works very well with Hive,MapReduce etc How do I make it work with Spark please guide I am new to Spark. Thank in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to create Spark DataFrame using custom Hadoop InputFormat?
Hi Ted thanks much for the reply. I cant share code on public forum. I have created custom format by extending Hadoop mapred InputFormat class and same way RecordReader class. If you can help me how do I use the same in DataFrame it would be very helpful. On Sat, Aug 1, 2015 at 12:12 AM, Ted Yu yuzhih...@gmail.com wrote: Can you pastebin the complete stack trace ? If you can show skeleton of MyInputFormat and MyRecordWritable, that would provide additional information as well. Cheers On Fri, Jul 31, 2015 at 11:24 AM, unk1102 umesh.ka...@gmail.com wrote: Hi I am having my own Hadoop custom InputFormat which I need to use in creating DataFrame. I tried to do the following JavaPairRDDVoid,MyRecordWritable myFormatAsPairRdd = jsc.hadoopFile(hdfs://tmp/data/myformat.xyz,MyInputFormat.class,Void.class,MyRecordWritable.class); JavaRDDMyRecordWritable myformatRdd = myFormatAsPairRdd.values(); DataFrame myFormatAsDataframe = sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class); myFormatAsDataframe.show(); Above code does not work and throws exception saying java.lang.IllegalArgumentException object is not an instance of declaring class My custom Hadoop InputFormat works very well with Hive,MapReduce etc How do I make it work with Spark please guide I am new to Spark. Thank in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to create Spark DataFrame using custom Hadoop InputFormat?
I don't think using Void class is the right choice - it is not even a Writable. BTW in the future, capture text output instead of image. Thanks On Fri, Jul 31, 2015 at 12:35 PM, Umesh Kacha umesh.ka...@gmail.com wrote: Hi Ted thanks My key is always Void because my custom format file is non splittable so key is Void and values is MyRecordWritable which extends Hadoop Writable. I am sharing my log as snap please dont mind as I cant paste code outside. Regards, Umesh On Sat, Aug 1, 2015 at 12:59 AM, Ted Yu yuzhih...@gmail.com wrote: Looking closer at the code you posted, the error likely was caused by the 3rd parameter: Void.class It is supposed to be the class of key. FYI On Fri, Jul 31, 2015 at 11:24 AM, unk1102 umesh.ka...@gmail.com wrote: Hi I am having my own Hadoop custom InputFormat which I need to use in creating DataFrame. I tried to do the following JavaPairRDDVoid,MyRecordWritable myFormatAsPairRdd = jsc.hadoopFile(hdfs://tmp/data/myformat.xyz,MyInputFormat.class,Void.class,MyRecordWritable.class); JavaRDDMyRecordWritable myformatRdd = myFormatAsPairRdd.values(); DataFrame myFormatAsDataframe = sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class); myFormatAsDataframe.show(); Above code does not work and throws exception saying java.lang.IllegalArgumentException object is not an instance of declaring class My custom Hadoop InputFormat works very well with Hive,MapReduce etc How do I make it work with Spark please guide I am new to Spark. Thank in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to create Spark DataFrame using custom Hadoop InputFormat?
Looking closer at the code you posted, the error likely was caused by the 3rd parameter: Void.class It is supposed to be the class of key. FYI On Fri, Jul 31, 2015 at 11:24 AM, unk1102 umesh.ka...@gmail.com wrote: Hi I am having my own Hadoop custom InputFormat which I need to use in creating DataFrame. I tried to do the following JavaPairRDDVoid,MyRecordWritable myFormatAsPairRdd = jsc.hadoopFile(hdfs://tmp/data/myformat.xyz,MyInputFormat.class,Void.class,MyRecordWritable.class); JavaRDDMyRecordWritable myformatRdd = myFormatAsPairRdd.values(); DataFrame myFormatAsDataframe = sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class); myFormatAsDataframe.show(); Above code does not work and throws exception saying java.lang.IllegalArgumentException object is not an instance of declaring class My custom Hadoop InputFormat works very well with Hive,MapReduce etc How do I make it work with Spark please guide I am new to Spark. Thank in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org