Re: How to create DataFrame from a binary file?
Umesh: Please take a look at the classes under: sql/core/src/main/scala/org/apache/spark/sql/parquet FYI On Mon, Aug 10, 2015 at 10:35 AM, Umesh Kacha umesh.ka...@gmail.com wrote: Hi Bo thanks much let me explain please see the following code JavaPairRDDString,PortableDataStream pairRdd = javaSparkContext.binaryFiles(/hdfs/path/to/binfile); JavaRDDPortableDataStream javardd = pairRdd.values(); DataFrame binDataFrame = sqlContext.createDataFrame(javaBinRdd, PortableDataStream.class); binDataFrame.show(); //shows just one row with above file path /hdfs/path/to/binfile I want binary data in DataFrame from above file so that I can directly do analytics on it. My data is binary so I cant use StructType with primitive data types rigth since everything is binary/byte. My custom data format in binary is same as Parquet I did not find any good example where/how parquet is read into DataFrame. Please guide. On Sun, Aug 9, 2015 at 11:52 PM, bo yang bobyan...@gmail.com wrote: Well, my post uses raw text json file to show how to create data frame with a custom data schema. The key idea is to show the flexibility to deal with any format of data by using your own schema. Sorry if I did not make you fully understand. Anyway, let us know once you figure out your problem. On Sun, Aug 9, 2015 at 11:10 AM, Umesh Kacha umesh.ka...@gmail.com wrote: Hi Bo I know how to create a DataFrame my question is how to create a DataFrame for binary files and in your blog it is raw text json files please read my question properly thanks. On Sun, Aug 9, 2015 at 11:21 PM, bo yang bobyan...@gmail.com wrote: You can create your own data schema (StructType in spark), and use following method to create data frame with your own data schema: sqlContext.createDataFrame(yourRDD, structType); I wrote a post on how to do it. You can also get the sample code there: Light-Weight Self-Service Data Query through Spark SQL: https://www.linkedin.com/pulse/light-weight-self-service-data-query-through-spark-sql-bo-yang Take a look and feel free to let me know for any question. Best, Bo On Sat, Aug 8, 2015 at 1:42 PM, unk1102 umesh.ka...@gmail.com wrote: Hi how do we create DataFrame from a binary file stored in HDFS? I was thinking to use JavaPairRDDString,PortableDataStream pairRdd = javaSparkContext.binaryFiles(/hdfs/path/to/binfile); JavaRDDPortableDataStream javardd = pairRdd.values(); I can see that PortableDataStream has method called toArray which can convert into byte array I was thinking if I have JavaRDDbyte[] can I call the following and get DataFrame DataFrame binDataFrame = sqlContext.createDataFrame(javaBinRdd,Byte.class); Please guide I am new to Spark. I have my own custom format which is binary format and I was thinking if I can convert my custom format into DataFrame using binary operations then I dont need to create my own custom Hadoop format am I on right track? Will reading binary data into DataFrame scale? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-DataFrame-from-a-binary-file-tp24179.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to create DataFrame from a binary file?
Hi Bo thanks much let me explain please see the following code JavaPairRDDString,PortableDataStream pairRdd = javaSparkContext.binaryFiles(/hdfs/path/to/binfile); JavaRDDPortableDataStream javardd = pairRdd.values(); DataFrame binDataFrame = sqlContext.createDataFrame(javaBinRdd, PortableDataStream.class); binDataFrame.show(); //shows just one row with above file path /hdfs/path/to/binfile I want binary data in DataFrame from above file so that I can directly do analytics on it. My data is binary so I cant use StructType with primitive data types rigth since everything is binary/byte. My custom data format in binary is same as Parquet I did not find any good example where/how parquet is read into DataFrame. Please guide. On Sun, Aug 9, 2015 at 11:52 PM, bo yang bobyan...@gmail.com wrote: Well, my post uses raw text json file to show how to create data frame with a custom data schema. The key idea is to show the flexibility to deal with any format of data by using your own schema. Sorry if I did not make you fully understand. Anyway, let us know once you figure out your problem. On Sun, Aug 9, 2015 at 11:10 AM, Umesh Kacha umesh.ka...@gmail.com wrote: Hi Bo I know how to create a DataFrame my question is how to create a DataFrame for binary files and in your blog it is raw text json files please read my question properly thanks. On Sun, Aug 9, 2015 at 11:21 PM, bo yang bobyan...@gmail.com wrote: You can create your own data schema (StructType in spark), and use following method to create data frame with your own data schema: sqlContext.createDataFrame(yourRDD, structType); I wrote a post on how to do it. You can also get the sample code there: Light-Weight Self-Service Data Query through Spark SQL: https://www.linkedin.com/pulse/light-weight-self-service-data-query-through-spark-sql-bo-yang Take a look and feel free to let me know for any question. Best, Bo On Sat, Aug 8, 2015 at 1:42 PM, unk1102 umesh.ka...@gmail.com wrote: Hi how do we create DataFrame from a binary file stored in HDFS? I was thinking to use JavaPairRDDString,PortableDataStream pairRdd = javaSparkContext.binaryFiles(/hdfs/path/to/binfile); JavaRDDPortableDataStream javardd = pairRdd.values(); I can see that PortableDataStream has method called toArray which can convert into byte array I was thinking if I have JavaRDDbyte[] can I call the following and get DataFrame DataFrame binDataFrame = sqlContext.createDataFrame(javaBinRdd,Byte.class); Please guide I am new to Spark. I have my own custom format which is binary format and I was thinking if I can convert my custom format into DataFrame using binary operations then I dont need to create my own custom Hadoop format am I on right track? Will reading binary data into DataFrame scale? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-DataFrame-from-a-binary-file-tp24179.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to create DataFrame from a binary file?
You can create your own data schema (StructType in spark), and use following method to create data frame with your own data schema: sqlContext.createDataFrame(yourRDD, structType); I wrote a post on how to do it. You can also get the sample code there: Light-Weight Self-Service Data Query through Spark SQL: https://www.linkedin.com/pulse/light-weight-self-service-data-query-through-spark-sql-bo-yang Take a look and feel free to let me know for any question. Best, Bo On Sat, Aug 8, 2015 at 1:42 PM, unk1102 umesh.ka...@gmail.com wrote: Hi how do we create DataFrame from a binary file stored in HDFS? I was thinking to use JavaPairRDDString,PortableDataStream pairRdd = javaSparkContext.binaryFiles(/hdfs/path/to/binfile); JavaRDDPortableDataStream javardd = pairRdd.values(); I can see that PortableDataStream has method called toArray which can convert into byte array I was thinking if I have JavaRDDbyte[] can I call the following and get DataFrame DataFrame binDataFrame = sqlContext.createDataFrame(javaBinRdd,Byte.class); Please guide I am new to Spark. I have my own custom format which is binary format and I was thinking if I can convert my custom format into DataFrame using binary operations then I dont need to create my own custom Hadoop format am I on right track? Will reading binary data into DataFrame scale? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-DataFrame-from-a-binary-file-tp24179.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to create DataFrame from a binary file?
Hi Bo I know how to create a DataFrame my question is how to create a DataFrame for binary files and in your blog it is raw text json files please read my question properly thanks. On Sun, Aug 9, 2015 at 11:21 PM, bo yang bobyan...@gmail.com wrote: You can create your own data schema (StructType in spark), and use following method to create data frame with your own data schema: sqlContext.createDataFrame(yourRDD, structType); I wrote a post on how to do it. You can also get the sample code there: Light-Weight Self-Service Data Query through Spark SQL: https://www.linkedin.com/pulse/light-weight-self-service-data-query-through-spark-sql-bo-yang Take a look and feel free to let me know for any question. Best, Bo On Sat, Aug 8, 2015 at 1:42 PM, unk1102 umesh.ka...@gmail.com wrote: Hi how do we create DataFrame from a binary file stored in HDFS? I was thinking to use JavaPairRDDString,PortableDataStream pairRdd = javaSparkContext.binaryFiles(/hdfs/path/to/binfile); JavaRDDPortableDataStream javardd = pairRdd.values(); I can see that PortableDataStream has method called toArray which can convert into byte array I was thinking if I have JavaRDDbyte[] can I call the following and get DataFrame DataFrame binDataFrame = sqlContext.createDataFrame(javaBinRdd,Byte.class); Please guide I am new to Spark. I have my own custom format which is binary format and I was thinking if I can convert my custom format into DataFrame using binary operations then I dont need to create my own custom Hadoop format am I on right track? Will reading binary data into DataFrame scale? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-DataFrame-from-a-binary-file-tp24179.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to create DataFrame from a binary file?
Well, my post uses raw text json file to show how to create data frame with a custom data schema. The key idea is to show the flexibility to deal with any format of data by using your own schema. Sorry if I did not make you fully understand. Anyway, let us know once you figure out your problem. On Sun, Aug 9, 2015 at 11:10 AM, Umesh Kacha umesh.ka...@gmail.com wrote: Hi Bo I know how to create a DataFrame my question is how to create a DataFrame for binary files and in your blog it is raw text json files please read my question properly thanks. On Sun, Aug 9, 2015 at 11:21 PM, bo yang bobyan...@gmail.com wrote: You can create your own data schema (StructType in spark), and use following method to create data frame with your own data schema: sqlContext.createDataFrame(yourRDD, structType); I wrote a post on how to do it. You can also get the sample code there: Light-Weight Self-Service Data Query through Spark SQL: https://www.linkedin.com/pulse/light-weight-self-service-data-query-through-spark-sql-bo-yang Take a look and feel free to let me know for any question. Best, Bo On Sat, Aug 8, 2015 at 1:42 PM, unk1102 umesh.ka...@gmail.com wrote: Hi how do we create DataFrame from a binary file stored in HDFS? I was thinking to use JavaPairRDDString,PortableDataStream pairRdd = javaSparkContext.binaryFiles(/hdfs/path/to/binfile); JavaRDDPortableDataStream javardd = pairRdd.values(); I can see that PortableDataStream has method called toArray which can convert into byte array I was thinking if I have JavaRDDbyte[] can I call the following and get DataFrame DataFrame binDataFrame = sqlContext.createDataFrame(javaBinRdd,Byte.class); Please guide I am new to Spark. I have my own custom format which is binary format and I was thinking if I can convert my custom format into DataFrame using binary operations then I dont need to create my own custom Hadoop format am I on right track? Will reading binary data into DataFrame scale? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-DataFrame-from-a-binary-file-tp24179.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org