Re: spark reshape hive table and save to parquet
Hi Divya, Thanks, it is exactly what I am looking for! Anton On Wed, Dec 14, 2016 at 6:01 PM, Divya Gehlotwrote: > you can use udfs to do it > http://stackoverflow.com/questions/31615657/how-to-add- > a-new-struct-column-to-a-dataframe > > Hope it will help. > > > Thanks, > Divya > > On 9 December 2016 at 00:53, Anton Kravchenko < > kravchenko.anto...@gmail.com> wrote: > >> Hello, >> >> I wonder if there is a way (preferably efficient) in Spark to reshape >> hive table and save it to parquet. >> >> Here is a minimal example, input hive table: >> col1 col2 col3 >> 1 2 3 >> 4 5 6 >> >> output parquet: >> col1 newcol2 >> 1 [2 3] >> 4 [5 6] >> >> p.s. The real input hive table has ~1000 columns. >> >> Thank you, >> Anton >> > >
Re: spark reshape hive table and save to parquet
you can use udfs to do it http://stackoverflow.com/questions/31615657/how-to-add-a-new-struct-column-to-a-dataframe Hope it will help. Thanks, Divya On 9 December 2016 at 00:53, Anton Kravchenkowrote: > Hello, > > I wonder if there is a way (preferably efficient) in Spark to reshape hive > table and save it to parquet. > > Here is a minimal example, input hive table: > col1 col2 col3 > 1 2 3 > 4 5 6 > > output parquet: > col1 newcol2 > 1 [2 3] > 4 [5 6] > > p.s. The real input hive table has ~1000 columns. > > Thank you, > Anton >
Re: spark reshape hive table and save to parquet
I am looking for something like: # prepare input data val input_schema = StructType(Seq( StructField("col1", IntegerType), StructField("col2", IntegerType), StructField("col3", IntegerType))) val input_data = spark.createDataFrame( sc.parallelize(Seq( Row(1, 2, 3), Row(4, 5, 6))), schema) # reshape input dataframe according to the output_schema and save to parquet val output_schema = StructType(Seq( StructField("col1", IntegerType), StructField("newcol2", StructType(Seq( StructField("col2", IntegerType), StructField("col3", IntegerType)) *val output_data = spark.createDataFrame(input_data, output_schema) # does not work* output_data.write.parquet("output_data.parquet")
Re: spark reshape hive table and save to parquet
https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html Anton Kravchenkoschrieb am Do., 8. Dez. 2016 um 17:53 Uhr: > Hello, > > I wonder if there is a way (preferably efficient) in Spark to reshape hive > table and save it to parquet. > > Here is a minimal example, input hive table: > col1 col2 col3 > 1 2 3 > 4 5 6 > > output parquet: > col1 newcol2 > 1 [2 3] > 4 [5 6] > > p.s. The real input hive table has ~1000 columns. > > Thank you, > Anton >
spark reshape hive table and save to parquet
Hello, I wonder if there is a way (preferably efficient) in Spark to reshape hive table and save it to parquet. Here is a minimal example, input hive table: col1 col2 col3 1 2 3 4 5 6 output parquet: col1 newcol2 1 [2 3] 4 [5 6] p.s. The real input hive table has ~1000 columns. Thank you, Anton