Re: spark reshape hive table and save to parquet

2016-12-15 Thread Anton Kravchenko
Hi Divya,

Thanks, it is exactly what I am looking for!

Anton

On Wed, Dec 14, 2016 at 6:01 PM, Divya Gehlot 
wrote:

> you can use udfs to do it
> http://stackoverflow.com/questions/31615657/how-to-add-
> a-new-struct-column-to-a-dataframe
>
> Hope it will help.
>
>
> Thanks,
> Divya
>
> On 9 December 2016 at 00:53, Anton Kravchenko <
> kravchenko.anto...@gmail.com> wrote:
>
>> Hello,
>>
>> I wonder if there is a way (preferably efficient) in Spark to reshape
>> hive table and save it to parquet.
>>
>> Here is a minimal example, input hive table:
>> col1 col2 col3
>> 1 2 3
>> 4 5 6
>>
>> output parquet:
>> col1 newcol2
>> 1 [2 3]
>> 4 [5 6]
>>
>> p.s. The real input hive table has ~1000 columns.
>>
>> Thank you,
>> Anton
>>
>
>


Re: spark reshape hive table and save to parquet

2016-12-14 Thread Divya Gehlot
you can use udfs to do it
http://stackoverflow.com/questions/31615657/how-to-add-a-new-struct-column-to-a-dataframe

Hope it will help.


Thanks,
Divya

On 9 December 2016 at 00:53, Anton Kravchenko 
wrote:

> Hello,
>
> I wonder if there is a way (preferably efficient) in Spark to reshape hive
> table and save it to parquet.
>
> Here is a minimal example, input hive table:
> col1 col2 col3
> 1 2 3
> 4 5 6
>
> output parquet:
> col1 newcol2
> 1 [2 3]
> 4 [5 6]
>
> p.s. The real input hive table has ~1000 columns.
>
> Thank you,
> Anton
>


Re: spark reshape hive table and save to parquet

2016-12-14 Thread Anton Kravchenko
I am looking for something like:

# prepare input data
val input_schema = StructType(Seq(
StructField("col1", IntegerType),
StructField("col2", IntegerType),
StructField("col3", IntegerType)))
val input_data = spark.createDataFrame(
sc.parallelize(Seq(
Row(1, 2, 3),
Row(4, 5, 6))),
schema)

# reshape input dataframe according to the output_schema and save to parquet
val output_schema = StructType(Seq(
StructField("col1", IntegerType),
StructField("newcol2", StructType(Seq(
StructField("col2", IntegerType),
StructField("col3", IntegerType))
*val output_data = spark.createDataFrame(input_data, output_schema) # does
not work*
output_data.write.parquet("output_data.parquet")


Re: spark reshape hive table and save to parquet

2016-12-08 Thread Georg Heiler
https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html

Anton Kravchenko  schrieb am Do., 8. Dez.
2016 um 17:53 Uhr:

> Hello,
>
> I wonder if there is a way (preferably efficient) in Spark to reshape hive
> table and save it to parquet.
>
> Here is a minimal example, input hive table:
> col1 col2 col3
> 1 2 3
> 4 5 6
>
> output parquet:
> col1 newcol2
> 1 [2 3]
> 4 [5 6]
>
> p.s. The real input hive table has ~1000 columns.
>
> Thank you,
> Anton
>


spark reshape hive table and save to parquet

2016-12-08 Thread Anton Kravchenko
Hello,

I wonder if there is a way (preferably efficient) in Spark to reshape hive
table and save it to parquet.

Here is a minimal example, input hive table:
col1 col2 col3
1 2 3
4 5 6

output parquet:
col1 newcol2
1 [2 3]
4 [5 6]

p.s. The real input hive table has ~1000 columns.

Thank you,
Anton