://hortonworks.com/blog/bringing-orc-support-into-apache-spark/
Thanks,
Ewan
-Original Message-
From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: 19 October 2015 06:32
To: Gavin Yue <yue.yuany...@gmail.com>
Cc: user <user@spark.apache.org>
Subject: Re: Should I
t-into-apache-spark/
>
>Thanks,
>Ewan
>
>-Original Message-
>From: Jörn Franke [mailto:jornfra...@gmail.com]
>Sent: 19 October 2015 06:32
>To: Gavin Yue <yue.yuany...@gmail.com>
>Cc: user <user@spark.apache.org>
>Subject: Re: Should I convert json into parqu
Good Formats are Parquet or ORC. Both can be useful with compression, such as
Snappy. They are much faster than JSON. however, the table structure is up to
you and depends on your use case.
> On 17 Oct 2015, at 23:07, Gavin Yue wrote:
>
> I have json files which
I have json files which contains timestamped events. Each event associate
with a user id.
Now I want to group by user id. So converts from
Event1 -> UserIDA;
Event2 -> UserIDA;
Event3 -> UserIDB;
To intermediate storage.
UserIDA -> (Event1, Event2...)
UserIDB-> (Event3...)
Then I will label