subject:"Should I convert json into parquet\?"

RE: Should I convert json into parquet?

2015-10-19 Thread Ewan Leith

://hortonworks.com/blog/bringing-orc-support-into-apache-spark/ Thanks, Ewan -Original Message- From: Jörn Franke [mailto:jornfra...@gmail.com] Sent: 19 October 2015 06:32 To: Gavin Yue <yue.yuany...@gmail.com> Cc: user <user@spark.apache.org> Subject: Re: Should I

Re: Should I convert json into parquet?

2015-10-19 Thread Adrian Tanase

t-into-apache-spark/ > >Thanks, >Ewan > >-Original Message- >From: Jörn Franke [mailto:jornfra...@gmail.com] >Sent: 19 October 2015 06:32 >To: Gavin Yue <yue.yuany...@gmail.com> >Cc: user <user@spark.apache.org> >Subject: Re: Should I convert json into parqu

Re: Should I convert json into parquet?

2015-10-18 Thread Jörn Franke

Good Formats are Parquet or ORC. Both can be useful with compression, such as Snappy. They are much faster than JSON. however, the table structure is up to you and depends on your use case. > On 17 Oct 2015, at 23:07, Gavin Yue wrote: > > I have json files which

Should I convert json into parquet?

2015-10-17 Thread Gavin Yue

I have json files which contains timestamped events. Each event associate with a user id. Now I want to group by user id. So converts from Event1 -> UserIDA; Event2 -> UserIDA; Event3 -> UserIDB; To intermediate storage. UserIDA -> (Event1, Event2...) UserIDB-> (Event3...) Then I will label