Re: Spark Partitioning Strategy with Parquet

2016-12-30 Thread titli batali
g) part as a function. If the function meets certain criteria such > as associative and cumulative like, say Add or multiplication, you can use > reducebykey, else you may use groupbykey. > > HTH > On 18 Nov 2016 06:45, "titli batali" <titlibat...@gmail.com> wrote: > >

Re: Spark Partitioning Strategy with Parquet

2016-11-17 Thread titli batali
rtitioned on the first n letters of userid > > On 17 November 2016 at 08:25, titli batali <titlibat...@gmail.com> wrote: > >> Hi, >> >> I have a use case, where we have 1000 csv files with a column user_Id, >> having 8 million unique users. The data contain

Fwd: Spark Partitioning Strategy with Parquet

2016-11-17 Thread titli batali
Hi, I have a use case, where we have 1000 csv files with a column user_Id, having 8 million unique users. The data contains: userid,date,transaction, where we run some queries. We have a case where we need to iterate for each transaction in a particular date for each user. There is three nesting