Re: Spark Partitioning Strategy with Parquet

2016-12-30 Thread titli batali
Yeah, it works for me. Thanks On Fri, Nov 18, 2016 at 3:08 AM, ayan guha wrote: > Hi > > I think you can use map reduce paradigm here. Create a key using user ID > and date and record as a value. Then you can express your operation (do > something) part as a function. If

Re: Spark Partitioning Strategy with Parquet

2016-11-17 Thread ayan guha
Hi I think you can use map reduce paradigm here. Create a key using user ID and date and record as a value. Then you can express your operation (do something) part as a function. If the function meets certain criteria such as associative and cumulative like, say Add or multiplication, you can

Re: Spark Partitioning Strategy with Parquet

2016-11-17 Thread titli batali
That would help but again in a particular partitions i would need to a iterate over the customers having first n letters of user id in that partition. I want to get rid of nested iterations. Thanks On Thu, Nov 17, 2016 at 10:28 PM, Xiaomeng Wan wrote: > You can partitioned

Re: Spark Partitioning Strategy with Parquet

2016-11-17 Thread Xiaomeng Wan
You can partitioned on the first n letters of userid On 17 November 2016 at 08:25, titli batali wrote: > Hi, > > I have a use case, where we have 1000 csv files with a column user_Id, > having 8 million unique users. The data contains: userid,date,transaction, > where we

Fwd: Spark Partitioning Strategy with Parquet

2016-11-17 Thread titli batali
Hi, I have a use case, where we have 1000 csv files with a column user_Id, having 8 million unique users. The data contains: userid,date,transaction, where we run some queries. We have a case where we need to iterate for each transaction in a particular date for each user. There is three nesting