Re: Whether Spark is appropriate for our use case.

2015-11-07 Thread Igor Berman
1. if you have join by some specific field(e.g. user id or account-id or whatever) you may try to partition parquet file by this field and then join will be more efficient. 2. you need to see in spark metrics what is performance of particular join, how much partitions is there, what is shuffle

Re: Whether Spark is appropriate for our use case.

2015-10-21 Thread Adrian Tanase
Can you share your approximate data size? all should be valid use cases for spark, wondering if you are providing enough resources. Also - do you have some expectations in terms of performance? what does "slow down" mean? For this usecase I would personally favor parquet over DB, and

Whether Spark is appropriate for our use case.

2015-10-20 Thread Aliaksei Tsyvunchyk
Hello all community members, I need opinion of people who was using Spark before and can share there experience to help me select technical approach. I have a project in Proof Of Concept phase, where we are evaluating possibility of Spark usage for our use case. Here is brief task description.