Using Spark for high concurrent load tasks

2015-12-28 Thread Aliaksei Tsyvunchyk
Hello Spark community, We have a project where we want to use Spark as computation engine to perform calculations and return result via REST services. Working with Spark we have learned how to do things to make it work faster and finally optimize our code to produce results in acceptable time

DataFrame.toJavaRDD cause fetching data to driver, is it expected ?

2015-11-04 Thread Aliaksei Tsyvunchyk
Hello folks, Recently I have noticed unexpectedly big network traffic between Driver Program and Worker node. During debugging I have figured out that it is caused by following block of code —— Java ——— — DataFrame etpvRecords = context.sql(" SOME SQL query here"); Mapper m = new

Re: DataFrame.toJavaRDD cause fetching data to driver, is it expected ?

2015-11-04 Thread Aliaksei Tsyvunchyk
program ? > On Nov 4, 2015, at 12:34 PM, Romi Kuntsman <r...@totango.com> wrote: > > I noticed that toJavaRDD causes a computation on the DataFrame, so is it > considered an action, even though logically it's a transformation? > > On Nov 4, 2015 6:51 PM, "Aliaksei Tsyvun

Re: DataFrame.toJavaRDD cause fetching data to driver, is it expected ?

2015-11-04 Thread Aliaksei Tsyvunchyk
ons you have on the DF and RDD... > > On Nov 4, 2015 7:54 PM, "Aliaksei Tsyvunchyk" <atsyvunc...@exadel.com > <mailto:atsyvunc...@exadel.com>> wrote: > Hello Romi, > > Do you mean that in my particular case I’m causing computation on dataFrame > or it is regu

Whether Spark is appropriate for our use case.

2015-10-20 Thread Aliaksei Tsyvunchyk
Hello all community members, I need opinion of people who was using Spark before and can share there experience to help me select technical approach. I have a project in Proof Of Concept phase, where we are evaluating possibility of Spark usage for our use case. Here is brief task description.