subject:"Using Java functions in Spark"

Re: Using Java functions in Spark

2014-06-07 Thread Oleg Proudnikov

Increasing number of partitions on data file solved the problem.


On 6 June 2014 18:46, Oleg Proudnikov  wrote:

> Additional observation - the map and mapValues are pipelined and executed
> - as expected - in pairs. This means that there is a simple sequence of
> steps - first read from Cassandra and then processing for each value of K.
> This is the exact behaviour of a normal Java loop with these two steps
> inside. I understand that this eliminates batch loading first and pile up
> of massive text arrays.
>
> Also the keys are relatively evenly distributed across Executors.
>
> The question is - why is this still so slow? I would appreciate any
> suggestions on where to focus my search.
>
> Thank you,
> Oleg
>
>
>
> On 6 June 2014 16:24, Oleg Proudnikov  wrote:
>
>> Hi All,
>>
>> I am passing Java static methods into RDD transformations map and
>> mapValues. The first map is from a simple string K into a (K,V) where V is
>> a Java ArrayList of large text strings, 50K each, read from Cassandra.
>> MapValues does processing of these text blocks into very small ArrayLists.
>>
>> The code runs quite slow compared to running it in parallel on the same
>> servers from plain Java.
>>
>> I gave the same heap to Executors and Java. Does java run slower under
>> Spark or do I suffer from excess heap pressure or am I missing something?
>>
>> Thank you for any insight,
>> Oleg
>>
>>
>
>
> --
> Kind regards,
>
> Oleg
>
>


-- 
Kind regards,

Oleg

Re: Using Java functions in Spark

2014-06-06 Thread Oleg Proudnikov

Additional observation - the map and mapValues are pipelined and executed -
as expected - in pairs. This means that there is a simple sequence of steps
- first read from Cassandra and then processing for each value of K. This
is the exact behaviour of a normal Java loop with these two steps inside. I
understand that this eliminates batch loading first and pile up of massive
text arrays.

Also the keys are relatively evenly distributed across Executors.

The question is - why is this still so slow? I would appreciate any
suggestions on where to focus my search.

Thank you,
Oleg



On 6 June 2014 16:24, Oleg Proudnikov  wrote:

> Hi All,
>
> I am passing Java static methods into RDD transformations map and
> mapValues. The first map is from a simple string K into a (K,V) where V is
> a Java ArrayList of large text strings, 50K each, read from Cassandra.
> MapValues does processing of these text blocks into very small ArrayLists.
>
> The code runs quite slow compared to running it in parallel on the same
> servers from plain Java.
>
> I gave the same heap to Executors and Java. Does java run slower under
> Spark or do I suffer from excess heap pressure or am I missing something?
>
> Thank you for any insight,
> Oleg
>
>


-- 
Kind regards,

Oleg

Using Java functions in Spark

2014-06-06 Thread Oleg Proudnikov

Hi All,

I am passing Java static methods into RDD transformations map and
mapValues. The first map is from a simple string K into a (K,V) where V is
a Java ArrayList of large text strings, 50K each, read from Cassandra.
MapValues does processing of these text blocks into very small ArrayLists.

The code runs quite slow compared to running it in parallel on the same
servers from plain Java.

I gave the same heap to Executors and Java. Does java run slower under
Spark or do I suffer from excess heap pressure or am I missing something?

Thank you for any insight,
Oleg

Re: Using Java functions in Spark

Re: Using Java functions in Spark

Using Java functions in Spark

3 matches

Site Navigation

Mail list logo

Footer information