Re: Sharing Spark RDDs with Ignite

2016-02-12 Thread Andrey Gura
Dmitry,

I repeated your test. On my laptop it took about 2300 ms.

Having in mind that RDD is lazy by nature I suggested that DataFrame is
lazy too. So I add df.rdd().count() call in the code before RDD caching in
order to measure execution time and got about 670 ms.
After it igniteRDD.saveValues(df.rdd()) call takes about 1500 ms.

For more accurate results I measured this operations in a loop and got
about 700 ms for RDD caching on warmed up JVM.

I created pull request for clarity:
https://github.com/erasmas/ignite-playground/pull/1

On Thu, Feb 11, 2016 at 3:20 PM, Dmitriy Morozov <int.2...@gmail.com> wrote:

> Hi Valentin,
>
> Sorry, I realize I didn't get it right. I'm using IgniteRDD to save RDD
> values now and IgniteCache to cache StructType.
> I'm using a ~1mb Parquet file for testing which has ~75K rows. I noticed
> that saving IgniteRDD is expensive, it takes about 4 seconds on my laptop.
>  I tried both client and server mode for IgniteContext but still couldn't
> make it faster.
>
> Here's the code
> <https://github.com/erasmas/ignite-playground/blob/master/src/main/java/ignite/CachedRddExample.java>
> that I tried. I'd appreciate if somebody could give a hint on how to make
> it faster.
>
> Thanks!
>
> On 10 February 2016 at 21:55, vkulichenko <valentin.kuliche...@gmail.com>
> wrote:
>
>> Hi Dmitry,
>>
>> What are you trying to achieve by putting the RDD into the cache as a
>> single
>> entry? If you want to save RDD data into the Ignite cache, it's better to
>> create IgniteRDD and use its savePairs() or saveValues() methods. See [1]
>> for details.
>>
>> [1]
>>
>> https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-saving-values-to-ignite
>>
>> -Val
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-ignite-users.70518.x6.nabble.com/Sharing-Spark-RDDs-with-Ignite-tp2805p2941.html
>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>
>
>
>
> --
> Kind regards,
> Dima
>



-- 
Andrey Gura
GridGain Systems, Inc.
www.gridgain.com


Re: Sharing Spark RDDs with Ignite

2016-02-12 Thread Dmitriy Morozov
Thanks Andrey!

It totally makes sense. I should have done a more accurate test. Appreciate
your help!

On 12 February 2016 at 17:31, Andrey Gura <ag...@gridgain.com> wrote:

> Dmitry,
>
> I repeated your test. On my laptop it took about 2300 ms.
>
> Having in mind that RDD is lazy by nature I suggested that DataFrame is
> lazy too. So I add df.rdd().count() call in the code before RDD caching in
> order to measure execution time and got about 670 ms.
> After it igniteRDD.saveValues(df.rdd()) call takes about 1500 ms.
>
> For more accurate results I measured this operations in a loop and got
> about 700 ms for RDD caching on warmed up JVM.
>
> I created pull request for clarity:
> https://github.com/erasmas/ignite-playground/pull/1
>
> On Thu, Feb 11, 2016 at 3:20 PM, Dmitriy Morozov <int.2...@gmail.com>
> wrote:
>
>> Hi Valentin,
>>
>> Sorry, I realize I didn't get it right. I'm using IgniteRDD to save RDD
>> values now and IgniteCache to cache StructType.
>> I'm using a ~1mb Parquet file for testing which has ~75K rows. I noticed
>> that saving IgniteRDD is expensive, it takes about 4 seconds on my laptop.
>>  I tried both client and server mode for IgniteContext but still couldn't
>> make it faster.
>>
>> Here's the code
>> <https://github.com/erasmas/ignite-playground/blob/master/src/main/java/ignite/CachedRddExample.java>
>> that I tried. I'd appreciate if somebody could give a hint on how to make
>> it faster.
>>
>> Thanks!
>>
>> On 10 February 2016 at 21:55, vkulichenko <valentin.kuliche...@gmail.com>
>> wrote:
>>
>>> Hi Dmitry,
>>>
>>> What are you trying to achieve by putting the RDD into the cache as a
>>> single
>>> entry? If you want to save RDD data into the Ignite cache, it's better to
>>> create IgniteRDD and use its savePairs() or saveValues() methods. See [1]
>>> for details.
>>>
>>> [1]
>>>
>>> https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-saving-values-to-ignite
>>>
>>> -Val
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-ignite-users.70518.x6.nabble.com/Sharing-Spark-RDDs-with-Ignite-tp2805p2941.html
>>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> Kind regards,
>> Dima
>>
>
>
>
> --
> Andrey Gura
> GridGain Systems, Inc.
> www.gridgain.com
>



-- 
Kind regards,
Dima


Re: Sharing Spark RDDs with Ignite

2016-02-11 Thread Dmitriy Morozov
Hi Valentin,

Sorry, I realize I didn't get it right. I'm using IgniteRDD to save RDD
values now and IgniteCache to cache StructType.
I'm using a ~1mb Parquet file for testing which has ~75K rows. I noticed
that saving IgniteRDD is expensive, it takes about 4 seconds on my laptop.
 I tried both client and server mode for IgniteContext but still couldn't
make it faster.

Here's the code
<https://github.com/erasmas/ignite-playground/blob/master/src/main/java/ignite/CachedRddExample.java>
that I tried. I'd appreciate if somebody could give a hint on how to make
it faster.

Thanks!

On 10 February 2016 at 21:55, vkulichenko <valentin.kuliche...@gmail.com>
wrote:

> Hi Dmitry,
>
> What are you trying to achieve by putting the RDD into the cache as a
> single
> entry? If you want to save RDD data into the Ignite cache, it's better to
> create IgniteRDD and use its savePairs() or saveValues() methods. See [1]
> for details.
>
> [1]
>
> https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-saving-values-to-ignite
>
> -Val
>
>
>
> --
> View this message in context:
> http://apache-ignite-users.70518.x6.nabble.com/Sharing-Spark-RDDs-with-Ignite-tp2805p2941.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Kind regards,
Dima


Re: Sharing Spark RDDs with Ignite

2016-02-10 Thread vkulichenko
Hi Dmitry,

What are you trying to achieve by putting the RDD into the cache as a single
entry? If you want to save RDD data into the Ignite cache, it's better to
create IgniteRDD and use its savePairs() or saveValues() methods. See [1]
for details.

[1]
https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-saving-values-to-ignite

-Val



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Sharing-Spark-RDDs-with-Ignite-tp2805p2941.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Sharing Spark RDDs with Ignite

2016-02-02 Thread vkulichenko
Hi Dmitry,

Ignite provides better data distribution and better performance if there are
more partitions than nodes in topology. 1024 is the default number of
partitions, but you can change it by providing custom affinity function
configuration:

CacheConfiguration cfg = new CacheConfiguration("hello-world-cache").
setAffinity(new RendezvousAffinityFunction(false, 32)); // 32 partitions
instead of 1024.
final IgniteRDD igniteRDD = igniteContext.fromCache(cfg);

You can try this and see if it gets better.

Actually, I think that methods like isEmpty should be overridden in
IgniteRDD to use native IgniteCache API, it will be much faster. I created a
ticket for this task [1], feel free to provide your comments there. Are
there any other methods that should be optimized?

[1] https://issues.apache.org/jira/browse/IGNITE-2538

-Val



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Sharing-Spark-RDDs-with-Ignite-tp2805p2808.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.