Re: Integration of Spark and Ignite. Prototype.

Николай Ижиков Mon, 16 Oct 2017 07:23:04 -0700

Hello, guys.

I have created example application to run Ignite Data Frame on standalone
Spark cluster.
With Data Frame API implementation there are no requirements to have any
Ignite files on spark worker nodes.


I ran this application on the free dataset: ATP tennis match statistics.

data - https://github.com/nizhikov/atp_matches
app - https://github.com/nizhikov/ignite-spark-df-example

Valentin, do you have a chance to look at my changes?


2017-10-12 6:03 GMT+03:00 Valentin Kulichenko <valentin.kuliche...@gmail.com
>:

> Hi Nikolay,
>
> Sorry for delay on this, got a little swamped lately. I will do my best to
> review the code this week.
>
> -Val
>
> On Mon, Oct 9, 2017 at 11:48 AM, Николай Ижиков <nizhikov....@gmail.com>
> wrote:
>
>> Hello, Valentin.
>>
>> Did you have a chance to look at my changes?
>>
>> Now I think I have done almost all required features.
>> I want to make some performance test to ensure my implementation work
>> properly with a significant amount of data.
>> And I definitely need some feedback for my changes.
>>
>>
>> 2017-10-09 18:45 GMT+03:00 Николай Ижиков <nizhikov....@gmail.com>:
>>
>>> Hello, guys.
>>>
>>> Which version of Spark do we want to use?
>>>
>>> 1. Currently, Ignite depends on Spark 2.1.0.
>>>
>>>     * Can be run on JDK 7.
>>>     * Still supported: 2.1.2 will be released soon.
>>>
>>> 2. Latest Spark version is 2.2.0.
>>>
>>>     * Can be run only on JDK 8+
>>>     * Released Jul 11, 2017.
>>>     * Already supported by huge vendors(Amazon for example).
>>>
>>> Note that in IGNITE-3084 I implement some internal Spark API.
>>> So It will take some effort to switch between Spark 2.1 and 2.2
>>>
>>>
>>> 2017-09-27 2:20 GMT+03:00 Valentin Kulichenko <
>>> valentin.kuliche...@gmail.com>:
>>>
>>>> I will review in the next few days.
>>>>
>>>> -Val
>>>>
>>>> On Tue, Sep 26, 2017 at 2:23 PM, Denis Magda <dma...@apache.org> wrote:
>>>>
>>>> > Hello Nikolay,
>>>> >
>>>> > This is good news. Finally this capability is coming to Ignite.
>>>> >
>>>> > Val, Vladimir, could you do a preliminary review?
>>>> >
>>>> > Answering on your questions.
>>>> >
>>>> > 1. Yardstick should be enough for performance measurements. As a Spark
>>>> > user, I will be curious to know what’s the point of this integration.
>>>> > Probably we need to compare Spark + Ignite and Spark + Hive or Spark +
>>>> > RDBMS cases.
>>>> >
>>>> > 2. If Spark community is reluctant let’s include the module in
>>>> > ignite-spark integration.
>>>> >
>>>> > —
>>>> > Denis
>>>> >
>>>> > > On Sep 25, 2017, at 11:14 AM, Николай Ижиков <
>>>> nizhikov....@gmail.com>
>>>> > wrote:
>>>> > >
>>>> > > Hello, guys.
>>>> > >
>>>> > > Currently, I’m working on integration between Spark and Ignite [1].
>>>> > >
>>>> > > For now, I implement following:
>>>> > >    * Ignite DataSource implementation(IgniteRelationProvider)
>>>> > >    * DataFrame support for Ignite SQL table.
>>>> > >    * IgniteCatalog implementation for a transparent resolving of
>>>> ignites
>>>> > > SQL tables.
>>>> > >
>>>> > > Implementation of it can be found in PR [2]
>>>> > > It would be great if someone provides feedback for a prototype.
>>>> > >
>>>> > > I made some examples in PR so you can see how API suppose to be
>>>> used [3].
>>>> > > [4].
>>>> > >
>>>> > > I need some advice. Can you help me?
>>>> > >
>>>> > > 1. How should this PR be tested?
>>>> > >
>>>> > > Of course, I need to provide some unit tests. But what about
>>>> scalability
>>>> > > tests, etc.
>>>> > > Maybe we need some Yardstick benchmark or similar?
>>>> > > What are your thoughts?
>>>> > > Which scenarios should I consider in the first place?
>>>> > >
>>>> > > 2. Should we provide Spark Catalog implementation inside Ignite
>>>> codebase?
>>>> > >
>>>> > > A current implementation of Spark Catalog based on *internal Spark
>>>> API*.
>>>> > > Spark community seems not interested in making Catalog API public or
>>>> > > including Ignite Catalog in Spark code base [5], [6].
>>>> > >
>>>> > > *Should we include Spark internal API implementation inside Ignite
>>>> code
>>>> > > base?*
>>>> > >
>>>> > > Or should we consider to include Catalog implementation in some
>>>> external
>>>> > > module?
>>>> > > That will be created and released outside Ignite?(we still can
>>>> support
>>>> > and
>>>> > > develop it inside Ignite community).
>>>> > >
>>>> > > [1] https://issues.apache.org/jira/browse/IGNITE-3084
>>>> > > [2] https://github.com/apache/ignite/pull/2742
>>>> > > [3] https://github.com/apache/ignite/pull/2742/files#diff-
>>>> > > f4ff509cef3018e221394474775e0905
>>>> > > [4] https://github.com/apache/ignite/pull/2742/files#diff-
>>>> > > f2b670497d81e780dfd5098c5dd8a89c
>>>> > > [5] http://apache-spark-developers-list.1001551.n3.
>>>> > > nabble.com/Spark-Core-Custom-Catalog-Integration-between-
>>>> > > Apache-Ignite-and-Apache-Spark-td22452.html
>>>> > > [6] https://issues.apache.org/jira/browse/SPARK-17767
>>>> > >
>>>> > > --
>>>> > > Nikolay Izhikov
>>>> > > nizhikov....@gmail.com
>>>> >
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Nikolay Izhikov
>>> nizhikov....@gmail.com
>>>
>>
>>
>>
>> --
>> Nikolay Izhikov
>> nizhikov....@gmail.com
>>
>
>


-- 
Nikolay Izhikov
nizhikov....@gmail.com

Re: Integration of Spark and Ignite. Prototype.

Reply via email to