[jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite

Nikolay Izhikov (JIRA) Wed, 27 Dec 2017 20:46:33 -0800

    [ 
https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305057#comment-16305057
 ]


Nikolay Izhikov commented on IGNITE-3084:
-----------------------------------------

{quote}
I don't agree on OPTION_GRID. In catalog implementation you already have Ignite 
instance, so I don't see why would you need to get it by name again. There 
should be another solution.
And in general, implementation detail should not drive public API. OPTION_GRID 
is very confusing and I still think it should be removed, at least in this 
iteration.
{quote}

I made {{OPTION_GRID}} private in IgniteExternalCatalog so it not appear in 
public API.
Is it OK?

Please, see some details about spark relation resolving:

1. When spark execute SQL query it resolves catalog relation with 
{{FindDataSourceTable}}. Relation resolved with string options provided from 
Catalog:

https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L230

2. DataSource constructs relation throw 
RelationProvider(IgniteRelationProvider) with these string options:

https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L306

So we have to pass some string ID from a catalog to IgniteRelationProvider to 
get existing Ignite instance.
As far as I can understand - Ignite.name is well for this purpose.
What have I missed?

Another option, that I can see - make some internal HashMap[String, Ignite] and 
use `UUID.randomUUID()` as Ignite identifier.
What do you think?


> Spark Data Frames Support in Apache Ignite
> ------------------------------------------
>
>                 Key: IGNITE-3084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3084
>             Project: Ignite
>          Issue Type: Task
>          Components: spark
>    Affects Versions: 1.5.0.final
>            Reporter: Vladimir Ozerov
>            Assignee: Nikolay Izhikov
>            Priority: Critical
>              Labels: bigdata, important
>             Fix For: 2.4
>
>
> Apache Spark already benefits from integration with Apache Ignite. The latter 
> provides shared RDDs, an implementation of Spark RDD, that help Spark to 
> share a state between Spark workers and execute SQL queries much faster. The 
> next logical step is to enable support for modern Spark Data Frames API in a 
> similar way.
> As a contributor, you will be fully in charge of the integration of Spark 
> Data Frame API and Apache Ignite.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite

Reply via email to