Nikolay, Let's estimate the strategy implementation work, and then decide weather to merge the code in current state or not. If anything is unclear, please start a separate discussion.
-Val On Fri, Nov 24, 2017 at 5:42 AM, Николай Ижиков <nizhikov....@gmail.com> wrote: > Hello, Val, Denis. > > > Personally, I think that we should release the integration only after > the strategy is fully supported. > > I see two major reason to propose merge of DataFrame API implementation > without custom strategy: > > 1. My PR is relatively huge, already. From my experience of interaction > with Ignite community - the bigger PR becomes, the more time of commiters > required to review PR. > So, I propose to move smaller, but complete steps here. > > 2. It is not clear for me what exactly includes "custom strategy and > optimization". > Seems, that additional discussion required. > I think, I can put my thoughts on the paper and start discussion right > after basic implementation is done. > > > Custom strategy implementation is actually very important for this > integration. > > Understand and fully agreed. > I'm ready to continue work in that area. > > 23.11.2017 02:15, Denis Magda пишет: > > Val, Nikolay, >> >> Personally, I think that we should release the integration only after the >> strategy is fully supported. Without the strategy we don’t really leverage >> from Ignite’s SQL engine and introduce redundant data movement between >> Ignite and Spark nodes. >> >> How big is the effort to support the strategy in terms of the amount of >> work left? 40%, 60%, 80%? >> >> — >> Denis >> >> On Nov 22, 2017, at 2:57 PM, Valentin Kulichenko < >>> valentin.kuliche...@gmail.com> wrote: >>> >>> Nikolay, >>> >>> Custom strategy implementation is actually very important for this >>> integration. Basically, it will allow to create a SQL query for Ignite >>> and >>> execute it directly on the cluster. Your current implementation only >>> adds a >>> new DataSource which means that Spark will fetch data in its own memory >>> first, and then do most of the work (like joins for example). Does it >>> make >>> sense to you? Can you please take a look at this and provide your >>> thoughts >>> on how much development is implied there? >>> >>> Current code looks good to me though and I'm OK if the strategy is >>> implemented as a next step in a scope of separate ticket. I will do final >>> review early next week and will merge it if everything is OK. >>> >>> -Val >>> >>> On Thu, Oct 19, 2017 at 7:29 AM, Николай Ижиков <nizhikov....@gmail.com> >>> wrote: >>> >>> Hello. >>>> >>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two Catalog >>>>> >>>> implementations and what is the difference? >>>> >>>> IgniteCatalog removed. >>>> >>>> 5. I don't like that IgniteStrategy and IgniteOptimization have to be >>>>> >>>> set manually on SQLContext each time it's created....Is there any way to >>>> automate this and improve usability? >>>> >>>> IgniteStrategy and IgniteOptimization are removed as it empty now. >>>> >>>> Actually, I think it makes sense to create a builder similar to >>>>> >>>> SparkSession.builder()... >>>> >>>> IgniteBuilder added. >>>> Syntax looks like: >>>> >>>> ``` >>>> val igniteSession = IgniteSparkSession.builder() >>>> .appName("Spark Ignite catalog example") >>>> .master("local") >>>> .config("spark.executor.instances", "2") >>>> .igniteConfig(CONFIG) >>>> .getOrCreate() >>>> >>>> igniteSession.catalog.listTables().show() >>>> ``` >>>> >>>> Please, see updated PR - https://github.com/apache/ignite/pull/2742 >>>> >>>> 2017-10-18 20:02 GMT+03:00 Николай Ижиков <nizhikov....@gmail.com>: >>>> >>>> Hello, Valentin. >>>>> >>>>> My answers is below. >>>>> Dmitry, do we need to move discussion to Jira? >>>>> >>>>> 1. Why do we have org.apache.spark.sql.ignite package in our codebase? >>>>>> >>>>> >>>>> As I mentioned earlier, to implement and override Spark Catalog one >>>>> have >>>>> to use internal(private) Spark API. >>>>> So I have to use package `org.spark.sql.***` to have access to private >>>>> class and variables. >>>>> >>>>> For example, SharedState class that stores link to ExternalCatalog >>>>> declared as `private[sql] class SharedState` - i.e. package private. >>>>> >>>>> Can these classes reside under org.apache.ignite.spark instead? >>>>>> >>>>> >>>>> No, as long as we want to have our own implementation of >>>>> ExternalCatalog. >>>>> >>>>> 2. IgniteRelationProvider contains multiple constants which I guess are >>>>>> >>>>> some king of config options. Can you describe the purpose of each of >>>>> them? >>>>> >>>>> I extend comments for this options. >>>>> Please, see my commit [1] or PR HEAD: >>>>> >>>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two Catalog >>>>>> >>>>> implementations and what is the difference? >>>>> >>>>> Good catch, thank you! >>>>> After additional research I founded that only IgniteExternalCatalog >>>>> required. >>>>> I will update PR with IgniteCatalog remove in a few days. >>>>> >>>>> 4. IgniteStrategy and IgniteOptimization are currently no-op. What are >>>>>> >>>>> our plans on implementing them? Also, what exactly is planned in >>>>> IgniteOptimization and what is its purpose? >>>>> >>>>> Actually, this is very good question :) >>>>> And I need advice from experienced community members here: >>>>> >>>>> `IgniteOptimization` purpose is to modify query plan created by Spark. >>>>> Currently, we have one optimization described in IGNITE-3084 [2] by >>>>> you, >>>>> Valentin :) : >>>>> >>>>> “If there are non-Ignite relations in the plan, we should fall back to >>>>> native Spark strategies“ >>>>> >>>>> I think we can go little further and reduce join of two Ignite backed >>>>> Data Frames into single Ignite SQL query. Currently, this feature is >>>>> unimplemented. >>>>> >>>>> *Do we need it now? Or we can postpone it and concentrates on basic >>>>> Data >>>>> Frame and Catalog implementation?* >>>>> >>>>> `Strategy` purpose, as you correctly mentioned in [2], is transform >>>>> LogicalPlan into physical operators. >>>>> I don’t have ideas how to use this opportunity. So I think we don’t >>>>> need >>>>> IgniteStrategy. >>>>> >>>>> Can you or anyone else suggest some optimization strategy to speed up >>>>> SQL >>>>> query execution? >>>>> >>>>> 5. I don't like that IgniteStrategy and IgniteOptimization have to be >>>>>> >>>>> set manually on SQLContext each time it's created....Is there any way >>>>> to >>>>> automate this and improve usability? >>>>> >>>>> These classes added to `extraOptimizations` when one using >>>>> IgniteSparkSession. >>>>> As far as I know, there is no way to automatically add these classes to >>>>> regular SparkSession. >>>>> >>>>> 6. What is the purpose of IgniteSparkSession? I see it's used in >>>>>> >>>>> IgniteCatalogExample but not in IgniteDataFrameExample, which is >>>>> Confusing. >>>>> >>>>> DataFrame API is *public* Spark API. So anyone can provide >>>>> implementation >>>>> and plug it into Spark. That’s why IgniteDataFrameExample doesn’t need >>>>> any >>>>> Ignite specific session. >>>>> >>>>> Catalog API is *internal* Spark API. There is no way to plug custom >>>>> catalog implementation into Spark [3]. So we have to use >>>>> `IgniteSparkSession` that extends regular SparkSession and overrides >>>>> links >>>>> to `ExternalCatalog`. >>>>> >>>>> 7. To create IgniteSparkSession we first create IgniteContext. Is it >>>>>> >>>>> really needed? It looks like we can directly provide the configuration >>>>> file; if IgniteSparkSession really requires IgniteContext, it can >>>>> create it >>>>> by itself under the hood. >>>>> >>>>> Actually, IgniteContext is base class for Ignite <-> Spark integration >>>>> for now. So I tried to reuse it here. I like the idea to remove >>>>> explicit >>>>> usage of IgniteContext. >>>>> Will implement it in a few days. >>>>> >>>>> Actually, I think it makes sense to create a builder similar to >>>>>> >>>>> SparkSession.builder()... >>>>> >>>>> Great idea! I will implement such builder in a few days. >>>>> >>>>> 9. Do I understand correctly that IgniteCacheRelation is for the case >>>>>> >>>>> when we don't have SQL configured on Ignite side? >>>>> >>>>> Yes, IgniteCacheRelation is Data Frame implementation for a key-value >>>>> cache. >>>>> >>>>> I thought we decided not to support this, no? Or this is something >>>>>> else? >>>>>> >>>>> >>>>> My understanding is following: >>>>> >>>>> 1. We can’t support automatic resolving key-value caches in >>>>> *ExternalCatalog*. Because there is no way to reliably detect key and >>>>> value >>>>> classes. >>>>> >>>>> 2. We can support key-value caches in regular Data Frame >>>>> implementation. >>>>> Because we can require user to provide key and value classes >>>>> explicitly. >>>>> >>>>> 8. Can you clarify the query syntax in IgniteDataFrameExample#nativeS >>>>>> >>>>> parkSqlFromCacheExample2? >>>>> >>>>> Key-value cache: >>>>> >>>>> key - java.lang.Long, >>>>> value - case class Person(name: String, birthDate: java.util.Date) >>>>> >>>>> Schema of data frame for cache is: >>>>> >>>>> key - long >>>>> value.name - string >>>>> value.birthDate - date >>>>> >>>>> So we can select data from data from cache: >>>>> >>>>> SELECT >>>>> key, `value.name`, `value.birthDate` >>>>> FROM >>>>> testCache >>>>> WHERE key >= 2 AND `value.name` like '%0' >>>>> >>>>> [1] https://github.com/apache/ignite/pull/2742/commits/faf3ed6fe >>>>> bf417bc59b0519156fd4d09114c8da7 >>>>> [2] https://issues.apache.org/jira/browse/IGNITE-3084?focusedCom >>>>> mentId=15794210&page=com.atlassian.jira.plugin.system.issuet >>>>> abpanels:comment-tabpanel#comment-15794210 >>>>> [3] https://issues.apache.org/jira/browse/SPARK-17767?focusedCom >>>>> mentId=15543733&page=com.atlassian.jira.plugin.system.issuet >>>>> abpanels:comment-tabpanel#comment-15543733 >>>>> >>>>> >>>>> 18.10.2017 04:39, Dmitriy Setrakyan пишет: >>>>> >>>>> Val, thanks for the review. Can I ask you to add the same comments to >>>>> the >>>>> >>>>>> ticket? >>>>>> >>>>>> On Tue, Oct 17, 2017 at 3:20 PM, Valentin Kulichenko < >>>>>> valentin.kuliche...@gmail.com> wrote: >>>>>> >>>>>> Nikolay, Anton, >>>>>> >>>>>>> >>>>>>> I did a high level review of the code. First of all, impressive >>>>>>> results! >>>>>>> However, I have some questions/comments. >>>>>>> >>>>>>> 1. Why do we have org.apache.spark.sql.ignite package in our >>>>>>> codebase? >>>>>>> Can >>>>>>> these classes reside under org.apache.ignite.spark instead? >>>>>>> 2. IgniteRelationProvider contains multiple constants which I guess >>>>>>> are >>>>>>> some king of config options. Can you describe the purpose of each of >>>>>>> them? >>>>>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two >>>>>>> Catalog >>>>>>> implementations and what is the difference? >>>>>>> 4. IgniteStrategy and IgniteOptimization are currently no-op. What >>>>>>> are >>>>>>> our >>>>>>> plans on implementing them? Also, what exactly is planned in >>>>>>> IgniteOptimization and what is its purpose? >>>>>>> 5. I don't like that IgniteStrategy and IgniteOptimization have to be >>>>>>> set >>>>>>> manually on SQLContext each time it's created. This seems to be very >>>>>>> error >>>>>>> prone. Is there any way to automate this and improve usability? >>>>>>> 6. What is the purpose of IgniteSparkSession? I see it's used >>>>>>> in IgniteCatalogExample but not in IgniteDataFrameExample, which is >>>>>>> confusing. >>>>>>> 7. To create IgniteSparkSession we first create IgniteContext. Is it >>>>>>> really >>>>>>> needed? It looks like we can directly provide the configuration >>>>>>> file; if >>>>>>> IgniteSparkSession really requires IgniteContext, it can create it by >>>>>>> itself under the hood. Actually, I think it makes sense to create a >>>>>>> builder >>>>>>> similar to SparkSession.builder(), it would be good if our APIs here >>>>>>> are >>>>>>> consistent with Spark APIs. >>>>>>> 8. Can you clarify the query syntax >>>>>>> inIgniteDataFrameExample#nativeSparkSqlFromCacheExample2? >>>>>>> 9. Do I understand correctly that IgniteCacheRelation is for the case >>>>>>> when >>>>>>> we don't have SQL configured on Ignite side? I thought we decided >>>>>>> not to >>>>>>> support this, no? Or this is something else? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> -Val >>>>>>> >>>>>>> On Tue, Oct 17, 2017 at 4:40 AM, Anton Vinogradov < >>>>>>> avinogra...@gridgain.com> >>>>>>> wrote: >>>>>>> >>>>>>> Sounds awesome. >>>>>>> >>>>>>>> >>>>>>>> I'll try to review API & tests this week. >>>>>>>> >>>>>>>> Val, >>>>>>>> Your review still required :) >>>>>>>> >>>>>>>> On Tue, Oct 17, 2017 at 2:36 PM, Николай Ижиков < >>>>>>>> nizhikov....@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Yes >>>>>>>> >>>>>>>>> >>>>>>>>> 17 окт. 2017 г. 2:34 PM пользователь "Anton Vinogradov" < >>>>>>>>> avinogra...@gridgain.com> написал: >>>>>>>>> >>>>>>>>> Nikolay, >>>>>>>>> >>>>>>>>>> >>>>>>>>>> So, it will be able to start regular spark and ignite clusters >>>>>>>>>> and, >>>>>>>>>> >>>>>>>>>> using >>>>>>>>> >>>>>>>> >>>>>>>> peer classloading via spark-context, perform any DataFrame request, >>>>>>>>> >>>>>>>>>> correct? >>>>>>>>>> >>>>>>>>>> On Tue, Oct 17, 2017 at 2:25 PM, Николай Ижиков < >>>>>>>>>> >>>>>>>>>> nizhikov....@gmail.com> >>>>>>>>> >>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hello, Anton. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> An example you provide is a path to a master *local* file. >>>>>>>>>>> These libraries are added to the classpath for each remote node >>>>>>>>>>> >>>>>>>>>>> running >>>>>>>>>> >>>>>>>>> >>>>>>>> submitted job. >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Please, see documentation: >>>>>>>>>>> >>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/ >>>>>>>>>>> spark/SparkContext.html#addJar(java.lang.String) >>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/ >>>>>>>>>>> spark/SparkContext.html#addFile(java.lang.String) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2017-10-17 13:10 GMT+03:00 Anton Vinogradov < >>>>>>>>>>> >>>>>>>>>>> avinogra...@gridgain.com >>>>>>>>>> >>>>>>>>> >>>>>>>> : >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Nikolay, >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> With Data Frame API implementation there are no requirements to >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> have >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> any >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Ignite files on spark worker nodes. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> What do you mean? I see code like: >>>>>>>>>>>> >>>>>>>>>>>> spark.sparkContext.addJar(MAVEN_HOME + >>>>>>>>>>>> "/org/apache/ignite/ignite-core/2.3.0-SNAPSHOT/ignite- >>>>>>>>>>>> core-2.3.0-SNAPSHOT.jar") >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 16, 2017 at 5:22 PM, Николай Ижиков < >>>>>>>>>>>> >>>>>>>>>>>> nizhikov....@gmail.com> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hello, guys. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I have created example application to run Ignite Data Frame on >>>>>>>>>>>>> >>>>>>>>>>>>> standalone >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Spark cluster. >>>>>>>>>>>> >>>>>>>>>>>>> With Data Frame API implementation there are no requirements to >>>>>>>>>>>>> >>>>>>>>>>>>> have >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> any >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Ignite files on spark worker nodes. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I ran this application on the free dataset: ATP tennis match >>>>>>>>>>>>> >>>>>>>>>>>>> statistics. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> data - https://github.com/nizhikov/atp_matches >>>>>>>>>>>>> app - https://github.com/nizhikov/ignite-spark-df-example >>>>>>>>>>>>> >>>>>>>>>>>>> Valentin, do you have a chance to look at my changes? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2017-10-12 6:03 GMT+03:00 Valentin Kulichenko < >>>>>>>>>>>>> valentin.kuliche...@gmail.com >>>>>>>>>>>>> >>>>>>>>>>>>> : >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Hi Nikolay, >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sorry for delay on this, got a little swamped lately. I will >>>>>>>>>>>>>> >>>>>>>>>>>>>> do >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> my >>>>>>>> >>>>>>>>> >>>>>>>>> best >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> to >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> review the code this week. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Val >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Oct 9, 2017 at 11:48 AM, Николай Ижиков < >>>>>>>>>>>>>> >>>>>>>>>>>>>> nizhikov....@gmail.com> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hello, Valentin. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Did you have a chance to look at my changes? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Now I think I have done almost all required features. >>>>>>>>>>>>>>> I want to make some performance test to ensure my >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> implementation >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>> work >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> properly with a significant amount of data. >>>>>>>>>>>> >>>>>>>>>>>>> And I definitely need some feedback for my changes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2017-10-09 18:45 GMT+03:00 Николай Ижиков < >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>> : >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Hello, guys. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Which version of Spark do we want to use? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. Currently, Ignite depends on Spark 2.1.0. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> * Can be run on JDK 7. >>>>>>>>>>>>>>>> * Still supported: 2.1.2 will be released soon. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2. Latest Spark version is 2.2.0. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> * Can be run only on JDK 8+ >>>>>>>>>>>>>>>> * Released Jul 11, 2017. >>>>>>>>>>>>>>>> * Already supported by huge vendors(Amazon for >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> example). >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>> >>>>>>>> Note that in IGNITE-3084 I implement some internal Spark >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> API. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>> So It will take some effort to switch between Spark 2.1 and >>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> 2.2 >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>>>>>>>> 2017-09-27 2:20 GMT+03:00 Valentin Kulichenko < >>>>>>>>>>>>>>>> valentin.kuliche...@gmail.com>: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I will review in the next few days. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Val >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Sep 26, 2017 at 2:23 PM, Denis Magda < >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> dma...@apache.org >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello Nikolay, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This is good news. Finally this capability is coming to >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Ignite. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Val, Vladimir, could you do a preliminary review? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Answering on your questions. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1. Yardstick should be enough for performance >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> measurements. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>> As a >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Spark >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> user, I will be curious to know what’s the point of this >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> integration. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> Probably we need to compare Spark + Ignite and Spark + >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hive >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>> or >>>>>>>>> >>>>>>>>> Spark + >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> RDBMS cases. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2. If Spark community is reluctant let’s include the >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> module >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>> in >>>>>>>>> >>>>>>>>> ignite-spark integration. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>>> — >>>>>>>>>>>>>>>>>> Denis >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sep 25, 2017, at 11:14 AM, Николай Ижиков < >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> nizhikov....@gmail.com> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hello, guys. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Currently, I’m working on integration between Spark >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> Ignite >>>>>>>> >>>>>>>>> >>>>>>>>>> [1]. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> For now, I implement following: >>>>>>>>>>>>>>>>>>> * Ignite DataSource implementation( >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> IgniteRelationProvider) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> * DataFrame support for Ignite SQL table. >>>>>>>>>>> >>>>>>>>>>>> * IgniteCatalog implementation for a transparent >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> resolving >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>> of >>>>>>>>>>> >>>>>>>>>>> ignites >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> SQL tables. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Implementation of it can be found in PR [2] >>>>>>>>>>>>>>>>>>> It would be great if someone provides feedback for a >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> prototype. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I made some examples in PR so you can see how API >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> suppose >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> to >>>>>>>>> >>>>>>>>> be >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> used [3]. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [4]. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I need some advice. Can you help me? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 1. How should this PR be tested? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Of course, I need to provide some unit tests. But what >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> scalability >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>>>>> tests, etc. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Maybe we need some Yardstick benchmark or similar? >>>>>>>>>>>>>>>>>>> What are your thoughts? >>>>>>>>>>>>>>>>>>> Which scenarios should I consider in the first place? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2. Should we provide Spark Catalog implementation >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> Ignite >>>>>>>> >>>>>>>>> >>>>>>>>>>> codebase? >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> A current implementation of Spark Catalog based on >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> *internal >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> Spark >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> API*. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Spark community seems not interested in making Catalog >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> API >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> public >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> or >>>>>>>>>>>>> >>>>>>>>>>>>> including Ignite Catalog in Spark code base [5], [6]. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> *Should we include Spark internal API implementation >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> Ignite >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> code >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> base?* >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Or should we consider to include Catalog >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> implementation >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>> in >>>>>>>> >>>>>>>> some >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> external >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>>> module? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> That will be created and released outside Ignite?(we >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> still >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> can >>>>>>>>> >>>>>>>>>> >>>>>>>>>> support >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> develop it inside Ignite community). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-3084 >>>>>>>>>>>>>>>>>>> [2] https://github.com/apache/ignite/pull/2742 >>>>>>>>>>>>>>>>>>> [3] https://github.com/apache/ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ignite/pull/2742/files#diff- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> f4ff509cef3018e221394474775e0905 >>>>>>>>>> >>>>>>>>>>> [4] https://github.com/apache/ >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ignite/pull/2742/files#diff- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>> f2b670497d81e780dfd5098c5dd8a89c >>>>>>>>>> >>>>>>>>>>> [5] http://apache-spark-developers-list.1001551.n3. >>>>>>>>>>>>>>>>>>> nabble.com/Spark-Core-Custom- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Catalog-Integration-between- >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>> Apache-Ignite-and-Apache-Spark-td22452.html >>>>>>>>> >>>>>>>>>> [6] https://issues.apache.org/jira/browse/SPARK-17767 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> Nikolay Izhikov >>>>>>>>>>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Nikolay Izhikov >>>>>>>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Nikolay Izhikov >>>>>>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Nikolay Izhikov >>>>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Nikolay Izhikov >>>>>>>>>>> nizhikov....@gmail.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>>> -- >>>> Nikolay Izhikov >>>> nizhikov....@gmail.com >>>> >>>> >>