Hello, Stuart. Do you need any assistance with this task from me or other community member?
В Вт, 04/09/2018 в 19:03 +0300, Nikolay Izhikov пишет: > Hello, Stuart. > > Sorry for the silence. > > I was swamped the last couple of days. > > I think you can go forward and implement suggested solution. > I'm -0 with it. > So no block from my side, but I'm still no happy with abstractions :). > > В Пн, 03/09/2018 в 09:35 +0100, Stuart Macdonald пишет: > > Nikolay, Val, it would be good if we could reach agreement here so that I > > can make the necessary modifications before the 2.7 cutoff. > > > > Nikolay - would you be comfortable if I went ahead and made database=schema? > > > > Stuart. > > > > On Mon, Aug 27, 2018 at 10:22 PM Valentin Kulichenko < > > valentin.kuliche...@gmail.com> wrote: > > > > > Hi Nikolay, > > > > > > I think it's actually pretty unfortunate that Spark uses term "database" > > > here, as it essentially refers to a schema in my view. Usually, database > > > is > > > something you create a physical connection to, and connection is bind to > > > that database. To connect to another database you need to create a new > > > connection. In Spark, however, you can switch between "databases" within a > > > single session, which looks really weird to me because it's usually a > > > characteristic of a schema. Having said that, I understand your concern, > > > but I don't think there is an ideal solution. > > > > > > As for your approach, I still don't understand how it will allow to fully > > > support schemas in catalog. > > > - How will you get a list of tables within a particular schema? In other > > > words, what would listTables() method return? > > > - How will you switch between the schemas? > > > - Etc. > > > > > > I still think assuming database=schema is the best we can do here, but I > > > would be happy to hear another opinions from other community members. > > > > > > OPTION_SCHEMA should definitely be introduced though (I thought we already > > > did, no?). CREATE TABLE will be supported with this ticket: > > > https://issues.apache.org/jira/browse/IGNITE-5780. For now we will have to > > > throw an exception if custom schema name is provided when creating a Spark > > > session, but table does not exist yet. > > > > > > -Val > > > > > > On Sun, Aug 26, 2018 at 7:56 AM Nikolay Izhikov <nizhi...@apache.org> > > > wrote: > > > > > > > Igniters, > > > > > > > > Personally, I don't like the solution with database == schema name. > > > > > > > > 1. I think we should try to use the right abstractions. > > > > schema == database doesn't sound right for me. > > > > > > > > Do you want to answer to all of our users something like that: > > > > > > > > - "How I can change Ignite SQL schema?" > > > > - "This is obvious, just use setDatabase("MY_SCHEMA_NAME")". > > > > > > > > 2. I think we restrict whole solution with that decision. > > > > If Ignite will support multiple databases in the future we just don't > > > > > > have > > > > a place for it. > > > > > > > > I think we should do the following: > > > > > > > > 1. IgniteExternalCatalog should be able to return *ALL* tables > > > > within Ignite instance. > > > > We shouldn't restrict tables list by schema by default. > > > > We should return tables with schema name - `schema.table` > > > > > > > > 2. We should introduce `OPTION_SCHEMA` for a dataframe to > > > > specify > > > > a schema. > > > > > > > > There is an issue with the second step: We can't use schema name > > > > in `CREATE TABLE` clause. > > > > This is restriction of current Ignite SQL. > > > > > > > > I propose to make the following: > > > > > > > > 1. For all write modes that requires the creation of table we > > > > should disallow usage of table outside of `SQL_PUBLIC` > > > > or usage of `OPTION_SCHEMA`. We should throw proper exception > > > > for > > > > this case. > > > > > > > > 2. Create a ticket to support `CREATE TABLE` with custom schema > > > > name. > > > > > > > > 3. After resolving ticket from step 2 we can add full support of > > > > custom schema to Spark integration. > > > > > > > > 4. We should throw an exception if user try to use setDatabase. > > > > > > > > Is that makes sense for you? > > > > > > > > В Вс, 26/08/2018 в 14:09 +0100, Stuart Macdonald пишет: > > > > > I'll go ahead and make the changes to represent the schema name as the > > > > > database name for the purposes of the Spark catalog. > > > > > > > > > > If anyone knows of an existing way to list all available schemata > > > > > > within > > > > an > > > > > Ignite instance please let me know, otherwise the first task will be > > > > > creating that mechanism. > > > > > > > > > > Stuart. > > > > > > > > > > On Fri, Aug 24, 2018 at 6:23 PM Valentin Kulichenko < > > > > > valentin.kuliche...@gmail.com> wrote: > > > > > > > > > > > Nikolay, > > > > > > > > > > > > If there are multiple configuration in XML, IgniteContext will > > > > > > always > > > > > > > > use > > > > > > only one of them. Looks like current approach simply doesn't work. I > > > > > > propose to report schema name as 'database' in Spark. If there are > > > > > > > > multiple > > > > > > clients, you would create multiple sessions and multiple catalogs. > > > > > > > > > > > > Makes sense? > > > > > > > > > > > > -Val > > > > > > > > > > > > On Fri, Aug 24, 2018 at 12:33 AM Nikolay Izhikov < > > > > > > nizhi...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > Hello, Valentin. > > > > > > > > > > > > > > > catalog exist in scope of a single IgniteSparkSession> (and > > > > > > > > therefore > > > > > > > > > > > > > > single IgniteContext and single Ignite instance)? > > > > > > > > > > > > > > Yes. > > > > > > > Actually, I was thinking about use case when we have several > > > > > > > Ignite > > > > > > > configuration in one XML file. > > > > > > > Now I see, may be this is too rare use-case to support. > > > > > > > > > > > > > > Stuart, Valentin, What is your proposal? > > > > > > > > > > > > > > В Ср, 22/08/2018 в 08:56 -0700, Valentin Kulichenko пишет: > > > > > > > > Nikolay, > > > > > > > > > > > > > > > > Whatever we decide on would be right :) Basically, we need to > > > > > > > > answer > > > > > > > > > > > > this > > > > > > > > question: does the catalog exist in scope of a single > > > > > > > > > > > > IgniteSparkSession > > > > > > > > (and therefore single IgniteContext and single Ignite instance)? > > > > > > In > > > > > > > > > > > > other > > > > > > > > words, in case of a rare use case when a single Spark > > > > > > > > application > > > > > > > > > > > > > > connects > > > > > > > > to multiple Ignite clusters, would there be a catalog created > > > > > > > > per > > > > > > > > > > > > > > cluster? > > > > > > > > > > > > > > > > If the answer is yes, current logic doesn't make sense. > > > > > > > > > > > > > > > > -Val > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Aug 22, 2018 at 1:44 AM Nikolay Izhikov < > > > > > > > > nizhi...@apache.org> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hello, Valentin. > > > > > > > > > > > > > > > > > > > I believe we should get rid of this logic and use Ignite > > > > > > schema > > > > > > > > > > > > name > > > > > > > as > > > > > > > > > > > > > > > > > > database name in Spark's catalog. > > > > > > > > > > > > > > > > > > When I develop Ignite integration with Spark Data Frame I use > > > > > > > > > > > > following > > > > > > > > > abstraction described by Vladimir Ozerov: > > > > > > > > > > > > > > > > > > "1) Let's consider Ignite cluster as a single database > > > > > > > > ("catalog" in > > > > > > > > > > > > > > ANSI > > > > > > > > > SQL'92 terms)." [1] > > > > > > > > > > > > > > > > > > Am I was wrong? If yes - let's fix it. > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > http://apache-ignite-developers.2346864.n4.nabble.com/SQL-usability-catalogs-schemas-and-tables-td17148.html > > > > > > > > > > > > > > > > > > В Ср, 22/08/2018 в 09:26 +0100, Stuart Macdonald пишет: > > > > > > > > > > Hi Val, yes that's correct. I'd be happy to make the change > > > > > > to > > > > have > > > > > > > > > > > > > > the > > > > > > > > > > database reference the schema if Nikolay agrees. (I'll first > > > > > > > > need > > > > > > > > > > > > to > > > > > > > do a > > > > > > > > > > bit of research into how to obtain the list of all available > > > > > > > > > > > > > > schemata...) > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Stuart. > > > > > > > > > > > > > > > > > > > > On Tue, Aug 21, 2018 at 9:43 PM, Valentin Kulichenko < > > > > > > > > > > valentin.kuliche...@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > > Stuart, > > > > > > > > > > > > > > > > > > > > > > Thanks for pointing this out, I was not aware that we use > > > > > > > > Spark > > > > > > > > > > > > > > > > > > database > > > > > > > > > > > concept this way. Actually, this confuses me a lot. As far > > > > > > > > as I > > > > > > > > > > > > > > > > > > understand, > > > > > > > > > > > catalog is created in the scope of a particular > > > > > > > > > > > > IgniteSparkSession, > > > > > > > > > > > > > > > > > > which > > > > > > > > > > > in turn is assigned to a particular IgniteContext and > > > > > > > > therefore > > > > > > > > > > > > > > single > > > > > > > > > > > Ignite client. If that's the case, I don't think it should > > > > > > be > > > > > > > > > > > > > > aware of > > > > > > > > > > > other Ignite clients that are connected to other clusters. > > > > > > > > This > > > > > > > > > > > > > > doesn't > > > > > > > > > > > look like correct behavior to me, not to mention that with > > > > > > > > this > > > > > > > > > > > > > > > > > > approach > > > > > > > > > > > having multiple databases would be a very rare case. I > > > > > > > > believe we > > > > > > > > > > > > > > > > > > should > > > > > > > > > > > get rid of this logic and use Ignite schema name as > > > > > > database > > > > name > > > > > > > > > > > > > > in > > > > > > > > > > > Spark's catalog. > > > > > > > > > > > > > > > > > > > > > > Nikolay, what do you think? > > > > > > > > > > > > > > > > > > > > > > -Val > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 21, 2018 at 8:17 AM Stuart Macdonald < > > > > > > > > > > > > > > stu...@stuwee.org> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Nikolay, Val, > > > > > > > > > > > > > > > > > > > > > > > > The JDBC Spark datasource[1] -- as far as I can tell -- > > > > > > > > has no > > > > > > > > > > > > ExternalCatalog implementation, it just uses the > > > > > > > > > > > > database > > > > > > > > > > > > > > specified > > > > > > > > > > > > > > > > > > in the > > > > > > > > > > > > JDBC URL. So I don't believe there is any way to call > > > > > > > > > > > > > > listTables() or > > > > > > > > > > > > listDatabases() for JDBC provider. > > > > > > > > > > > > > > > > > > > > > > > > The Hive ExternalCatalog[2] makes the distinction > > > > > > > > > > > > between > > > > > > > > > > > > > > database > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > table using the actual database and table mechanisms > > > > > > built > > > > into > > > > > > > > > > > > > > the > > > > > > > > > > > > catalog, which is fine because Hive has the clear > > > > > > > > distinction > > > > > > > > > > > > and > > > > > > > > > > > > hierarchy > > > > > > > > > > > > of databases and tables. > > > > > > > > > > > > > > > > > > > > > > > > *However* Ignite already uses the "database" concept in > > > > > > the > > > > > > > > > > > > > > Ignite > > > > > > > > > > > > > > > > > > > > > > > > ExternalCatalog[3] to mean the name of an Ignite > > > > > > instance. > > > > So > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > Ignite we > > > > > > > > > > > > have instances containing schemas containing tables, and > > > > > > > > Spark > > > > > > > > > > > > > > only > > > > > > > > > > > > > > > > > > has > > > > > > > > > > > > the > > > > > > > > > > > > concept of databases and tables so it seems like either > > > > > > we > > > > > > > > > > > > ignore > > > > > > > > > > > > > > > > > > one of > > > > > > > > > > > > the three Ignite concepts or combine two of them into > > > > > > > > database > > > > > > > > > > > > or > > > > > > > > > > > > > > > > > > table. > > > > > > > > > > > > The current implementation in the pull request combines > > > > > > > > Ignite > > > > > > > > > > > > > > > > > > schema and > > > > > > > > > > > > table attributes into the Spark table attribute. > > > > > > > > > > > > > > > > > > > > > > > > Stuart. > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > https://github.com/apache/spark/blob/master/sql/core/ > > > > > > > > > > > > src/main/scala/org/apache/spark/sql/execution/ > > > > > > > > > > > > datasources/jdbc/JDBCRelation.scala > > > > > > > > > > > > [2] > > > > > > > > > > > > https://github.com/apache/spark/blob/master/sql/hive/ > > > > > > > > > > > > > > > > > > > > > > > > > > src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala > > > > > > > > > > > > [3] > > > > > > > > > > > > https://github.com/apache/ignite/blob/master/modules/ > > > > > > > > > > > > spark/src/main/scala/org/apache/spark/sql/ignite/ > > > > > > > > > > > > IgniteExternalCatalog.scala > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Aug 21, 2018 at 9:31 AM, Nikolay Izhikov < > > > > > > > > > > > > > > > > > > nizhi...@apache.org> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hello, Stuart. > > > > > > > > > > > > > > > > > > > > > > > > > > Can you do some research and find out how schema is > > > > > > > > handled > > > > > > > > > > > > in > > > > > > > Data > > > > > > > > > > > > > > > > > > > > > > > > Frames > > > > > > > > > > > > > for a regular RDBMS such as Oracle, MySQL, etc? > > > > > > > > > > > > > > > > > > > > > > > > > > В Пн, 20/08/2018 в 15:37 -0700, Valentin Kulichenko > > > > > > > > пишет: > > > > > > > > > > > > > > Stuart, Nikolay, > > > > > > > > > > > > > > > > > > > > > > > > > > > > I see that the 'Table' class (returned by listTables > > > > > > > > > > > > method) > > > > > > > has > > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > > > > > 'database' field. Can we use this one to report schema > > > > > > > > name? > > > > > > > > > > > > > > > > > > > > > > > > > > > > In any case, I think we should look into how this is > > > > > > > > done > > > > > > > > > > > > in > > > > > > > data > > > > > > > > > > > > > > > > > > > > > > > > source > > > > > > > > > > > > > implementations for other databases. Any relational > > > > > > > > database > > > > > > > > > > > > > > has a > > > > > > > > > > > > > > > > > > > > > > > > notion > > > > > > > > > > > > > of schema, and I'm sure Spark integrations take this > > > > > > into > > > > > > > > > > > > > > account > > > > > > > > > > > > > > > > > > > > > > > > somehow. > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Val > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 20, 2018 at 6:12 AM Nikolay Izhikov < > > > > > > > > > > > > > > > > > > nizhi...@apache.org> > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hello, Stuart. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Personally, I think we should change current > > > > > > > > > > > > > > > tables > > > > > > > > > > > > naming > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > > return > > > > > > > > > > > > > table in form of `schema.table`. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Valentin, could you share your opinion? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > В Пн, 20/08/2018 в 10:04 +0100, Stuart Macdonald > > > > > > > > пишет: > > > > > > > > > > > > > > > > Igniters, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > While reviewing the changes for IGNITE-9228 > > > > > > [1,2], > > > > > > > > > > > > > > Nikolay > > > > > > > > > > > > > > > > > > and I > > > > > > > > > > > > > > > > > > > > > > > > are > > > > > > > > > > > > > > > > discussing whether to introduce a change which > > > > > > may > > > > > > > > > > > > impact > > > > > > > > > > > > > > > > > > > > > > > > backwards > > > > > > > > > > > > > > > > compatibility; Nikolay suggested we take the > > > > > > > > discussion > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > > > > > > > list. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ignite implements a custom Spark catalog which > > > > > > > > provides > > > > > > > > > > > > > > an > > > > > > > > > > > > > > > > > > API by > > > > > > > > > > > > > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > Spark users can list the tables which are > > > > > > > > available in > > > > > > > > > > > > > > Ignite > > > > > > > > > > > > > > > > > > > > > > > > which > > > > > > > > > > > > > can be > > > > > > > > > > > > > > > > queried via Spark SQL. Currently that table name > > > > > > > > list > > > > > > > > > > > > > > > > > > includes > > > > > > > > > > > > > > > > > > > > > > > > just > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > names of the tables, but IGNITE-9228 is > > > > > > > > introducing a > > > > > > > > > > > > > > change > > > > > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > > > > > > > > > > > allows > > > > > > > > > > > > > > > > optional prefixing of schema names to table > > > > > > > > > > > > > > > > names > > > > > > > > to > > > > > > > > > > > > > > > > > > disambiguate > > > > > > > > > > > > > > > > > > > > > > > > > > multiple > > > > > > > > > > > > > > > > tables with the same name in different schemas. > > > > > > > > For the > > > > > > > > > > > > > > "list > > > > > > > > > > > > > > > > > > > > > > > > > > tables" API > > > > > > > > > > > > > > > > we therefore have two options: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. List the tables using both their table names > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > > schema-qualified > > > > > > > > > > > > > table > > > > > > > > > > > > > > > > names (eg. [ "myTable", "mySchema.myTable" ]) > > > > > > even > > > > > > > > > > > > though > > > > > > > > > > > > > > > > > > they are > > > > > > > > > > > > > > > > > > > > > > > > > > the same > > > > > > > > > > > > > > > > underlying table. This retains backwards > > > > > > > > compatibility > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > > users > > > > > > > > > > > > > > > > > > > > > > > > who > > > > > > > > > > > > > > > > expect "myTable" to appear in the catalog. > > > > > > > > > > > > > > > > 2. List the tables using only their > > > > > > > > schema-qualified > > > > > > > > > > > > > > names. > > > > > > > > > > > > > > > > > > This > > > > > > > > > > > > > > > > > > > > > > > > > > eliminates > > > > > > > > > > > > > > > > duplication of names in the catalog but will > > > > > > > > > > > > potentially > > > > > > > > > > > > > > > > > > break > > > > > > > > > > > > > > > > compatibility with users who expect the table > > > > > > name > > > > in > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > > catalog. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > With either option we will allow for Spark SQL > > > > > > > > SELECT > > > > > > > > > > > > > > > > > > statements > > > > > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > use > > > > > > > > > > > > > > > > either table name or schema-qualified table > > > > > > names, > > > > this > > > > > > > > > > > > > > > > > > change > > > > > > > > > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > purely > > > > > > > > > > > > > > > > impact the API which is used to list available > > > > > > > > tables. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Any opinions would be welcome. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Stuart. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-9228 > > > > > > > > > > > > > > > > [2] https://github.com/apache/ignite/pull/4551
signature.asc
Description: This is a digitally signed message part