Nikolay, Val, it would be good if we could reach agreement here so that I can make the necessary modifications before the 2.7 cutoff.
Nikolay - would you be comfortable if I went ahead and made database=schema? Stuart. On Mon, Aug 27, 2018 at 10:22 PM Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Hi Nikolay, > > I think it's actually pretty unfortunate that Spark uses term "database" > here, as it essentially refers to a schema in my view. Usually, database is > something you create a physical connection to, and connection is bind to > that database. To connect to another database you need to create a new > connection. In Spark, however, you can switch between "databases" within a > single session, which looks really weird to me because it's usually a > characteristic of a schema. Having said that, I understand your concern, > but I don't think there is an ideal solution. > > As for your approach, I still don't understand how it will allow to fully > support schemas in catalog. > - How will you get a list of tables within a particular schema? In other > words, what would listTables() method return? > - How will you switch between the schemas? > - Etc. > > I still think assuming database=schema is the best we can do here, but I > would be happy to hear another opinions from other community members. > > OPTION_SCHEMA should definitely be introduced though (I thought we already > did, no?). CREATE TABLE will be supported with this ticket: > https://issues.apache.org/jira/browse/IGNITE-5780. For now we will have to > throw an exception if custom schema name is provided when creating a Spark > session, but table does not exist yet. > > -Val > > On Sun, Aug 26, 2018 at 7:56 AM Nikolay Izhikov <nizhi...@apache.org> > wrote: > > > Igniters, > > > > Personally, I don't like the solution with database == schema name. > > > > 1. I think we should try to use the right abstractions. > > schema == database doesn't sound right for me. > > > > Do you want to answer to all of our users something like that: > > > > - "How I can change Ignite SQL schema?" > > - "This is obvious, just use setDatabase("MY_SCHEMA_NAME")". > > > > 2. I think we restrict whole solution with that decision. > > If Ignite will support multiple databases in the future we just don't > have > > a place for it. > > > > I think we should do the following: > > > > 1. IgniteExternalCatalog should be able to return *ALL* tables > > within Ignite instance. > > We shouldn't restrict tables list by schema by default. > > We should return tables with schema name - `schema.table` > > > > 2. We should introduce `OPTION_SCHEMA` for a dataframe to specify > > a schema. > > > > There is an issue with the second step: We can't use schema name > > in `CREATE TABLE` clause. > > This is restriction of current Ignite SQL. > > > > I propose to make the following: > > > > 1. For all write modes that requires the creation of table we > > should disallow usage of table outside of `SQL_PUBLIC` > > or usage of `OPTION_SCHEMA`. We should throw proper exception for > > this case. > > > > 2. Create a ticket to support `CREATE TABLE` with custom schema > > name. > > > > 3. After resolving ticket from step 2 we can add full support of > > custom schema to Spark integration. > > > > 4. We should throw an exception if user try to use setDatabase. > > > > Is that makes sense for you? > > > > В Вс, 26/08/2018 в 14:09 +0100, Stuart Macdonald пишет: > > > I'll go ahead and make the changes to represent the schema name as the > > > database name for the purposes of the Spark catalog. > > > > > > If anyone knows of an existing way to list all available schemata > within > > an > > > Ignite instance please let me know, otherwise the first task will be > > > creating that mechanism. > > > > > > Stuart. > > > > > > On Fri, Aug 24, 2018 at 6:23 PM Valentin Kulichenko < > > > valentin.kuliche...@gmail.com> wrote: > > > > > > > Nikolay, > > > > > > > > If there are multiple configuration in XML, IgniteContext will always > > use > > > > only one of them. Looks like current approach simply doesn't work. I > > > > propose to report schema name as 'database' in Spark. If there are > > multiple > > > > clients, you would create multiple sessions and multiple catalogs. > > > > > > > > Makes sense? > > > > > > > > -Val > > > > > > > > On Fri, Aug 24, 2018 at 12:33 AM Nikolay Izhikov < > nizhi...@apache.org> > > > > wrote: > > > > > > > > > Hello, Valentin. > > > > > > > > > > > catalog exist in scope of a single IgniteSparkSession> (and > > therefore > > > > > > > > > > single IgniteContext and single Ignite instance)? > > > > > > > > > > Yes. > > > > > Actually, I was thinking about use case when we have several Ignite > > > > > configuration in one XML file. > > > > > Now I see, may be this is too rare use-case to support. > > > > > > > > > > Stuart, Valentin, What is your proposal? > > > > > > > > > > В Ср, 22/08/2018 в 08:56 -0700, Valentin Kulichenko пишет: > > > > > > Nikolay, > > > > > > > > > > > > Whatever we decide on would be right :) Basically, we need to > > answer > > > > > > > > this > > > > > > question: does the catalog exist in scope of a single > > > > > > > > IgniteSparkSession > > > > > > (and therefore single IgniteContext and single Ignite instance)? > In > > > > > > > > other > > > > > > words, in case of a rare use case when a single Spark application > > > > > > > > > > connects > > > > > > to multiple Ignite clusters, would there be a catalog created per > > > > > > > > > > cluster? > > > > > > > > > > > > If the answer is yes, current logic doesn't make sense. > > > > > > > > > > > > -Val > > > > > > > > > > > > > > > > > > On Wed, Aug 22, 2018 at 1:44 AM Nikolay Izhikov < > > nizhi...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > > > > Hello, Valentin. > > > > > > > > > > > > > > > I believe we should get rid of this logic and use Ignite > schema > > > > > > > > name > > > > > as > > > > > > > > > > > > > > database name in Spark's catalog. > > > > > > > > > > > > > > When I develop Ignite integration with Spark Data Frame I use > > > > > > > > following > > > > > > > abstraction described by Vladimir Ozerov: > > > > > > > > > > > > > > "1) Let's consider Ignite cluster as a single database > > ("catalog" in > > > > > > > > > > ANSI > > > > > > > SQL'92 terms)." [1] > > > > > > > > > > > > > > Am I was wrong? If yes - let's fix it. > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > http://apache-ignite-developers.2346864.n4.nabble.com/SQL-usability-catalogs-schemas-and-tables-td17148.html > > > > > > > > > > > > > > В Ср, 22/08/2018 в 09:26 +0100, Stuart Macdonald пишет: > > > > > > > > Hi Val, yes that's correct. I'd be happy to make the change > to > > have > > > > > > > > > > the > > > > > > > > database reference the schema if Nikolay agrees. (I'll first > > need > > > > > > > > to > > > > > do a > > > > > > > > bit of research into how to obtain the list of all available > > > > > > > > > > schemata...) > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Stuart. > > > > > > > > > > > > > > > > On Tue, Aug 21, 2018 at 9:43 PM, Valentin Kulichenko < > > > > > > > > valentin.kuliche...@gmail.com> wrote: > > > > > > > > > > > > > > > > > Stuart, > > > > > > > > > > > > > > > > > > Thanks for pointing this out, I was not aware that we use > > Spark > > > > > > > > > > > > > > database > > > > > > > > > concept this way. Actually, this confuses me a lot. As far > > as I > > > > > > > > > > > > > > understand, > > > > > > > > > catalog is created in the scope of a particular > > > > > > > > IgniteSparkSession, > > > > > > > > > > > > > > which > > > > > > > > > in turn is assigned to a particular IgniteContext and > > therefore > > > > > > > > > > single > > > > > > > > > Ignite client. If that's the case, I don't think it should > be > > > > > > > > > > aware of > > > > > > > > > other Ignite clients that are connected to other clusters. > > This > > > > > > > > > > doesn't > > > > > > > > > look like correct behavior to me, not to mention that with > > this > > > > > > > > > > > > > > approach > > > > > > > > > having multiple databases would be a very rare case. I > > believe we > > > > > > > > > > > > > > should > > > > > > > > > get rid of this logic and use Ignite schema name as > database > > name > > > > > > > > > > in > > > > > > > > > Spark's catalog. > > > > > > > > > > > > > > > > > > Nikolay, what do you think? > > > > > > > > > > > > > > > > > > -Val > > > > > > > > > > > > > > > > > > On Tue, Aug 21, 2018 at 8:17 AM Stuart Macdonald < > > > > > > > > > > stu...@stuwee.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Nikolay, Val, > > > > > > > > > > > > > > > > > > > > The JDBC Spark datasource[1] -- as far as I can tell -- > > has no > > > > > > > > > > ExternalCatalog implementation, it just uses the database > > > > > > > > > > specified > > > > > > > > > > > > > > in the > > > > > > > > > > JDBC URL. So I don't believe there is any way to call > > > > > > > > > > listTables() or > > > > > > > > > > listDatabases() for JDBC provider. > > > > > > > > > > > > > > > > > > > > The Hive ExternalCatalog[2] makes the distinction between > > > > > > > > > > database > > > > > > > > > > > > > > and > > > > > > > > > > table using the actual database and table mechanisms > built > > into > > > > > > > > > > the > > > > > > > > > > catalog, which is fine because Hive has the clear > > distinction > > > > > > > > and > > > > > > > > > > hierarchy > > > > > > > > > > of databases and tables. > > > > > > > > > > > > > > > > > > > > *However* Ignite already uses the "database" concept in > the > > > > > > > > > > Ignite > > > > > > > > > > > > > > > > > > > > ExternalCatalog[3] to mean the name of an Ignite > instance. > > So > > > > > > > > in > > > > > > > > > > > > > > Ignite we > > > > > > > > > > have instances containing schemas containing tables, and > > Spark > > > > > > > > > > only > > > > > > > > > > > > > > has > > > > > > > > > > the > > > > > > > > > > concept of databases and tables so it seems like either > we > > > > > > > > ignore > > > > > > > > > > > > > > one of > > > > > > > > > > the three Ignite concepts or combine two of them into > > database > > > > > > > > or > > > > > > > > > > > > > > table. > > > > > > > > > > The current implementation in the pull request combines > > Ignite > > > > > > > > > > > > > > schema and > > > > > > > > > > table attributes into the Spark table attribute. > > > > > > > > > > > > > > > > > > > > Stuart. > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > https://github.com/apache/spark/blob/master/sql/core/ > > > > > > > > > > src/main/scala/org/apache/spark/sql/execution/ > > > > > > > > > > datasources/jdbc/JDBCRelation.scala > > > > > > > > > > [2] > > > > > > > > > > https://github.com/apache/spark/blob/master/sql/hive/ > > > > > > > > > > > > > > > > > > > > src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala > > > > > > > > > > [3] > > > > > > > > > > https://github.com/apache/ignite/blob/master/modules/ > > > > > > > > > > spark/src/main/scala/org/apache/spark/sql/ignite/ > > > > > > > > > > IgniteExternalCatalog.scala > > > > > > > > > > > > > > > > > > > > On Tue, Aug 21, 2018 at 9:31 AM, Nikolay Izhikov < > > > > > > > > > > > > > > nizhi...@apache.org> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hello, Stuart. > > > > > > > > > > > > > > > > > > > > > > Can you do some research and find out how schema is > > handled > > > > > > > > in > > > > > Data > > > > > > > > > > > > > > > > > > > > Frames > > > > > > > > > > > for a regular RDBMS such as Oracle, MySQL, etc? > > > > > > > > > > > > > > > > > > > > > > В Пн, 20/08/2018 в 15:37 -0700, Valentin Kulichenko > > пишет: > > > > > > > > > > > > Stuart, Nikolay, > > > > > > > > > > > > > > > > > > > > > > > > I see that the 'Table' class (returned by listTables > > > > > > > > method) > > > > > has > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > > > > > > 'database' field. Can we use this one to report schema > > name? > > > > > > > > > > > > > > > > > > > > > > > > In any case, I think we should look into how this is > > done > > > > > > > > in > > > > > data > > > > > > > > > > > > > > > > > > > > source > > > > > > > > > > > implementations for other databases. Any relational > > database > > > > > > > > > > has a > > > > > > > > > > > > > > > > > > > > notion > > > > > > > > > > > of schema, and I'm sure Spark integrations take this > into > > > > > > > > > > account > > > > > > > > > > > > > > > > > > > > somehow. > > > > > > > > > > > > > > > > > > > > > > > > -Val > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Aug 20, 2018 at 6:12 AM Nikolay Izhikov < > > > > > > > > > > > > > > nizhi...@apache.org> > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hello, Stuart. > > > > > > > > > > > > > > > > > > > > > > > > > > Personally, I think we should change current tables > > > > > > > > naming > > > > > and > > > > > > > > > > > > > > > > > > > > return > > > > > > > > > > > table in form of `schema.table`. > > > > > > > > > > > > > > > > > > > > > > > > > > Valentin, could you share your opinion? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > В Пн, 20/08/2018 в 10:04 +0100, Stuart Macdonald > > пишет: > > > > > > > > > > > > > > Igniters, > > > > > > > > > > > > > > > > > > > > > > > > > > > > While reviewing the changes for IGNITE-9228 > [1,2], > > > > > > > > > > Nikolay > > > > > > > > > > > > > > and I > > > > > > > > > > > > > > > > > > > > are > > > > > > > > > > > > > > discussing whether to introduce a change which > may > > > > > > > > impact > > > > > > > > > > > > > > > > > > > > backwards > > > > > > > > > > > > > > compatibility; Nikolay suggested we take the > > discussion > > > > > > > > > > to > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > > > list. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ignite implements a custom Spark catalog which > > provides > > > > > > > > > > an > > > > > > > > > > > > > > API by > > > > > > > > > > > > > > > > > > > > > > which > > > > > > > > > > > > > > Spark users can list the tables which are > > available in > > > > > > > > > > Ignite > > > > > > > > > > > > > > > > > > > > which > > > > > > > > > > > can be > > > > > > > > > > > > > > queried via Spark SQL. Currently that table name > > list > > > > > > > > > > > > > > includes > > > > > > > > > > > > > > > > > > > > just > > > > > > > > > > > the > > > > > > > > > > > > > > names of the tables, but IGNITE-9228 is > > introducing a > > > > > > > > > > change > > > > > > > > > > > > > > which > > > > > > > > > > > > > > > > > > > > > > allows > > > > > > > > > > > > > > optional prefixing of schema names to table names > > to > > > > > > > > > > > > > > disambiguate > > > > > > > > > > > > > > > > > > > > > > multiple > > > > > > > > > > > > > > tables with the same name in different schemas. > > For the > > > > > > > > > > "list > > > > > > > > > > > > > > > > > > > > > > tables" API > > > > > > > > > > > > > > we therefore have two options: > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. List the tables using both their table names > and > > > > > > > > > > > > > > > > > > > > schema-qualified > > > > > > > > > > > table > > > > > > > > > > > > > > names (eg. [ "myTable", "mySchema.myTable" ]) > even > > > > > > > > though > > > > > > > > > > > > > > they are > > > > > > > > > > > > > > > > > > > > > > the same > > > > > > > > > > > > > > underlying table. This retains backwards > > compatibility > > > > > > > > > > with > > > > > > > > > > > > > > users > > > > > > > > > > > > > > > > > > > > who > > > > > > > > > > > > > > expect "myTable" to appear in the catalog. > > > > > > > > > > > > > > 2. List the tables using only their > > schema-qualified > > > > > > > > > > names. > > > > > > > > > > > > > > This > > > > > > > > > > > > > > > > > > > > > > eliminates > > > > > > > > > > > > > > duplication of names in the catalog but will > > > > > > > > potentially > > > > > > > > > > > > > > break > > > > > > > > > > > > > > compatibility with users who expect the table > name > > in > > > > > > > > the > > > > > > > > > > > > > > catalog. > > > > > > > > > > > > > > > > > > > > > > > > > > > > With either option we will allow for Spark SQL > > SELECT > > > > > > > > > > > > > > statements > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > use > > > > > > > > > > > > > > either table name or schema-qualified table > names, > > this > > > > > > > > > > > > > > change > > > > > > > > > > > > > > > > > > > > would > > > > > > > > > > > purely > > > > > > > > > > > > > > impact the API which is used to list available > > tables. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Any opinions would be welcome. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Stuart. > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > https://issues.apache.org/jira/browse/IGNITE-9228 > > > > > > > > > > > > > > [2] https://github.com/apache/ignite/pull/4551 >