Hello, Valentin. > I believe we should get rid of this logic and use Ignite schema name as > database name in Spark's catalog.
When I develop Ignite integration with Spark Data Frame I use following abstraction described by Vladimir Ozerov: "1) Let's consider Ignite cluster as a single database ("catalog" in ANSI SQL'92 terms)." [1] Am I was wrong? If yes - let's fix it. [1] http://apache-ignite-developers.2346864.n4.nabble.com/SQL-usability-catalogs-schemas-and-tables-td17148.html В Ср, 22/08/2018 в 09:26 +0100, Stuart Macdonald пишет: > Hi Val, yes that's correct. I'd be happy to make the change to have the > database reference the schema if Nikolay agrees. (I'll first need to do a > bit of research into how to obtain the list of all available schemata...) > > Thanks, > Stuart. > > On Tue, Aug 21, 2018 at 9:43 PM, Valentin Kulichenko < > valentin.kuliche...@gmail.com> wrote: > > > Stuart, > > > > Thanks for pointing this out, I was not aware that we use Spark database > > concept this way. Actually, this confuses me a lot. As far as I understand, > > catalog is created in the scope of a particular IgniteSparkSession, which > > in turn is assigned to a particular IgniteContext and therefore single > > Ignite client. If that's the case, I don't think it should be aware of > > other Ignite clients that are connected to other clusters. This doesn't > > look like correct behavior to me, not to mention that with this approach > > having multiple databases would be a very rare case. I believe we should > > get rid of this logic and use Ignite schema name as database name in > > Spark's catalog. > > > > Nikolay, what do you think? > > > > -Val > > > > On Tue, Aug 21, 2018 at 8:17 AM Stuart Macdonald <stu...@stuwee.org> > > wrote: > > > > > Nikolay, Val, > > > > > > The JDBC Spark datasource[1] -- as far as I can tell -- has no > > > ExternalCatalog implementation, it just uses the database specified in the > > > JDBC URL. So I don't believe there is any way to call listTables() or > > > listDatabases() for JDBC provider. > > > > > > The Hive ExternalCatalog[2] makes the distinction between database and > > > table using the actual database and table mechanisms built into the > > > catalog, which is fine because Hive has the clear distinction and > > > hierarchy > > > of databases and tables. > > > > > > *However* Ignite already uses the "database" concept in the Ignite > > > > > > ExternalCatalog[3] to mean the name of an Ignite instance. So in Ignite we > > > have instances containing schemas containing tables, and Spark only has > > > the > > > concept of databases and tables so it seems like either we ignore one of > > > the three Ignite concepts or combine two of them into database or table. > > > The current implementation in the pull request combines Ignite schema and > > > table attributes into the Spark table attribute. > > > > > > Stuart. > > > > > > [1] > > > https://github.com/apache/spark/blob/master/sql/core/ > > > src/main/scala/org/apache/spark/sql/execution/ > > > datasources/jdbc/JDBCRelation.scala > > > [2] > > > https://github.com/apache/spark/blob/master/sql/hive/ > > > src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala > > > [3] > > > https://github.com/apache/ignite/blob/master/modules/ > > > spark/src/main/scala/org/apache/spark/sql/ignite/ > > > IgniteExternalCatalog.scala > > > > > > On Tue, Aug 21, 2018 at 9:31 AM, Nikolay Izhikov <nizhi...@apache.org> > > > wrote: > > > > > > > Hello, Stuart. > > > > > > > > Can you do some research and find out how schema is handled in Data > > > > > > Frames > > > > for a regular RDBMS such as Oracle, MySQL, etc? > > > > > > > > В Пн, 20/08/2018 в 15:37 -0700, Valentin Kulichenko пишет: > > > > > Stuart, Nikolay, > > > > > > > > > > I see that the 'Table' class (returned by listTables method) has a > > > > > > > > 'database' field. Can we use this one to report schema name? > > > > > > > > > > In any case, I think we should look into how this is done in data > > > > > > source > > > > implementations for other databases. Any relational database has a > > > > > > notion > > > > of schema, and I'm sure Spark integrations take this into account > > > > > > somehow. > > > > > > > > > > -Val > > > > > > > > > > On Mon, Aug 20, 2018 at 6:12 AM Nikolay Izhikov <nizhi...@apache.org> > > > > > > > > wrote: > > > > > > Hello, Stuart. > > > > > > > > > > > > Personally, I think we should change current tables naming and > > > > > > return > > > > table in form of `schema.table`. > > > > > > > > > > > > Valentin, could you share your opinion? > > > > > > > > > > > > > > > > > > В Пн, 20/08/2018 в 10:04 +0100, Stuart Macdonald пишет: > > > > > > > Igniters, > > > > > > > > > > > > > > While reviewing the changes for IGNITE-9228 [1,2], Nikolay and I > > > > > > are > > > > > > > discussing whether to introduce a change which may impact > > > > > > backwards > > > > > > > compatibility; Nikolay suggested we take the discussion to this > > > > > > list. > > > > > > > > > > > > > > Ignite implements a custom Spark catalog which provides an API by > > > > > > > > which > > > > > > > Spark users can list the tables which are available in Ignite > > > > > > which > > > > can be > > > > > > > queried via Spark SQL. Currently that table name list includes > > > > > > just > > > > the > > > > > > > names of the tables, but IGNITE-9228 is introducing a change which > > > > > > > > allows > > > > > > > optional prefixing of schema names to table names to disambiguate > > > > > > > > multiple > > > > > > > tables with the same name in different schemas. For the "list > > > > > > > > tables" API > > > > > > > we therefore have two options: > > > > > > > > > > > > > > 1. List the tables using both their table names and > > > > > > schema-qualified > > > > table > > > > > > > names (eg. [ "myTable", "mySchema.myTable" ]) even though they are > > > > > > > > the same > > > > > > > underlying table. This retains backwards compatibility with users > > > > > > who > > > > > > > expect "myTable" to appear in the catalog. > > > > > > > 2. List the tables using only their schema-qualified names. This > > > > > > > > eliminates > > > > > > > duplication of names in the catalog but will potentially break > > > > > > > compatibility with users who expect the table name in the catalog. > > > > > > > > > > > > > > With either option we will allow for Spark SQL SELECT statements > > > > > > to > > > > use > > > > > > > either table name or schema-qualified table names, this change > > > > > > would > > > > purely > > > > > > > impact the API which is used to list available tables. > > > > > > > > > > > > > > Any opinions would be welcome. > > > > > > > > > > > > > > Thanks, > > > > > > > Stuart. > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9228 > > > > > > > [2] https://github.com/apache/ignite/pull/4551
signature.asc
Description: This is a digitally signed message part