[ https://issues.apache.org/jira/browse/CALCITE-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205381#comment-17205381 ]
Ruben Q L commented on CALCITE-4034: ------------------------------------ [~neoremind] any update on the current issue? > Implement a MySQL InnoDB adapter > -------------------------------- > > Key: CALCITE-4034 > URL: https://issues.apache.org/jira/browse/CALCITE-4034 > Project: Calcite > Issue Type: New Feature > Components: csv-adapter > Affects Versions: 1.23.0 > Reporter: neoremind > Assignee: Julian Hyde > Priority: Minor > Labels: pull-request-available > Fix For: 1.26.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Calcite’s InnoDB adapter allows you to query the data based on InnoDB data > files directy, data files are also known as .ibd files, this adapter is > different from JDBC adapter which requires a MySQL server to serve response. > With .ibd files and the corresponding DDLs, InnoDB adapter is able to work > like a simple "MySQL server", it accepts SQL query and attempts to compile > the query based on InnoDB file accessing APIs provided by > [innodb-java-reader|[https://github.com/alibaba/innodb-java-reader]] (I am > the author of the library, which is well-tested and production-ready), it > will exploit projecting, filtering and sorting directly in InnoDB data file > where possible. What’s more, with DDLs, the adapter is "index aware", it > leverages rules to choose the right index to scan, for example, using primary > key or secondary keys to look up data, then it tries to push down some > conditions into storage engine. Also, the adapter leaves option to provide > hint as well, so that user can indicate the optimizer to force use one > specific index. > The InnoDB adapter can, > 1) Full scan a table. > 2) Scan by index such as primary key or secondary key, single column or > composite keys. > 3) Push down filtering condition to storage engine, support both point query > and range query. > 4) Push down projection. > 5) Push down ordering if it matches the natural collation of an index. > 6) Support almost all the commonly used data types. > 7) Work as a simple MySQL server being able to read .ibd data files directly > through file system, this can be a tool to query or dump table data by > offloading from MySQL process under some conditions. > Before I describe the adapter and its design, I was wondering if it is > appropriate to deliver the work by contributing the codebase back to Calcite, > or if it would be better to stay in another project, probably being referred > from Calcite adapter link. Looking forward to Calcite developer's advice. > > Below is the first version of the usage. > ---------- > [MySQL]([https://www.mysql.com/]) is the most popular Open Source SQL > database management system, is developed, distributed, and supported by > Oracle Corporation. InnoDB is a general-purpose storage engine that balances > high reliability and high performance in MySQL, since 5.6 InnoDB has become > the default MySQL storage engine. > Calcite’s InnoDB adapter allows you to query the data based on InnoDB data > files directy, data files are also known as .ibd files. It leverages > [innodb-java-reader]([https://github.com/alibaba/innodb-java-reader]). This > adapter is different from JDBC adapter which maps a schema in a JDBC data > source and requires a MySQL server to serve response. With .ibd files and the > corresponding DDLs, InnoDB adapter is able to work like a simple "MySQL > server", it accepts SQL query and attempts to compile the query based on > InnoDB file accessing APIs provided by innodb-java-reader, it will exploit > projecting, filtering and sorting directly in InnoDB data file where possible. > What’s more, with DDLs, the adapter is "index aware", it leverages rules to > choose the right index to scan, for example, using primary key or secondary > keys to look up data, then it tries to push down some conditions into storage > engine. Also, the adapter leaves option to provide hint as well, so that user > can indicate the optimizer to force use one specific index. > A basic example of a model file is given below, this schema reads from a > MySQL "scott" database: > {code:java} > { > "version": "1.0", > "defaultSchema": "scott", > "schemas": [ > { > "name": "scott", > "type": "custom", > "factory": "org.apache.calcite.adapter.innodb.InnodbSchemaFactory", > "operand": { > "sqlFilePath": [ "/path/scott.sql" ], > "ibdDataFileBasePath": "/usr/local/mysql/data/scott" > } > } > ] > } > {code} > `sqlFilePath` is a list of DDL files, you can generate table definitions by > executing `_mysqldump -d -u<username> -p<password> -h <hostname> <dbname>_` > in command-line. > The file content of `/path/scott.sql` is given below: > {code:java} > CREATE TABLE `DEPT`( > `DEPTNO` TINYINT NOT NULL, > `DNAME` VARCHAR(50) NOT NULL, > `LOC` VARCHAR(20), > UNIQUE KEY `DEPT_PK` (`DEPTNO`) > )ENGINE=InnoDB DEFAULT CHARSET=latin1; > CREATE TABLE `EMP`( > `EMPNO` INT(11) NOT NULL, > `ENAME` VARCHAR(100) NOT NULL, > `JOB` VARCHAR(15) NOT NULL, > `AGE` SMALLINT, > `MGR` BIGINT, > `HIREDATE` DATE, > `SAL` DECIMAL(8,2) NOT NULL, > `COMM` DECIMAL(6,2), > `DEPTNO` TINYINT, > `EMAIL` VARCHAR(100) DEFAULT NULL, > `CREATE_DATETIME` DATETIME, > `CREATE_TIME` TIME, > `UPSERT_TIME` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE > CURRENT_TIMESTAMP, > PRIMARY KEY (`EMPNO`), > KEY `ENAME_KEY` (`ENAME`), > KEY `HIREDATE_KEY` (`HIREDATE`), > KEY `CREATE_DATETIME_JOB_KEY` (`CREATE_DATETIME`, `JOB`), > KEY `CREATE_TIME_KEY` (`CREATE_TIME`), > KEY `UPSERT_TIME_KEY` (`UPSERT_TIME`), > KEY `DEPTNO_JOB_KEY` (`DEPTNO`, `JOB`), > KEY `DEPTNO_SAL_COMM_KEY` (`DEPTNO`, `SAL`, `COMM`), > KEY `DEPTNO_MGR_KEY` (`DEPTNO`, `MGR`), > KEY `AGE_KEY` (`AGE`) > )ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; > {code} > ibdDataFileBasePath is the parent file path of .ibd files. > Assuming the model file is stored as `model.json`, you can connect to InnoDB > data file to performa query via > [sqlline]([https://github.com/julianhyde/sqlline]) as follows: > {code:java} > sqlline> !connect jdbc:calcite:model=model.json admin admin{code} > We can issue a simple query to query all employees by writing standard SQL: > {code:java} > sqlline> select empno,ename,job,age,mgr from "EMP"; > +-------+--------+-----------+-----+------+ > | EMPNO | ENAME | JOB | AGE | MGR | > +-------+--------+-----------+-----+------+ > | 7369 | SMITH | CLERK | 30 | 7902 | > | 7499 | ALLEN | SALESMAN | 24 | 7698 | > | 7521 | WARD | SALESMAN | 41 | 7698 | > | 7566 | JONES | MANAGER | 28 | 7839 | > | 7654 | MARTIN | SALESMAN | 27 | 7698 | > | 7698 | BLAKE | MANAGER | 38 | 7839 | > | 7782 | CLARK | MANAGER | 32 | 7839 | > | 7788 | SCOTT | ANALYST | 45 | 7566 | > | 7839 | KING | PRESIDENT | 22 | null | > | 7844 | TURNER | SALESMAN | 54 | 7698 | > | 7876 | ADAMS | CLERK | 35 | 7788 | > | 7900 | JAMES | CLERK | 40 | 7698 | > | 7902 | FORD | ANALYST | 28 | 7566 | > | 7934 | MILLER | CLERK | 32 | 7782 | > +-------+--------+-----------+-----+------+ > {code} > While executing this query, the InnoDB adapter scans the InnoDB data file > `EMP.ibd` using primary key, also known as clustering B+ tree index in MySQL, > and is able to > push down projection to underlying storage engine. Projection can reduce the > size of data fetched from the storage engine. > We can look up one employee by filtering. The InnoDB adapter retrieves all > indexes through DDL file provided in `model.json`. > {code:java} > sqlline> select empno,ename,job,age,mgr from "EMP" where empno = 7782; > +-------+-------+---------+-----+------+ > | EMPNO | ENAME | JOB | AGE | MGR | > +-------+-------+---------+-----+------+ > | 7782 | CLARK | MANAGER | 32 | 7839 | > +-------+-------+---------+-----+------+ > {code} > The InnoDB adapter is able to recognize that `empno` is the primary key and > do a point query lookup by using clustering index instead of full table scan. > We can do range query on primary key as well. > {code:java} > sqlline> select empno,ename,job,age,mgr from "EMP" where empno > 7782 and > empno < 7900;{code} > Note that such query with acceptable range is usually efficient in MySQL with > InnoDB storage engine, because for clustering B+ tree index, records close in > index are close in data file, which is good for scanning. > We can look up employee by secondary key. For example, the filtering > condition will be on a `VARCHAR` field `ename`. > {code:java} > sqlline> select empno,ename,job,age,mgr from "EMP" where ename = 'smith'; > +-------+-------+-------+-----+------+ > | EMPNO | ENAME | JOB | AGE | MGR | > +-------+-------+-------+-----+------+ > | 7369 | SMITH | CLERK | 30 | 7902 | > +-------+-------+-------+-----+------+ > {code} > The InnoDB adapter works well on almost all the commonly used data types in > MySQL, for more information on supported data types, please refer to > [innodb-java-reader](https://github.com/alibaba/innodb-java-reader#3-features). > We can query by composite key. For example, given secondary index of > `DEPTNO_MGR_KEY`. > {code:java} > sqlline> select empno,ename,job,age,mgr from "EMP" where deptno = 20 and mgr > = 7566; > +-------+-------+---------+-----+------+ > | EMPNO | ENAME | JOB | AGE | MGR | > +-------+-------+---------+-----+------+ > | 7788 | SCOTT | ANALYST | 45 | 7566 | > | 7902 | FORD | ANALYST | 28 | 7566 | > +-------+-------+---------+-----+------+ > {code} > The InnoDB adapter will leverage the matched key `DEPTNO_MGR_KEY` to push > down filtering condition of `deptno = 20 and mgr = 7566`. > In some cases, only part of the conditions can be pushed down since there is > a limitation in the underlying storage engine API, leaving unpushed remainder > conditions in the rest of the plan. Given the below SQL, only `deptno = 20` > is pushed down. > {code:java} > select empno,ename,job,age,mgr from "EMP" where deptno = 20 and upsert_time > > '2018-01-01 00:00:00'; > {code} > `innodb-java-reader` only supports range query with lower and upper bound > using an index, not fully `Index Condition Pushdown (ICP)`. The storage > engine returns a range of rows and Calcite will evaluates the rest of `WHERE` > condition from the rows fetched. > For the below SQL, there are multiple indexes satisfying the left-prefix > index rule, the possible indexes are `DEPTNO_JOB_KEY`, `DEPTNO_SAL_COMM_KEY` > and `DEPTNO_MGR_KEY`, the Innod adapter will choose one of them according to > the ordinal defined in DDL, only `deptno = 20` condition is pushed down, > leaving the rest of `WHERE` condition handled by Calcite built-in execution > engine. > {code:java} > sqlline> select empno,deptno,sal from "EMP" where deptno = 20 and sal > 2000; > +-------+--------+---------+ > | EMPNO | DEPTNO | SAL | > +-------+--------+---------+ > | 7788 | 20 | 3000.00 | > | 7902 | 20 | 3000.00 | > | 7566 | 20 | 2975.00 | > +-------+--------+---------+ > {code} > Accessing rows through secondary key requires scanning by secondary index and > retrieving records back to clustering index in InnoDB, for a "big" scan, that > would introduce many random I/O operations, so performance is usually not > good enough. Note that the query above can be more performant by using > `EPTNO_SAL_COMM_KEY` index, because covering index does not need to retrieve > back to clustering index. We can force using `DEPTNO_SAL_COMM_KEY` index by > hint as below. > {code:java} > sqlline> select empno,ename,job,age,sal from "emp"/*+ > index(DEPTNO_SAL_COMM_KEY) */ where deptno = 20 and sal > 2000; > {code} > Hint can be configured in `SqlToRelConverter`, to enable hint, you should > register `index` HintStrategy on `TableScan` in > `SqlToRelConverter.ConfigBuilder`. Index hint takes effect on the base > `TableScan` relational node, if there are conditions matching the index, > index condition can be pushed down as well. For the below SQL, although none > of the indexes can be used, but by leveraging covering index, the performance > is better than full table scan, we can force to use `DEPTNO_MGR_KEY` to scan > in secondary index. > {code:java} > sqlline> select empno,mgr from "EMP"/*+ index(DEPTNO_MGR_KEY) */ where mgr = > 7839; > {code} > Ordering can be pushed down if it matches the natural collation of the index > used. > {code:java} > sqlline> select deptno,ename,hiredate from "EMP" where hiredate < > '2020-01-01' order by hiredate desc; > +--------+--------+------------+ > | DEPTNO | ENAME | HIREDATE | > +--------+--------+------------+ > | 20 | ADAMS | 1987-05-23 | > | 20 | SCOTT | 1987-04-19 | > | 10 | MILLER | 1982-01-23 | > | 20 | FORD | 1981-12-03 | > | 30 | JAMES | 1981-12-03 | > | 10 | KING | 1981-11-17 | > | 30 | MARTIN | 1981-09-28 | > | 30 | TURNER | 1981-09-08 | > | 10 | CLARK | 1981-06-09 | > | 30 | WARD | 1981-02-22 | > | 30 | ALLEN | 1981-02-20 | > | 20 | JONES | 1981-02-04 | > | 30 | BLAKE | 1981-01-05 | > | 20 | SMITH | 1980-12-17 | > +--------+--------+------------+ > {code} > *Limitations* > `innodb-java-reader` has some prerequisites for `.ibd` files, please refer to > [Prerequisites](https://github.com/alibaba/innodb-java-reader#2-prerequisites). > You can think of the adapter as a simple MySQL server, with the ability to > query, dump data by offloading from MySQL process under some conditions. If > pages are not flushed from InnoDB Buffer Pool to disk, then the result may be > inconsistent (the LSN in `.ibd` file might smaller than in-memory pages). > InnoDB leverages write ahead log in terms of performance, so there is no > command available to flush all dirty pages. Only internal mechanism manages > when and where to persist pages to disk, like Page Cleaner thread, adaptive > flushing, etc. > Currently the InnoDB adapter does not aware row count and cardinality of a > `.ibd` data file, so it will only rely on simple rules to perform > optimization, once underlying storage engine could provide such metrics and > metadata, this can be integrated in Calcite by leveraging cost based > optimization in the future. > -- This message was sent by Atlassian Jira (v8.3.4#803005)