[ 
https://issues.apache.org/jira/browse/CALCITE-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210871#comment-17210871
 ] 

neoremind commented on CALCITE-4034:
------------------------------------

[~julianhyde] [~rubenql] Sorry about that I miss the alert of my mail, I was 
taking vacation these days.

I was trying to rebase before 1.26.0-release but end up with some failed test 
cases, glad that you split time to work on the PR, I will port to 
1.26.0-release ASAP, any problem will let you know.

> Implement a MySQL InnoDB adapter
> --------------------------------
>
>                 Key: CALCITE-4034
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4034
>             Project: Calcite
>          Issue Type: New Feature
>          Components: csv-adapter
>    Affects Versions: 1.23.0
>            Reporter: neoremind
>            Assignee: Julian Hyde
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Calcite’s InnoDB adapter allows you to query the data based on InnoDB data 
> files directy, data files are also known as .ibd files, this adapter is 
> different from JDBC adapter which requires a MySQL server to serve response. 
> With .ibd files and the corresponding DDLs, InnoDB adapter is able to work 
> like a simple "MySQL server", it accepts SQL query and attempts to compile 
> the query based on InnoDB file accessing APIs provided by 
> [innodb-java-reader|[https://github.com/alibaba/innodb-java-reader]] (I am 
> the author of the library, which is well-tested and production-ready), it 
> will exploit projecting, filtering and sorting directly in InnoDB data file 
> where possible. What’s more, with DDLs, the adapter is "index aware", it 
> leverages rules to choose the right index to scan, for example, using primary 
> key or secondary keys to look up data, then it tries to push down some 
> conditions into storage engine. Also, the adapter leaves option to provide 
> hint as well, so that user can indicate the optimizer to force use one 
> specific index.
> The InnoDB adapter can,
> 1) Full scan a table.
> 2) Scan by index such as primary key or secondary key, single column or 
> composite keys.
> 3) Push down filtering condition to storage engine, support both point query 
> and range query.
> 4) Push down projection.
> 5) Push down ordering if it matches the natural collation of an index.
> 6) Support almost all the commonly used data types.
> 7) Work as a simple MySQL server being able to read .ibd data files directly 
> through file system, this can be a tool to query or dump table data by 
> offloading from MySQL process under some conditions.
> Before I describe the adapter and its design, I was wondering if it is 
> appropriate to deliver the work by contributing the codebase back to Calcite, 
> or if it would be better to stay in another project, probably being referred 
> from Calcite adapter link. Looking forward to Calcite developer's advice.
>  
> Below is the first version of the usage.
> ----------
> [MySQL]([https://www.mysql.com/]) is the most popular Open Source SQL 
> database management system, is developed, distributed, and supported by 
> Oracle Corporation. InnoDB is a general-purpose storage engine that balances 
> high reliability and high performance in MySQL, since 5.6 InnoDB has become 
> the default MySQL storage engine.
> Calcite’s InnoDB adapter allows you to query the data based on InnoDB data 
> files directy, data files are also known as .ibd files. It leverages 
> [innodb-java-reader]([https://github.com/alibaba/innodb-java-reader]). This 
> adapter is different from JDBC adapter which maps a schema in a JDBC data 
> source and requires a MySQL server to serve response. With .ibd files and the 
> corresponding DDLs, InnoDB adapter is able to work like a simple "MySQL 
> server", it accepts SQL query and attempts to compile the query based on 
> InnoDB file accessing APIs provided by innodb-java-reader, it will exploit 
> projecting, filtering and sorting directly in InnoDB data file where possible.
> What’s more, with DDLs, the adapter is "index aware", it leverages rules to 
> choose the right index to scan, for example, using primary key or secondary 
> keys to look up data, then it tries to push down some conditions into storage 
> engine. Also, the adapter leaves option to provide hint as well, so that user 
> can indicate the optimizer to force use one specific index.
> A basic example of a model file is given below, this schema reads from a 
> MySQL "scott" database:
> {code:java}
> {
>   "version": "1.0",
>   "defaultSchema": "scott",
>   "schemas": [
>     {
>       "name": "scott",
>       "type": "custom",
>       "factory": "org.apache.calcite.adapter.innodb.InnodbSchemaFactory",
>       "operand": {
>         "sqlFilePath": [ "/path/scott.sql" ],
>         "ibdDataFileBasePath": "/usr/local/mysql/data/scott"
>       }
>     }
>   ]
> }
> {code}
> `sqlFilePath` is a list of DDL files, you can generate table definitions by 
> executing `_mysqldump -d -u<username> -p<password> -h <hostname> <dbname>_` 
> in command-line.
> The file content of `/path/scott.sql` is given below:
> {code:java}
> CREATE TABLE `DEPT`(
> `DEPTNO` TINYINT NOT NULL,
> `DNAME` VARCHAR(50) NOT NULL,
> `LOC` VARCHAR(20),
> UNIQUE KEY `DEPT_PK` (`DEPTNO`)
> )ENGINE=InnoDB DEFAULT CHARSET=latin1;
> CREATE TABLE `EMP`(
> `EMPNO` INT(11) NOT NULL,
> `ENAME` VARCHAR(100) NOT NULL,
> `JOB` VARCHAR(15) NOT NULL,
> `AGE` SMALLINT,
> `MGR` BIGINT,
> `HIREDATE` DATE,
> `SAL` DECIMAL(8,2) NOT NULL,
> `COMM` DECIMAL(6,2),
> `DEPTNO` TINYINT,
> `EMAIL` VARCHAR(100) DEFAULT NULL,
> `CREATE_DATETIME` DATETIME,
> `CREATE_TIME` TIME,
> `UPSERT_TIME` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE 
> CURRENT_TIMESTAMP,
> PRIMARY KEY (`EMPNO`),
> KEY `ENAME_KEY` (`ENAME`),
> KEY `HIREDATE_KEY` (`HIREDATE`),
> KEY `CREATE_DATETIME_JOB_KEY` (`CREATE_DATETIME`, `JOB`),
> KEY `CREATE_TIME_KEY` (`CREATE_TIME`),
> KEY `UPSERT_TIME_KEY` (`UPSERT_TIME`),
> KEY `DEPTNO_JOB_KEY` (`DEPTNO`, `JOB`),
> KEY `DEPTNO_SAL_COMM_KEY` (`DEPTNO`, `SAL`, `COMM`),
> KEY `DEPTNO_MGR_KEY` (`DEPTNO`, `MGR`),
> KEY `AGE_KEY` (`AGE`)
> )ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
> {code}
> ibdDataFileBasePath is the parent file path of .ibd files.
> Assuming the model file is stored as `model.json`, you can connect to InnoDB 
> data file to performa query via 
> [sqlline]([https://github.com/julianhyde/sqlline]) as follows:
> {code:java}
> sqlline> !connect jdbc:calcite:model=model.json admin admin{code}
> We can issue a simple query to query all employees by writing standard SQL:
> {code:java}
> sqlline> select empno,ename,job,age,mgr from "EMP";
> +-------+--------+-----------+-----+------+
> | EMPNO | ENAME  |    JOB    | AGE | MGR  |
> +-------+--------+-----------+-----+------+
> | 7369  | SMITH  | CLERK     | 30  | 7902 |
> | 7499  | ALLEN  | SALESMAN  | 24  | 7698 |
> | 7521  | WARD   | SALESMAN  | 41  | 7698 |
> | 7566  | JONES  | MANAGER   | 28  | 7839 |
> | 7654  | MARTIN | SALESMAN  | 27  | 7698 |
> | 7698  | BLAKE  | MANAGER   | 38  | 7839 |
> | 7782  | CLARK  | MANAGER   | 32  | 7839 |
> | 7788  | SCOTT  | ANALYST   | 45  | 7566 |
> | 7839  | KING   | PRESIDENT | 22  | null |
> | 7844  | TURNER | SALESMAN  | 54  | 7698 |
> | 7876  | ADAMS  | CLERK     | 35  | 7788 |
> | 7900  | JAMES  | CLERK     | 40  | 7698 |
> | 7902  | FORD   | ANALYST   | 28  | 7566 |
> | 7934  | MILLER | CLERK     | 32  | 7782 |
> +-------+--------+-----------+-----+------+
> {code}
> While executing this query, the InnoDB adapter scans the InnoDB data file 
> `EMP.ibd` using primary key, also known as clustering B+ tree index in MySQL, 
> and is able to 
> push down projection to underlying storage engine. Projection can reduce the 
> size of data fetched from the storage engine.
> We can look up one employee by filtering. The InnoDB adapter retrieves all 
> indexes through DDL file provided in `model.json`.
> {code:java}
> sqlline> select empno,ename,job,age,mgr from "EMP" where empno = 7782;
> +-------+-------+---------+-----+------+
> | EMPNO | ENAME |   JOB   | AGE | MGR  |
> +-------+-------+---------+-----+------+
> | 7782  | CLARK | MANAGER | 32  | 7839 |
> +-------+-------+---------+-----+------+
> {code}
> The InnoDB adapter is able to recognize that `empno` is the primary key and 
> do a point query lookup by using clustering index instead of full table scan.
> We can do range query on primary key as well.
> {code:java}
> sqlline> select empno,ename,job,age,mgr from "EMP" where empno > 7782 and 
> empno < 7900;{code}
> Note that such query with acceptable range is usually efficient in MySQL with 
> InnoDB storage engine, because for clustering B+ tree index, records close in 
> index are close in data file, which is good for scanning.
> We can look up employee by secondary key. For example, the filtering 
> condition will be on a `VARCHAR` field `ename`.
> {code:java}
> sqlline> select empno,ename,job,age,mgr from "EMP" where ename = 'smith';
> +-------+-------+-------+-----+------+
> | EMPNO | ENAME |  JOB  | AGE | MGR  |
> +-------+-------+-------+-----+------+
> | 7369  | SMITH | CLERK | 30  | 7902 |
> +-------+-------+-------+-----+------+
> {code}
> The InnoDB adapter works well on almost all the commonly used data types in 
> MySQL, for more information on supported data types, please refer to 
> [innodb-java-reader](https://github.com/alibaba/innodb-java-reader#3-features).
> We can query by composite key. For example, given secondary index of 
> `DEPTNO_MGR_KEY`.
> {code:java}
> sqlline> select empno,ename,job,age,mgr from "EMP" where deptno = 20 and mgr 
> = 7566;
> +-------+-------+---------+-----+------+
> | EMPNO | ENAME |   JOB   | AGE | MGR  |
> +-------+-------+---------+-----+------+
> | 7788  | SCOTT | ANALYST | 45  | 7566 |
> | 7902  | FORD  | ANALYST | 28  | 7566 |
> +-------+-------+---------+-----+------+
> {code}
> The InnoDB adapter will leverage the matched key `DEPTNO_MGR_KEY` to push 
> down filtering condition of `deptno = 20 and mgr = 7566`.
> In some cases, only part of the conditions can be pushed down since there is 
> a limitation in the underlying storage engine API, leaving unpushed remainder 
> conditions in the rest of the plan. Given the below SQL, only `deptno = 20` 
> is pushed down.
> {code:java}
> select empno,ename,job,age,mgr from "EMP" where deptno = 20 and upsert_time > 
> '2018-01-01 00:00:00';
> {code}
> `innodb-java-reader` only supports range query with lower and upper bound 
> using an index, not fully `Index Condition Pushdown (ICP)`. The storage 
> engine returns a range of rows and Calcite will evaluates the rest of `WHERE` 
> condition from the rows fetched.
> For the below SQL, there are multiple indexes satisfying the left-prefix 
> index rule, the possible indexes are `DEPTNO_JOB_KEY`, `DEPTNO_SAL_COMM_KEY` 
> and `DEPTNO_MGR_KEY`, the Innod adapter will choose one of them according to 
> the ordinal defined in DDL, only `deptno = 20` condition is pushed down, 
> leaving the rest of `WHERE` condition handled by Calcite built-in execution 
> engine.
> {code:java}
> sqlline> select empno,deptno,sal from "EMP" where deptno = 20 and sal > 2000;
> +-------+--------+---------+
> | EMPNO | DEPTNO |   SAL   |
> +-------+--------+---------+
> | 7788  | 20     | 3000.00 |
> | 7902  | 20     | 3000.00 |
> | 7566  | 20     | 2975.00 |
> +-------+--------+---------+
> {code}
> Accessing rows through secondary key requires scanning by secondary index and 
> retrieving records back to clustering index in InnoDB, for a "big" scan, that 
> would introduce many random I/O operations, so performance is usually not 
> good enough. Note that the query above can be more performant by using 
> `EPTNO_SAL_COMM_KEY` index, because covering index does not need to retrieve 
> back to clustering index. We can force using `DEPTNO_SAL_COMM_KEY` index by 
> hint as below.
> {code:java}
> sqlline> select empno,ename,job,age,sal from "emp"/*+ 
> index(DEPTNO_SAL_COMM_KEY) */ where deptno = 20 and sal > 2000;
> {code}
> Hint can be configured in `SqlToRelConverter`, to enable hint, you should 
> register `index` HintStrategy on `TableScan` in 
> `SqlToRelConverter.ConfigBuilder`. Index hint takes effect on the base 
> `TableScan` relational node, if there are conditions matching the index, 
> index condition can be pushed down as well. For the below SQL, although none 
> of the indexes can be used, but by leveraging covering index, the performance 
> is better than full table scan, we can force to use `DEPTNO_MGR_KEY` to scan 
> in secondary index.
> {code:java}
> sqlline> select empno,mgr from "EMP"/*+ index(DEPTNO_MGR_KEY) */ where mgr = 
> 7839;
> {code}
> Ordering can be pushed down if it matches the natural collation of the index 
> used.
> {code:java}
> sqlline> select deptno,ename,hiredate from "EMP" where hiredate < 
> '2020-01-01' order by hiredate desc;
> +--------+--------+------------+
> | DEPTNO | ENAME  |  HIREDATE  |
> +--------+--------+------------+
> | 20     | ADAMS  | 1987-05-23 |
> | 20     | SCOTT  | 1987-04-19 |
> | 10     | MILLER | 1982-01-23 |
> | 20     | FORD   | 1981-12-03 |
> | 30     | JAMES  | 1981-12-03 |
> | 10     | KING   | 1981-11-17 |
> | 30     | MARTIN | 1981-09-28 |
> | 30     | TURNER | 1981-09-08 |
> | 10     | CLARK  | 1981-06-09 |
> | 30     | WARD   | 1981-02-22 |
> | 30     | ALLEN  | 1981-02-20 |
> | 20     | JONES  | 1981-02-04 |
> | 30     | BLAKE  | 1981-01-05 |
> | 20     | SMITH  | 1980-12-17 |
> +--------+--------+------------+
> {code}
> *Limitations*
> `innodb-java-reader` has some prerequisites for `.ibd` files, please refer to 
> [Prerequisites](https://github.com/alibaba/innodb-java-reader#2-prerequisites).
> You can think of the adapter as a simple MySQL server, with the ability to 
> query, dump data by offloading from MySQL process under some conditions. If 
> pages are not flushed from InnoDB Buffer Pool to disk, then the result may be 
> inconsistent (the LSN in `.ibd` file might smaller than in-memory pages). 
> InnoDB leverages write ahead log in terms of performance, so there is no 
> command available to flush all dirty pages. Only internal mechanism manages 
> when and where to persist pages to disk, like Page Cleaner thread, adaptive 
> flushing, etc.
> Currently the InnoDB adapter does not aware row count and cardinality of a 
> `.ibd` data file, so it will only rely on simple rules to perform 
> optimization, once underlying storage engine could provide such metrics and 
> metadata, this can be integrated in Calcite by leveraging cost based 
> optimization in the future.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to