[ https://issues.apache.org/jira/browse/FLINK-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147470#comment-16147470 ]
ASF GitHub Bot commented on FLINK-6233: --------------------------------------- GitHub user xccui opened a pull request: https://github.com/apache/flink/pull/4625 [FLINK-6233] [table] Support time-bounded stream inner join in the SQL API ## What is the purpose of the change This PR aims add an implementation of the time-bounded stream inner join for both proctime and rowtime in the SQL API. For example, ``SELECT * from L, R WHERE L.pid = R.pid AND L.time between R.time + X and R.time + Y``. A design document for this problem can be found [here](http://goo.gl/VW5Gpd). ## Brief change log - I fill the missing part of the compiling stage for the rowtime stream inner join. - Some logics are added to the `WindowJoinUtil` to extract the rowtime indices. - A general `TimeBoundedStreamInnerJoin` is provided. - To test the new join function, I add a `TimeBoundedJoinExample` and some new tests to the `JoinHarnessTest`. ## Verifying this change This change added tests to the existing JoinHarnessTest. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (**no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (**no**) - The serializers: (**no**) - The runtime per-record code paths (performance sensitive): (**no**) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (**no**) ## Documentation - Does this pull request introduce a new feature? (**yes**) - If yes, how is the feature documented? (**not documented yet**) You can merge this pull request into a Git repository by running: $ git pull https://github.com/xccui/flink FLINK-6233 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4625.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4625 ---- commit c79588b134a1270956a6d32b7a0a13ff4e3f483d Author: Xingcan Cui <xingc...@gmail.com> Date: 2017-08-30T05:57:38Z [FLINK-6233] [table] Support rowtime inner equi-join between two streams in the SQL API ---- > Support rowtime inner equi-join between two streams in the SQL API > ------------------------------------------------------------------ > > Key: FLINK-6233 > URL: https://issues.apache.org/jira/browse/FLINK-6233 > Project: Flink > Issue Type: Sub-task > Components: Table API & SQL > Reporter: hongyuhong > Assignee: Xingcan Cui > > The goal of this issue is to add support for inner equi-join on proc time > streams to the SQL interface. > Queries similar to the following should be supported: > {code} > SELECT o.rowtime , o.productId, o.orderId, s.rowtime AS shipTime > FROM Orders AS o > JOIN Shipments AS s > ON o.orderId = s.orderId > AND o.rowtime BETWEEN s.rowtime AND s.rowtime + INTERVAL '1' HOUR; > {code} > The following restrictions should initially apply: > * The join hint only support inner join > * The ON clause should include equi-join condition > * The time-condition {{o.rowtime BETWEEN s.rowtime AND s.rowtime + INTERVAL > '1' HOUR}} only can use rowtime that is a system attribute, the time > condition only support bounded time range like {{o.rowtime BETWEEN s.rowtime > - INTERVAL '1' HOUR AND s.rowtime + INTERVAL '1' HOUR}}, not support > unbounded like {{o.rowtime < s.rowtime}} , and should include both two > stream's rowtime attribute, {{o.rowtime between rowtime () and rowtime () + > 1}} should also not be supported. > An row-time streams join will not be able to handle late data, because this > would mean in insert a row into a sorted order shift all other computations. > This would be too expensive to maintain. Therefore, we will throw an error if > a user tries to use an row-time stream join with late data handling. > This issue includes: > * Design of the DataStream operator to deal with stream join > * Translation from Calcite's RelNode representation (LogicalJoin). -- This message was sent by Atlassian JIRA (v6.4.14#64029)