[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 > In addition, the TableSource should know project information (including order) not just fieldIncluded. So maybe we should also adapt RowCsvInputFormat. Do you mean we need to shuffle field in row according to projection while scanning a file? Why not to map row fields after that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user wuchong commented on the issue: https://github.com/apache/flink/pull/2810 Hi @tonycox , the [`RowCsvInputFormat`](https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/runtime/io/RowCsvInputFormat.scala#L100) read columns from left to right and write the value into `holders`. I mean we should have a `fieldMapping` to determine this value should be put into `holders` under which index. I think it will have a better performance than applying a map after that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 @wuchong I added noticed items --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 @wuchong could you review my changes? cc @fhueske @StephanEwen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user wuchong commented on the issue: https://github.com/apache/flink/pull/2810 Hi @tonycox , the overall change looks good to me. I will do more thorough review in this weekend. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2810 Hi @tonycox, I'll have a look over the weekend or early next week as well. Thanks, Fabian --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user wuchong commented on the issue: https://github.com/apache/flink/pull/2810 Hi @tonycox , thanks for the update. I'm thinking of an idea that `StreamProjectableTableSourceScan` seems duplicate with `StreamTableSourceScan`, why not reuse the `StreamTableSourceScan` with an addition field `projects: Option[util.List[_ <: RexNode]]`. What do you think ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 Hi @wuchong , thank you for the review. Do you mean only for `Stream`? And why we shouldn't to override computeSelfCost for DataStreamRel ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2810 Hi @tonycox, can you please drop a comment when the PR is ready for review again. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 Hi @fhueske , I see conflicts PR check, should I rebase new commits, or merge it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2810 Please rebase. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 @wuchong, @fhueske, question about field shuffling, should I shuffle it in RowCsvInputFormat by setting an order to scan, or let LogicalCal do it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2810 This is a good question @tonycox. First of all, we need to agree on the semantics of the `ProjectableTableSource.setProjection()` method. IMO, the table source must return a `DataSet` (or `DataStream`) with the fields in the order requested by `setProjection()`. If the `InputFormat` (or `SourceFunction`) is not able to control the order of fields, the table source needs to add a `MapFunction` to fix the order. This would happen in `BatchTableSource.getDataSet()` (or `StreamTableSource.getDataStream()` and hence be opaque to Calcite and not involve a `LogicalCalc`. From a performance point of view I agree with @wuchong that it would be good to have correct records produced by the `RowCsvInputFormat`. However, I would see this as a "nice to have" feature. If you do not want to include it, it would be nice if you could open a JIRA to address this issue at some later point in time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 I see. @fhueske. Set an order of scaning is much better way. I have problem when plan is choosing. Best expression is not correct. Rules work, but planner decides to choose `MapFunction` anyway, without projection --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2810 Hi @tonycox, the plan choice issues might be related to FLINK-5226. I have a PR pending to fix this issue: #2926. You could try to built your PR on top of my PR (or for the sake of simplicity simply copy the changes). Please let me know if that solves the problem. If not, I'll be happy to check our your branch to investigate what is going wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user KurtYoung commented on the issue: https://github.com/apache/flink/pull/2810 Hi @tonycox , i want to appologize first for opening another "project pushdown" issue when this one is still in active. We actually come up with some whole new idea and designs compare to the original design here, so we think maybe it will be easier to open another pull request and let's discuss with some codes. But i noticed that this PR is becoming more similar with our design now (only some slightly differences), even some core codes are copied from our pull request. I think we should work out a plan to merge our work together. A litter opinion here: I think it's pointless to let CsvTableSource be a ProjecableTableSource since it simply cannot avoid reading extra fields in the first place. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 Hi @fhueske, your PR do solve my problem. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2810 @tonycox, great. Thanks for the notice! Is the PR ready to review from your point of view or are you still working on it? @KurtYoung Thanks for reaching out to this PR. I'd like to propose the following. I'll have a look at both PRs and see how we can maybe split the effort into two commits. So, everybody gets a contribution out of this effort. Regarding making `CsvTableSource` a `ProjectableTableSource`, it is true that `CsvTableSource` will not benefit as much as other sources. Even with pushed-down projections, it will still read all data and split it into rows. However, we can save some time in field parsing (esp. tailing fields are not parsed at all) and object creation (and later garbage collection). So, making `CsvTableSource` a `ProjectableTableSource` will mainly safe some CPU time and not IO. In addition, this is currently the only `BatchTableSource` and rather easy to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 @fhueske Basically yes, but I think we need finish projection rule for Stream, after FLINK-5251 merging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2810 Hi @tonycox, @beyond1920, and @KurtYoung, I think I found a good way to split the issue. Let me first summarize what both of your PRs implement: @tonycox (PR #2810) - good test coverage with tests that check the optimized plan (`TableTestBase`) - adapted `RowCsvInputFormat` + `CsvTableSource` to support projection - first changes for projection pushdown in streaming - good cost function @beyond1920 (PR #2923) - Extracts fields from `RexProgram` and converts `RexProgram` (seems to reused in PR #2810) - Rules work on `DataSetRel` instead of `LogicalRel` (#2810 rules seem to be inspired by this rule) - too many ITCases - no changes for streaming - no projection support for `CsvTableSource`. From my point of view PR #2923 provides a solid basis for this feature which seem to be partially reused in PR #2810 but lacks a few things that have been implemented in PR #2810. Therefore, I would propose the following: PR #2923 is modified to cover the following: - `ProjectableTableSource` - `RexProgramProjectExtractor` + missing unit test - Rule to push projection into `BatchTableSourceScan` - Remove all ITCases except for one (the other PR will add more tests) - Adapt the cost model for the `BatchTableSourceScan` as in PR #2810. PR #2810 builds on PR#2923 and adds the following: - Translation for `StreamTableSourceScan` + one ITCase - `CsvTableSource` + `RowCsvInputFormat` changes (including tests and tests refactoring) - The extensive plan tests based on `TableTestBase` for batch and streaming If we agree on this plan, it would be good if PR #2923 would be adapted as soon as possible such that PR #2810 can be built on top. What do you think @tonycox, @beyond1920, @KurtYoung? Best, Fabian --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user KurtYoung commented on the issue: https://github.com/apache/flink/pull/2810 Hi @fhueske, I think this is a fair enough approach, thanks for your effort. We will change our pr ASAP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 Hi @fhueske and @KurtYoung I totally agree with the plan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 #2926 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 Hi @fhueske , what should I finish first, this PR or FLINK-5188 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2810 Thanks for asking @tonycox! FLINK-5188 is more urgent. We want to do the feature freeze by end of the week. Would be great if you could open a PR at Thursday the latest. If this is too soon, please let me know and I can work on FLINK-5188. Thanks, Fabian --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user tonycox commented on the issue: https://github.com/apache/flink/pull/2810 @fhueske I update this PR according to last changes in master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2810 Thanks for the update @tonycox. Merging this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---