[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-11-22 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
> In addition, the TableSource should know project information (including 
order) not just fieldIncluded. So maybe we should also adapt RowCsvInputFormat.

Do you mean we need to shuffle field in row according to projection while 
scanning a file? Why not to map row fields after that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-11-22 Thread wuchong
Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @tonycox , the 
[`RowCsvInputFormat`](https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/runtime/io/RowCsvInputFormat.scala#L100)
  read columns from left to right and write the value into `holders`. I mean we 
should have a `fieldMapping` to determine this value should be put into 
`holders` under which index. 

I think it will have a better performance than applying a map after that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-11-22 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
@wuchong I added noticed items


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-11-25 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
@wuchong could you review my changes?
cc @fhueske @StephanEwen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-11-25 Thread wuchong
Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @tonycox , the overall change looks good to me. I will do more thorough 
review in this weekend.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-11-25 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @tonycox, I'll have a look over the weekend or early next week as well.
Thanks, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-11-28 Thread wuchong
Github user wuchong commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @tonycox , thanks for the update. I'm thinking of an idea that 
`StreamProjectableTableSourceScan` seems duplicate with 
`StreamTableSourceScan`, why not reuse the `StreamTableSourceScan` with an 
addition field  `projects: Option[util.List[_ <: RexNode]]`.

What do you think ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-11-28 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @wuchong , thank you for the review. 
Do you mean only for `Stream`?
And why we shouldn't to override computeSelfCost for DataStreamRel ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-02 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @tonycox, can you please drop a comment when the PR is ready for review 
again.
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-05 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @fhueske , I see conflicts PR check, should I rebase new commits, or 
merge it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-05 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2810
  
Please rebase. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-05 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
@wuchong, @fhueske, question about field shuffling, should I shuffle it in 
RowCsvInputFormat by setting an order to scan, or let LogicalCal do it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-05 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2810
  
This is a good question @tonycox. First of all, we need to agree on the 
semantics of the `ProjectableTableSource.setProjection()` method. IMO, the 
table source must return a `DataSet` (or `DataStream`) with the fields in the 
order requested by `setProjection()`. If the `InputFormat` (or 
`SourceFunction`) is not able to control the order of fields, the table source 
needs to add a `MapFunction` to fix the order. This would happen in 
`BatchTableSource.getDataSet()` (or `StreamTableSource.getDataStream()` and 
hence be opaque to Calcite and not involve a `LogicalCalc`.

From a performance point of view I agree with @wuchong that it would be 
good to have correct records produced by the `RowCsvInputFormat`. However, I 
would see this as a "nice to have" feature. If you do not want to include it, 
it would be nice if you could open a JIRA to address this issue at some later 
point in time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-05 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
I see. @fhueske. Set an order of scaning is much better way. 
I have problem when plan is choosing. Best expression is not correct. Rules 
work, but planner decides to choose `MapFunction` anyway, without projection


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-05 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @tonycox, the plan choice issues might be related to FLINK-5226. I have 
a PR pending to fix this issue: #2926. You could try to built your PR on top of 
my PR (or for the sake of simplicity simply copy the changes).

Please let me know if that solves the problem. If not, I'll be happy to 
check our your branch to investigate what is going wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-05 Thread KurtYoung
Github user KurtYoung commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @tonycox , i want to appologize first for opening another "project 
pushdown" issue when this one is still in active. We actually come up with some 
whole new idea and designs compare to the original design here, so we think 
maybe it will be easier to open another pull request and let's discuss with 
some codes. 

But i noticed that this PR is becoming more similar with our design now 
(only some slightly differences), even some core codes are copied from our pull 
request. I think we should work out a plan to merge our work together. 

A litter opinion here: I think it's pointless to let CsvTableSource be a 
ProjecableTableSource since it simply cannot avoid reading extra fields in the 
first place. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-06 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @fhueske, your PR do solve my problem. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-06 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2810
  
@tonycox, great. Thanks for the notice! Is the PR ready to review from your 
point of view or are you still working on it?

@KurtYoung Thanks for reaching out to this PR. I'd like to propose the 
following. I'll have a look at both PRs and see how we can maybe split the 
effort into two commits. So, everybody gets a contribution out of this effort.

Regarding making `CsvTableSource` a `ProjectableTableSource`, it is true 
that `CsvTableSource` will not benefit as much as other sources. Even with 
pushed-down projections, it will still read all data and split it into rows. 
However, we can save some time in field parsing (esp. tailing fields are not 
parsed at all) and object creation (and later garbage collection). So, making 
`CsvTableSource` a `ProjectableTableSource` will mainly safe some CPU time and 
not IO. In addition, this is currently the only `BatchTableSource` and rather 
easy to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-06 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
@fhueske Basically yes, but I think we need finish projection rule for 
Stream, after FLINK-5251 merging


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-06 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @tonycox, @beyond1920, and @KurtYoung,

I think I found a good way to split the issue. Let me first summarize what 
both of your PRs implement:

@tonycox (PR #2810)
- good test coverage with tests that check the optimized plan 
(`TableTestBase`)
- adapted `RowCsvInputFormat` + `CsvTableSource` to support projection
- first changes for projection pushdown in streaming
- good cost function

@beyond1920 (PR #2923)
- Extracts fields from `RexProgram` and converts `RexProgram` (seems to 
reused in PR #2810)
- Rules work on `DataSetRel` instead of `LogicalRel` (#2810 rules seem to 
be inspired by this rule)
- too many ITCases
- no changes for streaming
- no projection support for `CsvTableSource`.

From my point of view PR #2923 provides a solid basis for this feature 
which seem to be partially reused in PR #2810 but lacks a few things that have 
been implemented in PR #2810. 
Therefore, I would propose the following:

PR #2923 is modified to cover the following:
- `ProjectableTableSource`
- `RexProgramProjectExtractor` + missing unit test
- Rule to push projection into `BatchTableSourceScan`
- Remove all ITCases except for one (the other PR will add more tests)
- Adapt the cost model for the `BatchTableSourceScan` as in PR #2810.

PR #2810 builds on PR#2923 and adds the following:
- Translation for `StreamTableSourceScan` + one ITCase
- `CsvTableSource` + `RowCsvInputFormat` changes (including tests and tests 
refactoring)
- The extensive plan tests based on `TableTestBase` for batch and streaming

If we agree on this plan, it would be good if PR #2923 would be adapted as 
soon as possible such that PR #2810 can be built on top.

What do you think @tonycox, @beyond1920, @KurtYoung?

Best, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-06 Thread KurtYoung
Github user KurtYoung commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @fhueske, I think this is a fair enough approach, thanks for your 
effort. We will change our pr ASAP.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-07 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @fhueske and @KurtYoung I totally agree with the plan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-08 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
#2926 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-13 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
Hi @fhueske , what should I finish first, this PR or FLINK-5188 ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-13 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2810
  
Thanks for asking @tonycox! FLINK-5188 is more urgent. We want to do the 
feature freeze by end of the week. Would be great if you could open a PR at 
Thursday the latest. If this is too soon, please let me know and I can work on 
FLINK-5188.

Thanks, Fabian


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-15 Thread tonycox
Github user tonycox commented on the issue:

https://github.com/apache/flink/pull/2810
  
@fhueske I update this PR according to last changes in master 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2810: [FLINK-3848] Add ProjectableTableSource interface and tra...

2016-12-16 Thread fhueske
Github user fhueske commented on the issue:

https://github.com/apache/flink/pull/2810
  
Thanks for the update @tonycox.
Merging this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---