[GitHub] [flink] flinkbot edited a comment on issue #10815: [FLINK-15537][table-planner-blink] Type of keys should be `BinaryRow`…

2020-01-13 Thread GitBox
flinkbot edited a comment on issue #10815: [FLINK-15537][table-planner-blink] 
Type of keys should be `BinaryRow`…
URL: https://github.com/apache/flink/pull/10815#issuecomment-572566376
 
 
   
   ## CI report:
   
   * 19a4290f709495491fe460037c8c31d106984ea8 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/143732723) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4229)
 
   * c3ef5ea345a343170806de8112163edb7df31f69 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/144110200) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4284)
 
   * 941a5d4725dee3317ca05f8ab16eb103f61d3fcb Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144255612) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4312)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-15577) WindowAggregate RelNodes missing Window specs in digest

2020-01-13 Thread Benoit Hanotte (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014856#comment-17014856
 ] 

Benoit Hanotte commented on FLINK-15577:


Hi [~ykt836], yes, I'll push a PR today

> WindowAggregate RelNodes missing Window specs in digest
> ---
>
> Key: FLINK-15577
> URL: https://issues.apache.org/jira/browse/FLINK-15577
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Legacy Planner
>Affects Versions: 1.9.1
>Reporter: Benoit Hanotte
>Priority: Critical
>
> The RelNode's digest (AbstractRelNode.getDigest()), along its RowType, is 
> used by the Calcite HepPlanner to avoid adding duplicate Vertices to the 
> graph. If an equivalent vertex was already present in the graph, then that 
> vertex is used in place of the new generated one: 
> https://github.com/apache/calcite/blob/branch-1.21/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java#L828
> This means that *the digest needs to contain all the information necessary to 
> identify a vertex and distinguish it from similar - but not equivalent - 
> vertices*.
> In the case of `LogicalWindowAggregation` and 
> `FlinkLogicalWindowAggregation`, the window specs are currently not in the 
> digest, meaning that two aggregations with the same signatures and 
> expressions but different windows are considered equivalent by the planner, 
> which is not correct and will lead to an invalid Physical Plan.
> For instance, the following query would give an invalid plan:
> {code}
> WITH window_1h AS (
> SELECT HOP_ROWTIME(`timestamp`, INTERVAL '1' HOUR, INTERVAL '1' HOUR) as 
> `timestamp`
> FROM my_table
> GROUP BY HOP(`timestamp`, INTERVAL '1' HOUR, INTERVAL '1' HOUR)
> ),
> window_2h AS (
> SELECT HOP_ROWTIME(`timestamp`, INTERVAL '1' HOUR, INTERVAL '2' HOUR) as 
> `timestamp`
> FROM my_table
> GROUP BY HOP(`timestamp`, INTERVAL '1' HOUR, INTERVAL '2' HOUR)
> )
> (SELECT * FROM window_1h)
> UNION ALL
> (SELECT * FROM window_2h)
> {code}
> The invalid plan generated by the planner is the following (*Please note the 
> windows in the two DataStreamGroupWindowAggregates nodes being the same when 
> they should be different*):
> {code}
> DataStreamUnion(all=[true], union all=[timestamp]): rowcount = 200.0, 
> cumulative cost = {800.0 rows, 802.0 cpu, 0.0 io}, id = 176
>   DataStreamCalc(select=[w$rowtime AS timestamp]): rowcount = 100.0, 
> cumulative cost = {300.0 rows, 301.0 cpu, 0.0 io}, id = 173
> DataStreamGroupWindowAggregate(window=[SlidingGroupWindow('w$, 
> 'timestamp, 720.millis, 360.millis)], select=[start('w$) AS w$start, 
> end('w$) AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]): 
> rowcount = 100.0, cumulative cost = {200.0 rows, 201.0 cpu, 0.0 io}, id = 172
>   DataStreamScan(id=[1], fields=[timestamp]): rowcount = 100.0, 
> cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 171
>   DataStreamCalc(select=[w$rowtime AS timestamp]): rowcount = 100.0, 
> cumulative cost = {300.0 rows, 301.0 cpu, 0.0 io}, id = 175
> DataStreamGroupWindowAggregate(window=[SlidingGroupWindow('w$, 
> 'timestamp, 720.millis, 360.millis)], select=[start('w$) AS w$start, 
> end('w$) AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]): 
> rowcount = 100.0, cumulative cost = {200.0 rows, 201.0 cpu, 0.0 io}, id = 174
>   DataStreamScan(id=[1], fields=[timestamp]): rowcount = 100.0, 
> cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 171
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on issue #10736: [FLINK-15010][Network] Add shutdown hook to ensure cleanup netty shuffle directories

2020-01-13 Thread GitBox
flinkbot edited a comment on issue #10736: [FLINK-15010][Network] Add shutdown 
hook to ensure cleanup netty shuffle directories
URL: https://github.com/apache/flink/pull/10736#issuecomment-570066854
 
 
   
   ## CI report:
   
   * 4b605068e32d3eb15a51f838ed29918d1224959a Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/142804653) Azure: 
[FAILURE](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4021)
 
   * 9430066683a67318f9685de8a58904972c5dbaca Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/142829633) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4026)
 
   * 2185007c824d21817356c9dfb9c9e09846e27f7e UNKNOWN
   * d7ab35b18c5964b837be1d52611623d7c271dc99 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144257501) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4316)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-15579) Can not use jdbc connector on Blink batch mode

2020-01-13 Thread Shu Li Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shu Li Zheng updated FLINK-15579:
-
Affects Version/s: 1.9.0
   1.9.1

> Can not use jdbc connector on Blink batch mode 
> ---
>
> Key: FLINK-15579
> URL: https://issues.apache.org/jira/browse/FLINK-15579
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Shu Li Zheng
>Priority: Major
>
> Because JDBCTableSourceSinkFactory.createStreamTableSink() create 
> JDBCUpsertTableSink. But BatchExecSink can not work with 
> UpsertStreamTableSink.
> {code:scala}
>   override protected def translateToPlanInternal(
>   planner: BatchPlanner): Transformation[Any] = {
> val resultTransformation = sink match {
>   case _: RetractStreamTableSink[T] | _: UpsertStreamTableSink[T] =>
> throw new TableException("RetractStreamTableSink and 
> UpsertStreamTableSink is not" +
>   " supported in Batch environment.")
> {code}
> DDL like:
> CREATE TABLE USER_RESULT(
> NAME VARCHAR,
> CITY VARCHAR,
> SCORE BIGINT
> ) WITH (
> 'connector.type' = 'jdbc',
> 'connector.url' = '',
> 'connector.table' = '',
> 'connector.driver' = 'com.mysql.cj.jdbc.Driver',
> 'connector.username' = 'root',
> 'connector.password' = '',
> 'connector.write.flush.interval' = '1s')



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15576) remove isTemporary property from CatalogFunction API

2020-01-13 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014874#comment-17014874
 ] 

Jingsong Lee commented on FLINK-15576:
--

Hi [~phoenixjiangnan], looks like it is just an internal code refactor, no 
public api change, so it could not be a blocker?

> remove isTemporary property from CatalogFunction API
> 
>
> Key: FLINK-15576
> URL: https://issues.apache.org/jira/browse/FLINK-15576
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.10.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> according to FLIP-79, CatalogFunction shouldn't have "isTemporary" property. 
> Moving that from CatalogFunction to Create/AlterCatalogFunctionOperation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] KarmaGYZ commented on issue #10668: [hotfix] Align the parameter pattern of retry_times with retry_times_…

2020-01-13 Thread GitBox
KarmaGYZ commented on issue #10668: [hotfix] Align the parameter pattern of 
retry_times with retry_times_…
URL: https://github.com/apache/flink/pull/10668#issuecomment-574025389
 
 
   Kindly ping @tillrohrmann 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-15249) Improve PipelinedRegions calculation with Union Set

2020-01-13 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014900#comment-17014900
 ] 

Zhu Zhu commented on FLINK-15249:
-

Hi [~nppoly], I tried your patch with a testing topology. The result turned out 
to be that region building is even slower with the patch.
Here's the result (the value is region building time eplapsed in ms):

Pure master(2020-01-14):

build regions: 700
build regions: 675
build regions: 477
build regions: 447
build regions: 461
build regions: 430
build regions: 459
build regions: 482
build regions: 733
build regions: 455

Apply #10572 on master(2020-01-14):

build regions: 1298
build regions: 1384
build regions: 1029
build regions: 867
build regions: 995
build regions: 1042
build regions: 965
build regions: 1161
build regions: 1050
build regions: 1021

The testing topology is attached.

> Improve PipelinedRegions calculation with Union Set
> ---
>
> Key: FLINK-15249
> URL: https://issues.apache.org/jira/browse/FLINK-15249
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Chongchen Chen
>Priority: Major
>  Labels: pull-request-available
> Attachments: PipelinedRegionComputeUtil.diff
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Union Set's Merge Set cost is O(1). current implementation is O(N). the 
> attachment is patch.
> [Disjoint Set Data 
> Structure|[https://en.wikipedia.org/wiki/Disjoint-set_data_structure]]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on issue #10845: [FLINK-15355][plugins] Classloader avoids loading unrelated services.

2020-01-13 Thread GitBox
flinkbot edited a comment on issue #10845: [FLINK-15355][plugins] Classloader 
avoids loading unrelated services.
URL: https://github.com/apache/flink/pull/10845#issuecomment-573736294
 
 
   
   ## CI report:
   
   * 0942cb8b913ba50ecf8d7ca28832c4c92bf78e6c Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/144178023) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4303)
 
   * ce62e6539d1394a5d27d8ef51db010104852e433 Travis: 
[PENDING](https://travis-ci.com/flink-ci/flink/builds/144207965) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4304)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-11899) Introduce vectorized parquet InputFormat for blink runtime

2020-01-13 Thread Zhenqiu Huang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014906#comment-17014906
 ] 

Zhenqiu Huang commented on FLINK-11899:
---

[~lzljs3620320] 
I want to leverage existing 
org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader. So 
ParquetColumnarRowSplitReader will need a wrap on hive column vector as ORC. 
Not sure what's the efficient way of reading data directly to Flink Vector.

> Introduce vectorized parquet InputFormat for blink runtime
> --
>
> Key: FLINK-11899
> URL: https://issues.apache.org/jira/browse/FLINK-11899
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Reporter: Jingsong Lee
>Assignee: Zhenqiu Huang
>Priority: Major
> Fix For: 1.11.0
>
>
> VectorizedParquetInputFormat is introduced to read parquet data in batches.
> When returning each row of data, instead of actually retrieving each field, 
> we use BaseRow's abstraction to return a Columnar Row-like view.
> This will greatly improve the downstream filtered scenarios, so that there is 
> no need to access redundant fields on the filtered data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-15581) SpillingResettableMutableObjectIterator data overflow

2020-01-13 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-15581.
--
Resolution: Duplicate

> SpillingResettableMutableObjectIterator data overflow
> -
>
> Key: FLINK-15581
> URL: https://issues.apache.org/jira/browse/FLINK-15581
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet
>Affects Versions: 1.6.4, 1.7.2, 1.8.3, 1.9.1, 1.10.0
>Reporter: Piotr Nowojski
>Priority: Minor
>
> As [reported by a user on the mailing 
> list|https://lists.apache.org/thread.html/r1e3c53eaddfd8050c94ee4e521da4fc96a119662937cf801801bde52%40%3Cuser.flink.apache.org%3E]
> {quote}
> SpillingResettableMutableObjectIterator has a data overflow problem if the 
> number of elements in a single input exceeds Integer.MAX_VALUE.
> The reason is inside the SpillingResettableMutableObjectIterator, it track 
> the total number of elements and the number of elements currently read with 
> two int type fileds (elementCount and currentElementNum), and if the number 
> of elements exceeds Integer.MAX_VALUE, it will overflow.
> If there is an overflow, then in the next iteration, after reset the input , 
> the data will not be read or only part of the data will be read.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15581) SpillingResettableMutableObjectIterator data overflow

2020-01-13 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-15581:
--

 Summary: SpillingResettableMutableObjectIterator data overflow
 Key: FLINK-15581
 URL: https://issues.apache.org/jira/browse/FLINK-15581
 Project: Flink
  Issue Type: Bug
  Components: API / DataSet
Affects Versions: 1.9.1, 1.8.3, 1.7.2, 1.6.4, 1.10.0
Reporter: Piotr Nowojski


As [reported by a user on the mailing 
list|https://lists.apache.org/thread.html/r1e3c53eaddfd8050c94ee4e521da4fc96a119662937cf801801bde52%40%3Cuser.flink.apache.org%3E]
{quote}
SpillingResettableMutableObjectIterator has a data overflow problem if the 
number of elements in a single input exceeds Integer.MAX_VALUE.

The reason is inside the SpillingResettableMutableObjectIterator, it track the 
total number of elements and the number of elements currently read with two int 
type fileds (elementCount and currentElementNum), and if the number of elements 
exceeds Integer.MAX_VALUE, it will overflow.

If there is an overflow, then in the next iteration, after reset the input , 
the data will not be read or only part of the data will be read.
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15549) integer overflow in SpillingResettableMutableObjectIterator

2020-01-13 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-15549:
---
Affects Version/s: 1.6.4
   1.7.2
   1.8.3
   1.9.1

> integer overflow in SpillingResettableMutableObjectIterator
> ---
>
> Key: FLINK-15549
> URL: https://issues.apache.org/jira/browse/FLINK-15549
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.6.4, 1.7.2, 1.8.3, 1.9.1, 1.10.0
>Reporter: caojian0613
>Priority: Major
>  Labels: overflow
>
> The SpillingResettableMutableObjectIterator has a data overflow problem if 
> the number of elements in a single input exceeds Integer.MAX_VALUE.
> The reason is inside the SpillingResettableMutableObjectIterator, it track 
> the total number of elements and the number of elements currently read with 
> two int type fileds (elementCount and currentElementNum), and if the number 
> of elements exceeds Integer.MAX_VALUE, it will overflow.
> If there is an overflow, then in the next iteration, after reset the input , 
> the data will not be read or only part of the data will be read.
> Therefore, we should changing the type of these two fields of 
> SpillingResettableIterator* from int to long, and we also need a pre-check 
> mechanism before such numerical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15549) integer overflow in SpillingResettableMutableObjectIterator

2020-01-13 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-15549:
---
Component/s: (was: Runtime / Task)
 API / DataSet

> integer overflow in SpillingResettableMutableObjectIterator
> ---
>
> Key: FLINK-15549
> URL: https://issues.apache.org/jira/browse/FLINK-15549
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet
>Affects Versions: 1.6.4, 1.7.2, 1.8.3, 1.9.1, 1.10.0
>Reporter: caojian0613
>Priority: Major
>  Labels: overflow
>
> The SpillingResettableMutableObjectIterator has a data overflow problem if 
> the number of elements in a single input exceeds Integer.MAX_VALUE.
> The reason is inside the SpillingResettableMutableObjectIterator, it track 
> the total number of elements and the number of elements currently read with 
> two int type fileds (elementCount and currentElementNum), and if the number 
> of elements exceeds Integer.MAX_VALUE, it will overflow.
> If there is an overflow, then in the next iteration, after reset the input , 
> the data will not be read or only part of the data will be read.
> Therefore, we should changing the type of these two fields of 
> SpillingResettableIterator* from int to long, and we also need a pre-check 
> mechanism before such numerical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15580) Add baseurl to docs/build_docs.sh

2020-01-13 Thread Benchao Li (Jira)
Benchao Li created FLINK-15580:
--

 Summary: Add baseurl to docs/build_docs.sh
 Key: FLINK-15580
 URL: https://issues.apache.org/jira/browse/FLINK-15580
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.9.1, 1.10.0
Reporter: Benchao Li


As discussed in https://issues.apache.org/jira/browse/FLINK-15559

Currently, we did not set {{baseurl}}, which made broken links due to missing 
{{baseurl}} not easy to detect.

So I propose we add a {{baseurl}} to {{build_docs.sh}}'s -i & -p mode, for 
example  {{/projects/flink/local-baseurl}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] Li-Aihua commented on issue #10378: [FLINK-14919] [flink-end-to-end-perf-tests] Add performance e2e test suite for basic operations

2020-01-13 Thread GitBox
Li-Aihua commented on issue #10378: [FLINK-14919] [flink-end-to-end-perf-tests] 
Add performance e2e test suite for basic operations
URL: https://github.com/apache/flink/pull/10378#issuecomment-574011447
 
 
   > I'm not convinced that the `PerformanceTestJobBase` is the right approach. 
Could you expand a bit on the design and why you selected this one?
   > 
   > I'm wondering what the point of all these setters are; do we expect some 
subclasses to override them?
   > Why are some methods returning values despite them not being used?
   > Why are _all_ methods public despite not being used?
   
   There were some problems with the previous implementation, I updated it 
again, you can review it again.thanks。


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] flinkbot edited a comment on issue #10378: [FLINK-14919] [flink-end-to-end-perf-tests] Add performance e2e test suite for basic operations

2020-01-13 Thread GitBox
flinkbot edited a comment on issue #10378: [FLINK-14919] 
[flink-end-to-end-perf-tests] Add performance e2e test suite for basic 
operations
URL: https://github.com/apache/flink/pull/10378#issuecomment-560291002
 
 
   
   ## CI report:
   
   * 5a9b8cc32f275d416e8358ee148ad4daff2626ba Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/138907609) 
   * cc2ef5f9dfc57a895046c3f478f22cb508f2a716 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/138941833) 
   * 5b46ddf6c512ca856e30a4aae56e0be771b94393 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/138957594) 
   * abbb96dbf131d246b9697590824916a4590f4bf3 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/139076775) 
   * 496f4834a8345968ba0f988cd45af065b5f73db3 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/139083484) 
   * b5e3b371ac86fa7dc6e24010f060d6ee8395f991 Travis: 
[CANCELED](https://travis-ci.com/flink-ci/flink/builds/139790553) 
   * 286491ed4bb010c501cea2e5084d348290bf1357 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/139795221) 
   * b8b10dbdfa1f15379ba7dec6ee7a99d8eb73c3c5 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/139865418) 
   * 0c419a072b4ae3448a5f21aa8622d0354e2f6193 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/141182278) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3629)
 
   * 63a748011eeabfedcc4a52d0264a4780fb63ad3b Travis: 
[PENDING](https://travis-ci.com/flink-ci/flink/builds/144267619) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4317)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-14163) Execution#producedPartitions is possibly not assigned when used

2020-01-13 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014886#comment-17014886
 ] 

Yuan Mei commented on FLINK-14163:
--

 

Thanks [~azagrebin] and [~zhuzh]! I do not have a strong opinion on whether to 
add a time-out or not. If I have to choose, I would probably prefer to going 
with a simple check because this makes system behavior consistent and easy to 
reason about.

 

I double checked with [~zjwang] before he left (he is on vacation right now), 
and he is fine with either case as well.

> Execution#producedPartitions is possibly not assigned when used
> ---
>
> Key: FLINK-14163
> URL: https://issues.apache.org/jira/browse/FLINK-14163
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Zhu Zhu
>Assignee: Yuan Mei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.10.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently {{Execution#producedPartitions}} is assigned after the partitions 
> have completed the registration to shuffle master in 
> {{Execution#registerProducedPartitions(...)}}.
> The partition registration is an async interface 
> ({{ShuffleMaster#registerPartitionWithProducer(...)}}), so 
> {{Execution#producedPartitions}} is possible[1] not set when used. 
> Usages includes:
> 1. deploying this task, so that the task may be deployed without its result 
> partitions assigned, and the job would hang. (DefaultScheduler issue only, 
> since legacy scheduler handled this case)
> 2. generating input descriptors for downstream tasks: 
> 3. retrieve {{ResultPartitionID}} for partition releasing: 
> [1] If a user uses Flink default shuffle master {{NettyShuffleMaster}}, it is 
> not problematic at the moment since it returns a completed future on 
> registration, so that it would be a synchronized process. However, if users 
> implement their own shuffle service in which the 
> {{ShuffleMaster#registerPartitionWithProducer}} returns an pending future, it 
> can be a problem. This is possible since customizable shuffle service is open 
> to users since 1.9 (via config "shuffle-service-factory.class").
> To avoid issues to happen, we may either 
> 1. fix all the usages of {{Execution#producedPartitions}} regarding the async 
> assigning, or 
> 2. change {{ShuffleMaster#registerPartitionWithProducer(...)}} to a sync 
> interface



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] AHeise commented on issue #10845: [FLINK-15355][plugins] Classloader avoids loading unrelated services.

2020-01-13 Thread GitBox
AHeise commented on issue #10845: [FLINK-15355][plugins] Classloader avoids 
loading unrelated services.
URL: https://github.com/apache/flink/pull/10845#issuecomment-574032048
 
 
   @flinkbot run travis


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] AHeise commented on issue #10845: [FLINK-15355][plugins] Classloader avoids loading unrelated services.

2020-01-13 Thread GitBox
AHeise commented on issue #10845: [FLINK-15355][plugins] Classloader avoids 
loading unrelated services.
URL: https://github.com/apache/flink/pull/10845#issuecomment-574032079
 
 
   @flinkbot run azure


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (FLINK-15249) Improve PipelinedRegions calculation with Union Set

2020-01-13 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-15249:

Attachment: RegionFailoverPerfTest.java

> Improve PipelinedRegions calculation with Union Set
> ---
>
> Key: FLINK-15249
> URL: https://issues.apache.org/jira/browse/FLINK-15249
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Reporter: Chongchen Chen
>Priority: Major
>  Labels: pull-request-available
> Attachments: PipelinedRegionComputeUtil.diff, 
> RegionFailoverPerfTest.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Union Set's Merge Set cost is O(1). current implementation is O(N). the 
> attachment is patch.
> [Disjoint Set Data 
> Structure|[https://en.wikipedia.org/wiki/Disjoint-set_data_structure]]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15577) WindowAggregate RelNodes missing Window specs in digest

2020-01-13 Thread Benoit Hanotte (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014902#comment-17014902
 ] 

Benoit Hanotte commented on FLINK-15577:


[~ykt836] yes, I'll have a look at both

> WindowAggregate RelNodes missing Window specs in digest
> ---
>
> Key: FLINK-15577
> URL: https://issues.apache.org/jira/browse/FLINK-15577
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Legacy Planner
>Affects Versions: 1.9.1
>Reporter: Benoit Hanotte
>Assignee: Benoit Hanotte
>Priority: Critical
>
> The RelNode's digest (AbstractRelNode.getDigest()), along with its RowType, 
> is used by the Calcite HepPlanner to avoid adding duplicate Vertices to the 
> graph. If an equivalent vertex is already present in the graph, then that 
> vertex is used in place of the newly generated one: 
> https://github.com/apache/calcite/blob/branch-1.21/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java#L828
> This means that *the digest needs to contain all the information necessary to 
> identify a vertex and distinguish it from similar - but not equivalent - 
> vertices*.
> In the case of `LogicalWindowAggregation` and 
> `FlinkLogicalWindowAggregation`, the window specs are currently not in the 
> digest, meaning that two aggregations with the same signatures and 
> expressions but different windows are considered equivalent by the planner, 
> which is not correct and will lead to an invalid Physical Plan.
> For instance, the following query would give an invalid plan:
> {code}
> WITH window_1h AS (
> SELECT HOP_ROWTIME(`timestamp`, INTERVAL '1' HOUR, INTERVAL '1' HOUR) as 
> `timestamp`
> FROM my_table
> GROUP BY HOP(`timestamp`, INTERVAL '1' HOUR, INTERVAL '1' HOUR)
> ),
> window_2h AS (
> SELECT HOP_ROWTIME(`timestamp`, INTERVAL '1' HOUR, INTERVAL '2' HOUR) as 
> `timestamp`
> FROM my_table
> GROUP BY HOP(`timestamp`, INTERVAL '1' HOUR, INTERVAL '2' HOUR)
> )
> (SELECT * FROM window_1h)
> UNION ALL
> (SELECT * FROM window_2h)
> {code}
> The invalid plan generated by the planner is the following (*Please note the 
> windows in the two DataStreamGroupWindowAggregates nodes being the same when 
> they should be different*):
> {code}
> DataStreamUnion(all=[true], union all=[timestamp]): rowcount = 200.0, 
> cumulative cost = {800.0 rows, 802.0 cpu, 0.0 io}, id = 176
>   DataStreamCalc(select=[w$rowtime AS timestamp]): rowcount = 100.0, 
> cumulative cost = {300.0 rows, 301.0 cpu, 0.0 io}, id = 173
> DataStreamGroupWindowAggregate(window=[SlidingGroupWindow('w$, 
> 'timestamp, 720.millis, 360.millis)], select=[start('w$) AS w$start, 
> end('w$) AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]): 
> rowcount = 100.0, cumulative cost = {200.0 rows, 201.0 cpu, 0.0 io}, id = 172
>   DataStreamScan(id=[1], fields=[timestamp]): rowcount = 100.0, 
> cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 171
>   DataStreamCalc(select=[w$rowtime AS timestamp]): rowcount = 100.0, 
> cumulative cost = {300.0 rows, 301.0 cpu, 0.0 io}, id = 175
> DataStreamGroupWindowAggregate(window=[SlidingGroupWindow('w$, 
> 'timestamp, 720.millis, 360.millis)], select=[start('w$) AS w$start, 
> end('w$) AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]): 
> rowcount = 100.0, cumulative cost = {200.0 rows, 201.0 cpu, 0.0 io}, id = 174
>   DataStreamScan(id=[1], fields=[timestamp]): rowcount = 100.0, 
> cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 171
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on issue #10849: [hotfix] [javadocs] fix typo in KafkaTopicPartitionStateWithPunctuatedWatermarks class introduction

2020-01-13 Thread GitBox
flinkbot commented on issue #10849: [hotfix] [javadocs] fix typo in 
KafkaTopicPartitionStateWithPunctuatedWatermarks class introduction
URL: https://github.com/apache/flink/pull/10849#issuecomment-574047859
 
 
   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit c780c93aad6554dab17acb7110246b957d6daf77 (Tue Jan 14 
07:48:09 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-11899) Introduce vectorized parquet InputFormat for blink runtime

2020-01-13 Thread Jingsong Lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014909#comment-17014909
 ] 

Jingsong Lee commented on FLINK-11899:
--

Hi [~hpeter], another way is re-writing readers instead of re-using hive 
readers, just like:

[https://github.com/flink-tpc-ds/flink/tree/tpcds-master/flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/parquet]

I think that is better way, otherwise we will deal with multi hive versions 
again... Which is annoying.

> Introduce vectorized parquet InputFormat for blink runtime
> --
>
> Key: FLINK-11899
> URL: https://issues.apache.org/jira/browse/FLINK-11899
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Reporter: Jingsong Lee
>Assignee: Zhenqiu Huang
>Priority: Major
> Fix For: 1.11.0
>
>
> VectorizedParquetInputFormat is introduced to read parquet data in batches.
> When returning each row of data, instead of actually retrieving each field, 
> we use BaseRow's abstraction to return a Columnar Row-like view.
> This will greatly improve the downstream filtered scenarios, so that there is 
> no need to access redundant fields on the filtered data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-15549) integer overflow in SpillingResettableMutableObjectIterator

2020-01-13 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski reassigned FLINK-15549:
--

Assignee: caojian0613

> integer overflow in SpillingResettableMutableObjectIterator
> ---
>
> Key: FLINK-15549
> URL: https://issues.apache.org/jira/browse/FLINK-15549
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet
>Affects Versions: 1.6.4, 1.7.2, 1.8.3, 1.9.1, 1.10.0
>Reporter: caojian0613
>Assignee: caojian0613
>Priority: Major
>  Labels: overflow
>
> The SpillingResettableMutableObjectIterator has a data overflow problem if 
> the number of elements in a single input exceeds Integer.MAX_VALUE.
> The reason is inside the SpillingResettableMutableObjectIterator, it track 
> the total number of elements and the number of elements currently read with 
> two int type fileds (elementCount and currentElementNum), and if the number 
> of elements exceeds Integer.MAX_VALUE, it will overflow.
> If there is an overflow, then in the next iteration, after reset the input , 
> the data will not be read or only part of the data will be read.
> Therefore, we should changing the type of these two fields of 
> SpillingResettableIterator* from int to long, and we also need a pre-check 
> mechanism before such numerical.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15581) SpillingResettableMutableObjectIterator data overflow

2020-01-13 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014911#comment-17014911
 ] 

Benchao Li commented on FLINK-15581:


Hi [~pnowojski], I noticed that [~caojian0613] has already created a issue to 
track this: https://issues.apache.org/jira/browse/FLINK-15549

> SpillingResettableMutableObjectIterator data overflow
> -
>
> Key: FLINK-15581
> URL: https://issues.apache.org/jira/browse/FLINK-15581
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet
>Affects Versions: 1.6.4, 1.7.2, 1.8.3, 1.9.1, 1.10.0
>Reporter: Piotr Nowojski
>Priority: Minor
>
> As [reported by a user on the mailing 
> list|https://lists.apache.org/thread.html/r1e3c53eaddfd8050c94ee4e521da4fc96a119662937cf801801bde52%40%3Cuser.flink.apache.org%3E]
> {quote}
> SpillingResettableMutableObjectIterator has a data overflow problem if the 
> number of elements in a single input exceeds Integer.MAX_VALUE.
> The reason is inside the SpillingResettableMutableObjectIterator, it track 
> the total number of elements and the number of elements currently read with 
> two int type fileds (elementCount and currentElementNum), and if the number 
> of elements exceeds Integer.MAX_VALUE, it will overflow.
> If there is an overflow, then in the next iteration, after reset the input , 
> the data will not be read or only part of the data will be read.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on issue #10378: [FLINK-14919] [flink-end-to-end-perf-tests] Add performance e2e test suite for basic operations

2020-01-13 Thread GitBox
flinkbot edited a comment on issue #10378: [FLINK-14919] 
[flink-end-to-end-perf-tests] Add performance e2e test suite for basic 
operations
URL: https://github.com/apache/flink/pull/10378#issuecomment-560291002
 
 
   
   ## CI report:
   
   * 5a9b8cc32f275d416e8358ee148ad4daff2626ba Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/138907609) 
   * cc2ef5f9dfc57a895046c3f478f22cb508f2a716 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/138941833) 
   * 5b46ddf6c512ca856e30a4aae56e0be771b94393 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/138957594) 
   * abbb96dbf131d246b9697590824916a4590f4bf3 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/139076775) 
   * 496f4834a8345968ba0f988cd45af065b5f73db3 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/139083484) 
   * b5e3b371ac86fa7dc6e24010f060d6ee8395f991 Travis: 
[CANCELED](https://travis-ci.com/flink-ci/flink/builds/139790553) 
   * 286491ed4bb010c501cea2e5084d348290bf1357 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/139795221) 
   * b8b10dbdfa1f15379ba7dec6ee7a99d8eb73c3c5 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/139865418) 
   * 0c419a072b4ae3448a5f21aa8622d0354e2f6193 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/141182278) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3629)
 
   * 63a748011eeabfedcc4a52d0264a4780fb63ad3b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-15579) Can not use jdbc connector on Blink batch mode

2020-01-13 Thread Kurt Young (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014875#comment-17014875
 ] 

Kurt Young commented on FLINK-15579:


So you want to use JDBC as a batch sink?

> Can not use jdbc connector on Blink batch mode 
> ---
>
> Key: FLINK-15579
> URL: https://issues.apache.org/jira/browse/FLINK-15579
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Shu Li Zheng
>Priority: Major
>
> Because JDBCTableSourceSinkFactory.createStreamTableSink() create 
> JDBCUpsertTableSink. But BatchExecSink can not work with 
> UpsertStreamTableSink.
> {code:scala}
>   override protected def translateToPlanInternal(
>   planner: BatchPlanner): Transformation[Any] = {
> val resultTransformation = sink match {
>   case _: RetractStreamTableSink[T] | _: UpsertStreamTableSink[T] =>
> throw new TableException("RetractStreamTableSink and 
> UpsertStreamTableSink is not" +
>   " supported in Batch environment.")
> {code}
> DDL like:
> CREATE TABLE USER_RESULT(
> NAME VARCHAR,
> CITY VARCHAR,
> SCORE BIGINT
> ) WITH (
> 'connector.type' = 'jdbc',
> 'connector.url' = '',
> 'connector.table' = '',
> 'connector.driver' = 'com.mysql.cj.jdbc.Driver',
> 'connector.username' = 'root',
> 'connector.password' = '',
> 'connector.write.flush.interval' = '1s')



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-15579) Can not use jdbc connector on Blink batch mode

2020-01-13 Thread Kurt Young (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Young updated FLINK-15579:
---
Issue Type: Improvement  (was: Bug)

> Can not use jdbc connector on Blink batch mode 
> ---
>
> Key: FLINK-15579
> URL: https://issues.apache.org/jira/browse/FLINK-15579
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Shu Li Zheng
>Priority: Major
>
> Because JDBCTableSourceSinkFactory.createStreamTableSink() create 
> JDBCUpsertTableSink. But BatchExecSink can not work with 
> UpsertStreamTableSink.
> {code:scala}
>   override protected def translateToPlanInternal(
>   planner: BatchPlanner): Transformation[Any] = {
> val resultTransformation = sink match {
>   case _: RetractStreamTableSink[T] | _: UpsertStreamTableSink[T] =>
> throw new TableException("RetractStreamTableSink and 
> UpsertStreamTableSink is not" +
>   " supported in Batch environment.")
> {code}
> DDL like:
> CREATE TABLE USER_RESULT(
> NAME VARCHAR,
> CITY VARCHAR,
> SCORE BIGINT
> ) WITH (
> 'connector.type' = 'jdbc',
> 'connector.url' = '',
> 'connector.table' = '',
> 'connector.driver' = 'com.mysql.cj.jdbc.Driver',
> 'connector.username' = 'root',
> 'connector.password' = '',
> 'connector.write.flush.interval' = '1s')



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-15577) WindowAggregate RelNodes missing Window specs in digest

2020-01-13 Thread Kurt Young (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Young reassigned FLINK-15577:
--

Assignee: Benoit Hanotte

> WindowAggregate RelNodes missing Window specs in digest
> ---
>
> Key: FLINK-15577
> URL: https://issues.apache.org/jira/browse/FLINK-15577
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Legacy Planner
>Affects Versions: 1.9.1
>Reporter: Benoit Hanotte
>Assignee: Benoit Hanotte
>Priority: Critical
>
> The RelNode's digest (AbstractRelNode.getDigest()), along with its RowType, 
> is used by the Calcite HepPlanner to avoid adding duplicate Vertices to the 
> graph. If an equivalent vertex is already present in the graph, then that 
> vertex is used in place of the newly generated one: 
> https://github.com/apache/calcite/blob/branch-1.21/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java#L828
> This means that *the digest needs to contain all the information necessary to 
> identify a vertex and distinguish it from similar - but not equivalent - 
> vertices*.
> In the case of `LogicalWindowAggregation` and 
> `FlinkLogicalWindowAggregation`, the window specs are currently not in the 
> digest, meaning that two aggregations with the same signatures and 
> expressions but different windows are considered equivalent by the planner, 
> which is not correct and will lead to an invalid Physical Plan.
> For instance, the following query would give an invalid plan:
> {code}
> WITH window_1h AS (
> SELECT HOP_ROWTIME(`timestamp`, INTERVAL '1' HOUR, INTERVAL '1' HOUR) as 
> `timestamp`
> FROM my_table
> GROUP BY HOP(`timestamp`, INTERVAL '1' HOUR, INTERVAL '1' HOUR)
> ),
> window_2h AS (
> SELECT HOP_ROWTIME(`timestamp`, INTERVAL '1' HOUR, INTERVAL '2' HOUR) as 
> `timestamp`
> FROM my_table
> GROUP BY HOP(`timestamp`, INTERVAL '1' HOUR, INTERVAL '2' HOUR)
> )
> (SELECT * FROM window_1h)
> UNION ALL
> (SELECT * FROM window_2h)
> {code}
> The invalid plan generated by the planner is the following (*Please note the 
> windows in the two DataStreamGroupWindowAggregates nodes being the same when 
> they should be different*):
> {code}
> DataStreamUnion(all=[true], union all=[timestamp]): rowcount = 200.0, 
> cumulative cost = {800.0 rows, 802.0 cpu, 0.0 io}, id = 176
>   DataStreamCalc(select=[w$rowtime AS timestamp]): rowcount = 100.0, 
> cumulative cost = {300.0 rows, 301.0 cpu, 0.0 io}, id = 173
> DataStreamGroupWindowAggregate(window=[SlidingGroupWindow('w$, 
> 'timestamp, 720.millis, 360.millis)], select=[start('w$) AS w$start, 
> end('w$) AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]): 
> rowcount = 100.0, cumulative cost = {200.0 rows, 201.0 cpu, 0.0 io}, id = 172
>   DataStreamScan(id=[1], fields=[timestamp]): rowcount = 100.0, 
> cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 171
>   DataStreamCalc(select=[w$rowtime AS timestamp]): rowcount = 100.0, 
> cumulative cost = {300.0 rows, 301.0 cpu, 0.0 io}, id = 175
> DataStreamGroupWindowAggregate(window=[SlidingGroupWindow('w$, 
> 'timestamp, 720.millis, 360.millis)], select=[start('w$) AS w$start, 
> end('w$) AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]): 
> rowcount = 100.0, cumulative cost = {200.0 rows, 201.0 cpu, 0.0 io}, id = 174
>   DataStreamScan(id=[1], fields=[timestamp]): rowcount = 100.0, 
> cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 171
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15577) WindowAggregate RelNodes missing Window specs in digest

2020-01-13 Thread Kurt Young (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014876#comment-17014876
 ] 

Kurt Young commented on FLINK-15577:


[~b.hanotte] Thanks for the fix, looks like blink planner also has this issue, 
could you fix both?

> WindowAggregate RelNodes missing Window specs in digest
> ---
>
> Key: FLINK-15577
> URL: https://issues.apache.org/jira/browse/FLINK-15577
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Legacy Planner
>Affects Versions: 1.9.1
>Reporter: Benoit Hanotte
>Priority: Critical
>
> The RelNode's digest (AbstractRelNode.getDigest()), along with its RowType, 
> is used by the Calcite HepPlanner to avoid adding duplicate Vertices to the 
> graph. If an equivalent vertex is already present in the graph, then that 
> vertex is used in place of the newly generated one: 
> https://github.com/apache/calcite/blob/branch-1.21/core/src/main/java/org/apache/calcite/plan/hep/HepPlanner.java#L828
> This means that *the digest needs to contain all the information necessary to 
> identify a vertex and distinguish it from similar - but not equivalent - 
> vertices*.
> In the case of `LogicalWindowAggregation` and 
> `FlinkLogicalWindowAggregation`, the window specs are currently not in the 
> digest, meaning that two aggregations with the same signatures and 
> expressions but different windows are considered equivalent by the planner, 
> which is not correct and will lead to an invalid Physical Plan.
> For instance, the following query would give an invalid plan:
> {code}
> WITH window_1h AS (
> SELECT HOP_ROWTIME(`timestamp`, INTERVAL '1' HOUR, INTERVAL '1' HOUR) as 
> `timestamp`
> FROM my_table
> GROUP BY HOP(`timestamp`, INTERVAL '1' HOUR, INTERVAL '1' HOUR)
> ),
> window_2h AS (
> SELECT HOP_ROWTIME(`timestamp`, INTERVAL '1' HOUR, INTERVAL '2' HOUR) as 
> `timestamp`
> FROM my_table
> GROUP BY HOP(`timestamp`, INTERVAL '1' HOUR, INTERVAL '2' HOUR)
> )
> (SELECT * FROM window_1h)
> UNION ALL
> (SELECT * FROM window_2h)
> {code}
> The invalid plan generated by the planner is the following (*Please note the 
> windows in the two DataStreamGroupWindowAggregates nodes being the same when 
> they should be different*):
> {code}
> DataStreamUnion(all=[true], union all=[timestamp]): rowcount = 200.0, 
> cumulative cost = {800.0 rows, 802.0 cpu, 0.0 io}, id = 176
>   DataStreamCalc(select=[w$rowtime AS timestamp]): rowcount = 100.0, 
> cumulative cost = {300.0 rows, 301.0 cpu, 0.0 io}, id = 173
> DataStreamGroupWindowAggregate(window=[SlidingGroupWindow('w$, 
> 'timestamp, 720.millis, 360.millis)], select=[start('w$) AS w$start, 
> end('w$) AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]): 
> rowcount = 100.0, cumulative cost = {200.0 rows, 201.0 cpu, 0.0 io}, id = 172
>   DataStreamScan(id=[1], fields=[timestamp]): rowcount = 100.0, 
> cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 171
>   DataStreamCalc(select=[w$rowtime AS timestamp]): rowcount = 100.0, 
> cumulative cost = {300.0 rows, 301.0 cpu, 0.0 io}, id = 175
> DataStreamGroupWindowAggregate(window=[SlidingGroupWindow('w$, 
> 'timestamp, 720.millis, 360.millis)], select=[start('w$) AS w$start, 
> end('w$) AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]): 
> rowcount = 100.0, cumulative cost = {200.0 rows, 201.0 cpu, 0.0 io}, id = 174
>   DataStreamScan(id=[1], fields=[timestamp]): rowcount = 100.0, 
> cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 171
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on issue #10378: [FLINK-14919] [flink-end-to-end-perf-tests] Add performance e2e test suite for basic operations

2020-01-13 Thread GitBox
flinkbot edited a comment on issue #10378: [FLINK-14919] 
[flink-end-to-end-perf-tests] Add performance e2e test suite for basic 
operations
URL: https://github.com/apache/flink/pull/10378#issuecomment-560291002
 
 
   
   ## CI report:
   
   * 5a9b8cc32f275d416e8358ee148ad4daff2626ba Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/138907609) 
   * cc2ef5f9dfc57a895046c3f478f22cb508f2a716 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/138941833) 
   * 5b46ddf6c512ca856e30a4aae56e0be771b94393 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/138957594) 
   * abbb96dbf131d246b9697590824916a4590f4bf3 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/139076775) 
   * 496f4834a8345968ba0f988cd45af065b5f73db3 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/139083484) 
   * b5e3b371ac86fa7dc6e24010f060d6ee8395f991 Travis: 
[CANCELED](https://travis-ci.com/flink-ci/flink/builds/139790553) 
   * 286491ed4bb010c501cea2e5084d348290bf1357 Travis: 
[FAILURE](https://travis-ci.com/flink-ci/flink/builds/139795221) 
   * b8b10dbdfa1f15379ba7dec6ee7a99d8eb73c3c5 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/139865418) 
   * 0c419a072b4ae3448a5f21aa8622d0354e2f6193 Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/141182278) Azure: 
[SUCCESS](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=3629)
 
   * 63a748011eeabfedcc4a52d0264a4780fb63ad3b Travis: 
[SUCCESS](https://travis-ci.com/flink-ci/flink/builds/144267619) Azure: 
[PENDING](https://dev.azure.com/rmetzger/5bd3ef0a-4359-41af-abca-811b04098d2e/_build/results?buildId=4317)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (FLINK-15552) SQL Client can not correctly create kafka table using --library to indicate a kafka connector directory

2020-01-13 Thread Terry Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014903#comment-17014903
 ] 

Terry Wang commented on FLINK-15552:


Hi, [~jark].  I'm sure about it. As for SQL CLI e2e tests, maybe there is kafka 
jar under /lib directory.

> SQL Client can not correctly create kafka table using --library to indicate a 
> kafka connector directory
> ---
>
> Key: FLINK-15552
> URL: https://issues.apache.org/jira/browse/FLINK-15552
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Client, Table SQL / Runtime
>Reporter: Terry Wang
>Priority: Major
>
> How to Reproduce:
> first, I start a sql client and using `-l` to point to a kafka connector 
> directory.
> `
>  bin/sql-client.sh embedded -l /xx/connectors/kafka/
> `
> Then, I create a Kafka Table like following 
> `
> Flink SQL> CREATE TABLE MyUserTable (
> >   content String
> > ) WITH (
> >   'connector.type' = 'kafka',
> >   'connector.version' = 'universal',
> >   'connector.topic' = 'test',
> >   'connector.properties.zookeeper.connect' = 'localhost:2181',
> >   'connector.properties.bootstrap.servers' = 'localhost:9092',
> >   'connector.properties.group.id' = 'testGroup',
> >   'connector.startup-mode' = 'earliest-offset',
> >   'format.type' = 'csv'
> >  );
> [INFO] Table has been created.
> `
> Then I select from just created table and an exception been thrown: 
> `
> Flink SQL> select * from MyUserTable;
> [ERROR] Could not execute SQL statement. Reason:
> org.apache.flink.table.api.NoMatchingTableFactoryException: Could not find a 
> suitable table factory for 
> 'org.apache.flink.table.factories.TableSourceFactory' in
> the classpath.
> Reason: Required context properties mismatch.
> The matching candidates:
> org.apache.flink.table.sources.CsvBatchTableSourceFactory
> Mismatched properties:
> 'connector.type' expects 'filesystem', but is 'kafka'
> The following properties are requested:
> connector.properties.bootstrap.servers=localhost:9092
> connector.properties.group.id=testGroup
> connector.properties.zookeeper.connect=localhost:2181
> connector.startup-mode=earliest-offset
> connector.topic=test
> connector.type=kafka
> connector.version=universal
> format.type=csv
> schema.0.data-type=VARCHAR(2147483647)
> schema.0.name=content
> The following factories have been considered:
> org.apache.flink.table.sources.CsvBatchTableSourceFactory
> org.apache.flink.table.sources.CsvAppendTableSourceFactory
> `
> Potential Reasons:
> Now we use  `TableFactoryUtil#findAndCreateTableSource`  to convert a 
> CatalogTable to TableSource,  but when call `TableFactoryService.find` we 
> don't pass current classLoader to this method, the default loader will be 
> BootStrapClassLoader, which can not find our factory.
> I verified in my box, it's truly caused by this behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] dawidwys commented on issue #10674: [FLINK-15220][Connector/Kafka][Table] Add startFromTimestamp in KafkaTableSource

2020-01-13 Thread GitBox
dawidwys commented on issue #10674: [FLINK-15220][Connector/Kafka][Table] Add 
startFromTimestamp in KafkaTableSource
URL: https://github.com/apache/flink/pull/10674#issuecomment-574046614
 
 
   Thank you for the update @link3280 . Will merge it later today.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [flink] guliziduo opened a new pull request #10849: [hotfix] [javadocs] fix typo in KafkaTopicPartitionStateWithPunctuatedWatermarks class introduction

2020-01-13 Thread GitBox
guliziduo opened a new pull request #10849: [hotfix] [javadocs] fix typo in 
KafkaTopicPartitionStateWithPunctuatedWatermarks class introduction
URL: https://github.com/apache/flink/pull/10849
 
 
   ## What is the purpose of the change
  fix typo in KafkaTopicPartitionStateWithPunctuatedWatermarks class 
introduction
   
   ## Brief change log
 - * fix typo in KafkaTopicPartitionStateWithPunctuatedWatermarks class 
introduction*
   
   ## Verifying this change
 This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   - Dependencies (does it add or upgrade a dependency): (no)
   - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
   - The serializers: (no)
   - The runtime per-record code paths (performance sensitive): (no)
   - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no )
   - The S3 file system connector: (no)
   
   ## Documentation
 - Does this pull request introduce a new feature? (no)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


<    1   2   3   4   5