[jira] [Created] (FLINK-17995) Add a new overview page for the new SQL connectors

2020-05-28 Thread Jark Wu (Jira)
Jark Wu created FLINK-17995:
---

 Summary: Add a new overview page for the new SQL connectors
 Key: FLINK-17995
 URL: https://issues.apache.org/jira/browse/FLINK-17995
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation, Table SQL / API
Reporter: Jark Wu
Assignee: Jark Wu
 Fix For: 1.11.0


A lot of contents in 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connect.html#overview
 is out-dated. There are also many frictions on the Descriptor API and YAML 
file. I would propose to remove them in the new Overview page, we should 
encourage users to use DDL for now. We can add them back once Descriptor API 
and YAML API is ready again. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17996) NEP in CatalogTableStatisticsConverter.convertToColumnStats method

2020-05-28 Thread godfrey he (Jira)
godfrey he created FLINK-17996:
--

 Summary: NEP in 
CatalogTableStatisticsConverter.convertToColumnStats method
 Key: FLINK-17996
 URL: https://issues.apache.org/jira/browse/FLINK-17996
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.11.0
Reporter: godfrey he


Currently, hive catalog only supports a few kinds of statistics, otherwise 
return null. (see HiveStatsUtil#createTableColumnStats). If there is a decimal 
statistics, NEP will occur in 
CatalogTableStatisticsConverter.convertToColumnStats method

Caused  by:  java.lang.NullPointerException
at  
org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStats(CatalogTableStatisticsConverter.java:77)
at  
org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStatsMap(CatalogTableStatisticsConverter.java:68)
at  
org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToTableStats(CatalogTableStatisticsConverter.java:57)
at  
org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.extractTableStats(DatabaseCalciteSchema.java:113)
at  
org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getStatistic(DatabaseCalciteSchema.java:97)
at  
org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.lambda$getTable$0(DatabaseCalciteSchema.java:74)
at  java.util.Optional.map(Optional.java:215)
at  
org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getTable(DatabaseCalciteSchema.java:72)
at  org.apache.calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17996) NEP in CatalogTableStatisticsConverter.convertToColumnStats method

2020-05-28 Thread godfrey he (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-17996:
---
Affects Version/s: (was: 1.11.0)

> NEP in CatalogTableStatisticsConverter.convertToColumnStats method
> --
>
> Key: FLINK-17996
> URL: https://issues.apache.org/jira/browse/FLINK-17996
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: godfrey he
>Priority: Major
>
> Currently, hive catalog only supports a few kinds of statistics, otherwise 
> return null. (see HiveStatsUtil#createTableColumnStats). If there is a 
> decimal statistics, NEP will occur in 
> CatalogTableStatisticsConverter.convertToColumnStats method
> Caused  by:  java.lang.NullPointerException
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStats(CatalogTableStatisticsConverter.java:77)
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStatsMap(CatalogTableStatisticsConverter.java:68)
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToTableStats(CatalogTableStatisticsConverter.java:57)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.extractTableStats(DatabaseCalciteSchema.java:113)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getStatistic(DatabaseCalciteSchema.java:97)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.lambda$getTable$0(DatabaseCalciteSchema.java:74)
> at  java.util.Optional.map(Optional.java:215)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getTable(DatabaseCalciteSchema.java:72)
> at  org.apache.calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17996) NEP in CatalogTableStatisticsConverter.convertToColumnStats method

2020-05-28 Thread godfrey he (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-17996:
---
Fix Version/s: 1.11.0

> NEP in CatalogTableStatisticsConverter.convertToColumnStats method
> --
>
> Key: FLINK-17996
> URL: https://issues.apache.org/jira/browse/FLINK-17996
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: godfrey he
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, hive catalog only supports a few kinds of statistics, otherwise 
> return null. (see HiveStatsUtil#createTableColumnStats). If there is a 
> decimal statistics, NEP will occur in 
> CatalogTableStatisticsConverter.convertToColumnStats method
> Caused  by:  java.lang.NullPointerException
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStats(CatalogTableStatisticsConverter.java:77)
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStatsMap(CatalogTableStatisticsConverter.java:68)
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToTableStats(CatalogTableStatisticsConverter.java:57)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.extractTableStats(DatabaseCalciteSchema.java:113)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getStatistic(DatabaseCalciteSchema.java:97)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.lambda$getTable$0(DatabaseCalciteSchema.java:74)
> at  java.util.Optional.map(Optional.java:215)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getTable(DatabaseCalciteSchema.java:72)
> at  org.apache.calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17857) All Kubernetes e2e tests could not run on Mac after migration

2020-05-28 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118383#comment-17118383
 ] 

Robert Metzger commented on FLINK-17857:


I agree with Yang that it would be useful to have some convenience utility for 
building the docker image from the local dist (see also the discussion here: 
https://github.com/apache/flink/pull/12172/files#r425820527).

I still have not understood the problem here. Isn't {{build_image}} 
automatically building the image from a local build? Is the {{build_image}} 
script not working on OS X?
Or is there an issue with minikube on Mac?

> All Kubernetes e2e tests could not run on Mac after migration
> -
>
> Key: FLINK-17857
> URL: https://issues.apache.org/jira/browse/FLINK-17857
> Project: Flink
>  Issue Type: Test
>  Components: Deployment / Kubernetes, Tests
>Reporter: Yang Wang
>Priority: Major
> Fix For: 1.11.0
>
>
> In FLINK-17656, we migrate all the e2e tests from {{flink-container/docker}} 
> to {{apache/flink-docker}}. After the migration, when building a docker 
> image, we need to start a file server and then download it in the 
> {{Dockerfile}}.
> It works well in linux environment since the minikube is started in 
> "vm-driver=none". However, it is not true in the Mac. We usually start a 
> virtual machine for minikube and this will make "localhost" could not work.
> So i suggest to support local Flink dist when building the docker image, not 
> always need to download from a URL.
>  
> cc [~chesnay]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17996) NEP in CatalogTableStatisticsConverter.convertToColumnStats method

2020-05-28 Thread godfrey he (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-17996:
---
Affects Version/s: 1.9.0
   1.10.0

> NEP in CatalogTableStatisticsConverter.convertToColumnStats method
> --
>
> Key: FLINK-17996
> URL: https://issues.apache.org/jira/browse/FLINK-17996
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.10.0
>Reporter: godfrey he
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, hive catalog only supports a few kinds of statistics, otherwise 
> return null. (see HiveStatsUtil#createTableColumnStats). If there is a 
> decimal statistics, NEP will occur in 
> CatalogTableStatisticsConverter.convertToColumnStats method
> Caused  by:  java.lang.NullPointerException
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStats(CatalogTableStatisticsConverter.java:77)
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStatsMap(CatalogTableStatisticsConverter.java:68)
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToTableStats(CatalogTableStatisticsConverter.java:57)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.extractTableStats(DatabaseCalciteSchema.java:113)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getStatistic(DatabaseCalciteSchema.java:97)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.lambda$getTable$0(DatabaseCalciteSchema.java:74)
> at  java.util.Optional.map(Optional.java:215)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getTable(DatabaseCalciteSchema.java:72)
> at  org.apache.calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17996) NEP in CatalogTableStatisticsConverter.convertToColumnStats method

2020-05-28 Thread godfrey he (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-17996:
---
Affects Version/s: 1.11.0

> NEP in CatalogTableStatisticsConverter.convertToColumnStats method
> --
>
> Key: FLINK-17996
> URL: https://issues.apache.org/jira/browse/FLINK-17996
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.10.0, 1.11.0
>Reporter: godfrey he
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, hive catalog only supports a few kinds of statistics, otherwise 
> return null. (see HiveStatsUtil#createTableColumnStats). If there is a 
> decimal statistics, NEP will occur in 
> CatalogTableStatisticsConverter.convertToColumnStats method
> Caused  by:  java.lang.NullPointerException
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStats(CatalogTableStatisticsConverter.java:77)
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStatsMap(CatalogTableStatisticsConverter.java:68)
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToTableStats(CatalogTableStatisticsConverter.java:57)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.extractTableStats(DatabaseCalciteSchema.java:113)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getStatistic(DatabaseCalciteSchema.java:97)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.lambda$getTable$0(DatabaseCalciteSchema.java:74)
> at  java.util.Optional.map(Optional.java:215)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getTable(DatabaseCalciteSchema.java:72)
> at  org.apache.calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17404) Running HA per-job cluster (rocks, non-incremental) gets stuck killing a non-existing pid in Hadoop 3 build profile

2020-05-28 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-17404:
---
Summary: Running HA per-job cluster (rocks, non-incremental) gets stuck 
killing a non-existing pid in Hadoop 3 build profile  (was: Running HA per-job 
cluster (rocks, non-incremental) gets stuck killing a non-existing pid)

> Running HA per-job cluster (rocks, non-incremental) gets stuck killing a 
> non-existing pid in Hadoop 3 build profile
> ---
>
> Key: FLINK-17404
> URL: https://issues.apache.org/jira/browse/FLINK-17404
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Test Infrastructure, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Attachments: 255
>
>
> CI log: https://api.travis-ci.org/v3/job/678609505/log.txt
> {code}
> Waiting for text Completed checkpoint [1-9]* for job 
>  to appear 2 of times in logs...
> grep: 
> /home/travis/build/apache/flink/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/log/*standalonejob-2*.log:
>  No such file or directory
> grep: 
> /home/travis/build/apache/flink/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/log/*standalonejob-2*.log:
>  No such file or directory
> Starting standalonejob daemon on host 
> travis-job-e606668f-b674-49c0-8590-e3508e22b99d.
> grep: 
> /home/travis/build/apache/flink/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/log/*standalonejob-2*.log:
>  No such file or directory
> grep: 
> /home/travis/build/apache/flink/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/log/*standalonejob-2*.log:
>  No such file or directory
> Killed TM @ 18864
> kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or 
> kill -l [sigspec]
> Killed TM @ 
> No output has been received in the last 10m0s, this potentially indicates a 
> stalled build or something wrong with the build itself.
> Check the details on how to adjust your build configuration on: 
> https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received
> The build has been terminated
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17996) NEP in CatalogTableStatisticsConverter.convertToColumnStats method

2020-05-28 Thread godfrey he (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

godfrey he updated FLINK-17996:
---
Affects Version/s: (was: 1.11.0)
   (was: 1.10.0)
   (was: 1.9.0)

> NEP in CatalogTableStatisticsConverter.convertToColumnStats method
> --
>
> Key: FLINK-17996
> URL: https://issues.apache.org/jira/browse/FLINK-17996
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: godfrey he
>Priority: Major
> Fix For: 1.11.0
>
>
> Currently, hive catalog only supports a few kinds of statistics, otherwise 
> return null. (see HiveStatsUtil#createTableColumnStats). If there is a 
> decimal statistics, NEP will occur in 
> CatalogTableStatisticsConverter.convertToColumnStats method
> Caused  by:  java.lang.NullPointerException
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStats(CatalogTableStatisticsConverter.java:77)
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToColumnStatsMap(CatalogTableStatisticsConverter.java:68)
> at  
> org.apache.flink.table.planner.utils.CatalogTableStatisticsConverter.convertToTableStats(CatalogTableStatisticsConverter.java:57)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.extractTableStats(DatabaseCalciteSchema.java:113)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getStatistic(DatabaseCalciteSchema.java:97)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.lambda$getTable$0(DatabaseCalciteSchema.java:74)
> at  java.util.Optional.map(Optional.java:215)
> at  
> org.apache.flink.table.planner.catalog.DatabaseCalciteSchema.getTable(DatabaseCalciteSchema.java:72)
> at  org.apache.calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16681) Jdbc JDBCOutputFormat and JDBCLookupFunction PreparedStatement loss connection, if long time not records to write.

2020-05-28 Thread Lijie Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118389#comment-17118389
 ] 

Lijie Wang commented on FLINK-16681:


Thank you [~jark], I see, I will also fix JdbcRowDataOutputFormat.

> Jdbc JDBCOutputFormat   and JDBCLookupFunction PreparedStatement loss 
> connection, if long time not records to write.
> 
>
> Key: FLINK-16681
> URL: https://issues.apache.org/jira/browse/FLINK-16681
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / JDBC
>Reporter: LakeShen
>Assignee: Lijie Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0, 1.10.2
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In my thought, jdbc connector is the one of most frequently used connector in 
> flink . But maybe there is a problem for jdbc connector. For example, if 
> there are no records to write or join with dim table for a long time , the 
> exception will throw like this :
> java.sql.SQLException: No operations allowed after statement closed
> Because there are long time no records to write , the PreparedStatement loss 
> the connection.
>  If there is no other jira to solve this problem , can you assign this jira 
> to me , I will try my best to solve it , thank you .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-16572) CheckPubSubEmulatorTest is flaky on Azure

2020-05-28 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118392#comment-17118392
 ] 

Robert Metzger commented on FLINK-16572:


This failure is from a pull request which is based on 1.11-SNAPSHOT: 
https://github.com/flink-ci/flink/blob/ci_12361_956eb8c52c2e828b82a04f8cd41d984cc7a4d9bf/pom.xml.
 We have not merged the workaround to 1.11-SNAPSHOT.

> CheckPubSubEmulatorTest is flaky on Azure
> -
>
> Key: FLINK-16572
> URL: https://issues.apache.org/jira/browse/FLINK-16572
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub, Tests
>Affects Versions: 1.11.0
>Reporter: Aljoscha Krettek
>Assignee: Richard Deurwaarder
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Log: 
> https://dev.azure.com/aljoschakrettek/Flink/_build/results?buildId=56&view=logs&j=1f3ed471-1849-5d3c-a34c-19792af4ad16&t=ce095137-3e3b-5f73-4b79-c42d3d5f8283&l=7842



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] rmetzger merged pull request #12371: [FLINK-17375][hotfix] Fix print_stacktraces multiline-behavior

2020-05-28 Thread GitBox


rmetzger merged pull request #12371:
URL: https://github.com/apache/flink/pull/12371


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger commented on pull request #12371: [FLINK-17375][hotfix] Fix print_stacktraces multiline-behavior

2020-05-28 Thread GitBox


rmetzger commented on pull request #12371:
URL: https://github.com/apache/flink/pull/12371#issuecomment-635153196


   Thanks! merging



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-17992) Exception from RemoteInputChannel#onBuffer should not fail the whole NetworkClientHandler

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-17992:
---
Labels: pull-request-available  (was: )

> Exception from RemoteInputChannel#onBuffer should not fail the whole 
> NetworkClientHandler
> -
>
> Key: FLINK-17992
> URL: https://issues.apache.org/jira/browse/FLINK-17992
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Zhijiang
>Assignee: Zhijiang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> RemoteInputChannel#onBuffer is invoked by 
> CreditBasedPartitionRequestClientHandler while receiving and decoding the 
> network data. #onBuffer can throw exceptions which would tag the error in 
> client handler and fail all the added input channels inside handler. Then it 
> would cause a tricky potential issue as following.
> If the RemoteInputChannel is canceling by canceler thread, then the task 
> thread might exit early than canceler thread terminate. That means the 
> PartitionRequestClient might not be closed (triggered by canceler thread) 
> while the new task attempt is already deployed into this TaskManger. 
> Therefore the new task might reuse the previous PartitionRequestClient while 
> requesting partitions, but note that the respective client handler was 
> already tagged an error before during above RemoteInputChannel#onBuffer. It 
> will cause the next round unnecessary failover.
> It is hard to find this potential issue in production because it can be 
> restored normal finally after one or more additional failover. We find this 
> potential problem from UnalignedCheckpointITCase because it will define the 
> precise restart times within configured failures.
> The solution is to only fail the respective task when its internal 
> RemoteInputChannel#onBuffer throws any exceptions instead of failing the 
> whole channels inside client handler, then the client is still health and can 
> also be reused by other input channels as long as it is not released yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zhijiangW opened a new pull request #12374: [FLINK-17992][checkpointing] Exception from RemoteInputChannel#onBuffer should not fail the whole NetworkClientHandler

2020-05-28 Thread GitBox


zhijiangW opened a new pull request #12374:
URL: https://github.com/apache/flink/pull/12374


   ## What is the purpose of the change
   
   RemoteInputChannel#onBuffer is invoked by 
CreditBasedPartitionRequestClientHandler while receiving and decoding the 
network data. #onBuffer can throw exceptions which would tag the error in 
client handler and fail all the added input channels inside handler. Then it 
would cause a tricky potential issue as following.
   
   If the RemoteInputChannel is canceling by canceler thread, then the task 
thread might exit early than canceler thread terminate. That means the 
PartitionRequestClient might not be closed (triggered by canceler thread) while 
the new task attempt is already deployed into the same TaskManager. Therefore 
the new task might reuse the previous PartitionRequestClient while requesting 
partitions, but note that the respective client handler was already tagged an 
error before during above RemoteInputChannel#onBuffer, to cause the next round 
unnecessary failover.
   
   The solution is to only fail the respective task when its internal 
RemoteInputChannel#onBuffer throws any exceptions instead of failing the whole 
channels inside client handler, then the client is still healthy and can also 
be reused by other input channels as long as it is not released yet.
   
   ## Brief change log
   
 - *Not fail the whole network client handler while exception in 
`RemoteInputChannel#onBuffer*
   
   ## Verifying this change
   
   Added new unit test 
`CreditBasedPartitionRequestClientHandlerTest#testRemoteInputChannelOnBufferException`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17375) Clean up CI system related scripts

2020-05-28 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118393#comment-17118393
 ] 

Robert Metzger commented on FLINK-17375:


Merged a small follow up fixing the stacktrace printing: 
https://github.com/apache/flink/commit/d7834fcec226ca2232ff2413965665a6c168eb3e

> Clean up CI system related scripts
> --
>
> Key: FLINK-17375
> URL: https://issues.apache.org/jira/browse/FLINK-17375
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System, Build System / Azure Pipelines
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>  Labels: pull-request-available
>
> Once we have only one CI system in place for Flink (again), it makes sense to 
> clean up the available scripts:
> - Separate "Azure-specific" from "CI-generic" files (names of files, methods, 
> build profiles)
> - separate "log handling" from "build timeout" in "travis_watchdog"
> - remove workarounds needed because of Travis limitations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17736) Add flink CEP examples

2020-05-28 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118397#comment-17118397
 ] 

Robert Metzger commented on FLINK-17736:


[~dwysakowicz] I would value your opinion here

> Add flink CEP examples
> --
>
> Key: FLINK-17736
> URL: https://issues.apache.org/jira/browse/FLINK-17736
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples, Library / CEP
>Reporter: dengziming
>Priority: Minor
>  Labels: pull-request-available, starer
>
> There is not a flink cep example, we can add one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17736) Add flink CEP examples

2020-05-28 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-17736:
-
Labels: pull-request-available starter  (was: pull-request-available starer)

> Add flink CEP examples
> --
>
> Key: FLINK-17736
> URL: https://issues.apache.org/jira/browse/FLINK-17736
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples, Library / CEP
>Reporter: dengziming
>Priority: Minor
>  Labels: pull-request-available, starter
>
> There is not a flink cep example, we can add one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] rkhachatryan commented on a change in pull request #12365: [FLINK-17988][checkpointing] Discard only unique channel state delegates

2020-05-28 Thread GitBox


rkhachatryan commented on a change in pull request #12365:
URL: https://github.com/apache/flink/pull/12365#discussion_r431629099



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/OperatorSubtaskState.java
##
@@ -231,8 +232,7 @@ public void discardState() {
toDispose.addAll(rawOperatorState);
toDispose.addAll(managedKeyedState);
toDispose.addAll(rawKeyedState);
-   toDispose.addAll(inputChannelState);
-   toDispose.addAll(resultSubpartitionState);
+   
toDispose.addAll(collectUniqueDelegates(inputChannelState, 
resultSubpartitionState));

Review comment:
   Totally agree.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17736) Add flink CEP examples

2020-05-28 Thread Dawid Wysakowicz (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118400#comment-17118400
 ] 

Dawid Wysakowicz commented on FLINK-17736:
--

I agree with Robert here. Just a personal opinion but I never found the 
examples useful. For a simple code usage in my opinion the IT tests we have in 
particular modules are much better than an artificial simple example in a 
separate module.

If the purpose is to explain in more depth certain intricacies then a code 
walkthrough, as Robert said, in the documentation is a much better choice.

> Add flink CEP examples
> --
>
> Key: FLINK-17736
> URL: https://issues.apache.org/jira/browse/FLINK-17736
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples, Library / CEP
>Reporter: dengziming
>Priority: Minor
>  Labels: pull-request-available, starter
>
> There is not a flink cep example, we can add one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #12373: [FLINK-17865][checkpoint] Increase default size of 'state.backend.fs.memory-threshold'

2020-05-28 Thread GitBox


flinkbot commented on pull request #12373:
URL: https://github.com/apache/flink/pull/12373#issuecomment-635160213


   
   ## CI report:
   
   * 5771a99b8c3a5491e4114c3e3c75a1d86a1c17b3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12372: [FLINK-17946][python] fix the bug that the config option 'pipeline.jars' doesn't work.

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12372:
URL: https://github.com/apache/flink/pull/12372#issuecomment-635142364


   
   ## CI report:
   
   * 65c92cf7403d9a42250667682184481e3dd76c92 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2335)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12374: [FLINK-17992][checkpointing] Exception from RemoteInputChannel#onBuffer should not fail the whole NetworkClientHandler

2020-05-28 Thread GitBox


flinkbot commented on pull request #12374:
URL: https://github.com/apache/flink/pull/12374#issuecomment-635160458


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit eeaa680286b6dd389acebbf1d8695ba94ff5ec59 (Thu May 28 
07:23:19 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] curcur commented on pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-28 Thread GitBox


curcur commented on pull request #12353:
URL: https://github.com/apache/flink/pull/12353#issuecomment-635161998


   Hey @AHeise , I've spent some time walking through the code. I think the bug 
is clear that the consumer's  reference counter of `bufferBuilder` for 
`randomEmit` is not counted correctly. But I was wondering whether this fix 
work as expected in the case when if the `record` to `randomEmit` is more than 
one buffer can hold.
   
   In `BroadcastRandomWriter#randomEmit`, random triggered data is emitted in 
the line of `emit(record, targetChannelIndex);`
   
   If you trace the code down, eventually data is serialized through 
`RecordWriter#copyFromSerializerToTargetChannel`
   
   and there, `requestNewBufferBuilder` is requested multiple times if the data 
to serialize is more than one buffer can hold.
   
   And as you can see, all the references to the remaining bufferBuilders are 
lost. Since the `addConsumer` in `randomEmit` is done after `emit(record, 
targetChannelIndex)`, it looks like so from the code perspective.
   
   But I am not quite familiar with this part of code, not sure whether this 
will cause real problems. 
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] curcur edited a comment on pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-28 Thread GitBox


curcur edited a comment on pull request #12353:
URL: https://github.com/apache/flink/pull/12353#issuecomment-635161998


   Hey @AHeise , I've spent some time walking through the code. I think the bug 
is clear that the consumer's  reference counter of `bufferBuilder` for 
`randomEmit` is not counted correctly. But I was wondering whether this fix 
work as expected in the case if the `record` to `randomEmit` is more than one 
buffer can hold.
   
   In `BroadcastRandomWriter#randomEmit`, random triggered data is emitted in 
the line of `emit(record, targetChannelIndex);`
   
   If you trace the code down, eventually data is serialized through 
`RecordWriter#copyFromSerializerToTargetChannel`
   
   and there, `requestNewBufferBuilder` is requested multiple times if the data 
to serialize is more than one buffer can hold.
   
   And as you can see, all the references to the remaining bufferBuilders are 
lost. Since the `addConsumer` in `randomEmit` is done after `emit(record, 
targetChannelIndex)`, it looks like so from the code perspective.
   
   But I am not quite familiar with this part of code, not sure whether this 
will cause real problems. 
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] curcur edited a comment on pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-28 Thread GitBox


curcur edited a comment on pull request #12353:
URL: https://github.com/apache/flink/pull/12353#issuecomment-635161998


   Hey @AHeise , I've spent some time walking through the code. I think the bug 
is clear that the consumer's  reference counter of `bufferBuilder` for 
`randomEmit` is not counted correctly. But I was wondering whether this fix 
work as expected in the case if the `record` to `randomEmit` is more than one 
buffer can hold.
   
   In `BroadcastRandomWriter#randomEmit`, random triggered data is emitted in 
the line of `emit(record, targetChannelIndex);`
   
   If you trace the code down, eventually data is serialized through 
`RecordWriter#copyFromSerializerToTargetChannel`
   
   and there, `requestNewBufferBuilder` is requested multiple times if the data 
to serialize is more than one buffer can hold.
   
   And as you can see, all the references to the remaining bufferBuilders are 
lost. Since the `addConsumer` in `randomEmit` is done after `emit(record, 
targetChannelIndex)`, it looks like so purely from the code perspective.
   
   But I am not quite familiar with this part of code, not sure whether this 
will cause real problems. 
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger commented on a change in pull request #12340: [FLINK-17844][build] Enforce @PublicEvolving compatibility for minor versions

2020-05-28 Thread GitBox


rmetzger commented on a change in pull request #12340:
URL: https://github.com/apache/flink/pull/12340#discussion_r431635006



##
File path: tools/releasing/update_japicmp_configuration.sh
##
@@ -0,0 +1,83 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+if [ -z "${NEW_VERSION}" ]; then
+echo "NEW_VERSION was not set."
+exit 1
+fi
+
+CURR_DIR=`pwd`
+if [[ `basename $CURR_DIR` != "tools" ]] ; then
+  echo "You have to call the script from the tools/ dir"
+  exit 1
+fi
+
+# Idealized use-cases:
+# Scenario A) New major release X.Y.0
+#   Premise:
+# There is a master branch with a version X.Y-SNAPSHOT, with a japicmp 
reference version of X.(Y-1).0 .
+#   Release flow:
+# - update the master to X.(Y+1)-SNAPSHOT, but keep the reference version 
intact since X.Y.0 is not released (yet)
+# - create X.Y-SNAPSHOT branch, but keep the reference version intact 
since X.Y.0 is not released (yet)
+# - release X.Y.0
+# - update the japicmp reference version of both master and X.Y-SNAPSHOT 
to X.Y.0
+# - enable stronger compatibility constraints for X.Y-SNAPSHOT to ensure 
compatibility for PublicEvolving
+# Scenario B) New minor release X.Y.Z
+#   Premise:
+# There is a snapshot branch with a version X.Y-SNAPSHOT, with a japicmp 
reference version of X.Y.(Z-1)
+#   Release flow:
+# - create X.Y.Z-rc branch
+# - update the japicmp reference version of X.Y.Z to X.Y.(Z-1)
+# - release X.Y.Z
+# - update the japicmp reference version of X.Y-SNAPSHOT to X.Y.Z
+
+POM=../pom.xml
+function enable_public_evolving_compatibility_checks() {
+  perl -pi -e 
's##${1}#'
 ${POM}
+  perl -pi -e 
's#\t+\@org.apache.flink.annotation.PublicEvolving.*\n##' ${POM}
+}
+
+function set_japicmp_reference_version() {
+  local version=$1
+
+  perl -pi -e 
's#().*()#${1}'${version}'${2}#'
 ${POM}
+}
+
+current_branch=$(git rev-parse --abbrev-ref HEAD)
+
+if [[ ${current_branch} =~ -rc ]]; then
+  # release branch
+  version_prefix=$(echo "${NEW_VERSION}" | perl -p -e 's#(\d+\.\d+)\.\d+#$1#')
+  minor=$(echo "${NEW_VERSION}" | perl -p -e 's#\d+\.\d+\.(\d+)#$1#')
+  if ! [[ ${minor} == "0" ]]; then
+set_japicmp_reference_version ${version_prefix}.$((minor - 1))
+# this is a safeguard in case the manual step of enabling checks after the 
X.Y.0 release was forgotten
+enable_public_evolving_compatibility_checks
+  fi
+elif [[ ${current_branch} =~ -SNAPSHOT ]]; then

Review comment:
   Will also be a manual invocation of the script described in the release 
guide?
   The RM will have to rename `release-1.11` to `release-1.11-SNAPSHOT` and 
then run this script after 1.11.0 has been released?
   
   I still wonder why this check is on the branch name, and not the version.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] WeiZhong94 commented on pull request #12372: [FLINK-17946][python] fix the bug that the config option 'pipeline.jars' doesn't work.

2020-05-28 Thread GitBox


WeiZhong94 commented on pull request #12372:
URL: https://github.com/apache/flink/pull/12372#issuecomment-635164299


   @dianfu Thanks for your review! I have addressed your comments. It is 
unnecessary to execute `_before_execute()` in `sql_update` as it won't execute 
the pipeline really.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhuzhurk commented on a change in pull request #12256: [FLINK-17018][runtime] Allocates slots in bulks for pipelined region scheduling

2020-05-28 Thread GitBox


zhuzhurk commented on a change in pull request #12256:
URL: https://github.com/apache/flink/pull/12256#discussion_r430231588



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SchedulerImpl.java
##
@@ -562,4 +575,110 @@ private void releaseSharedSlot(
public boolean requiresPreviousExecutionGraphAllocations() {
return slotSelectionStrategy instanceof 
PreviousAllocationSlotSelectionStrategy;
}
+
+   @Override
+   public CompletableFuture> 
allocatePhysicalSlots(
+   final Collection 
physicalSlotRequests,
+   final Time timeout) {
+
+   final PhysicalSlotRequestBulk slotRequestBulk = new 
PhysicalSlotRequestBulk(physicalSlotRequests);
+
+   final List> 
resultFutures = new ArrayList<>(physicalSlotRequests.size());
+   for (PhysicalSlotRequest request : physicalSlotRequests) {
+   final CompletableFuture 
resultFuture =
+   allocatePhysicalSlot(request, 
timeout).thenApply(result -> {
+   slotRequestBulk.markRequestFulfilled(
+   result.getSlotRequestId(),
+   
result.getPhysicalSlot().getAllocationId());
+
+   return result;
+   });
+   resultFutures.add(resultFuture);
+   }
+
+   slotRequestBulkTracker.track(slotRequestBulk);
+   schedulePendingRequestBulkTimeoutCheck(slotRequestBulk, 
timeout);
+
+   return FutureUtils.combineAll(resultFutures)
+   .whenComplete((ignore, throwable) -> 
slotRequestBulkTracker.untrack(slotRequestBulk));
+   }
+
+   private CompletableFuture 
allocatePhysicalSlot(
+   final PhysicalSlotRequest physicalSlotRequest,
+   final Time timeout) {
+
+   final SlotRequestId slotRequestId = 
physicalSlotRequest.getSlotRequestId();
+   final SlotProfile slotProfile = 
physicalSlotRequest.getSlotProfile();
+
+   final Optional availablePhysicalSlot = 
tryAllocateFromAvailable(slotRequestId, slotProfile);

Review comment:
   I think that the majority part of `SchedulerImpl` will not be needed 
anymore in the future.
   Introducing a separate interface like `BulkSlotProvider ` might make it 
easier for us to drop the deprecated components in the future.
   
   > Do you think we can reuse SchedulerImpl in future?
   
   I think not.
   
   > Would just duplicating tryAllocateFromAvailable/cancelSlotRequest in 
BulkSlotProviderImpl bring less confusion in future?
   
   I think yes. Let's do it this way.
   
   > Will we need single slot provider eventually for pipeline region 
scheduling at all?
   
   I think not. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger commented on pull request #12174: [FLINK-17736] Add flink cep example

2020-05-28 Thread GitBox


rmetzger commented on pull request #12174:
URL: https://github.com/apache/flink/pull/12174#issuecomment-635164273


   Based on the JIRA discussion, I propose to close this PR until the 
discussion has resolved.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger commented on pull request #12350: [FLINK-17824][e2e] Introduce timeout to 'resume savepoint' test

2020-05-28 Thread GitBox


rmetzger commented on pull request #12350:
URL: https://github.com/apache/flink/pull/12350#issuecomment-635164566


   Thanks a lot for your review! I will address your comments.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17857) All Kubernetes e2e tests could not run on Mac after migration

2020-05-28 Thread Chesnay Schepler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118404#comment-17118404
 ] 

Chesnay Schepler commented on FLINK-17857:
--

Actually, [~rmetzger] has a point. I don't see what the VM has got to do with 
it since the image is built outside the VM, see {{comon_docker.sh#build_image}}.

> All Kubernetes e2e tests could not run on Mac after migration
> -
>
> Key: FLINK-17857
> URL: https://issues.apache.org/jira/browse/FLINK-17857
> Project: Flink
>  Issue Type: Test
>  Components: Deployment / Kubernetes, Tests
>Reporter: Yang Wang
>Priority: Major
> Fix For: 1.11.0
>
>
> In FLINK-17656, we migrate all the e2e tests from {{flink-container/docker}} 
> to {{apache/flink-docker}}. After the migration, when building a docker 
> image, we need to start a file server and then download it in the 
> {{Dockerfile}}.
> It works well in linux environment since the minikube is started in 
> "vm-driver=none". However, it is not true in the Mac. We usually start a 
> virtual machine for minikube and this will make "localhost" could not work.
> So i suggest to support local Flink dist when building the docker image, not 
> always need to download from a URL.
>  
> cc [~chesnay]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17961) Create an Elasticsearch source

2020-05-28 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118405#comment-17118405
 ] 

Etienne Chauchot commented on FLINK-17961:
--

[~aljoscha] can you assign me this ticket?

> Create an Elasticsearch source
> --
>
> Key: FLINK-17961
> URL: https://issues.apache.org/jira/browse/FLINK-17961
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / ElasticSearch
>Reporter: Etienne Chauchot
>Priority: Minor
>
> There is only an Elasticsearch sink available. There are opensource github 
> repos such as [this 
> one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also 
> the apache bahir project does not provide an Elasticsearch source connector 
> for flink either. IMHO I think the project would benefit from having an 
> bundled source connector for ES alongside with the available sink connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-17989) java.lang.NoClassDefFoundError: org/apache/flink/fs/azure/common/hadoop/HadoopRecoverableWriter

2020-05-28 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-17989.

Fix Version/s: (was: 1.11.1)
   (was: 1.10.1)
   (was: 1.9.3)
   (was: 1.11.0)
   Resolution: Duplicate

> java.lang.NoClassDefFoundError: 
> org/apache/flink/fs/azure/common/hadoop/HadoopRecoverableWriter
> ---
>
> Key: FLINK-17989
> URL: https://issues.apache.org/jira/browse/FLINK-17989
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem, FileSystems
>Affects Versions: 1.9.3, 1.10.1, 1.11.0, 1.11.1
> Environment: Ubuntu 18
> Java 1.8
> Flink 1.9.x and 1.10.x
>Reporter: Israel Ekpo
>Priority: Critical
>
> In the POM.xml classes from certain packages are relocated and filtered out 
> of the final jar In the POM.xml classes from certain packages are relocated 
> and filtered out of the final jar 
>  This is causing errors for customers and users that are using the 
> StreamingFileSink with Azure Blob Storage in Flink versions 1.9.x, 1.10.x and 
> possibly 1.11.x 
> [https://github.com/apache/flink/blob/release-1.9/flink-filesystems/flink-azure-fs-hadoop/pom.xml#L170|https://github.com/apache/flink/blob/release-1.9/flink-filesystems/flink-azure-fs-hadoop/pom.xml#L170https://github.com/apache/flink/blob/release-1.9/flink-filesystems/flink-fs-hadoop-shaded/pom.xml#L233]
> [https://github.com/apache/flink/blob/release-1.9/flink-filesystems/flink-fs-hadoop-shaded/pom.xml#L233|https://github.com/apache/flink/blob/release-1.9/flink-filesystems/flink-azure-fs-hadoop/pom.xml#L170https://github.com/apache/flink/blob/release-1.9/flink-filesystems/flink-fs-hadoop-shaded/pom.xml#L233]
>  
> I would like to know why the relocation is happening and the reasoning behind 
> the exclusion and filtering of the classes 
>  It seems to affect just the Azure file systems in my sample implementations
>   
> {code:java}
> String outputPath = 
> "wasbs://contai...@account.blob.core.windows.net/streaming-output/";
> final StreamingFileSink sink = StreamingFileSink .forRowFormat(new 
> Path(outputPath), new SimpleStringEncoder("UTF-8")) .build();
> stream.addSink(sink);
> // execute programenv.execute(StreamingJob.class.getCanonicalName()); 
> {code}
> {code:java}
> //Exception Details Below
> 2020-05-27 17:23:16java.lang.NoClassDefFoundError: 
> org/apache/flink/fs/azure/common/hadoop/HadoopRecoverableWriter at 
> org.apache.flink.fs.azure.common.hadoop.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
>  at 
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
>  at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.(Buckets.java:112)
>  at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink$RowFormatBuilder.createBuckets(StreamingFileSink.java:242)
>  at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.initializeState(StreamingFileSink.java:327)
>  at 
> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
>  at 
> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
>  at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
>  at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:281)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395)
>  at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at 
> java.lang.Thread.run(Thread.java:748)Caused by: 
> java.lang.ClassNotFoundException: 
> org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:382) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:418) at 
> org.apache.flink.util.ChildFirstClassLoader.loadClass(ChildFirstClassLoader.java:60)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:351){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-statefun] tzulitai opened a new pull request #120: [FLINK-17875] [core] State TTL for remote functions

2020-05-28 Thread GitBox


tzulitai opened a new pull request #120:
URL: https://github.com/apache/flink-statefun/pull/120


   This PR is based on the refactoring work in #119, and completes the initial 
support for state TTL in remote functions. Only last 5 commits are relevant.
   
   Users define state TTL in their YAML modules like so:
   ```
   functions:
 - function:
states:
  - name: 
expireAfter: 6millisecond or 5sec / etc. # optional key
   ```
   
   The current implementation has some limitations due to how we are 
multiplexing remote function's user state in a single PersistedTable (see 
FLINK-17954):
   
   - The actual TTL being set will be the longest duration across all 
registered state
   - The only supported expiration mode now is `AFTER_READ_AND_WRITE`. That can 
be added as a `expireMode` key in the YAML spec in the future.
   
   ## Brief change log
   
   - 1dbfa07 to 199d3fd Introduces a `StateSpec` class which captures the state 
name and configured expiration timeout.
   - 435060f Updates the `FunctionJsonEntity` to recognize the new format (for 
version 2.0)
   - fb76fc5 Parameterize `JsonModuleTest` so that it runs tests for both v1.0 
and v2.0 formats.
   
   ## Verifying
   
   - The parameterized `JsonModuleTest` should cover this change.
   - Manually verified functionality by locally running a modified version of 
the Python greeter example, with state TTL enabled.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rkhachatryan commented on a change in pull request #12364: [FLINK-17986] Fix check in FsCheckpointStateOutputStream.write

2020-05-28 Thread GitBox


rkhachatryan commented on a change in pull request #12364:
URL: https://github.com/apache/flink/pull/12364#discussion_r431639052



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/FsCheckpointStreamFactoryTest.java
##
@@ -57,6 +57,23 @@ public void createStateDirectories() throws IOException {
//  tests
// 

 
+   @Test
+   @SuppressWarnings("ConstantConditions")
+   public void testWriteFlushesIfAboveThreshold() throws IOException {
+   int fileSizeThreshold = 100;
+   final FsCheckpointStreamFactory factory = 
createFactory(FileSystem.getLocalFileSystem(), fileSizeThreshold, 
fileSizeThreshold);
+   final FsCheckpointStreamFactory.FsCheckpointStateOutputStream 
stream = 
factory.createCheckpointStateOutputStream(CheckpointedStateScope.EXCLUSIVE);
+   stream.write(new byte[fileSizeThreshold]);
+   File[] files = new File(exclusiveStateDir.toUri()).listFiles();
+   assertEquals(1, files.length);
+   File file = files[0];
+   assertEquals(fileSizeThreshold, file.length());
+   for (int i = 0; i < fileSizeThreshold; i++) {
+   stream.write(127); // should buffer without flushing
+   }

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17726) Scheduler should take care of tasks directly canceled by TaskManager

2020-05-28 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118408#comment-17118408
 ] 

Till Rohrmann commented on FLINK-17726:
---

Hi [~nicholasjiang], what's the state of this issue?

> Scheduler should take care of tasks directly canceled by TaskManager
> 
>
> Key: FLINK-17726
> URL: https://issues.apache.org/jira/browse/FLINK-17726
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Zhu Zhu
>Assignee: Nicholas Jiang
>Priority: Critical
> Fix For: 1.11.0, 1.12.0
>
>
> JobManager will not trigger failure handling when receiving CANCELED task 
> update. 
> This is because CANCELED tasks are usually caused by another FAILED task. 
> These CANCELED tasks will be restarted by the failover process triggered  
> FAILED task.
> However, if a task is directly CANCELED by TaskManager due to its own runtime 
> issue, the task will not be recovered by JM and thus the job would hang.
> This is a potential issue and we should avoid it.
> A possible solution is to let JobManager treat tasks transitioning to 
> CANCELED from all states except from CANCELING as failed tasks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] curcur edited a comment on pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-28 Thread GitBox


curcur edited a comment on pull request #12353:
URL: https://github.com/apache/flink/pull/12353#issuecomment-635161998


   Hey @AHeise , I've spent some time walking through the code. I think the bug 
is clear that the consumer's  reference counter of `bufferBuilder` for 
`randomEmit` is not counted correctly. But I was wondering whether this fix 
work as expected in the case if the `record` to `randomEmit` is more than one 
buffer able to hold.
   
   In `BroadcastRandomWriter#randomEmit`, random triggered data is emitted in 
the line of `emit(record, targetChannelIndex);`
   
   If you trace the code down, eventually data is serialized through 
`RecordWriter#copyFromSerializerToTargetChannel`
   
   and there, `requestNewBufferBuilder` is requested multiple times if the data 
to serialize is more than one buffer can hold.
   
   And as you can see, all the references to the remaining bufferBuilders are 
lost. Since the `addConsumer` in `randomEmit` is done after `emit(record, 
targetChannelIndex)`, it looks like so purely from the code perspective.
   
   But I am not quite familiar with this part of code, not sure whether this 
will cause real problems. 
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] curcur edited a comment on pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-28 Thread GitBox


curcur edited a comment on pull request #12353:
URL: https://github.com/apache/flink/pull/12353#issuecomment-635161998


   Hey @AHeise , I've spent some time walking through the code. I think the bug 
is clear that the consumer's  reference counter of `bufferBuilder` for 
`randomEmit` is not counted correctly. But I was a bit confused whether this 
fix work as expected in the case if the `record` to `randomEmit` is more than 
one buffer able to hold. But I am not an expert, could be wrong :-)
   
   In `BroadcastRandomWriter#randomEmit`, random triggered data is emitted in 
the line of `emit(record, targetChannelIndex);`
   
   If trace the code down, eventually data is serialized through 
`RecordWriter#copyFromSerializerToTargetChannel`
   
   and there, `requestNewBufferBuilder` is requested multiple times if the data 
to serialize is more than one buffer can hold.
   
   And as you can see, all the references to the remaining bufferBuilders are 
lost. Since the `addConsumer` in `randomEmit` is done after `emit(record, 
targetChannelIndex)`, it looks like so purely from the code perspective.
   
   But I am not quite familiar with this part of code, not sure whether this 
will cause real problems. 
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] curcur edited a comment on pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-28 Thread GitBox


curcur edited a comment on pull request #12353:
URL: https://github.com/apache/flink/pull/12353#issuecomment-635161998


   Hey @AHeise , I've spent some time walking through the code. I think the bug 
is clear that the consumer's  reference counter of `bufferBuilder` for 
`randomEmit` is not counted correctly. But I was a bit confused whether this 
fix work as expected in the case if the `record` to `randomEmit` is more than 
one buffer able to hold. But I am not an expert, could be wrong :-)
   
   In `BroadcastRandomWriter#randomEmit`, random triggered data is emitted in 
the line of `emit(record, targetChannelIndex);`
   
   If tracing the code down, eventually data is serialized through 
`RecordWriter#copyFromSerializerToTargetChannel`
   
   and there, `requestNewBufferBuilder` is requested multiple times if the data 
to serialize is more than one buffer can hold.
   
   And as you can see, all the references to the remaining bufferBuilders are 
lost. Since the `addConsumer` in `randomEmit` is done after `emit(record, 
targetChannelIndex)`, it looks like so purely from the code perspective.
   
   But I am not quite familiar with this part of code, not sure whether this 
will cause real problems. 
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12264: [FLINK-17558][netty] Release partitions asynchronously

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12264:
URL: https://github.com/apache/flink/pull/12264#issuecomment-631349883


   
   ## CI report:
   
   * 19c5f57b94cc56b70002031618c32d9e6f68effb UNKNOWN
   * bb313e40f5a72dbf20cd0a8b48267063fd4f00af UNKNOWN
   * 8cf00b9f6fd8b76256883eedbdb8e79dea3c35dc Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2303)
 
   * c5dcd45e890a076e0a66d4e6e24fdab192e3a724 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12355: [FLINK-17893] [sql-client] SQL CLI should print the root cause if the statement is invalid

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12355:
URL: https://github.com/apache/flink/pull/12355#issuecomment-634548726


   
   ## CI report:
   
   * 1c6c2a8dc337e9d01bfbe48af599441571b0d4ee Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2321)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12320: [FLINK-17887][table][connector] Improve interface of ScanFormatFactory and SinkFormatFactory

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12320:
URL: https://github.com/apache/flink/pull/12320#issuecomment-633558182


   
   ## CI report:
   
   * 1ad6eea2421dd2c9950f19ff2fd5ee54bb66c6f0 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2327)
 
   * 6023f5f43cffbe959b1c35092e2ed4a79f2ed09c UNKNOWN
   * 80bfdeac6274c0cfa3218da26c88f7e05d517f86 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12364: [FLINK-17986] Fix check in FsCheckpointStateOutputStream.write

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12364:
URL: https://github.com/apache/flink/pull/12364#issuecomment-634816980


   
   ## CI report:
   
   * 6eef6ad43c519d62837d4e22246a4090a1aa1bd8 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2302)
 
   * aa65f31c2e89e08a8671a3748e2a516ee183220c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12373: [FLINK-17865][checkpoint] Increase default size of 'state.backend.fs.memory-threshold'

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12373:
URL: https://github.com/apache/flink/pull/12373#issuecomment-635160213


   
   ## CI report:
   
   * 5771a99b8c3a5491e4114c3e3c75a1d86a1c17b3 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2337)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12372: [FLINK-17946][python] fix the bug that the config option 'pipeline.jars' doesn't work.

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12372:
URL: https://github.com/apache/flink/pull/12372#issuecomment-635142364


   
   ## CI report:
   
   * 65c92cf7403d9a42250667682184481e3dd76c92 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2335)
 
   * b9759904d50a3d2f99aa9dbbcb42dc3b2d64f5b9 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12374: [FLINK-17992][checkpointing] Exception from RemoteInputChannel#onBuffer should not fail the whole NetworkClientHandler

2020-05-28 Thread GitBox


flinkbot commented on pull request #12374:
URL: https://github.com/apache/flink/pull/12374#issuecomment-635171486


   
   ## CI report:
   
   * eeaa680286b6dd389acebbf1d8695ba94ff5ec59 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] curcur edited a comment on pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-28 Thread GitBox


curcur edited a comment on pull request #12353:
URL: https://github.com/apache/flink/pull/12353#issuecomment-635161998


   Hey @AHeise , I've spent some time walking through the code. I think the bug 
is clear that the consumer's  reference counter of `bufferBuilder` for 
`randomEmit` is not counted correctly. But I was a bit confused whether this 
fix work as expected in the case if the `record` to `randomEmit` is more than 
one buffer able to hold. But I am not an expert, could be wrong :-)
   
   In `BroadcastRandomWriter#randomEmit`, random triggered data is emitted in 
the line of `emit(record, targetChannelIndex);`
   
   If tracing the code down, eventually data is serialized through 
`RecordWriter#copyFromSerializerToTargetChannel`
   
   and there, `requestNewBufferBuilder` is requested multiple times if the data 
to serialize is more than one buffer can hold.
   
   And as you can see, all the references to the remaining bufferBuilders are 
lost. Since the `addConsumer` in `randomEmit` is done after `emit(record, 
targetChannelIndex)`, it looks like so purely from the code perspective.
   
   But I am not quite familiar with this part of code, not sure whether this 
will cause real problems. 
   Or we can include a test to see whether it works as expected in this case?
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12365: [FLINK-17988][checkpointing] Discard only unique channel state delegates

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12365:
URL: https://github.com/apache/flink/pull/12365#issuecomment-634981483


   
   ## CI report:
   
   * fbdee39abdb6795fb67caeb907adf0f9d2619d8b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2308)
 
   * 74a2a8ec4d2d6510b67af3705d2a0bf50f05ab3e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhuzhurk opened a new pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

2020-05-28 Thread GitBox


zhuzhurk opened a new pull request #12375:
URL: https://github.com/apache/flink/pull/12375


   ## What is the purpose of the change
   
   This PR introduces a BulkSlotProvider which supports bulk slot allocation. 
In this way we are able to check whether the resource requirements of a slot 
request bulk can be fulfilled at the same time.
   
   ## Brief change log
   
 - *Enabled to set and get whether a physical slot will be occupied 
indefinitely*
 - *Introduced BulkSlotProvider and its default implementation*
   
   ## Verifying this change
   
 - *Added AllocatedSlotOccupationTest*
 - *Added BulkSlotAllocationTest*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (**yes** / no / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

2020-05-28 Thread GitBox


flinkbot commented on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635177408


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 4fe765587c0423b2d29d5dde946b94edff79398b (Thu May 28 
07:51:24 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhuzhurk commented on pull request #12256: [FLINK-17018][runtime] Allocates slots in bulks for pipelined region scheduling

2020-05-28 Thread GitBox


zhuzhurk commented on pull request #12256:
URL: https://github.com/apache/flink/pull/12256#issuecomment-635178001


   @azagrebin @GJL #12375 is opened for bulk slot allocation part with a 
`BulkSlotProvider` introduced. Related comments above are also addressed in it.
   Would you take a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] tillrohrmann closed pull request #12357: [FLINK-17958][core] Fix MathUtils#divideRoundUp bug for handling zero or negative values.

2020-05-28 Thread GitBox


tillrohrmann closed pull request #12357:
URL: https://github.com/apache/flink/pull/12357


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-statefun] tzulitai commented on pull request #118: [FLINK-17963] [core] Revert execution environment patching in StatefulFunctionsJob

2020-05-28 Thread GitBox


tzulitai commented on pull request #118:
URL: https://github.com/apache/flink-statefun/pull/118#issuecomment-635181173


   Also manually verified that with this revert, settings in `flink-conf.yaml` 
are being picked up.
   
   Merging this ...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-17958) Kubernetes session constantly allocates taskmanagers after cancel a job

2020-05-28 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-17958.
-
Resolution: Fixed

Fixed via

master: 1386e0db962a794e5cb78b72917a1c5340bd027e
1.11.0: 2cac04d7a74ff8fa7987bd137056381020775f07

> Kubernetes session constantly allocates taskmanagers after cancel a job
> ---
>
> Key: FLINK-17958
> URL: https://issues.apache.org/jira/browse/FLINK-17958
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Yang Wang
>Assignee: Xintong Song
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> When i am testing the {{kubernetes-session.sh}}, i find that the 
> {{KubernetesResourceManager}} will constantly allocate taskmanager after 
> cancel a job. I think it may be caused by a bug of the following code. When 
> the {{dividend}} is 0 and {{divisor}} is bigger than 1, the return value will 
> be 1. However, we expect it to be 0.
> {code:java}
> /**
>  * Divide and rounding up to integer.
>  * E.g., divideRoundUp(3, 2) returns 2.
>  * @param dividend value to be divided by the divisor
>  * @param divisor value by which the dividend is to be divided
>  * @return the quotient rounding up to integer
>  */
> public static int divideRoundUp(int dividend, int divisor) {
>return (dividend - 1) / divisor + 1;
> }{code}
>  
> How to reproduce this issue?
>  # Start a Kubernetes session
>  # Submit a Flink job to the existing session
>  # Cancel the job and wait for the TaskManager released via idle timeout
>  # More and more TaskManagers will be allocated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-statefun] tzulitai closed pull request #118: [FLINK-17963] [core] Revert execution environment patching in StatefulFunctionsJob

2020-05-28 Thread GitBox


tzulitai closed pull request #118:
URL: https://github.com/apache/flink-statefun/pull/118


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12320: [FLINK-17887][table][connector] Improve interface of ScanFormatFactory and SinkFormatFactory

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12320:
URL: https://github.com/apache/flink/pull/12320#issuecomment-633558182


   
   ## CI report:
   
   * 1ad6eea2421dd2c9950f19ff2fd5ee54bb66c6f0 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2327)
 
   * 6023f5f43cffbe959b1c35092e2ed4a79f2ed09c UNKNOWN
   * 80bfdeac6274c0cfa3218da26c88f7e05d517f86 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2338)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12374: [FLINK-17992][checkpointing] Exception from RemoteInputChannel#onBuffer should not fail the whole NetworkClientHandler

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12374:
URL: https://github.com/apache/flink/pull/12374#issuecomment-635171486


   
   ## CI report:
   
   * eeaa680286b6dd389acebbf1d8695ba94ff5ec59 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2341)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] liuyongvs commented on pull request #12144: [FLINK-17384][flink-dist] support read hbase conf dir from flink.conf.

2020-05-28 Thread GitBox


liuyongvs commented on pull request #12144:
URL: https://github.com/apache/flink/pull/12144#issuecomment-635184211


   > Thanks for the contribution @liuyongvs ! Could you add a unit test 
referring to 
[HadoopConfigLoadingTest#loadFromEnvVariables](https://github.com/apache/flink/blob/1cd696d92c3e088a5bd8e5e11b54aacf46e92ae8/flink-filesystems/flink-hadoop-fs/src/test/java/org/apache/flink/runtime/fs/hdfs/HadoopConfigLoadingTest.java#L119)
 as a safeguard? Thanks.
   
   hi @carp84 , i think it may not add unit test in 
HadoopConfigLoadingTest#loadFromEnvVariables 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] zhuzhurk commented on a change in pull request #12256: [FLINK-17018][runtime] Allocates slots in bulks for pipelined region scheduling

2020-05-28 Thread GitBox


zhuzhurk commented on a change in pull request #12256:
URL: https://github.com/apache/flink/pull/12256#discussion_r431653239



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/slotpool/SlotOccupationTest.java
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.jobmaster.slotpool;
+
+import org.apache.flink.runtime.clusterframework.types.AllocationID;
+import org.apache.flink.runtime.clusterframework.types.ResourceProfile;
+import 
org.apache.flink.runtime.executiongraph.utils.SimpleAckingTaskManagerGateway;
+import org.apache.flink.runtime.jobmanager.scheduler.Locality;
+import org.apache.flink.runtime.jobmanager.slots.TestingSlotOwner;
+import org.apache.flink.runtime.jobmaster.LogicalSlot;
+import org.apache.flink.runtime.jobmaster.SlotRequestId;
+import org.apache.flink.runtime.taskmanager.LocalTaskManagerLocation;
+import org.apache.flink.util.TestLogger;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Tests whether the slot occupation works correctly.
+ */
+public class SlotOccupationTest extends TestLogger {
+
+   @Test
+   public void testSingleTaskOccupyingSlotIndefinitely() {
+   final PhysicalSlot physicalSlot = createPhysicalSlot();
+   allocateSingleLogicalSlotFromPhysicalSlot(physicalSlot, true);
+
+   assertTrue(physicalSlot.willBeOccupiedIndefinitely());
+   }
+
+   @Test
+   public void testSingleTaskNotOccupyingSlotIndefinitely() {
+   final PhysicalSlot physicalSlot = createPhysicalSlot();
+   allocateSingleLogicalSlotFromPhysicalSlot(physicalSlot, true);
+
+   assertTrue(physicalSlot.willBeOccupiedIndefinitely());

Review comment:
   done.

##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/slotpool/SlotPoolImpl.java
##
@@ -469,6 +469,47 @@ public void releaseSlot(@Nonnull SlotRequestId 
slotRequestId, @Nullable Throwabl
.collect(Collectors.toList());
}
 
+   @Override
+   public boolean isSlotRequestBulkFulfillable(final 
PhysicalSlotRequestBulk slotRequestBulk) {
+   final Set assignedSlots = new 
HashSet<>(slotRequestBulk.getFulfilledRequests().values());
+   final Set reusableSlots = 
getReusableSlots(assignedSlots);
+   return 
areRequestsFulfillableWithSlots(slotRequestBulk.getPendingRequests().values(), 
reusableSlots);
+   }
+
+   private Set getReusableSlots(final Set 
slotsToExclude) {
+   return Stream
+   .concat(
+   getAvailableSlotsInformation().stream(),
+   getAllocatedSlotsInformation().stream())
+   .filter(slotInfo -> 
!slotInfo.willBeOccupiedIndefinitely())
+   .filter(slotInfo -> 
!slotsToExclude.contains(slotInfo.getAllocationId()))
+   .collect(Collectors.toSet());
+   }
+
+   private static boolean areRequestsFulfillableWithSlots(
+   final Collection requests,
+   final Set slots) {
+
+   final Set remainingSlots = new HashSet<>(slots);
+   for (PhysicalSlotRequest request : requests) {
+   final Optional matchedSlot = 
findMatchingSlotForRequest(request, remainingSlots);
+   if (matchedSlot.isPresent()) {
+   remainingSlots.remove(matchedSlot.get());
+   } else {
+   return false;
+   }
+   }
+   return true;
+   }
+
+   private static Optional findMatchingSlotForRequest(
+   final PhysicalSlotRequest request,
+   final Collection slots) {
+
+   final ResourceProfile requiredResource = 
request.getSlotProfile().getPhysicalSlotResourceProfile();

Review comment:
   done.





This is an automated message from the Apache Git Service.
To respond to the me

[GitHub] [flink] flinkbot commented on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

2020-05-28 Thread GitBox


flinkbot commented on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   
   ## CI report:
   
   * 4fe765587c0423b2d29d5dde946b94edff79398b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12365: [FLINK-17988][checkpointing] Discard only unique channel state delegates

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12365:
URL: https://github.com/apache/flink/pull/12365#issuecomment-634981483


   
   ## CI report:
   
   * fbdee39abdb6795fb67caeb907adf0f9d2619d8b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2308)
 
   * 74a2a8ec4d2d6510b67af3705d2a0bf50f05ab3e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2340)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12364: [FLINK-17986] Fix check in FsCheckpointStateOutputStream.write

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12364:
URL: https://github.com/apache/flink/pull/12364#issuecomment-634816980


   
   ## CI report:
   
   * 6eef6ad43c519d62837d4e22246a4090a1aa1bd8 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2302)
 
   * aa65f31c2e89e08a8671a3748e2a516ee183220c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2339)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-17994) Fix the race condition between CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceived

2020-05-28 Thread Zhijiang (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijiang updated FLINK-17994:
-
Description: 
The race condition issue happens as follow:
 * ch1 is received from network for one input channel by netty thread and 
schedule the ch1 into mailbox via #notifyBarrierReceived
 * ch2 is received from network for another input channel by netty thread, but 
before calling #notifyBarrierReceived this barrier was inserted into channel's 
data queue in advance. Then it would cause task thread process ch2 earlier than 
#notifyBarrierReceived by netty thread.
 * Task thread would execute checkpoint for ch2 immediately because ch2 > ch1.
 * After that, the previous scheduled ch1 is performed from mailbox by task 
thread, then it causes the IllegalArgumentException inside 
SubtaskCheckpointCoordinatorImpl#checkpointState because it breaks the initial 
assumption that checkpoint is executed in incremental way for aligned mode.

The key problem is that we can not remove the checkpoint action from mailbox 
queue before the next checkpoint is going to execute now. One possible solution 
is that we record the previous aborted checkpoint id inside 
SubtaskCheckpointCoordinatorImpl#abortedCheckpointIds, then when the queued 
checkpoint inside mailbox is executing, it will exit directly if found the 
checkpoint id was already aborted before.

  was:
The race condition issue happens as follow:
 * ch1 is received from network by netty thread and schedule the ch1 into 
mailbox via #notifyBarrierReceived
 * ch2 is received from network by netty thread, but before calling 
#notifyBarrierReceived this barrier was inserted into channel's data queue in 
advance. Then it would cause task thread process ch2 earlier than 
#notifyBarrierReceived by netty thread.
 * Task thread would execute checkpoint for ch2 directly because ch2 > ch1.
 * After that, the previous scheduled ch1 is performed from mailbox by task 
thread, then it causes the IllegalArgumentException inside 
SubtaskCheckpointCoordinatorImpl#checkpointState because it breaks the 
assumption that checkpoint is executed in incremental way. 

One possible solution for this race condition is inserting the received barrier 
into channel's data queue after calling #notifyBarrierReceived, then we can 
make the assumption that the checkpoint is always triggered by netty thread, to 
simplify the current situation that checkpoint might be triggered either by 
task thread or netty thread. 

To do so we can also avoid accessing #notifyBarrierReceived method by task 
thread while processing the barrier to simplify the logic inside 
CheckpointBarrierUnaligner.


> Fix the race condition between CheckpointBarrierUnaligner#processBarrier and 
> #notifyBarrierReceived
> ---
>
> Key: FLINK-17994
> URL: https://issues.apache.org/jira/browse/FLINK-17994
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Reporter: Zhijiang
>Assignee: Zhijiang
>Priority: Blocker
> Fix For: 1.11.0
>
>
> The race condition issue happens as follow:
>  * ch1 is received from network for one input channel by netty thread and 
> schedule the ch1 into mailbox via #notifyBarrierReceived
>  * ch2 is received from network for another input channel by netty thread, 
> but before calling #notifyBarrierReceived this barrier was inserted into 
> channel's data queue in advance. Then it would cause task thread process ch2 
> earlier than #notifyBarrierReceived by netty thread.
>  * Task thread would execute checkpoint for ch2 immediately because ch2 > ch1.
>  * After that, the previous scheduled ch1 is performed from mailbox by task 
> thread, then it causes the IllegalArgumentException inside 
> SubtaskCheckpointCoordinatorImpl#checkpointState because it breaks the 
> initial assumption that checkpoint is executed in incremental way for aligned 
> mode.
> The key problem is that we can not remove the checkpoint action from mailbox 
> queue before the next checkpoint is going to execute now. One possible 
> solution is that we record the previous aborted checkpoint id inside 
> SubtaskCheckpointCoordinatorImpl#abortedCheckpointIds, then when the queued 
> checkpoint inside mailbox is executing, it will exit directly if found the 
> checkpoint id was already aborted before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17963) Revert execution environment patching in StatefulFunctionsJob (FLINK-16926)

2020-05-28 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-17963:

Affects Version/s: (was: statefun-2.1.0)
   (was: statefun-2.0.1)

> Revert execution environment patching in StatefulFunctionsJob (FLINK-16926)
> ---
>
> Key: FLINK-17963
> URL: https://issues.apache.org/jira/browse/FLINK-17963
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
>  Labels: pull-request-available
>
> In FLINK-16926, we explicitly "patched" the {{StreamExecutionEnvironment}} 
> due to FLINK-16560.
> Now that we have upgraded the Flink version in StateFun to 1.10.1 which 
> includes a fix for FLINK-16560, we can now revert the patching of 
> {{StreamExecutionEnvironment}} in the {{StatefulFunctionsJob}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17963) Revert execution environment patching in StatefulFunctionsJob (FLINK-16926)

2020-05-28 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-17963:

Fix Version/s: statefun-2.1.0
   statefun-2.0.1

> Revert execution environment patching in StatefulFunctionsJob (FLINK-16926)
> ---
>
> Key: FLINK-17963
> URL: https://issues.apache.org/jira/browse/FLINK-17963
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: statefun-2.0.1, statefun-2.1.0
>
>
> In FLINK-16926, we explicitly "patched" the {{StreamExecutionEnvironment}} 
> due to FLINK-16560.
> Now that we have upgraded the Flink version in StateFun to 1.10.1 which 
> includes a fix for FLINK-16560, we can now revert the patching of 
> {{StreamExecutionEnvironment}} in the {{StatefulFunctionsJob}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-17963) Revert execution environment patching in StatefulFunctionsJob (FLINK-16926)

2020-05-28 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai closed FLINK-17963.
---
Resolution: Fixed

master: c14d07af0853d698fadaf183468c591f7101067c
release-2.0: 89fdde49b19919375b74f006b528fb8f3ac76e15

> Revert execution environment patching in StatefulFunctionsJob (FLINK-16926)
> ---
>
> Key: FLINK-17963
> URL: https://issues.apache.org/jira/browse/FLINK-17963
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: statefun-2.0.1, statefun-2.1.0
>
>
> In FLINK-16926, we explicitly "patched" the {{StreamExecutionEnvironment}} 
> due to FLINK-16560.
> Now that we have upgraded the Flink version in StateFun to 1.10.1 which 
> includes a fix for FLINK-16560, we can now revert the patching of 
> {{StreamExecutionEnvironment}} in the {{StatefulFunctionsJob}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17997) Revert manual merging of AWS KPL's THIRD_PARTY_NOTICE files content in Stateful Functions distribution artifact

2020-05-28 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-17997:
---

 Summary: Revert manual merging of AWS KPL's THIRD_PARTY_NOTICE 
files content in Stateful Functions distribution artifact
 Key: FLINK-17997
 URL: https://issues.apache.org/jira/browse/FLINK-17997
 Project: Flink
  Issue Type: Task
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai
 Fix For: statefun-2.0.1, statefun-2.1.0


We manually merged the contents in FLINK-16901, because at the time the 
upstream Flink Kinesis connector wasn't yet properly handling the content.

Now that this is fixed upstream, we can revert the fix in StateFun.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-17997) Revert manual merging of AWS KPL's THIRD_PARTY_NOTICE files content in Stateful Functions distribution artifact

2020-05-28 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-17997:

Component/s: Stateful Functions

> Revert manual merging of AWS KPL's THIRD_PARTY_NOTICE files content in 
> Stateful Functions distribution artifact
> ---
>
> Key: FLINK-17997
> URL: https://issues.apache.org/jira/browse/FLINK-17997
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.0.1, statefun-2.1.0
>
>
> We manually merged the contents in FLINK-16901, because at the time the 
> upstream Flink Kinesis connector wasn't yet properly handling the content.
> Now that this is fixed upstream, we can revert the fix in StateFun.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] GJL opened a new pull request #12376: [FLINK-17463][tests] Avoid concurrent directory creation and deletion

2020-05-28 Thread GitBox


GJL opened a new pull request #12376:
URL: https://github.com/apache/flink/pull/12376


   ## What is the purpose of the change
   
   *See commit*
   
   
   ## Brief change log
   
 - *See commit*
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as 
*BlobCacheCleanupTest*.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] dengziming closed pull request #12174: [FLINK-17736] Add flink cep example

2020-05-28 Thread GitBox


dengziming closed pull request #12174:
URL: https://github.com/apache/flink/pull/12174


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-17463) BlobCacheCleanupTest.testPermanentBlobCleanup:133->verifyJobCleanup:432 » FileAlreadyExists

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-17463:
---
Labels: pull-request-available test-stability  (was: test-stability)

> BlobCacheCleanupTest.testPermanentBlobCleanup:133->verifyJobCleanup:432 » 
> FileAlreadyExists
> ---
>
> Key: FLINK-17463
> URL: https://issues.apache.org/jira/browse/FLINK-17463
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Tests
>Affects Versions: 1.11.0
>Reporter: Robert Metzger
>Assignee: Gary Yao
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.0
>
>
> CI run: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=317&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=4ed44b66-cdd6-5dcf-5f6a-88b07dda665d
> {code}
> [ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 2.73 
> s <<< FAILURE! - in org.apache.flink.runtime.blob.BlobCacheCleanupTest
> [ERROR] 
> testPermanentBlobCleanup(org.apache.flink.runtime.blob.BlobCacheCleanupTest)  
> Time elapsed: 2.028 s  <<< ERROR!
> java.nio.file.FileAlreadyExistsException: 
> /tmp/junit7984674749832216773/junit1629420330972938723/blobStore-296d1a51-8917-4db1-a920-5d4e17e6fa36/job_3bafac5425979b4fe2fa2c7726f8dd5b
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>   at java.nio.file.Files.createDirectory(Files.java:674)
>   at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>   at java.nio.file.Files.createDirectories(Files.java:727)
>   at 
> org.apache.flink.runtime.blob.BlobUtils.getStorageLocation(BlobUtils.java:196)
>   at 
> org.apache.flink.runtime.blob.PermanentBlobCache.getStorageLocation(PermanentBlobCache.java:222)
>   at 
> org.apache.flink.runtime.blob.BlobServerCleanupTest.checkFilesExist(BlobServerCleanupTest.java:213)
>   at 
> org.apache.flink.runtime.blob.BlobCacheCleanupTest.verifyJobCleanup(BlobCacheCleanupTest.java:432)
>   at 
> org.apache.flink.runtime.blob.BlobCacheCleanupTest.testPermanentBlobCleanup(BlobCacheCleanupTest.java:133)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-statefun] tzulitai opened a new pull request #121: [FLINK-17997] [legal] Revert manual merging of AWS KPL's THIRD_PARTY_NOTICE content

2020-05-28 Thread GitBox


tzulitai opened a new pull request #121:
URL: https://github.com/apache/flink-statefun/pull/121


   We manually merged the contents in FLINK-16901, because at the time the 
upstream Flink Kinesis connector wasn't yet properly handling the content.
   
   Now that this is fixed upstream, we can revert the fix in StateFun.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-17997) Revert manual merging of AWS KPL's THIRD_PARTY_NOTICE files content in Stateful Functions distribution artifact

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-17997:
---
Labels: pull-request-available  (was: )

> Revert manual merging of AWS KPL's THIRD_PARTY_NOTICE files content in 
> Stateful Functions distribution artifact
> ---
>
> Key: FLINK-17997
> URL: https://issues.apache.org/jira/browse/FLINK-17997
> Project: Flink
>  Issue Type: Task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
>  Labels: pull-request-available
> Fix For: statefun-2.0.1, statefun-2.1.0
>
>
> We manually merged the contents in FLINK-16901, because at the time the 
> upstream Flink Kinesis connector wasn't yet properly handling the content.
> Now that this is fixed upstream, we can revert the fix in StateFun.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #12376: [FLINK-17463][tests] Avoid concurrent directory creation and deletion

2020-05-28 Thread GitBox


flinkbot commented on pull request #12376:
URL: https://github.com/apache/flink/pull/12376#issuecomment-635197812


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit d413281f6b19dd0379ed3012c94049763d914be8 (Thu May 28 
08:28:17 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17857) All Kubernetes e2e tests could not run on Mac after migration

2020-05-28 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118440#comment-17118440
 ] 

Yang Wang commented on FLINK-17857:
---

The root cause is when building the image we use {{localhost:}} to fetch 
the flink.tgz. However, in K8s e2e tests on Mac OS, actually we are building 
the image in virtual machine, not the physical machine. So the local host could 
not be used to downloaded the tgz.

> Why do we build the image in virtual machine?

Because we set {{eval $(minikube docker-env)}} in {{common_kubernetes.sh}}. 

> All Kubernetes e2e tests could not run on Mac after migration
> -
>
> Key: FLINK-17857
> URL: https://issues.apache.org/jira/browse/FLINK-17857
> Project: Flink
>  Issue Type: Test
>  Components: Deployment / Kubernetes, Tests
>Reporter: Yang Wang
>Priority: Major
> Fix For: 1.11.0
>
>
> In FLINK-17656, we migrate all the e2e tests from {{flink-container/docker}} 
> to {{apache/flink-docker}}. After the migration, when building a docker 
> image, we need to start a file server and then download it in the 
> {{Dockerfile}}.
> It works well in linux environment since the minikube is started in 
> "vm-driver=none". However, it is not true in the Mac. We usually start a 
> virtual machine for minikube and this will make "localhost" could not work.
> So i suggest to support local Flink dist when building the docker image, not 
> always need to download from a URL.
>  
> cc [~chesnay]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] EchoLee5 commented on pull request #12339: [FLINK-17744] StreamContextEnvironment#execute cannot be call JobListener#onJobExecuted

2020-05-28 Thread GitBox


EchoLee5 commented on pull request #12339:
URL: https://github.com/apache/flink/pull/12339#issuecomment-635200665


   
   @kl0u I got it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] xintongsong commented on a change in pull request #12370: [FLINK-17923][python] Allow Python worker to use off-heap memory

2020-05-28 Thread GitBox


xintongsong commented on a change in pull request #12370:
URL: https://github.com/apache/flink/pull/12370#discussion_r431671864



##
File path: 
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/plan/nodes/CommonPythonBase.scala
##
@@ -150,6 +154,35 @@ trait CommonPythonBase {
 }
 realEnv
   }
+
+  private def usingManagedMemory(config: Configuration): Boolean = {
+val clazz = loadClass("org.apache.flink.python.PythonOptions")
+config.getBoolean(clazz.getField("USE_MANAGED_MEMORY").get(null)
+  .asInstanceOf[ConfigOption[java.lang.Boolean]])
+  }
+
+  private def getPythonWorkerMemory(config: Configuration): Long = {
+val clazz = loadClass("org.apache.flink.python.PythonOptions")
+val pythonFrameworkMemorySize = MemorySize.parse(
+  config.getString(
+clazz.getField("PYTHON_FRAMEWORK_MEMORY_SIZE").get(null)
+  .asInstanceOf[ConfigOption[String]]))
+val pythonBufferMemorySize = MemorySize.parse(
+  config.getString(
+clazz.getField("PYTHON_DATA_BUFFER_MEMORY_SIZE").get(null)
+  .asInstanceOf[ConfigOption[String]]))
+pythonFrameworkMemorySize.add(pythonBufferMemorySize).getBytes
+  }
+
+  private def setDefaultConfigurations(config: Configuration): Unit = {
+if (!usingManagedMemory(config) && 
!config.contains(TaskManagerOptions.TASK_OFF_HEAP_MEMORY)) {
+  // Set the default value for the task off-heap memory if not set if the 
Python worker isn't
+  // using managed memory. Note that this is best effort attempt and if a 
task slot contains
+  // multiple Python workers, users should set the task off-heap memory 
manually.
+  config.setString(TaskManagerOptions.TASK_OFF_HEAP_MEMORY.key(),
+getPythonWorkerMemory(config) + "b")
+}

Review comment:
   I'm not sure about this behavior.
   I think this kind of auto magic is usually implicit and hard to understand.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] AHeise commented on pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-28 Thread GitBox


AHeise commented on pull request #12353:
URL: https://github.com/apache/flink/pull/12353#issuecomment-635203522


   > Hey @AHeise , I've spent some time walking through the code. I think the 
bug is clear that the consumer's reference counter of `bufferBuilder` for 
`randomEmit` is not counted correctly. But I was a bit confused whether this 
fix work as expected in the case if the `record` to `randomEmit` is more than 
one buffer able to hold. But I am not an expert, could be wrong :-)
   > 
   > In `BroadcastRandomWriter#randomEmit`, random triggered data is emitted in 
the line of `emit(record, targetChannelIndex);`
   > 
   > If tracing the code down, eventually data is serialized through 
`RecordWriter#copyFromSerializerToTargetChannel`
   > 
   > and there, `requestNewBufferBuilder` is requested multiple times if the 
data to serialize is more than one buffer can hold.
   > 
   > And as you can see, all the references to the remaining bufferBuilders are 
lost. Since the `addConsumer` in `randomEmit` is done after `emit(record, 
targetChannelIndex)`, it looks like so purely from the code perspective.
   > 
   > But I am not quite familiar with this part of code, not sure whether this 
will cause real problems.
   > Or we can include a test to see whether it works as expected in this case?
   
   I think this is actually by design (it's not modifying that behavior from 
@zhijiangW ). 
   
   If a record is randomly emitted that is longer than one buffer, only the 
last buffer is shared ultimately between all channels. The other buffers have 
no relevant data for the non-target channel.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12144: [FLINK-17384][flink-dist] support read hbase conf dir from flink.conf.

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12144:
URL: https://github.com/apache/flink/pull/12144#issuecomment-628475376


   
   ## CI report:
   
   * 5addfcff9eb59b5153e76627ed74e7255fe8b69b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1292)
 
   * 5ab91c03e9eb771441a920876047849406f43261 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12264: [FLINK-17558][netty] Release partitions asynchronously

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12264:
URL: https://github.com/apache/flink/pull/12264#issuecomment-631349883


   
   ## CI report:
   
   * 19c5f57b94cc56b70002031618c32d9e6f68effb UNKNOWN
   * bb313e40f5a72dbf20cd0a8b48267063fd4f00af UNKNOWN
   * 8cf00b9f6fd8b76256883eedbdb8e79dea3c35dc Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2303)
 
   * c5dcd45e890a076e0a66d4e6e24fdab192e3a724 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2344)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12376: [FLINK-17463][tests] Avoid concurrent directory creation and deletion

2020-05-28 Thread GitBox


flinkbot commented on pull request #12376:
URL: https://github.com/apache/flink/pull/12376#issuecomment-635204220


   
   ## CI report:
   
   * d413281f6b19dd0379ed3012c94049763d914be8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12372: [FLINK-17946][python] fix the bug that the config option 'pipeline.jars' doesn't work.

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12372:
URL: https://github.com/apache/flink/pull/12372#issuecomment-635142364


   
   ## CI report:
   
   * 65c92cf7403d9a42250667682184481e3dd76c92 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2335)
 
   * b9759904d50a3d2f99aa9dbbcb42dc3b2d64f5b9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2345)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12375: [FLINK-17017][runtime] Implements bulk allocation for physical slots

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12375:
URL: https://github.com/apache/flink/pull/12375#issuecomment-635184560


   
   ## CI report:
   
   * 4fe765587c0423b2d29d5dde946b94edff79398b Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2346)
 
   * c0483c6347992b8c4412da489b5584879834c396 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12367: [FLINK-17960][python][docs] Improve commands in the "Common Questions" document for PyFlink

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12367:
URL: https://github.com/apache/flink/pull/12367#issuecomment-635056734


   
   ## CI report:
   
   * 4b6d0edc58e4ae0b5ab77f65a73f9219dd6eee16 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2326)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-28 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118460#comment-17118460
 ] 

Xintong Song edited comment on FLINK-17923 at 5/28/20, 8:47 AM:


[~dian.fu], [~sunjincheng121]

I'm not familiar with the flink-python code base, thus I cannot speak much to 
the PR.

My only concern is regarding auto-magically setting task.off-heap.size for 
users. I wonder whether we are trying to be a bit over-smart. It might safe 
some user efforts in many cases, but could also make things hard to understand 
in other cases. It is one of the main motivations for FLIP-49 to make sure all 
the memory calculations happen at one place, without such kind of implicit 
logics.

I would suggest not to override the task.off-heap.size configuration. Instead, 
we can suggest how to set this configuration in both memory configuration and 
python udf docs. This is similar to RocksDBStateBackend, when managed memory is 
disabled users need to explicitly make sure enough native memory is reserved 
for RocksDB.

WDYT?


was (Author: xintongsong):
[~dian.fu][~sunjincheng121]

I'm not familiar with the flink-python code base, thus I cannot speak much to 
the PR.

My only concern is regarding auto-magically setting task.off-heap.size for 
users. I wonder whether we are trying to be a bit over-smart. It might safe 
some user efforts in many cases, but could also make things hard to understand 
in other cases. It is one of the main motivations for FLIP-49 to make sure all 
the memory calculations happen at one place, without such kind of implicit 
logics.

I would suggest not to override the task.off-heap.size configuration. Instead, 
we can suggest how to set this configuration in both memory configuration and 
python udf docs. This is similar to RocksDBStateBackend, when managed memory is 
disabled users need to explicitly make sure enough native memory is reserved 
for RocksDB.

WDYT?

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
>

[jira] [Commented] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-28 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118460#comment-17118460
 ] 

Xintong Song commented on FLINK-17923:
--

[~dian.fu][~sunjincheng121]

I'm not familiar with the flink-python code base, thus I cannot speak much to 
the PR.

My only concern is regarding auto-magically setting task.off-heap.size for 
users. I wonder whether we are trying to be a bit over-smart. It might safe 
some user efforts in many cases, but could also make things hard to understand 
in other cases. It is one of the main motivations for FLIP-49 to make sure all 
the memory calculations happen at one place, without such kind of implicit 
logics.

I would suggest not to override the task.off-heap.size configuration. Instead, 
we can suggest how to set this configuration in both memory configuration and 
python udf docs. This is similar to RocksDBStateBackend, when managed memory is 
disabled users need to explicitly make sure enough native memory is reserved 
for RocksDB.

WDYT?

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>   ... 9 more
> Caused by: java.io.IOException: Failed to acquire shared cache resource for 
> RocksDB
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateS

[GitHub] [flink] dianfu commented on a change in pull request #12370: [FLINK-17923][python] Allow Python worker to use off-heap memory

2020-05-28 Thread GitBox


dianfu commented on a change in pull request #12370:
URL: https://github.com/apache/flink/pull/12370#discussion_r431679353



##
File path: 
flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/plan/nodes/CommonPythonBase.scala
##
@@ -150,6 +154,35 @@ trait CommonPythonBase {
 }
 realEnv
   }
+
+  private def usingManagedMemory(config: Configuration): Boolean = {
+val clazz = loadClass("org.apache.flink.python.PythonOptions")
+config.getBoolean(clazz.getField("USE_MANAGED_MEMORY").get(null)
+  .asInstanceOf[ConfigOption[java.lang.Boolean]])
+  }
+
+  private def getPythonWorkerMemory(config: Configuration): Long = {
+val clazz = loadClass("org.apache.flink.python.PythonOptions")
+val pythonFrameworkMemorySize = MemorySize.parse(
+  config.getString(
+clazz.getField("PYTHON_FRAMEWORK_MEMORY_SIZE").get(null)
+  .asInstanceOf[ConfigOption[String]]))
+val pythonBufferMemorySize = MemorySize.parse(
+  config.getString(
+clazz.getField("PYTHON_DATA_BUFFER_MEMORY_SIZE").get(null)
+  .asInstanceOf[ConfigOption[String]]))
+pythonFrameworkMemorySize.add(pythonBufferMemorySize).getBytes
+  }
+
+  private def setDefaultConfigurations(config: Configuration): Unit = {
+if (!usingManagedMemory(config) && 
!config.contains(TaskManagerOptions.TASK_OFF_HEAP_MEMORY)) {
+  // Set the default value for the task off-heap memory if not set if the 
Python worker isn't
+  // using managed memory. Note that this is best effort attempt and if a 
task slot contains
+  // multiple Python workers, users should set the task off-heap memory 
manually.
+  config.setString(TaskManagerOptions.TASK_OFF_HEAP_MEMORY.key(),
+getPythonWorkerMemory(config) + "b")
+}

Review comment:
   It's a best effort attempt and should work in most scenarios. It's only 
activated if managed memory is disabled for Python UDF and users have not 
config the task off-heap memory. In this case, it doesn't no harm to set a 
default task off-heap memory for Python UDF. 
   
   Besides, For Python users, ease of use is the most important thing. 
Configuring the task off-heap memory is not an easy thing and so we want to let 
users to think about that as little as possible.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-28 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118460#comment-17118460
 ] 

Xintong Song edited comment on FLINK-17923 at 5/28/20, 8:48 AM:


[~dian.fu], [~sunjincheng121]

I'm not familiar with the flink-python code base, thus I cannot speak much to 
the PR.

My only concern is regarding auto-magically setting task.off-heap.size for 
users. I wonder whether we are trying to be a bit over-smart. It might save 
some user efforts in many cases, but could also make things hard to understand 
in other cases. It is one of the main motivations for FLIP-49 to make sure all 
the memory calculations happen at one place, without such kind of implicit 
logics.

I would suggest not to override the task.off-heap.size configuration. Instead, 
we can suggest how to set this configuration in both memory configuration and 
python udf docs. This is similar to RocksDBStateBackend, when managed memory is 
disabled users need to explicitly make sure enough native memory is reserved 
for RocksDB.

WDYT?


was (Author: xintongsong):
[~dian.fu], [~sunjincheng121]

I'm not familiar with the flink-python code base, thus I cannot speak much to 
the PR.

My only concern is regarding auto-magically setting task.off-heap.size for 
users. I wonder whether we are trying to be a bit over-smart. It might safe 
some user efforts in many cases, but could also make things hard to understand 
in other cases. It is one of the main motivations for FLIP-49 to make sure all 
the memory calculations happen at one place, without such kind of implicit 
logics.

I would suggest not to override the task.off-heap.size configuration. Instead, 
we can suggest how to set this configuration in both memory configuration and 
python udf docs. This is similar to RocksDBStateBackend, when managed memory is 
disabled users need to explicitly make sure enough native memory is reserved 
for RocksDB.

WDYT?

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
>  

[GitHub] [flink] flinkbot edited a comment on pull request #12144: [FLINK-17384][flink-dist] support read hbase conf dir from flink.conf.

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12144:
URL: https://github.com/apache/flink/pull/12144#issuecomment-628475376


   
   ## CI report:
   
   * 5addfcff9eb59b5153e76627ed74e7255fe8b69b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=1292)
 
   * 5ab91c03e9eb771441a920876047849406f43261 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2347)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12369: [FLINK-17678][Connectors/HBase]Support fink-sql-connector-hbase

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12369:
URL: https://github.com/apache/flink/pull/12369#issuecomment-635090403


   
   ## CI report:
   
   * 518368f03c225bf1cdc2065382539f7f49f6c4e7 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2333)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-17678) Support fink-sql-connector-hbase

2020-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-17678:
---
Labels: pull-request-available  (was: )

> Support fink-sql-connector-hbase
> 
>
> Key: FLINK-17678
> URL: https://issues.apache.org/jira/browse/FLINK-17678
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / HBase
>Reporter: ShenDa
>Assignee: ShenDa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Currently, flink doesn't contains a hbase uber jar, so users have to add 
> hbase dependency manually.
> Could I create new module called flink-sql-connector-hbase like elasticsaerch 
> and kafka sql -connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12374: [FLINK-17992][checkpointing] Exception from RemoteInputChannel#onBuffer should not fail the whole NetworkClientHandler

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12374:
URL: https://github.com/apache/flink/pull/12374#issuecomment-635171486


   
   ## CI report:
   
   * eeaa680286b6dd389acebbf1d8695ba94ff5ec59 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2341)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12353: [FLINK-17322][network] Fixes BroadcastRecordWriter overwriting memory segments on first finished BufferConsumer.

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12353:
URL: https://github.com/apache/flink/pull/12353#issuecomment-634518102


   
   ## CI report:
   
   * 6141cd7164c5febb45e97f68bc9873dcd6789e21 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2255)
 
   * 1e0e6b50bfecaf86fac285ff25dfb32aa52dde2c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12376: [FLINK-17463][tests] Avoid concurrent directory creation and deletion

2020-05-28 Thread GitBox


flinkbot edited a comment on pull request #12376:
URL: https://github.com/apache/flink/pull/12376#issuecomment-635204220


   
   ## CI report:
   
   * d413281f6b19dd0379ed3012c94049763d914be8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2348)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-17916) Provide API to separate KafkaShuffle's Producer and Consumer to different jobs

2020-05-28 Thread Yuan Mei (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118473#comment-17118473
 ] 

Yuan Mei commented on FLINK-17916:
--

This one is moved to 1.12.0

PR is ready, the review is postponed after 1.11 release blockers are resovled.

> Provide API to separate KafkaShuffle's Producer and Consumer to different jobs
> --
>
> Key: FLINK-17916
> URL: https://issues.apache.org/jira/browse/FLINK-17916
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream, Connectors / Kafka
>Affects Versions: 1.11.0
>Reporter: Yuan Mei
>Assignee: Yuan Mei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Follow up of FLINK-15670
> *Separate sink (producer) and source (consumer) to different jobs*
>  * In the same job, a sink and a source are recovered independently according 
> to regional failover. However, they share the same checkpoint coordinator and 
> correspondingly, share the same global checkpoint snapshot.
>  * That says if the consumer fails, the producer can not commit written data 
> because of two-phase commit set-up (the producer needs a checkpoint-complete 
> signal to complete the second stage)
>  * Same applies to the producer
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-13782) Implement type inference for logic functions

2020-05-28 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz reassigned FLINK-13782:


Assignee: Dawid Wysakowicz

> Implement type inference for logic functions
> 
>
> Key: FLINK-13782
> URL: https://issues.apache.org/jira/browse/FLINK-13782
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Jingsong Lee
>Assignee: Dawid Wysakowicz
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-28 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118478#comment-17118478
 ] 

Dian Fu commented on FLINK-17923:
-

[~xintongsong] Appreciated your suggestions. It makes sense to me.
I want to adjust it a bit as following:
- Set Python UDF to use managed memory by default.
- If Python UDF and RocksDB is used together, throw exceptions with meaningful 
suggestions.
- If Python UDF is configured to use off-heap memory and the task off-heap 
memory could not meet the requirement, throw exceptions with meaningful 
suggestions.

In this case, when we support to let Python UDF and RocksDB both use managed 
memory in the future, we could just remove the checks and there will be no 
potential backward compatibility issues.

What do you think?

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>   at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>   ... 9 more
> Caused by: java.io.IOException: Failed to acquire shared cache resource for 
> RocksDB
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.allocateSharedCachesIfConfigured(RocksDBOperationUtils.java:212)
>   at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:516)
>   at 
> org.apache.flink.streaming.api.oper

[jira] [Comment Edited] (FLINK-17923) It will throw MemoryAllocationException if rocksdb statebackend and Python UDF are used in the same slot

2020-05-28 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118478#comment-17118478
 ] 

Dian Fu edited comment on FLINK-17923 at 5/28/20, 9:16 AM:
---

[~xintongsong] Appreciated your suggestions. It makes sense to me.
I want to adjust the PR a bit as following:
- Set Python UDF to use managed memory by default.
- If Python UDF and RocksDB is used together, throw exceptions with meaningful 
suggestions.
- If Python UDF is configured to use off-heap memory and the task off-heap 
memory could not meet the requirement, throw exceptions with meaningful 
suggestions.

In this case, when we support to let Python UDF and RocksDB both use managed 
memory in the future, we could just remove the checks and there will be no 
potential backward compatibility issues.

What do you think?


was (Author: dian.fu):
[~xintongsong] Appreciated your suggestions. It makes sense to me.
I want to adjust it a bit as following:
- Set Python UDF to use managed memory by default.
- If Python UDF and RocksDB is used together, throw exceptions with meaningful 
suggestions.
- If Python UDF is configured to use off-heap memory and the task off-heap 
memory could not meet the requirement, throw exceptions with meaningful 
suggestions.

In this case, when we support to let Python UDF and RocksDB both use managed 
memory in the future, we could just remove the checks and there will be no 
potential backward compatibility issues.

What do you think?

> It will throw MemoryAllocationException if rocksdb statebackend and Python 
> UDF are used in the same slot  
> --
>
> Key: FLINK-17923
> URL: https://issues.apache.org/jira/browse/FLINK-17923
> Project: Flink
>  Issue Type: Bug
>  Components: API / Python, Runtime / State Backends
>Affects Versions: 1.10.0, 1.11.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> For the following job:
> {code}
> import logging
> import os
> import shutil
> import sys
> import tempfile
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment, DataTypes
> from pyflink.table.udf import udf
> def word_count():
> content = "line Licensed to the Apache Software Foundation ASF under one 
> " \
>   "line or more contributor license agreements See the NOTICE 
> file " \
>   "line distributed with this work for additional information " \
>   "line regarding copyright ownership The ASF licenses this file 
> " \
>   "to you under the Apache License Version the " \
>   "License you may not use this file except in compliance " \
>   "with the License"
> t_config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, t_config)
> # register Results table in table environment
> tmp_dir = tempfile.gettempdir()
> result_path = tmp_dir + '/result'
> if os.path.exists(result_path):
> try:
> if os.path.isfile(result_path):
> os.remove(result_path)
> else:
> shutil.rmtree(result_path)
> except OSError as e:
> logging.error("Error removing directory: %s - %s.", e.filename, 
> e.strerror)
> logging.info("Results directory: %s", result_path)
> sink_ddl = """
> create table Results(
> word VARCHAR,
> `count` BIGINT
> ) with (
> 'connector' = 'blackhole'
> )
> """
> t_env.sql_update(sink_ddl)
> @udf(input_types=[DataTypes.BIGINT()], result_type=DataTypes.BIGINT())
> def inc(count):
> return count + 1
> t_env.register_function("inc", inc)
> elements = [(word, 1) for word in content.split(" ")]
> t_env.from_elements(elements, ["word", "count"]) \
>  .group_by("word") \
>  .select("word, count(1) as count") \
>  .select("word, inc(count) as count") \
>  .insert_into("Results")
> t_env.execute("word_count")
> if __name__ == '__main__':
> logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
> format="%(message)s")
> word_count()
> {code}
> It will throw the following exception if rocksdb state backend is used:
> {code}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for KeyedProcessOperator_c27dcf7b54ef6bfd6cff02ca8870b681_(1/1) 
> from any of the 1 provided restore options.
>   at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>   at 
> org.a

[jira] [Commented] (FLINK-17788) scala shell in yarn mode is broken

2020-05-28 Thread Kostas Kloudas (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17118479#comment-17118479
 ] 

Kostas Kloudas commented on FLINK-17788:


Thanks a lot [~zjffdu] for the clarification. Let me know if you find anything 
and I will also have a look.

> scala shell in yarn mode is broken
> --
>
> Key: FLINK-17788
> URL: https://issues.apache.org/jira/browse/FLINK-17788
> Project: Flink
>  Issue Type: Bug
>  Components: Scala Shell
>Affects Versions: 1.11.0
>Reporter: Jeff Zhang
>Priority: Major
>  Labels: pull-request-available
>
> When I start scala shell in yarn mode, one yarn app will be launched, and 
> after I write some flink code and trigger a flink job, another yarn app will 
> be launched but would failed to launch due to some conflicts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   >