date:20200504

[jira] [Updated] (FLINK-17506) SavepointEnvironment does not honour 'io.tmp.dirs' property

2020-05-04 Thread David Artiga (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Artiga updated FLINK-17506:
-
Description: {{SavepointEnvironment}} [creates an 
IOManagerAsync|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-libraries/flink-state-processing-api/src/main/java/org/apache/flink/state/api/runtime/SavepointEnvironment.java#L106]
 using its [default 
constructor|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/iomanager/IOManagerAsync.java#L62],
 meaning it [uses env var 
"java.io.tmpdir"|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java#L227]
 instead of values from "io.tmp.dirs" config property,  (was: 
{{SavepointEnvironment}} [creates an 
IOManagerAsync|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-libraries/flink-state-processing-api/src/main/java/org/apache/flink/state/api/runtime/SavepointEnvironment.java#L106]
 using it's [default 
constructor|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/iomanager/IOManagerAsync.java#L62],
 meaning it [uses env var 
"java.io.tmpdir"|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java#L227]
 instead of values from "io.tmp.dirs" config property,)

> SavepointEnvironment does not honour 'io.tmp.dirs' property
> ---
>
> Key: FLINK-17506
> URL: https://issues.apache.org/jira/browse/FLINK-17506
> Project: Flink
>  Issue Type: Bug
>  Components: API / State Processor
>Reporter: David Artiga
>Assignee: Seth Wiesman
>Priority: Major
>
> {{SavepointEnvironment}} [creates an 
> IOManagerAsync|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-libraries/flink-state-processing-api/src/main/java/org/apache/flink/state/api/runtime/SavepointEnvironment.java#L106]
>  using its [default 
> constructor|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/iomanager/IOManagerAsync.java#L62],
>  meaning it [uses env var 
> "java.io.tmpdir"|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java#L227]
>  instead of values from "io.tmp.dirs" config property,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #11985: [FLINK-16989][table] Support ScanTableSource in blink planner

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11985:
URL: https://github.com/apache/flink/pull/11985#issuecomment-623545781


   
   ## CI report:
   
   * 945427a9cf24c43097883cc9fccbac2983ab8bac Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=587)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11985: [FLINK-16989][table] Support ScanTableSource in blink planner

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11985:
URL: https://github.com/apache/flink/pull/11985#issuecomment-623545781


   
   ## CI report:
   
   * de26ba11042772d77be2416fd6c829d80c9c66b7 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=568)
 
   * 945427a9cf24c43097883cc9fccbac2983ab8bac Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=587)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11985: [FLINK-16989][table] Support ScanTableSource in blink planner

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11985:
URL: https://github.com/apache/flink/pull/11985#issuecomment-623545781


   
   ## CI report:
   
   * de26ba11042772d77be2416fd6c829d80c9c66b7 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=568)
 
   * 945427a9cf24c43097883cc9fccbac2983ab8bac UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] RocMarshal commented on pull request #11979: [FLINK-17291][docs] Translate 'docs/training/event_driven.zh.md' to C…

2020-05-04 Thread GitBox



RocMarshal commented on pull request #11979:
URL: https://github.com/apache/flink/pull/11979#issuecomment-623845807


   Hi, @wuchong . I have completed the translation of this page and made 
corresponding improvements according to the suggestions of community members. 
If you have free time, would you please review it for me?
   
   Thank you  very  much.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-17291) Translate training lesson on event-driven applications to chinese

2020-05-04 Thread RocMarshal (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099479#comment-17099479
 ] 

RocMarshal edited comment on FLINK-17291 at 5/5/20, 3:29 AM:
-

Hi,[~alpinegizmo].
 I have completed the translation of this page, and made corresponding 
improvements according to the suggestions of the community reviewers. 


was (Author: rocmarshal):
Hi,[~alpinegizmo].
 I have completed the translation of this page, and made corresponding 
improvements according to the suggestions of the community reviewers. If you 
have free time, please review it for me. 
 Thank you.

> Translate training lesson on event-driven applications to chinese
> -
>
> Key: FLINK-17291
> URL: https://issues.apache.org/jira/browse/FLINK-17291
> Project: Flink
>  Issue Type: Improvement
>  Components: chinese-translation, Documentation / Training
>Reporter: David Anderson
>Assignee: RocMarshal
>Priority: Major
>  Labels: pull-request-available
>
> Translate docs/training/event_driven.zh.md to Chinese.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-17291) Translate training lesson on event-driven applications to chinese

2020-05-04 Thread RocMarshal (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099479#comment-17099479
 ] 

RocMarshal edited comment on FLINK-17291 at 5/5/20, 1:46 AM:
-

Hi,[~alpinegizmo].
 I have completed the translation of this page, and made corresponding 
improvements according to the suggestions of the community reviewers. If you 
have free time, please review it for me. 
 Thank you.


was (Author: rocmarshal):
Hi,[~alpinegizmo].
 I have completed the translation of this page, and made corresponding 
improvements according to the suggestions of the community reviewer. If you 
have free time, please review it for me. 
 Thank you.

> Translate training lesson on event-driven applications to chinese
> -
>
> Key: FLINK-17291
> URL: https://issues.apache.org/jira/browse/FLINK-17291
> Project: Flink
>  Issue Type: Improvement
>  Components: chinese-translation, Documentation / Training
>Reporter: David Anderson
>Assignee: RocMarshal
>Priority: Major
>  Labels: pull-request-available
>
> Translate docs/training/event_driven.zh.md to Chinese.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17291) Translate training lesson on event-driven applications to chinese

2020-05-04 Thread RocMarshal (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099479#comment-17099479
 ] 

RocMarshal commented on FLINK-17291:


Hi,[~alpinegizmo].
 I have completed the translation of this page, and made corresponding 
improvements according to the suggestions of the community reviewer. If you 
have free time, please review it for me. 
 Thank you.

> Translate training lesson on event-driven applications to chinese
> -
>
> Key: FLINK-17291
> URL: https://issues.apache.org/jira/browse/FLINK-17291
> Project: Flink
>  Issue Type: Improvement
>  Components: chinese-translation, Documentation / Training
>Reporter: David Anderson
>Assignee: RocMarshal
>Priority: Major
>  Labels: pull-request-available
>
> Translate docs/training/event_driven.zh.md to Chinese.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #11952: [FLINK-16638][runtime][checkpointing] Flink checkStateMappingCompleteness doesn't include UserDefinedOperatorIDs

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11952:
URL: https://github.com/apache/flink/pull/11952#issuecomment-621589853


   
   ## CI report:
   
   * 5a2f6e8bd534d439b30e10b41a17821a3ea93590 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=586)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] RocMarshal commented on pull request #11979: [FLINK-17291][docs] Translate 'docs/training/event_driven.zh.md' to C…

2020-05-04 Thread GitBox



RocMarshal commented on pull request #11979:
URL: https://github.com/apache/flink/pull/11979#issuecomment-623785660


   Hi, @XBaith ,I have updated the document according to your suggestions. 
Obviously, it is very helpful for the translation of the document. Thank you 
very much for your help.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] RocMarshal commented on pull request #11979: [FLINK-17291][docs] Translate 'docs/training/event_driven.zh.md' to C…

2020-05-04 Thread GitBox



RocMarshal commented on pull request #11979:
URL: https://github.com/apache/flink/pull/11979#issuecomment-623783106


   @flinkbot run travis



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] caozhen1937 commented on pull request #11978: [FLINK-16086][chinese-translation]Translate "Temporal Tables" page of "Streaming Concepts" into Chinese

2020-05-04 Thread GitBox



caozhen1937 commented on pull request #11978:
URL: https://github.com/apache/flink/pull/11978#issuecomment-623782234


   Hi @wuchong , if you have free time , please review it, thank you



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11986: [FLINK-17361] [jdbc] Added custom query on JDBC tables

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11986:
URL: https://github.com/apache/flink/pull/11986#issuecomment-623579672


   
   ## CI report:
   
   * 1597769183be673722456fb55e296d38d0fad837 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=584)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-17506) SavepointEnvironment does not honour 'io.tmp.dirs' property

2020-05-04 Thread Seth Wiesman (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seth Wiesman reassigned FLINK-17506:


Assignee: Seth Wiesman

> SavepointEnvironment does not honour 'io.tmp.dirs' property
> ---
>
> Key: FLINK-17506
> URL: https://issues.apache.org/jira/browse/FLINK-17506
> Project: Flink
>  Issue Type: Bug
>  Components: API / State Processor
>Reporter: David Artiga
>Assignee: Seth Wiesman
>Priority: Major
>
> {{SavepointEnvironment}} [creates an 
> IOManagerAsync|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-libraries/flink-state-processing-api/src/main/java/org/apache/flink/state/api/runtime/SavepointEnvironment.java#L106]
>  using it's [default 
> constructor|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/iomanager/IOManagerAsync.java#L62],
>  meaning it [uses env var 
> "java.io.tmpdir"|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java#L227]
>  instead of values from "io.tmp.dirs" config property,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #11952: [FLINK-16638][runtime][checkpointing] Flink checkStateMappingCompleteness doesn't include UserDefinedOperatorIDs

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11952:
URL: https://github.com/apache/flink/pull/11952#issuecomment-621589853


   
   ## CI report:
   
   * 2113227fdc0076eb9e2f72ae7883c2d5d245cbab Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=523)
 
   * 5a2f6e8bd534d439b30e10b41a17821a3ea93590 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=586)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11952: [FLINK-16638][runtime][checkpointing] Flink checkStateMappingCompleteness doesn't include UserDefinedOperatorIDs

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11952:
URL: https://github.com/apache/flink/pull/11952#issuecomment-621589853


   
   ## CI report:
   
   * 2113227fdc0076eb9e2f72ae7883c2d5d245cbab Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=523)
 
   * 5a2f6e8bd534d439b30e10b41a17821a3ea93590 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-17465) Update Chinese user documentation for job manager memory model

2020-05-04 Thread Andrey Zagrebin (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Zagrebin reassigned FLINK-17465:
---

Assignee: Xintong Song

> Update Chinese user documentation for job manager memory model
> --
>
> Key: FLINK-17465
> URL: https://issues.apache.org/jira/browse/FLINK-17465
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Andrey Zagrebin
>Assignee: Xintong Song
>Priority: Major
> Fix For: 1.11.0
>
>
> This is a follow-up for FLINK-16946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17509) Support OracleDialect

2020-05-04 Thread Flavio Pompermaier (Jira)

Flavio Pompermaier created FLINK-17509:
--

 Summary: Support OracleDialect
 Key: FLINK-17509
 URL: https://issues.apache.org/jira/browse/FLINK-17509
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Reporter: Flavio Pompermaier


Support OracleDialect in JDBCDialects



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17508) Develop OracleCatalog

2020-05-04 Thread Flavio Pompermaier (Jira)

Flavio Pompermaier created FLINK-17508:
--

 Summary: Develop OracleCatalog
 Key: FLINK-17508
 URL: https://issues.apache.org/jira/browse/FLINK-17508
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Reporter: Flavio Pompermaier


Similarly to https://issues.apache.org/jira/browse/FLINK-16471



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17507) Training figure program_dataflow.svg should use preferred parts of the API

2020-05-04 Thread David Anderson (Jira)

David Anderson created FLINK-17507:
--

 Summary: Training figure program_dataflow.svg should use preferred 
parts of the API
 Key: FLINK-17507
 URL: https://issues.apache.org/jira/browse/FLINK-17507
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training
Reporter: David Anderson


It would be better if fig/program_dataflow.svg used a 
{{ProcessWindowFunction}}, rather than a {{WindowFunction}}.

It also uses a {{BucketingSink}}, which sets a bad example. 

Note that this is not a trivial edit, since it doesn't work to simply replace 
{{new BucketingSink}} with {{new StreamingFileSink}}. Something like this would 
be better:

 
{{final StreamingFileSink sink = StreamingFileSink}}
{{        .forBulkFormat(...)}}
{{        .build();}}
{{}}
{{stats.addSink(sink);}}
{{}}
Note: This figure is only used once, in the Training Overview page.
{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-17507) Training figure program_dataflow.svg should use preferred parts of the API

2020-05-04 Thread David Anderson (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Anderson updated FLINK-17507:
---
Description: 
It would be better if fig/program_dataflow.svg used a 
{{ProcessWindowFunction}}, rather than a {{WindowFunction}}.

It also uses a {{BucketingSink}}, which sets a bad example. 

Note that this is not a trivial edit, since it doesn't work to simply replace 
{{new BucketingSink}} with {{new StreamingFileSink}}. Something like this would 
be better:

 
 {{final StreamingFileSink sink = StreamingFileSink}}
 {{        .forBulkFormat(...)}}
 {{        .build();}}

 {{stats.addSink(sink);}}

 Note: This figure is only used once, in the Training Overview page.

  was:
It would be better if fig/program_dataflow.svg used a 
{{ProcessWindowFunction}}, rather than a {{WindowFunction}}.

It also uses a {{BucketingSink}}, which sets a bad example. 

Note that this is not a trivial edit, since it doesn't work to simply replace 
{{new BucketingSink}} with {{new StreamingFileSink}}. Something like this would 
be better:

 
{{final StreamingFileSink sink = StreamingFileSink}}
{{        .forBulkFormat(...)}}
{{        .build();}}
{{}}
{{stats.addSink(sink);}}
{{}}
Note: This figure is only used once, in the Training Overview page.
{{}}


> Training figure program_dataflow.svg should use preferred parts of the API
> --
>
> Key: FLINK-17507
> URL: https://issues.apache.org/jira/browse/FLINK-17507
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation / Training
>Reporter: David Anderson
>Priority: Major
>
> It would be better if fig/program_dataflow.svg used a 
> {{ProcessWindowFunction}}, rather than a {{WindowFunction}}.
> It also uses a {{BucketingSink}}, which sets a bad example. 
> Note that this is not a trivial edit, since it doesn't work to simply replace 
> {{new BucketingSink}} with {{new StreamingFileSink}}. Something like this 
> would be better:
>  
>  {{final StreamingFileSink sink = StreamingFileSink}}
>  {{        .forBulkFormat(...)}}
>  {{        .build();}}
>  {{stats.addSink(sink);}}
>  Note: This figure is only used once, in the Training Overview page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #11986: [FLINK-17361] [jdbc] Added custom query on JDBC tables

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11986:
URL: https://github.com/apache/flink/pull/11986#issuecomment-623579672


   
   ## CI report:
   
   * 4cc9434270958dcbdf322483531c97c108c4a2e8 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=571)
 
   * 1597769183be673722456fb55e296d38d0fad837 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=584)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rkhachatryan commented on pull request #11952: [FLINK-16638][runtime][checkpointing] Flink checkStateMappingCompleteness doesn't include UserDefinedOperatorIDs

2020-05-04 Thread GitBox



rkhachatryan commented on pull request #11952:
URL: https://github.com/apache/flink/pull/11952#issuecomment-623702887


   @edu05 can you please squash the commits and rebase to the latest master 
(Azure build still fails)?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11986: [FLINK-17361] [jdbc] Added custom query on JDBC tables

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11986:
URL: https://github.com/apache/flink/pull/11986#issuecomment-623579672


   
   ## CI report:
   
   * 4cc9434270958dcbdf322483531c97c108c4a2e8 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=571)
 
   * 1597769183be673722456fb55e296d38d0fad837 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] fpompermaier commented on pull request #11900: [FLINK-17284][jdbc][postgres] Support serial fields

2020-05-04 Thread GitBox



fpompermaier commented on pull request #11900:
URL: https://github.com/apache/flink/pull/11900#issuecomment-623697631


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] fpompermaier removed a comment on pull request #11900: [FLINK-17284][jdbc][postgres] Support serial fields

2020-05-04 Thread GitBox



fpompermaier removed a comment on pull request #11900:
URL: https://github.com/apache/flink/pull/11900#issuecomment-623697329


   @flinkbot run travis
   
   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] fpompermaier removed a comment on pull request #11900: [FLINK-17284][jdbc][postgres] Support serial fields

2020-05-04 Thread GitBox



fpompermaier removed a comment on pull request #11900:
URL: https://github.com/apache/flink/pull/11900#issuecomment-623310522


   @flinkbot run travis
   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] fpompermaier commented on pull request #11900: [FLINK-17284][jdbc][postgres] Support serial fields

2020-05-04 Thread GitBox



fpompermaier commented on pull request #11900:
URL: https://github.com/apache/flink/pull/11900#issuecomment-623697329


   @flinkbot run travis
   
   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zentol commented on a change in pull request #11983: [FLINK-11086] Replace flink-shaded-hadoop dependencies; add Hadoop 3 test profile

2020-05-04 Thread GitBox



zentol commented on a change in pull request #11983:
URL: https://github.com/apache/flink/pull/11983#discussion_r419698801



##
File path: flink-dist/pom.xml
##
@@ -137,8 +137,8 @@ under the License.
${project.version}


-   org.apache.flink
-   
flink-shaded-hadoop-2
+   org.apache.hadoop
+   *

Review comment:
   hmm, should've read MNG-3832 more carefully; it _does_ work with all 
maven 3 versions, just prints a warning on everything below 3.2.1 .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rmetzger commented on a change in pull request #11983: [FLINK-11086] Replace flink-shaded-hadoop dependencies; add Hadoop 3 test profile

2020-05-04 Thread GitBox



rmetzger commented on a change in pull request #11983:
URL: https://github.com/apache/flink/pull/11983#discussion_r419675846



##
File path: flink-dist/pom.xml
##
@@ -137,8 +137,8 @@ under the License.
${project.version}


-   org.apache.flink
-   
flink-shaded-hadoop-2
+   org.apache.hadoop
+   *

Review comment:
   Oh, I didn't know that this is a newer feature. However, we have already 
a case in master with a star exclude: 
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/pom.xml#L68
   
   Can we raise the minimum maven version to 
[3.2.1](https://maven.apache.org/docs/3.2.1/release-notes.html) then?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11988: [FLINK-17244][docs] Update the Getting Started page

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11988:
URL: https://github.com/apache/flink/pull/11988#issuecomment-623640141


   
   ## CI report:
   
   * cc4fa4fe94ef8a6ff97a2a9b7bb752302224c5c4 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=573)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-17464) Stanalone HA Cluster crash with non-recoverable cluster state - need to wipe cluster to recover service

2020-05-04 Thread John Lonergan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099232#comment-17099232
 ] 

John Lonergan commented on FLINK-17464:
---

Yep think failing the job is the lesser evil.

A wrecked cluster due to one sick job is pretty catastrophic.

So the recommendation is that a shared cluster should just fail any jobs
that get sick during recovery. Seems that the use of the executeonmaster is
the architectusal weakness?







> Stanalone HA Cluster crash with non-recoverable cluster state - need to wipe 
> cluster to recover service
> ---
>
> Key: FLINK-17464
> URL: https://issues.apache.org/jira/browse/FLINK-17464
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: John Lonergan
>Priority: Critical
>
> When recovering job graphs after a failover of the JobManager, or after a 
> restart of the cluster, the HA Cluster can get into a state where it cannot 
> be restarted and the only resoluton we have identified is to destroy the 
> Zookkeeper job graph store.
> This happens when any job graph that is being recovered throws an exception 
> during recovery on the master. 
> Whilst we encountered this issues on a sink that extends "InitialiseOnMaster" 
> we believe the vulnerability is generic in nature and the unrecolverable 
> problems encountered will occur if the application code throws any exception 
> for any reason during recovery on the main line. 
> These application exceptions propagate up to the JobManager ClusterEntryPoint 
> class at which point the JM leader does a system.exit. If there are remaining 
> JobManagers then they will also follow leader election and also encounter the 
> same sequence of events. Ultimately all JM's exit and then all TM's fail 
> also. 
> The entire cluster is destroyed.
> Because these events happen during job graph recovery then merely attempt a 
> restart of the cluster will fail leaving the only option as destroying the 
> job graph state. 
> If one is running a shared cluster with many jobs then this is effectively a 
> DOS and results in prolonged down time as code or data changes are necessary 
> to work around the issue.
> --
> Of course if the same exception were to be thrown during job submission using 
> the CLI, then we would not see the cluster crashing nor the cluster being 
> corrupted; the job would merely fail.
> Our feeling is that the job graph recovery process ought to behave in a 
> similar fashion to the job submission processes.
> If a job submission fails then the job is recorded as failed and there is no 
> further impact on the cluster. However, if job recovery fails then the entire 
> cluster is taken down, and may as we have seen, become inoperable.
> We feel that a failure to restore a single job graph ought merely to result 
> in the job being recorded as failed. It should not result in a cluster-wide 
> impact.
> We do not understand the logic of the design in this space. However, if the 
> existing logic was for the benefit of single job clusters then this is a poor 
> result for multi job clusters. In which case we ought to be able to configure 
> a cluster for "multi-job mode" so that job graph recovery is "sandboxed"  and 
> doesn't take out the entire cluster.
> ---
> It is easy to demonstrate the problem using the built in Flink streaming Word 
> Count example.
> In order for this to work you configure the job to write a single output file 
> and also write this to HDFS not to a local disk. 
> You will note that the class FileOutputFormat extends InitializeOnMaster and 
> the initializeGlobal() function executes only when the file is on HDFS, not 
> on local disk.
> When this functon runs it will generate an exception if the output already 
> exists.
> Therefore to demonstrate the issues do the following:
> - configure the job to write a single file to HDFS
> - configure the job to to read a large file so that the job takes some time 
> to execute and we have time to complete the next few steps bnefore the job 
> finishes.
> - run the job on a HA cluster with two JM nodes
> - wait for the job to start and the output file to be created
> - kill the leader JM before the job has finished 
> - observe JM failover occuring ... 
> - recovery during failover will NOT suceed because the recovery of the Word 
> Count job will fail due to the presence of the output file
> - observe all JM's and TM's ultimately terminating
> Once the cluster has outright failed then try and restart it.
> During restart the cluster will detect the presence of job graphs in Zk and 
> attempt to restore them. This however, is doomed due to the same 
> vulnerability that causes the global outage above.
> ---
> For

[GitHub] [flink] flinkbot commented on pull request #11988: [FLINK-17244][docs] Update the Getting Started page

2020-05-04 Thread GitBox



flinkbot commented on pull request #11988:
URL: https://github.com/apache/flink/pull/11988#issuecomment-623622490


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit cc4fa4fe94ef8a6ff97a2a9b7bb752302224c5c4 (Mon May 04 
18:14:38 UTC 2020)
   
   **Warnings:**
* Documentation files were touched, but no `.zh.md` files: Update Chinese 
documentation or file Jira ticket.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-17244) Update Getting Started / Overview: training and python

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-17244:
---
Labels: pull-request-available  (was: )

> Update Getting Started / Overview: training and python
> --
>
> Key: FLINK-17244
> URL: https://issues.apache.org/jira/browse/FLINK-17244
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: Major
>  Labels: pull-request-available
>
> The Getting Started page needs a bit of general editing, and should it also 
> mention the Training section and the Python Table API walkthrough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17500) Deploy JobGraph from file in StandaloneClusterEntrypoint

2020-05-04 Thread Kostas Kloudas (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099208#comment-17099208
 ] 

Kostas Kloudas commented on FLINK-17500:


[~uce] and [~trohrmann] with the introduction of the application mode, I 
believe that there is no reason for the {{ClassPathJobGraphRetriever}}. 
Actually currently on the master the {{ClassPathJobGraphRetriever}} is replaced 
by the {{ClassPathPackagedProgramRetriever}}.

The reason for this is because the {{ClassPathJobGraphRetriever}} was executing 
the user's {{main()}} on the cluster, on a special environment, and only to 
extract the {{JobGraph}} which it was then submitting for execution. Now, with 
the application mode (see 
[FLIP-85|https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode]),
 the user's {{main()}} is properly executed on the cluster, so there is no need 
for hijacking the {{execute()}}.

Allowing the users though to execute a job graph that they have extracted by 
other means on the client may make sense. So adding the option to use the 
{{FileJobGraphRetriever}} on a standalone deployment can be a valid addition.

> Deploy JobGraph from file in StandaloneClusterEntrypoint
> 
>
> Key: FLINK-17500
> URL: https://issues.apache.org/jira/browse/FLINK-17500
> Project: Flink
>  Issue Type: Wish
>  Components: Deployment / Docker
>Reporter: Ufuk Celebi
>Priority: Minor
>
> We have a requirement to deploy a pre-generated {{JobGraph}} from a file in 
> {{StandaloneClusterEntrypoint}}.
> Currently, {{StandaloneClusterEntrypoint}} only supports deployment of a 
> Flink job from the class path using {{ClassPathPackagedProgramRetriever}}. 
> Our desired behaviour would be as follows:
> If {{internal.jobgraph-path}} is set, prepare a {{PackagedProgram}} from a 
> local {{JobGraph}} file using {{FileJobGraphRetriever}}. Otherwise, deploy 
> using {{ClassPathPackagedProgramRetriever}} (current behavior).
> ---
> I understand that this requirement is pretty niche, but wanted to get 
> feedback whether the Flink community would be open to supporting this 
> nonetheless.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] zentol commented on a change in pull request #11983: [FLINK-11086] Replace flink-shaded-hadoop dependencies; add Hadoop 3 test profile

2020-05-04 Thread GitBox



zentol commented on a change in pull request #11983:
URL: https://github.com/apache/flink/pull/11983#discussion_r419630074



##
File path: flink-dist/pom.xml
##
@@ -137,8 +137,8 @@ under the License.
${project.version}


-   org.apache.flink
-   
flink-shaded-hadoop-2
+   org.apache.hadoop
+   *

Review comment:
   This breaks compatibility with maven 3.1





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #11987: [hotfix] Show hostname in failure error message

2020-05-04 Thread GitBox



flinkbot commented on pull request #11987:
URL: https://github.com/apache/flink/pull/11987#issuecomment-623621331


   
   ## CI report:
   
   * db7481419288348aebf6a09127410424ff517b01 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] alpinegizmo opened a new pull request #11988: [FLINK-17244][docs] Update the Getting Started page

2020-05-04 Thread GitBox



alpinegizmo opened a new pull request #11988:
URL: https://github.com/apache/flink/pull/11988


   ## What is the purpose of the change
   
   * Generally improve the content of this page
   * Add the Python Table API code walkthrough
   * Add the Hands-on Training
   
   ## Translation
   
   * I created FLINK-17504 for updating the Chinese translation



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #11987: [hotfix] Show hostname in failure error message

2020-05-04 Thread GitBox



flinkbot commented on pull request #11987:
URL: https://github.com/apache/flink/pull/11987#issuecomment-623611390


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit db7481419288348aebf6a09127410424ff517b01 (Mon May 04 
17:53:44 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11986: [FLINK-17361] [jdbc] Added custom query on JDBC tables

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11986:
URL: https://github.com/apache/flink/pull/11986#issuecomment-623579672


   
   ## CI report:
   
   * 4cc9434270958dcbdf322483531c97c108c4a2e8 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=571)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] qqibrow opened a new pull request #11987: [hotfix] Show hostname in failure error message

2020-05-04 Thread GitBox



qqibrow opened a new pull request #11987:
URL: https://github.com/apache/flink/pull/11987


   Adding hostname in error message could help us detect host issue. The new 
error message would be like:
   ```
   2019-12-07 05:04:43,264 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph- KeyedProcess -> 
Sink: Unnamed (1/1) (52bd0b7c29cdcf040448660e7c52a03d) switched from RUNNING to 
FAILED on container_e07_1575407974804_0477_01_02 @ 
monarch-dev-021-20181106-data-slave-dev-0a02472a.ec2.pin220.com 
(dataPort=35719).
   org.apache.flink.util.FlinkException: The assigned slot 
container_e07_1575407974804_0477_01_02_0 was removed.
at 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:893)
at 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:863)
at 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:1058)
at 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:385)
at 
org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:830)
at 
org.apache.flink.yarn.YarnResourceManager.lambda$onContainersCompleted$0(YarnResourceManager.java:363)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at 
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-17313) Validation error when insert decimal/varchar with precision into sink using TypeInformation of row

2020-05-04 Thread Dawid Wysakowicz (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-17313:
-
Fix Version/s: 1.11.0

> Validation error when insert decimal/varchar with precision into sink using 
> TypeInformation of row
> --
>
> Key: FLINK-17313
> URL: https://issues.apache.org/jira/browse/FLINK-17313
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Reporter: Terry Wang
>Assignee: Terry Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Test code like follwing(in blink planner):
> {code:java}
>   tEnv.sqlUpdate("create table randomSource (" +
>   "   a varchar(10)," 
> +
>   "   b 
> decimal(20,2)" +
>   "   ) with (" +
>   "   'type' = 
> 'random'," +
>   "   'count' = '10'" 
> +
>   "   )");
>   tEnv.sqlUpdate("create table printSink (" +
>   "   a varchar(10)," 
> +
>   "   b 
> decimal(22,2)," +
>   "   ) with (" +
>   "   'type' = 'print'" +
>   "   )");
>   tEnv.sqlUpdate("insert into printSink select * from 
> randomSource");
>   tEnv.execute("");
> {code}
> Print TableSink implements UpsertStreamTableSink and it's getReocrdType is as 
> following:
> {code:java}
> public TypeInformation getRecordType() {
>   return getTableSchema().toRowType();
>   }
> {code}
> Varchar column validation exception is:
> org.apache.flink.table.api.ValidationException: Type VARCHAR(10) of table 
> field 'a' does not match with the physical type STRING of the 'a' field of 
> the TableSink consumed type.
>   at 
> org.apache.flink.table.utils.TypeMappingUtils.lambda$checkPhysicalLogicalTypeCompatible$4(TypeMappingUtils.java:165)
>   at 
> org.apache.flink.table.utils.TypeMappingUtils$1.defaultMethod(TypeMappingUtils.java:278)
>   at 
> org.apache.flink.table.utils.TypeMappingUtils$1.defaultMethod(TypeMappingUtils.java:255)
>   at 
> org.apache.flink.table.types.logical.utils.LogicalTypeDefaultVisitor.visit(LogicalTypeDefaultVisitor.java:67)
>   at 
> org.apache.flink.table.types.logical.VarCharType.accept(VarCharType.java:157)
>   at 
> org.apache.flink.table.utils.TypeMappingUtils.checkIfCompatible(TypeMappingUtils.java:255)
>   at 
> org.apache.flink.table.utils.TypeMappingUtils.checkPhysicalLogicalTypeCompatible(TypeMappingUtils.java:161)
>   at 
> org.apache.flink.table.planner.sinks.TableSinkUtils$$anonfun$validateLogicalPhysicalTypesCompatible$1.apply$mcVI$sp(TableSinkUtils.scala:315)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.flink.table.planner.sinks.TableSinkUtils$.validateLogicalPhysicalTypesCompatible(TableSinkUtils.scala:308)
>   at 
> org.apache.flink.table.planner.delegation.PlannerBase$$anonfun$2.apply(PlannerBase.scala:195)
>   at 
> org.apache.flink.table.planner.delegation.PlannerBase$$anonfun$2.apply(PlannerBase.scala:191)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.flink.table.planner.delegation.PlannerBase.translateToRel(PlannerBase.scala:191)
>   at 
> org.apache.flink.table.planner.delegation.PlannerBase$$anonfun$1.apply(PlannerBase.scala:150)
>   at 
> org.apache.flink.table.planner.delegation.PlannerBase$$anonfun$1.apply(PlannerBase.scala:150)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.flink.table.planner.delegation.PlannerBase.translate(PlannerBase.scala:150)
>   at 
>

[GitHub] [flink] flinkbot edited a comment on pull request #11986: [FLINK-17361] [jdbc] Added custom query on JDBC tables

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11986:
URL: https://github.com/apache/flink/pull/11986#issuecomment-623579672


   
   ## CI report:
   
   * 4cc9434270958dcbdf322483531c97c108c4a2e8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=571)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-17506) SavepointEnvironment does not honour 'io.tmp.dirs' property

2020-05-04 Thread David Artiga (Jira)

David Artiga created FLINK-17506:


 Summary: SavepointEnvironment does not honour 'io.tmp.dirs' 
property
 Key: FLINK-17506
 URL: https://issues.apache.org/jira/browse/FLINK-17506
 Project: Flink
  Issue Type: Bug
  Components: API / State Processor
Reporter: David Artiga


{{SavepointEnvironment}} [creates an 
IOManagerAsync|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-libraries/flink-state-processing-api/src/main/java/org/apache/flink/state/api/runtime/SavepointEnvironment.java#L106]
 using it's [default 
constructor|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/iomanager/IOManagerAsync.java#L62],
 meaning it [uses env var 
"java.io.tmpdir"|https://github.com/apache/flink/blob/d6439c8d0e7792961635e3e4297c3dbfb01938e3/flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java#L227]
 instead of values from "io.tmp.dirs" config property,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot commented on pull request #11986: [FLINK-17361] [jdbc] Added custom query on JDBC tables

2020-05-04 Thread GitBox



flinkbot commented on pull request #11986:
URL: https://github.com/apache/flink/pull/11986#issuecomment-623579672


   
   ## CI report:
   
   * 4cc9434270958dcbdf322483531c97c108c4a2e8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #11986: [FLINK-17361] [jdbc] Added custom query on JDBC tables

2020-05-04 Thread GitBox



flinkbot commented on pull request #11986:
URL: https://github.com/apache/flink/pull/11986#issuecomment-623574742


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 4cc9434270958dcbdf322483531c97c108c4a2e8 (Mon May 04 
16:41:58 UTC 2020)
   
   **Warnings:**
* Documentation files were touched, but no `.zh.md` files: Update Chinese 
documentation or file Jira ticket.
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-17361).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-17361) Support creating of a JDBC table using a custom query

2020-05-04 Thread Flavio Pompermaier (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099106#comment-17099106
 ] 

Flavio Pompermaier commented on FLINK-17361:


As I commented in the PR I named the property 'connector.read.query' because I 
saw that there's still no consensus about this property renaming. I think it 
will be safer to leave all the renaming part in a specific PR

> Support creating of a JDBC table using a custom query
> -
>
> Key: FLINK-17361
> URL: https://issues.apache.org/jira/browse/FLINK-17361
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Reporter: Flavio Pompermaier
>Priority: Major
>  Labels: pull-request-available
>
> In a long discussion on the mailing list it has emerged how it is not 
> possible to create a JDBC table that extract data using a custom query.
> A temporary workaround could be to assign as 'connector.table' the target 
> query.
> However this is undesirable. 
> Moreover, in relation to https://issues.apache.org/jira/browse/FLINK-17360, a 
> query could be actually a statement that requires parameters to be filled by 
> the custom parameter values provider



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] fpompermaier opened a new pull request #11986: [FLINK-17361] [jdbc] Added custom query on JDBC tables

2020-05-04 Thread GitBox



fpompermaier opened a new pull request #11986:
URL: https://github.com/apache/flink/pull/11986


   
   
   ## What is the purpose of the change
   
   Enable users to create a JDBC source table using a custom query / prepared 
statement.
   
   
   ## Brief change log
   
   - Managed 'connector.read.query' to create a (read-only) source table using 
a custom query.
   - Currently a read-only Table cannot be created. When a final decision will 
be taken on that we should set such a flag for tables having such a property
   - When a final decision will be made for renaming of connector properties 
this property could probably become simply 'scan.query', but I decided to left 
the renaming of those properties to another specific PR
   
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as 
*JDBCTableSourceSinkFactoryTest* and *JDBCTableSourceITCase*.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): *no*
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: *no*
 - The serializers: *no*
 - The runtime per-record code paths (performance sensitive): *no*
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper:*no*
 - The S3 file system connector: *no*
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? docs



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-17361) Support creating of a JDBC table using a custom query

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-17361:
---
Labels: pull-request-available  (was: )

> Support creating of a JDBC table using a custom query
> -
>
> Key: FLINK-17361
> URL: https://issues.apache.org/jira/browse/FLINK-17361
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / API
>Reporter: Flavio Pompermaier
>Priority: Major
>  Labels: pull-request-available
>
> In a long discussion on the mailing list it has emerged how it is not 
> possible to create a JDBC table that extract data using a custom query.
> A temporary workaround could be to assign as 'connector.table' the target 
> query.
> However this is undesirable. 
> Moreover, in relation to https://issues.apache.org/jira/browse/FLINK-17360, a 
> query could be actually a statement that requires parameters to be filled by 
> the custom parameter values provider



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #11985: [FLINK-16989][table] Support ScanTableSource in blink planner

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11985:
URL: https://github.com/apache/flink/pull/11985#issuecomment-623545781


   
   ## CI report:
   
   * de26ba11042772d77be2416fd6c829d80c9c66b7 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=568)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11854: [FLINK-17407] Introduce external resource framework

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11854:
URL: https://github.com/apache/flink/pull/11854#issuecomment-617586491


   
   ## CI report:
   
   * bddb0e274da11bbe99d15c6e0bb55e8d8c0e658a UNKNOWN
   * dc7a9c5c7d1fac82518815b9277809dfb82ddaac UNKNOWN
   * 2238559b0e2245e77204e7c7d0ef34c7a97e3766 UNKNOWN
   * b4311ee10a3e6df9a129c2b971231e2312b63c37 Travis: 
[SUCCESS](https://travis-ci.com/github/flink-ci/flink/builds/163725498) Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=563)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] walterddr commented on pull request #11936: [FLINK-17386][Security][hotfix] fix LinkageError not captured

2020-05-04 Thread GitBox



walterddr commented on pull request #11936:
URL: https://github.com/apache/flink/pull/11936#issuecomment-623568748


   @aljoscha could you take another look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] pnowojski commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



pnowojski commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419564857



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannelTest.java
##
@@ -731,33 +726,29 @@ public void testFailureInNotifyBufferAvailable() throws 
Exception {
buffer = checkNotNull(bufferPool.requestBuffer());
 
// trigger subscription to buffer pool
-   failingRemoteIC.onSenderBacklog(1);
-   successfulRemoteIC.onSenderBacklog(numExclusiveBuffers 
+ 1);
-   // recycling will call 
RemoteInputChannel#notifyBufferAvailable() which will fail and
-   // this exception will be swallowed and set as an error 
in failingRemoteIC
+   channelWithoutPartition.onSenderBacklog(1);
+   
channelWithPartition.onSenderBacklog(numExclusiveBuffers + 1);
+
+   // recycling will call 
RemoteInputChannel#notifyBufferAvailable() which will not increase
+   // the unannounced credit if the channel has not 
requested partition
buffer.recycleBuffer();
-   buffer = null;
-   try {
-   failingRemoteIC.checkError();
-   fail("The input channel should have an error 
based on the failure in RemoteInputChannel#notifyBufferAvailable()");
-   } catch (IOException e) {
-   assertThat(e, hasProperty("cause", 
isA(IllegalStateException.class)));
-   }

Review comment:
   This seems like we are loosing a test coverage here? However this test 
was quite fragile in the first place and I'm not entirely sure what is it 
suppose to test.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11985: [FLINK-16989][table] Support ScanTableSource in blink planner

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11985:
URL: https://github.com/apache/flink/pull/11985#issuecomment-623545781


   
   ## CI report:
   
   * de26ba11042772d77be2416fd6c829d80c9c66b7 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=568)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11554: [FLINK-15101][connector/common] Add SourceCoordinator implementation.

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11554:
URL: https://github.com/apache/flink/pull/11554#issuecomment-605459909


   
   ## CI report:
   
   * 14e9fe3bfdeeae480047848801243f9fbed03cb4 UNKNOWN
   * a11dbc2d6b25ff16ef3cff4ecc751538d6867e68 UNKNOWN
   * 5b241a0386f6ae029759e076afef6b155b10b328 UNKNOWN
   * 475dd8aa6478eaf51b758d3d201f0d4a38c27dd8 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=565)
 
   * 2ffa6112f5f8fd094f68aeb4013ebf5d026156d8 UNKNOWN
   * a460858a2f8ff4da137ac5d7533cf4cdab7c84f1 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] edu05 commented on pull request #11952: [FLINK-16638][runtime][checkpointing] Flink checkStateMappingCompleteness doesn't include UserDefinedOperatorIDs

2020-05-04 Thread GitBox



edu05 commented on pull request #11952:
URL: https://github.com/apache/flink/pull/11952#issuecomment-623554001


   @rkhachatryan Are there any outstanding issues?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11979: [FLINK-17291][docs] Translate 'docs/training/event_driven.zh.md' to C…

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11979:
URL: https://github.com/apache/flink/pull/11979#issuecomment-623137252


   
   ## CI report:
   
   * 51f00c041b126b54616f191f669750e819c733db Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=543)
 
   * 00a4324f613f50c3183df16d4cc5d4bada7316c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=546)
 
   * e34fdb3a37d8c673e9349936f7a390cda5b87ce3 UNKNOWN
   * c36ba5313969ea13c44cced005de17ec88c82696 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=547)
 
   * 0e6925923af7e86c8e4356fee25460b610040e69 UNKNOWN
   * 2ba60790d4e8f52ff6cd2845c81a7ca424a3 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=564)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #11985: [FLINK-16989][table] Support ScanTableSource in blink planner

2020-05-04 Thread GitBox



flinkbot commented on pull request #11985:
URL: https://github.com/apache/flink/pull/11985#issuecomment-623545781


   
   ## CI report:
   
   * de26ba11042772d77be2416fd6c829d80c9c66b7 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11554: [FLINK-15101][connector/common] Add SourceCoordinator implementation.

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11554:
URL: https://github.com/apache/flink/pull/11554#issuecomment-605459909


   
   ## CI report:
   
   * 14e9fe3bfdeeae480047848801243f9fbed03cb4 UNKNOWN
   * a11dbc2d6b25ff16ef3cff4ecc751538d6867e68 UNKNOWN
   * 5b241a0386f6ae029759e076afef6b155b10b328 UNKNOWN
   * 475dd8aa6478eaf51b758d3d201f0d4a38c27dd8 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=565)
 
   * 2ffa6112f5f8fd094f68aeb4013ebf5d026156d8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhijiangW commented on pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on pull request #11687:
URL: https://github.com/apache/flink/pull/11687#issuecomment-623538520


   Thanks for the further reviews @pnowojski ! 
   
   > I'm still not sure if this is the right approach. It's better compared to 
the original proposal, at a cost of much more added lines of code, while the 
fundamental issues still remain the same: input channels are even more 
complicated compared to master branch. Now input channels are interconnected 
with BufferManager (both Local/RemoteInputChannel and BufferManager are 
calling/accessing one another many times during a single method invocation) and 
they are intertwined with RecoveredInputChannel.
   
   Actually I am also a bit torn while implementation and I am also not quite 
sure which option is the best approach, unless we can fully implement every 
option to compare through. When I tried to implement the option of working on 
`CheckpointedInputGate`, I also encountered with some troubles/confusing in 
other different aspects to make me quit finally.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #11985: [FLINK-16989][table] Support ScanTableSource in blink planner

2020-05-04 Thread GitBox



flinkbot commented on pull request #11985:
URL: https://github.com/apache/flink/pull/11985#issuecomment-623535573


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit de26ba11042772d77be2416fd6c829d80c9c66b7 (Mon May 04 
15:32:57 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-17327) Kafka unavailability could cause Flink TM shutdown

2020-05-04 Thread Aljoscha Krettek (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099045#comment-17099045
 ] 

Aljoscha Krettek commented on FLINK-17327:
--

I believe that {{TransactionalRequestResult.await()}} is the culprit for the 
indefinite blocking, the latch is not counted down in the failure case: 
https://github.com/apache/kafka/blob/2.2/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionalRequestResult.java#L38.

I also believe that this bug in Kafka was fixed here as an unrelated change: 
https://github.com/apache/kafka/commit/df13fc93d0aebfe0ecc40dd4af3c5fb19b35f710#diff-8a2c4f47dcec247ce2ecebf082b3d0b1R42.

> Kafka unavailability could cause Flink TM shutdown
> --
>
> Key: FLINK-17327
> URL: https://issues.apache.org/jira/browse/FLINK-17327
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.0
>Reporter: Jun Qin
>Priority: Major
> Attachments: Standalonesession.log, TM.log, TM_produer_only_task.log
>
>
> Steps to reproduce:
>  # Start a Flink 1.10 standalone cluster
>  # Run a Flink job which reads from one Kafka topic and writes to another 
> topic, with exactly-once checkpointing enabled
>  # Stop all Kafka Brokers after a few successful checkpoints
> When Kafka brokers are down:
>  # {{org.apache.kafka.clients.NetworkClient}} reported connection to broker 
> could not be established
>  # Then, Flink could not complete snapshot due to {{Timeout expired while 
> initializing transactional state in 6ms}}
>  # After several snapshot failures, Flink reported {{Too many ongoing 
> snapshots. Increase kafka producers pool size or decrease number of 
> concurrent checkpoints.}}
>  # Eventually, Flink tried to cancel the task which did not succeed within 3 
> min. According to logs, consumer was cancelled, but producer is still running
>  # Then {{Fatal error occurred while executing the TaskManager. Shutting it 
> down...}}
> I will attach the logs to show the details.  Worth to note that if there 
> would be no consumer but producer only in the task, the behavior is different:
>  # {{org.apache.kafka.clients.NetworkClient}} reported connection to broker 
> could not be established
>  # after {{delivery.timeout.ms}} (2min by default), producer reports: 
> {{FlinkKafkaException: Failed to send data to Kafka: Expiring 4 record(s) for 
> output-topic-0:120001 ms has passed since batch creation}}
>  # Flink tried to cancel the upstream tasks and created a new producer
>  # The new producer obviously reported connectivity issue to brokers
>  # This continues till Kafka brokers are back. 
>  # Flink reported {{Too many ongoing snapshots. Increase kafka producers pool 
> size or decrease number of concurrent checkpoints.}}
>  # Flink cancelled the tasks and restarted them
>  # The job continues, and new checkpoint succeeded. 
>  # TM runs all the time in this scenario
> I set Kafka transaction time out to 1 hour just to avoid transaction timeout 
> during the test.
> To get a producer only task, I called {{env.disableOperatorChaining();}} in 
> the second scenario. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #11979: [FLINK-17291][docs] Translate 'docs/training/event_driven.zh.md' to C…

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11979:
URL: https://github.com/apache/flink/pull/11979#issuecomment-623137252


   
   ## CI report:
   
   * 51f00c041b126b54616f191f669750e819c733db Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=543)
 
   * 00a4324f613f50c3183df16d4cc5d4bada7316c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=546)
 
   * e34fdb3a37d8c673e9349936f7a390cda5b87ce3 UNKNOWN
   * c36ba5313969ea13c44cced005de17ec88c82696 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=547)
 
   * 0e6925923af7e86c8e4356fee25460b610040e69 UNKNOWN
   * 2ba60790d4e8f52ff6cd2845c81a7ca424a3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419522975



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/reader/AbstractRecordReader.java
##
@@ -63,6 +63,8 @@ protected AbstractRecordReader(InputGate inputGate, String[] 
tmpDirectories) {
}
 
protected boolean getNextRecord(T target) throws IOException, 
InterruptedException {
+   inputGate.requestPartitions();

Review comment:
   agree, it is not a perfect way, but the feasible simple way ATM to not 
maintain many different code paths, also not sensitive for batch code path.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-16989) Support ScanTableSource in planner

2020-05-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-16989:
---
Labels: pull-request-available  (was: )

> Support ScanTableSource in planner
> --
>
> Key: FLINK-16989
> URL: https://issues.apache.org/jira/browse/FLINK-16989
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Planner
>Reporter: Timo Walther
>Assignee: Jark Wu
>Priority: Major
>  Labels: pull-request-available
>
> Support the {{ScanTableSource}} interface in planner.
> Utility methods for creating type information and the data structure 
> converters might not be implemented yet.
> Not all changelog modes might be supported initially. This depends on 
> FLINK-16887.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419521815



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/LocalInputChannel.java
##
@@ -97,6 +100,18 @@ public LocalInputChannel(
// Consume
// 

 
+   @Override
+   public void readRecoveredState(ChannelStateReader reader) throws 
IOException, InterruptedException {
+   synchronized (bufferQueue) {
+   // In most of cases we only need one buffer for reading 
recovered state except in very large record case.
+   // Then only one floating buffer is required to avoid 
receive more floating buffers after recovery. Even
+   // though we need more buffers for recovery in large 
record case, it only increases some interactions with pool.
+   numRequiredBuffers = 1;
+   }
+
+   super.readRecoveredState(reader);

Review comment:
   what is the issue here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] wuchong opened a new pull request #11985: [FLINK-16989][table] Support ScanTableSource in blink planner

2020-05-04 Thread GitBox



wuchong opened a new pull request #11985:
URL: https://github.com/apache/flink/pull/11985


   
   
   ## What is the purpose of the change
   
   Support the `ScanTableSource` interface in planner. We don't support change 
mode `[I, UA, D]` (`UPDATE_AFTER` without `UPDATE_BEFORE`) in first version. 
May support it in the future. 
   
   ## Brief change log
   
   - The first 3 commits are from #11959, so this pull request depends on 
#11959.
   - We rename the existing `TableSourceTable` into `LegacyTableSourceTable` 
and `TableSourceScan` into `LegacyTableSourceScan`, and use `TableSourceTable` 
and `TableSourceScan` for the new `DynamicTableSource`.
 - We don't want to use `LogicalDynamicTableSourceScan` because the 
`Dynamic` is verbose and misleading. 
 - Besides, `LegacyTableSourceScan` will be dropped in the near future, we 
can easily remove all `Legacy` classes at that time. 
   - Implements logical node and physical nodes for `ScanTableSource` and 
integrate with changelog program.
   
   ## Verifying this change
   
   - Introudce a `TestValuesTableFactory` for testing
   - Use the `values` source in `TableScanTest` and add validation tests in it. 
   - Add `DynamicTableSourceITCase` for integration tests 
   - Add `testChangelog` and `testAggregateOnChangelogSource` for integration 
tests for changelog source.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: yes
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): yes
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? yes
 - If yes, how is the feature documented? not applicable 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419521605



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
##
@@ -371,7 +373,14 @@ boolean isWaitingForFloatingBuffers() {
 
@VisibleForTesting
public Buffer getNextReceivedBuffer() {
-   return receivedBuffers.poll();
+   synchronized (receivedBuffers) {
+   return receivedBuffers.poll();
+   }

Review comment:
   it is out of date and i can not find this code path now?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419520547



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
##
@@ -105,12 +98,15 @@ public RemoteInputChannel(
int maxBackoff,
InputChannelMetrics metrics) {
 
-   super(inputGate, channelIndex, partitionId, initialBackOff, 
maxBackoff,
-   metrics.getNumBytesInRemoteCounter(), 
metrics.getNumBuffersInRemoteCounter());
+   super(inputGate, channelIndex, partitionId, initialBackOff, 
maxBackoff, metrics);
 
this.connectionId = checkNotNull(connectionId);
this.connectionManager = checkNotNull(connectionManager);
-   this.bufferManager = new BufferManager(this, 0);
+   // In theory it should get the total number of states to 
indicate the numRequiredBuffers.
+   // Since we can not get this information in advance, and 
considering only one input channel
+   // will read state at the same time by design, then we give a 
maximum value here to reduce
+   // unnecessary interactions with buffer pool during recovery.
+   this.bufferManager = new BufferManager(this, Integer.MAX_VALUE);

Review comment:
   As I explained for `LocalInputChannel` case, this `numRequiredBuffers` 
setting is only for a bit optimization, actually we can unify them as 0 and 
adjust it while really requesting floating buffers in process. 
   
   ATM we only have one input channel under unspill, so it makes sense to grab 
all the available floating buffers for this channel now. After this channel 
finishes unspilling,  then it would release all the floating buffers back to 
`LocalBufferPool` to be reused by other unspill channel.
   
   There was a bit tricky to design the factor of `numRequiredBuffers` before. 
If one exclusive buffer is recycled or a floating buffer is recycled to notify 
available for the listener, it would double check whether the current listener 
still needs more floating buffers ATM based on `numRequiredBuffers`. If not 
needed, then the floating buffer would be return back to local pool to assign 
other listeners. 
   
   For input channel unspill case, we can assume that the current channel is 
always needing more floating buffers until finish, to avoid the floating buffer 
back to local pool and request from pool again when need it next time.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-16947) ArtifactResolutionException: Could not transfer artifact. Entry [...] has not been leased from this pool

2020-05-04 Thread Robert Metzger (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099035#comment-17099035
 ] 

Robert Metzger commented on FLINK-16947:


MSFT hosted: 
https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7944=logs=1f3ed471-1849-5d3c-a34c-19792af4ad16=2f5b54d0-1d28-5b01-d344-aa50ffe0cdf8

> ArtifactResolutionException: Could not transfer artifact.  Entry [...] has 
> not been leased from this pool
> -
>
> Key: FLINK-16947
> URL: https://issues.apache.org/jira/browse/FLINK-16947
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Piotr Nowojski
>Assignee: Robert Metzger
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.11.0
>
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6982=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
> Build of flink-metrics-availability-test failed with:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (end-to-end-tests) 
> on project flink-metrics-availability-test: Unable to generate classpath: 
> org.apache.maven.artifact.resolver.ArtifactResolutionException: Could not 
> transfer artifact org.apache.maven.surefire:surefire-grouper:jar:2.22.1 
> from/to google-maven-central 
> (https://maven-central-eu.storage-download.googleapis.com/maven2/): Entry 
> [id:13][route:{s}->https://maven-central-eu.storage-download.googleapis.com:443][state:null]
>  has not been leased from this pool
> [ERROR] org.apache.maven.surefire:surefire-grouper:jar:2.22.1
> [ERROR] 
> [ERROR] from the specified remote repositories:
> [ERROR] google-maven-central 
> (https://maven-central-eu.storage-download.googleapis.com/maven2/, 
> releases=true, snapshots=false),
> [ERROR] apache.snapshots (https://repository.apache.org/snapshots, 
> releases=false, snapshots=true)
> [ERROR] Path to dependency:
> [ERROR] 1) dummy:dummy:jar:1.0
> [ERROR] 2) org.apache.maven.surefire:surefire-junit47:jar:2.22.1
> [ERROR] 3) org.apache.maven.surefire:common-junit48:jar:2.22.1
> [ERROR] 4) org.apache.maven.surefire:surefire-grouper:jar:2.22.1
> [ERROR] -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :flink-metrics-availability-test
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] rmetzger commented on pull request #11983: [FLINK-11086] Replace flink-shaded-hadoop dependencies; add Hadoop 3 test profile

2020-05-04 Thread GitBox



rmetzger commented on pull request #11983:
URL: https://github.com/apache/flink/pull/11983#issuecomment-623529249


   The YARN failure on Azure is a known YARN issue: 
https://issues.apache.org/jira/browse/FLINK-15534



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11854: [FLINK-17407] Introduce external resource framework

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11854:
URL: https://github.com/apache/flink/pull/11854#issuecomment-617586491


   
   ## CI report:
   
   * bddb0e274da11bbe99d15c6e0bb55e8d8c0e658a UNKNOWN
   * dc7a9c5c7d1fac82518815b9277809dfb82ddaac UNKNOWN
   * 2238559b0e2245e77204e7c7d0ef34c7a97e3766 UNKNOWN
   * debf54c1009e424381dcfa589dc18be512d4d32c Travis: 
[FAILURE](https://travis-ci.com/github/flink-ci/flink/builds/163665169) Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=557)
 
   * b4311ee10a3e6df9a129c2b971231e2312b63c37 Travis: 
[PENDING](https://travis-ci.com/github/flink-ci/flink/builds/163725498) Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=563)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11554: [FLINK-15101][connector/common] Add SourceCoordinator implementation.

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11554:
URL: https://github.com/apache/flink/pull/11554#issuecomment-605459909


   
   ## CI report:
   
   * 14e9fe3bfdeeae480047848801243f9fbed03cb4 UNKNOWN
   * a11dbc2d6b25ff16ef3cff4ecc751538d6867e68 UNKNOWN
   * 360f3e0be00e518c2f268f26aab06d6f27764acf Travis: 
[CANCELED](https://travis-ci.com/github/flink-ci/flink/builds/163393154) Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=539)
 
   * 5b241a0386f6ae029759e076afef6b155b10b328 UNKNOWN
   * 475dd8aa6478eaf51b758d3d201f0d4a38c27dd8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419513791



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/LocalInputChannel.java
##
@@ -168,6 +175,12 @@ public void run() {
Optional getNextBuffer() throws IOException, 
InterruptedException {
checkError();
 
+   BufferAndAvailability bufferAndAvailability = 
getNextRecoveredStateBuffer();
+   if (bufferAndAvailability != null) {

Review comment:
   My previous assumption was that the local channel will not be chosen by 
`SingleInputGate` to read if there were no buffers to insert into 
`RecoveredInputChannel#receivedBuffers` to notify 
`SingleInputGate.notifyChannelNonEmpty` before.
   Or I missed some other corner case?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-16947) ArtifactResolutionException: Could not transfer artifact. Entry [...] has not been leased from this pool

2020-05-04 Thread Robert Metzger (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099029#comment-17099029
 ] 

Robert Metzger commented on FLINK-16947:


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=558=logs=1454c523-5777-5d91-a870-f026a11d0383=160c3171-f94a-5870-7346-5c8980c235f3

> ArtifactResolutionException: Could not transfer artifact.  Entry [...] has 
> not been leased from this pool
> -
>
> Key: FLINK-16947
> URL: https://issues.apache.org/jira/browse/FLINK-16947
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Piotr Nowojski
>Assignee: Robert Metzger
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.11.0
>
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6982=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
> Build of flink-metrics-availability-test failed with:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (end-to-end-tests) 
> on project flink-metrics-availability-test: Unable to generate classpath: 
> org.apache.maven.artifact.resolver.ArtifactResolutionException: Could not 
> transfer artifact org.apache.maven.surefire:surefire-grouper:jar:2.22.1 
> from/to google-maven-central 
> (https://maven-central-eu.storage-download.googleapis.com/maven2/): Entry 
> [id:13][route:{s}->https://maven-central-eu.storage-download.googleapis.com:443][state:null]
>  has not been leased from this pool
> [ERROR] org.apache.maven.surefire:surefire-grouper:jar:2.22.1
> [ERROR] 
> [ERROR] from the specified remote repositories:
> [ERROR] google-maven-central 
> (https://maven-central-eu.storage-download.googleapis.com/maven2/, 
> releases=true, snapshots=false),
> [ERROR] apache.snapshots (https://repository.apache.org/snapshots, 
> releases=false, snapshots=true)
> [ERROR] Path to dependency:
> [ERROR] 1) dummy:dummy:jar:1.0
> [ERROR] 2) org.apache.maven.surefire:surefire-junit47:jar:2.22.1
> [ERROR] 3) org.apache.maven.surefire:common-junit48:jar:2.22.1
> [ERROR] 4) org.apache.maven.surefire:surefire-grouper:jar:2.22.1
> [ERROR] -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :flink-metrics-availability-test
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] rmetzger commented on pull request #11554: [FLINK-15101][connector/common] Add SourceCoordinator implementation.

2020-05-04 Thread GitBox



rmetzger commented on pull request #11554:
URL: https://github.com/apache/flink/pull/11554#issuecomment-623522364


   The bot needs quite some time to update. I notified Chesnay already to look 
into it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419508820



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/LocalInputChannel.java
##
@@ -168,6 +175,12 @@ public void run() {
Optional getNextBuffer() throws IOException, 
InterruptedException {
checkError();
 
+   BufferAndAvailability bufferAndAvailability = 
getNextRecoveredStateBuffer();

Review comment:
   I am neutral for this option, because the similar way really existed in 
many other places before. E.g. we have `BufferStorage` in 
`CheckpointedInputGate` for caching the blocked channels' buffers, then while 
`getNextBuffer` we also need to check whether there are  any pending buffers to 
be read from `BufferStorage` firstly.
   
   I absolutely agree that it would be better to not have different paths, but 
I also think it is not so bad if no other easy options. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11979: [FLINK-17291][docs] Translate 'docs/training/event_driven.zh.md' to C…

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11979:
URL: https://github.com/apache/flink/pull/11979#issuecomment-623137252


   
   ## CI report:
   
   * 51f00c041b126b54616f191f669750e819c733db Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=543)
 
   * 00a4324f613f50c3183df16d4cc5d4bada7316c8 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=546)
 
   * e34fdb3a37d8c673e9349936f7a390cda5b87ce3 UNKNOWN
   * c36ba5313969ea13c44cced005de17ec88c82696 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=547)
 
   * 0e6925923af7e86c8e4356fee25460b610040e69 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419502301



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/LocalInputChannel.java
##
@@ -87,10 +90,14 @@ public LocalInputChannel(
int maxBackoff,
InputChannelMetrics metrics) {
 
-   super(inputGate, channelIndex, partitionId, initialBackoff, 
maxBackoff, metrics.getNumBytesInLocalCounter(), 
metrics.getNumBuffersInLocalCounter());
+   super(inputGate, channelIndex, partitionId, initialBackoff, 
maxBackoff, metrics);
 
this.partitionManager = checkNotNull(partitionManager);
this.taskEventPublisher = checkNotNull(taskEventPublisher);
+   // In most cases we only need one buffer for reading recovered 
state except for very large record.
+   // Then only one floating buffer is required. Even though we 
need more buffers for recovery for
+   // large record, it only increases some interactions with pool.
+   this.bufferManager = new BufferManager(this, 1);

Review comment:
   Yes, I agree it is a bit hard to understand. 
   
   The `numRequiredBuffers` factor is only the complex point of this buffer 
manager model for getting a bit optimization. Actually we can give any initial 
value for `numRequiredBuffers` (e.g. 0) for unifying the local and remote 
channels.  And ideally we should adjust this value based on how many total 
channel states are under unspilling exactly like the concept of `backlog` in 
credit-based mode.
   
   Actually any value for `numRequiredBuffers` can work correctly now and the 
only cost is increasing some unnecessary interactions between `BufferManager` 
and `LocalBufferPool`.
   I am really a bit torn here when implementation whether to retain this 
optimization.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] becketqin commented on pull request #11554: [FLINK-15101][connector/common] Add SourceCoordinator implementation.

2020-05-04 Thread GitBox



becketqin commented on pull request #11554:
URL: https://github.com/apache/flink/pull/11554#issuecomment-623516373


   @rmetzger  Thanks for the help. I just tried to do a rebase as well as an 
empty commit. But it seems the CI test was still not triggered.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11854: [FLINK-17407] Introduce external resource framework

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11854:
URL: https://github.com/apache/flink/pull/11854#issuecomment-617586491


   
   ## CI report:
   
   * bddb0e274da11bbe99d15c6e0bb55e8d8c0e658a UNKNOWN
   * dc7a9c5c7d1fac82518815b9277809dfb82ddaac UNKNOWN
   * 2238559b0e2245e77204e7c7d0ef34c7a97e3766 UNKNOWN
   * debf54c1009e424381dcfa589dc18be512d4d32c Travis: 
[FAILURE](https://travis-ci.com/github/flink-ci/flink/builds/163665169) Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=557)
 
   * b4311ee10a3e6df9a129c2b971231e2312b63c37 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Resolved] (FLINK-17499) LazyTimerService used to register timers via State Processing API incorrectly mixes event time timers with processing time timers

2020-05-04 Thread Seth Wiesman (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seth Wiesman resolved FLINK-17499.
--
Resolution: Fixed

> LazyTimerService used to register timers via State Processing API incorrectly 
> mixes event time timers with processing time timers
> -
>
> Key: FLINK-17499
> URL: https://issues.apache.org/jira/browse/FLINK-17499
> Project: Flink
>  Issue Type: Bug
>  Components: API / State Processor
>Affects Versions: 1.11.0, 1.10.2
>Reporter: Adam Laczynski
>Assignee: Seth Wiesman
>Priority: Major
>  Labels: pull-request-available
>
> @Override
>  public void register*{color:#FF}ProcessingTime{color}*Timer(long time) {
>    ensureInitialized();
>    
> internalTimerService.register{color:#ff}*EventTime*{color}Timer(VoidNamespace.INSTANCE,
>  time);
>  }
> Same issue for both registerEventTimeTimer and registerProcessingTimeTimer.
> [https://github.com/apache/flink/blob/master/flink-libraries/flink-state-processing-api/src/main/java/org/apache/flink/state/api/output/operators/LazyTimerService.java#L62]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17499) LazyTimerService used to register timers via State Processing API incorrectly mixes event time timers with processing time timers

2020-05-04 Thread Seth Wiesman (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099011#comment-17099011
 ] 

Seth Wiesman commented on FLINK-17499:
--

Fixed in master: d6439c8d0e7792961635e3e4297c3dbfb01938e3
Fixed in release-1.10: ebba1589d8b407f53bcc4e325cd63eaa6b30870b

> LazyTimerService used to register timers via State Processing API incorrectly 
> mixes event time timers with processing time timers
> -
>
> Key: FLINK-17499
> URL: https://issues.apache.org/jira/browse/FLINK-17499
> Project: Flink
>  Issue Type: Bug
>  Components: API / State Processor
>Affects Versions: 1.11.0, 1.10.2
>Reporter: Adam Laczynski
>Assignee: Seth Wiesman
>Priority: Major
>  Labels: pull-request-available
>
> @Override
>  public void register*{color:#FF}ProcessingTime{color}*Timer(long time) {
>    ensureInitialized();
>    
> internalTimerService.register{color:#ff}*EventTime*{color}Timer(VoidNamespace.INSTANCE,
>  time);
>  }
> Same issue for both registerEventTimeTimer and registerProcessingTimeTimer.
> [https://github.com/apache/flink/blob/master/flink-libraries/flink-state-processing-api/src/main/java/org/apache/flink/state/api/output/operators/LazyTimerService.java#L62]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-17499) LazyTimerService used to register timers via State Processing API incorrectly mixes event time timers with processing time timers

2020-05-04 Thread Seth Wiesman (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seth Wiesman updated FLINK-17499:
-
Affects Version/s: (was: 1.10.0)
   1.10.2

> LazyTimerService used to register timers via State Processing API incorrectly 
> mixes event time timers with processing time timers
> -
>
> Key: FLINK-17499
> URL: https://issues.apache.org/jira/browse/FLINK-17499
> Project: Flink
>  Issue Type: Bug
>  Components: API / State Processor
>Affects Versions: 1.11.0, 1.10.2
>Reporter: Adam Laczynski
>Assignee: Seth Wiesman
>Priority: Major
>  Labels: pull-request-available
>
> @Override
>  public void register*{color:#FF}ProcessingTime{color}*Timer(long time) {
>    ensureInitialized();
>    
> internalTimerService.register{color:#ff}*EventTime*{color}Timer(VoidNamespace.INSTANCE,
>  time);
>  }
> Same issue for both registerEventTimeTimer and registerProcessingTimeTimer.
> [https://github.com/apache/flink/blob/master/flink-libraries/flink-state-processing-api/src/main/java/org/apache/flink/state/api/output/operators/LazyTimerService.java#L62]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17453) KyroSerializer throws IndexOutOfBoundsException type java.util.PriorityQueue

2020-05-04 Thread Jiayi Liao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099007#comment-17099007
 ] 

Jiayi Liao commented on FLINK-17453:


[~aljoscha] It's {{HashMap}}. I've attached the udaf implementation in case I 
miss something.

> KyroSerializer throws IndexOutOfBoundsException type java.util.PriorityQueue
> 
>
> Key: FLINK-17453
> URL: https://issues.apache.org/jira/browse/FLINK-17453
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Table SQL / Runtime
>Affects Versions: 1.9.0
>Reporter: Jiayi Liao
>Priority: Major
> Attachments: udaf
>
>
> We're using SQL UDAF with a {{PriorityQueue}} as {{Accumulator}}, and when 
> recovering from checkpoint, the error occurs.
> {code:java}
> 2020-04-28 22:28:18,659 INFO  
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer  - 
> IndexOutOfBoundsException type java.util.PriorityQueue source data is: [2, 0, 
> 0, 0, 0, 0, 0, 0].
> 2020-04-28 22:28:18,660 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally GroupWindowAggregate -> 
> Calc_select_live_id__2 -> SinkConversionToTupl -> Map -> Filter (37/40)- 
> execution # 0 (4636858426452f0a437d2f6d9564f34d).
> 2020-04-28 22:28:18,660 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - GroupWindowAggregate -> Calc_select_live_id__2 -> 
> SinkConversionToTupl -> Map -> Filter (37/40)- execution # 0 
> (4636858426452f0a437d2f6d9564f34d) switched from RUNNING to FAILED.
> org.apache.flink.streaming.runtime.tasks.AsynchronousException: Caught 
> exception while processing timer.
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$StreamTaskAsyncExceptionHandler.handleAsyncException(StreamTask.java:967)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.handleAsyncException(StreamTask.java:941)
> at 
> org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:285)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: TimerException{com.esotericsoftware.kryo.KryoException: 
> java.io.EOFException: No more bytes left.}
> at 
> org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:284)
> ... 7 more
> Caused by: com.esotericsoftware.kryo.KryoException: java.io.EOFException: No 
> more bytes left.
> at 
> org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:79)
> at com.esotericsoftware.kryo.io.Input.readVarInt(Input.java:355)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
> at 
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:361)
> at 
> org.apache.flink.util.InstantiationUtil.deserializeFromByteArray(InstantiationUtil.java:536)
> at 
> org.apache.flink.table.dataformat.BinaryGeneric.getJavaObjectFromBinaryGeneric(BinaryGeneric.java:86)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$GenericConverter.toExternalImpl(DataFormatConverters.java:628)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$GenericConverter.toExternalImpl(DataFormatConverters.java:633)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$DataFormatConverter.toExternal(DataFormatConverters.java:320)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$PojoConverter.toExternalImpl(DataFormatConverters.java:1293)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$PojoConverter.toExternalImpl(DataFormatConverters.java:1257)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$DataFormatConverter.toExternal(DataFormatConverters.java:302)
> at GroupingWindowAggsHandler$57.setAccumulators(Unknown Source)
> at 
>

[jira] [Updated] (FLINK-17453) KyroSerializer throws IndexOutOfBoundsException type java.util.PriorityQueue

2020-05-04 Thread Jiayi Liao (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayi Liao updated FLINK-17453:
---
Attachment: udaf

> KyroSerializer throws IndexOutOfBoundsException type java.util.PriorityQueue
> 
>
> Key: FLINK-17453
> URL: https://issues.apache.org/jira/browse/FLINK-17453
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System, Table SQL / Runtime
>Affects Versions: 1.9.0
>Reporter: Jiayi Liao
>Priority: Major
> Attachments: udaf
>
>
> We're using SQL UDAF with a {{PriorityQueue}} as {{Accumulator}}, and when 
> recovering from checkpoint, the error occurs.
> {code:java}
> 2020-04-28 22:28:18,659 INFO  
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer  - 
> IndexOutOfBoundsException type java.util.PriorityQueue source data is: [2, 0, 
> 0, 0, 0, 0, 0, 0].
> 2020-04-28 22:28:18,660 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - Attempting to fail task externally GroupWindowAggregate -> 
> Calc_select_live_id__2 -> SinkConversionToTupl -> Map -> Filter (37/40)- 
> execution # 0 (4636858426452f0a437d2f6d9564f34d).
> 2020-04-28 22:28:18,660 INFO  org.apache.flink.runtime.taskmanager.Task   
>   - GroupWindowAggregate -> Calc_select_live_id__2 -> 
> SinkConversionToTupl -> Map -> Filter (37/40)- execution # 0 
> (4636858426452f0a437d2f6d9564f34d) switched from RUNNING to FAILED.
> org.apache.flink.streaming.runtime.tasks.AsynchronousException: Caught 
> exception while processing timer.
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$StreamTaskAsyncExceptionHandler.handleAsyncException(StreamTask.java:967)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.handleAsyncException(StreamTask.java:941)
> at 
> org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:285)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: TimerException{com.esotericsoftware.kryo.KryoException: 
> java.io.EOFException: No more bytes left.}
> at 
> org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:284)
> ... 7 more
> Caused by: com.esotericsoftware.kryo.KryoException: java.io.EOFException: No 
> more bytes left.
> at 
> org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:79)
> at com.esotericsoftware.kryo.io.Input.readVarInt(Input.java:355)
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
> at 
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:361)
> at 
> org.apache.flink.util.InstantiationUtil.deserializeFromByteArray(InstantiationUtil.java:536)
> at 
> org.apache.flink.table.dataformat.BinaryGeneric.getJavaObjectFromBinaryGeneric(BinaryGeneric.java:86)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$GenericConverter.toExternalImpl(DataFormatConverters.java:628)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$GenericConverter.toExternalImpl(DataFormatConverters.java:633)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$DataFormatConverter.toExternal(DataFormatConverters.java:320)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$PojoConverter.toExternalImpl(DataFormatConverters.java:1293)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$PojoConverter.toExternalImpl(DataFormatConverters.java:1257)
> at 
> org.apache.flink.table.dataformat.DataFormatConverters$DataFormatConverter.toExternal(DataFormatConverters.java:302)
> at GroupingWindowAggsHandler$57.setAccumulators(Unknown Source)
> at 
> org.apache.flink.table.runtime.operators.window.internal.GeneralWindowProcessFunction.getWindowAggregationResult(GeneralWindowProcessFunction.java:73)
> at 
>

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419494982



##
File path: 
flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/BackPressureITCase.java
##
@@ -94,7 +94,7 @@ private static Configuration 
createNetworkBufferConfiguration() {
final Configuration configuration = new Configuration();
 
final int memorySegmentSizeKb = 32;
-   final MemorySize networkBuffersMemory = 
MemorySize.parse(memorySegmentSizeKb * (NUM_TASKS + 2) + "kb");
+   final MemorySize networkBuffersMemory = 
MemorySize.parse(memorySegmentSizeKb * 6 + "kb");

Review comment:
   In previous way the `LocalBufferPool` for `SingleInputGate` has 0 
required buffers, but now we adjust it to guarantee at-least one required 
buffer for local channel state recovery.
   
   In this ITCase,  the exclusive buffers for map and sink vertex should be `2 
* 2` , and the floating buffers in `LocalBufferPool` should be `2 * 1`, then 
the total minimum buffer amount should be `6`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419494982



##
File path: 
flink-tests/src/test/java/org/apache/flink/test/streaming/runtime/BackPressureITCase.java
##
@@ -94,7 +94,7 @@ private static Configuration 
createNetworkBufferConfiguration() {
final Configuration configuration = new Configuration();
 
final int memorySegmentSizeKb = 32;
-   final MemorySize networkBuffersMemory = 
MemorySize.parse(memorySegmentSizeKb * (NUM_TASKS + 2) + "kb");
+   final MemorySize networkBuffersMemory = 
MemorySize.parse(memorySegmentSizeKb * 6 + "kb");

Review comment:
   In previous way the `LocalBufferPool` for `SingleInputGate` has 0 
required buffers, but now we adjust it to guarantee at-least one required 
buffer for local channel state recovery.
   
   In this ITCase,  the exclusive buffers for map and sink vertex should be 2 * 
2 , and the floating buffers in `LocalBufferPool` should be 2 * 1, then the 
total minimum buffer amount should be 6.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rmetzger commented on pull request #11554: [FLINK-15101][connector/common] Add SourceCoordinator implementation.

2020-05-04 Thread GitBox



rmetzger commented on pull request #11554:
URL: https://github.com/apache/flink/pull/11554#issuecomment-623508089


   Sadly, Flinkbot does not re-run the CI if the last status is "UNKNOWN". You 
need to do another push to the branch (empty commit, rebase).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Reopened] (FLINK-17327) Kafka unavailability could cause Flink TM shutdown

2020-05-04 Thread Aljoscha Krettek (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek reopened FLINK-17327:
--

I'm re-opening for now since I think the KafkaConsumer is working as designed, 
i.e. FLINK-16482 is not a bug (though I don't like the exception throwing 
behaviour).

Btw, the Kafka Producer is stuck on a lock, that's why the TM is eventually 
killed:
{code}
2020-05-04 16:43:21,297 WARN  org.apache.flink.runtime.taskmanager.Task 
- Task 'Map -> Sink: Unnamed (1/1)' did not react to cancelling 
signal for 30 seconds, but is stuck in method:
 sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
org.apache.kafka.clients.producer.internals.TransactionalRequestResult.await(TransactionalRequestResult.java:50)
org.apache.kafka.clients.producer.KafkaProducer.commitTransaction(KafkaProducer.java:698)
org.apache.flink.streaming.connectors.kafka.internal.FlinkKafkaInternalProducer.commitTransaction(FlinkKafkaInternalProducer.java:103)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.recoverAndCommit(FlinkKafkaProducer.java:920)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.recoverAndCommit(FlinkKafkaProducer.java:98)
org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.recoverAndCommitInternal(TwoPhaseCommitSinkFunction.java:405)
org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.initializeState(TwoPhaseCommitSinkFunction.java:358)
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.initializeState(FlinkKafkaProducer.java:1042)
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:284)
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeStateAndOpen(StreamTask.java:989)
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:453)
org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$122/1846623322.run(Unknown
 Source)
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:448)
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:460)
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
java.lang.Thread.run(Thread.java:748)
{code}

> Kafka unavailability could cause Flink TM shutdown
> --
>
> Key: FLINK-17327
> URL: https://issues.apache.org/jira/browse/FLINK-17327
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.0
>Reporter: Jun Qin
>Priority: Major
> Attachments: Standalonesession.log, TM.log, TM_produer_only_task.log
>
>
> Steps to reproduce:
>  # Start a Flink 1.10 standalone cluster
>  # Run a Flink job which reads from one Kafka topic and writes to another 
> topic, with exactly-once checkpointing enabled
>  # Stop all Kafka Brokers after a few successful checkpoints
> When Kafka brokers are down:
>  # {{org.apache.kafka.clients.NetworkClient}} reported connection to broker 
> could not be established
>  # Then, Flink could not complete snapshot due to {{Timeout expired while 
> initializing transactional state in 6ms}}
>  # After several snapshot failures, Flink reported {{Too many ongoing 
> snapshots. Increase kafka producers pool size or decrease number of 
> concurrent checkpoints.}}
>  # Eventually, Flink tried to cancel the task which did not succeed within 3 
> min. According to logs, consumer was cancelled, but producer is still running
>  # Then {{Fatal error occurred while executing the TaskManager. Shutting it 
> down...}}
> I will attach the logs to show the details.  Worth to note that if there 
> would be no consumer but producer only in the task, the behavior is different:
>  # {{org.apache.kafka.clients.NetworkClient}}

[GitHub] [flink] becketqin opened a new pull request #11554: [FLINK-15101][connector/common] Add SourceCoordinator implementation.

2020-05-04 Thread GitBox



becketqin opened a new pull request #11554:
URL: https://github.com/apache/flink/pull/11554


   ## What is the purpose of the change
   This patch is a part of 
[FLIP-27](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface).
 It adds the implementation for `SourceCoordinator` which extends 
`OperatorCoordinator`.
   
   ## Brief change log
   The following major classes are added:
   * SourceCoordinator
   * SourceCoordinatorContext
   * SourceCoordinatorProvider
   * SplitAssignmentTracker
   
   ## Verifying this change
   This change added related unit tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes)
 - The serializers: (yes)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes)
 - If yes, how is the feature documented? (JavaDocs)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419490643



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/BufferManager.java
##
@@ -65,14 +65,10 @@
@GuardedBy("bufferQueue")
private int numRequiredBuffers;
 
-   public BufferManager(
-   MemorySegmentProvider globalPool,
-   InputChannel inputChannel,
-   int numRequiredBuffers) {
-
-   this.globalPool = checkNotNull(globalPool);
-   this.inputChannel = checkNotNull(inputChannel);
+   public BufferManager(InputChannel inputChannel, int numRequiredBuffers) 
{
checkArgument(numRequiredBuffers >= 0);
+   this.inputChannel = checkNotNull(inputChannel);
+   this.globalPool = 
inputChannel.inputGate.getMemorySegmentProvider();

Review comment:
   Actually I injected it in the constructor in previous version, but since 
we can not get ride of `InputChannel` completed in `BufferManager` as above 
said, then I reduce this unnecessary argument in constructor.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419489175



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/BufferManager.java
##
@@ -0,0 +1,356 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition.consumer;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.core.memory.MemorySegment;
+import org.apache.flink.core.memory.MemorySegmentProvider;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import org.apache.flink.runtime.io.network.buffer.BufferListener;
+import org.apache.flink.runtime.io.network.buffer.BufferPool;
+import org.apache.flink.runtime.io.network.buffer.BufferRecycler;
+import org.apache.flink.runtime.io.network.buffer.NetworkBuffer;
+import org.apache.flink.util.ExceptionUtils;
+
+import javax.annotation.Nullable;
+import javax.annotation.concurrent.GuardedBy;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * The general buffer manager used by {@link InputChannel} to request/recycle
+ * exclusive or floating buffers.
+ */
+public class BufferManager implements BufferListener, BufferRecycler {
+
+   /** The available buffer queue wraps both exclusive and requested 
floating buffers. */
+   private final AvailableBufferQueue bufferQueue = new 
AvailableBufferQueue();
+
+   /** The buffer provider for requesting exclusive buffers. */
+   private final MemorySegmentProvider globalPool;
+
+   /** The input channel to own this buffer manager. */
+   private final InputChannel inputChannel;
+
+   /** The tag indicates whether it is waiting for additional floating 
buffers from the buffer pool. */
+   @GuardedBy("bufferQueue")
+   private boolean isWaitingForFloatingBuffers;
+
+   /** The total number of required buffers for the respective input 
channel. */
+   @GuardedBy("bufferQueue")
+   private int numRequiredBuffers;
+
+   public BufferManager(
+   MemorySegmentProvider globalPool,
+   InputChannel inputChannel,
+   int numRequiredBuffers) {
+
+   this.globalPool = checkNotNull(globalPool);
+   this.inputChannel = checkNotNull(inputChannel);
+   checkArgument(numRequiredBuffers >= 0);
+   this.numRequiredBuffers = numRequiredBuffers;
+   }
+
+   // 

+   // Buffer request
+   // 

+
+   @Nullable
+   Buffer requestBuffer() {
+   synchronized (bufferQueue) {
+   return bufferQueue.takeBuffer();
+   }
+   }
+
+   /**
+* Requests exclusive buffers from the provider and returns the number 
of requested amount.
+*/
+   int requestExclusiveBuffers() throws IOException {
+   Collection segments = 
globalPool.requestMemorySegments();
+   checkArgument(!segments.isEmpty(), "The number of exclusive 
buffers per channel should be larger than 0.");
+
+   synchronized (bufferQueue) {
+   for (MemorySegment segment : segments) {
+   bufferQueue.addExclusiveBuffer(new 
NetworkBuffer(segment, this), numRequiredBuffers);
+   }
+   }
+   return segments.size();
+   }
+
+   /**
+* Requests floating buffers from the buffer pool based on the given 
required amount, and returns the actual
+* requested amount. If the required amount is not fully satisfied, it 
will register as a listener.
+*/
+   int requestFloatingBuffers(int numRequired) throws IOException {
+

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419487035



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/BufferManager.java
##
@@ -0,0 +1,356 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.io.network.partition.consumer;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.core.memory.MemorySegment;
+import org.apache.flink.core.memory.MemorySegmentProvider;
+import org.apache.flink.runtime.io.network.buffer.Buffer;
+import org.apache.flink.runtime.io.network.buffer.BufferListener;
+import org.apache.flink.runtime.io.network.buffer.BufferPool;
+import org.apache.flink.runtime.io.network.buffer.BufferRecycler;
+import org.apache.flink.runtime.io.network.buffer.NetworkBuffer;
+import org.apache.flink.util.ExceptionUtils;
+
+import javax.annotation.Nullable;
+import javax.annotation.concurrent.GuardedBy;
+
+import java.io.IOException;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+import static org.apache.flink.util.Preconditions.checkState;
+
+/**
+ * The general buffer manager used by {@link InputChannel} to request/recycle
+ * exclusive or floating buffers.
+ */
+public class BufferManager implements BufferListener, BufferRecycler {
+
+   /** The available buffer queue wraps both exclusive and requested 
floating buffers. */
+   private final AvailableBufferQueue bufferQueue = new 
AvailableBufferQueue();
+
+   /** The buffer provider for requesting exclusive buffers. */
+   private final MemorySegmentProvider globalPool;
+
+   /** The input channel to own this buffer manager. */
+   private final InputChannel inputChannel;
+
+   /** The tag indicates whether it is waiting for additional floating 
buffers from the buffer pool. */
+   @GuardedBy("bufferQueue")
+   private boolean isWaitingForFloatingBuffers;
+
+   /** The total number of required buffers for the respective input 
channel. */
+   @GuardedBy("bufferQueue")
+   private int numRequiredBuffers;
+
+   public BufferManager(
+   MemorySegmentProvider globalPool,
+   InputChannel inputChannel,

Review comment:
   yeah, I also noticed this cycle dependency issue while implementation. 
And I also tried to cut it if easy to go. But actually I found it needs pay 
more efforts, because there are many interactions among `InputChannel` and 
`BufferManager`.
   
   1. `BufferManager` relies on `BufferPool`, then while 
`SingleInputGate#setup`, it needs to register the respective `BufferPool` for 
every `InputChannel#BufferManager`. This is the main concern to bypass this 
issue at current implementation.
   
   2. We also need to maintain the  separate `isReleased` variable inside 
`BufferManager` to not rely on `InputChannel#isReleased`.
   
   3. We might need another separate interface which would be implemented by 
`InputChannel`, then we can decouple another two depending methods 
`InputChannel#onError` and `InputChannel#notifyBufferAvailable`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #11984: [FLINK-17501][qs] Improve logging in AbstractServerHandler#channelRead(ChannelHandlerContext, Object)

2020-05-04 Thread GitBox



flinkbot edited a comment on pull request #11984:
URL: https://github.com/apache/flink/pull/11984#issuecomment-623399971


   
   ## CI report:
   
   * 3f9eb6a6f30b0a00b44eef6f3b331da442c0ddab Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=559)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] pnowojski commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



pnowojski commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419484431



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/reader/AbstractRecordReader.java
##
@@ -63,6 +63,8 @@ protected AbstractRecordReader(InputGate inputGate, String[] 
tmpDirectories) {
}
 
protected boolean getNextRecord(T target) throws IOException, 
InterruptedException {
+   inputGate.requestPartitions();

Review comment:
   Can not we keep the previous setup logic if unaligned checkpoints are 
disabled/not configured (that would include batch?)? And add a `checkState` 
somewhere, that unaligned checkpoints can not be used without `StreamTask`/in 
streaming or something like that?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-17464) Stanalone HA Cluster crash with non-recoverable cluster state - need to wipe cluster to recover service

2020-05-04 Thread Till Rohrmann (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098982#comment-17098982
 ] 

Till Rohrmann edited comment on FLINK-17464 at 5/4/20, 2:35 PM:


Thanks for reporting this issue [~johnlon]. Your description of Flink's 
behaviour is correct. The reasoning behind this behaviour is that Flink must 
not lose jobs due to transient exceptions. When recovering jobs, Flink needs to 
interact with external systems and it can happen that certain operations fail. 
If Flink encounters a problem, it takes the conservative approach to fail the 
complete process so that a new process can take over and try again to recover 
the persisted jobs. This will work if the encounter problem eventually 
disappears.

Unfortunately, it won't work when the problem repeats deterministically as in 
your case or if someone would meddle around with some internal state of Flink 
(e.g. removing persisted blobs belonging to a submitted job). This problem even 
more problematic in case of a session cluster where other jobs are affected by 
one faulty job.

Ideally, one would like to distinguish between transient exceptions and 
deterministic ones. If this were possible, then one could retry for the former 
ones and fail the jobs in case one encounters the latter ones. Since this is in 
general a hard problem for which I don't know a good solution, I guess it is a 
good proposal to make the failure behaviour in case of recoveries configurable. 
As you have suggested such a sandbox mode could simply transition the job into 
a failed state instead of failing the whole process. 

The drawback of such a mode would be that you might fail some jobs which might 
be recoverable if retried a bit more.


was (Author: till.rohrmann):
Thanks for reporting this issue [~johnlon]. Your description of Flink's 
behaviour is correct. The reasoning behind this behaviour is that Flink must 
not lose jobs due to ephemeral exceptions. When recovering jobs, Flink needs to 
interact with external systems and it can happen that certain operations fail. 
If Flink encounters a problem, it takes the conservative approach to fail the 
complete process so that a new process can take over and try again to recover 
the persisted jobs. This will work if the encounter problem eventually 
disappears.

Unfortunately, it won't work when the problem repeats deterministically as in 
your case or if someone would meddle around with some internal state of Flink 
(e.g. removing persisted blobs belonging to a submitted job). This problem even 
more problematic in case of a session cluster where other jobs are affected by 
one faulty job.

Ideally, one would like to distinguish between ephemeral exceptions and 
deterministic ones. If this were possible, then one could retry for the former 
ones and fail the jobs in case one encounters the latter ones. Since this is in 
general a hard problem for which I don't know a good solution, I guess it is a 
good proposal to make the failure behaviour in case of recoveries configurable. 
As you have suggested such a sandbox mode could simply transition the job into 
a failed state instead of failing the whole process. 

The drawback of such a mode would be that you might fail some jobs which might 
be recoverable if retried a bit more.

> Stanalone HA Cluster crash with non-recoverable cluster state - need to wipe 
> cluster to recover service
> ---
>
> Key: FLINK-17464
> URL: https://issues.apache.org/jira/browse/FLINK-17464
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: John Lonergan
>Priority: Critical
>
> When recovering job graphs after a failover of the JobManager, or after a 
> restart of the cluster, the HA Cluster can get into a state where it cannot 
> be restarted and the only resoluton we have identified is to destroy the 
> Zookkeeper job graph store.
> This happens when any job graph that is being recovered throws an exception 
> during recovery on the master. 
> Whilst we encountered this issues on a sink that extends "InitialiseOnMaster" 
> we believe the vulnerability is generic in nature and the unrecolverable 
> problems encountered will occur if the application code throws any exception 
> for any reason during recovery on the main line. 
> These application exceptions propagate up to the JobManager ClusterEntryPoint 
> class at which point the JM leader does a system.exit. If there are remaining 
> JobManagers then they will also follow leader election and also encounter the 
> same sequence of events. Ultimately all JM's exit and then all TM's fail 
> also. 
> The entire cluster is destroyed.
> Because these events happen during job

[jira] [Commented] (FLINK-17464) Stanalone HA Cluster crash with non-recoverable cluster state - need to wipe cluster to recover service

2020-05-04 Thread Till Rohrmann (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098982#comment-17098982
 ] 

Till Rohrmann commented on FLINK-17464:
---

Thanks for reporting this issue [~johnlon]. Your description of Flink's 
behaviour is correct. The reasoning behind this behaviour is that Flink must 
not lose jobs due to ephemeral exceptions. When recovering jobs, Flink needs to 
interact with external systems and it can happen that certain operations fail. 
If Flink encounters a problem, it takes the conservative approach to fail the 
complete process so that a new process can take over and try again to recover 
the persisted jobs. This will work if the encounter problem eventually 
disappears.

Unfortunately, it won't work when the problem repeats deterministically as in 
your case or if someone would meddle around with some internal state of Flink 
(e.g. removing persisted blobs belonging to a submitted job). This problem even 
more problematic in case of a session cluster where other jobs are affected by 
one faulty job.

Ideally, one would like to distinguish between ephemeral exceptions and 
deterministic ones. If this were possible, then one could retry for the former 
ones and fail the jobs in case one encounters the latter ones. Since this is in 
general a hard problem for which I don't know a good solution, I guess it is a 
good proposal to make the failure behaviour in case of recoveries configurable. 
As you have suggested such a sandbox mode could simply transition the job into 
a failed state instead of failing the whole process. 

The drawback of such a mode would be that you might fail some jobs which might 
be recoverable if retried a bit more.

> Stanalone HA Cluster crash with non-recoverable cluster state - need to wipe 
> cluster to recover service
> ---
>
> Key: FLINK-17464
> URL: https://issues.apache.org/jira/browse/FLINK-17464
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: John Lonergan
>Priority: Critical
>
> When recovering job graphs after a failover of the JobManager, or after a 
> restart of the cluster, the HA Cluster can get into a state where it cannot 
> be restarted and the only resoluton we have identified is to destroy the 
> Zookkeeper job graph store.
> This happens when any job graph that is being recovered throws an exception 
> during recovery on the master. 
> Whilst we encountered this issues on a sink that extends "InitialiseOnMaster" 
> we believe the vulnerability is generic in nature and the unrecolverable 
> problems encountered will occur if the application code throws any exception 
> for any reason during recovery on the main line. 
> These application exceptions propagate up to the JobManager ClusterEntryPoint 
> class at which point the JM leader does a system.exit. If there are remaining 
> JobManagers then they will also follow leader election and also encounter the 
> same sequence of events. Ultimately all JM's exit and then all TM's fail 
> also. 
> The entire cluster is destroyed.
> Because these events happen during job graph recovery then merely attempt a 
> restart of the cluster will fail leaving the only option as destroying the 
> job graph state. 
> If one is running a shared cluster with many jobs then this is effectively a 
> DOS and results in prolonged down time as code or data changes are necessary 
> to work around the issue.
> --
> Of course if the same exception were to be thrown during job submission using 
> the CLI, then we would not see the cluster crashing nor the cluster being 
> corrupted; the job would merely fail.
> Our feeling is that the job graph recovery process ought to behave in a 
> similar fashion to the job submission processes.
> If a job submission fails then the job is recorded as failed and there is no 
> further impact on the cluster. However, if job recovery fails then the entire 
> cluster is taken down, and may as we have seen, become inoperable.
> We feel that a failure to restore a single job graph ought merely to result 
> in the job being recorded as failed. It should not result in a cluster-wide 
> impact.
> We do not understand the logic of the design in this space. However, if the 
> existing logic was for the benefit of single job clusters then this is a poor 
> result for multi job clusters. In which case we ought to be able to configure 
> a cluster for "multi-job mode" so that job graph recovery is "sandboxed"  and 
> doesn't take out the entire cluster.
> ---
> It is easy to demonstrate the problem using the built in Flink streaming Word 
> Count example.
> In order for this to work you configure the job to write a single output file 
> and also write this to

[jira] [Commented] (FLINK-11499) Extend StreamingFileSink BulkFormats to support arbitrary roll policies

2020-05-04 Thread Piotr Nowojski (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098975#comment-17098975
 ] 

Piotr Nowojski commented on FLINK-11499:


A small status update here.

1. I was doing some PoC in that direction and I quickly realised that it would 
need to modify most of the existing StreamingFileSink (SFS) classes. 
SFS/Bucket/Buckets/… have hardcoded assumptions about working on a single path, 
lack support of reading or cleaning up/deleting files.
2. There is some concurrent effort on using StreamingFileSink with Hadoop based 
file systems, without using our RecoverableFileSystem abstraction, which would 
probably conflict with WAL changes ([~maguowei] is taking care of it).
3. We haven’t figured out how to deal with changes to the record format, for 
example on job upgrades. With current SFS there are no issues with that, as 
there are no in-flight data. Record is written once, and when it is written, we 
can completely forget about it. With WAL, upon recovery we need to read such 
records, which creates problems: what if records schema/format has changed. 
This is something that could be dealt with in couple of ways (either supporting 
some migration/backward compatibility, or add a requirement to clean 
cut/completely empty WAL on job upgrade when using save point), but either way 
that would be a source of extra complexity.

Because of that we started to consider going first with another approach to the 
problem: https://issues.apache.org/jira/browse/FLINK-17505 .


> Extend StreamingFileSink BulkFormats to support arbitrary roll policies
> ---
>
> Key: FLINK-11499
> URL: https://issues.apache.org/jira/browse/FLINK-11499
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / FileSystem
>Reporter: Seth Wiesman
>Priority: Major
>  Labels: usability
> Fix For: 1.11.0
>
>
> Currently when using the StreamingFilleSink Bulk-encoding formats can only be 
> combined with the `OnCheckpointRollingPolicy`, which rolls the in-progress 
> part file on every checkpoint.
> However, many bulk formats such as parquet are most efficient when written as 
> large files; this is not possible when frequent checkpointing is enabled. 
> Currently the only work-around is to have long checkpoint intervals which is 
> not ideal.
>  
> The StreamingFileSink should be enhanced to support arbitrary roll policy's 
> so users may write large bulk files while retaining frequent checkpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] zhijiangW commented on a change in pull request #11687: [FLINK-16536][network][checkpointing] Implement InputChannel state recovery for unaligned checkpoint

2020-05-04 Thread GitBox



zhijiangW commented on a change in pull request #11687:
URL: https://github.com/apache/flink/pull/11687#discussion_r419472411



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java
##
@@ -321,111 +274,43 @@ private void notifyCreditAvailable() {
partitionRequestClient.notifyCreditAvailable(this);
}
 
-   /**
-* Exclusive buffer is recycled to this input channel directly and it 
may trigger return extra
-* floating buffer and notify increased credit to the producer.
-*
-* @param segment The exclusive segment of this channel.
-*/
-   @Override
-   public void recycle(MemorySegment segment) {
-   int numAddedBuffers;
-
-   synchronized (bufferQueue) {
-   // Similar to notifyBufferAvailable(), make sure that 
we never add a buffer
-   // after releaseAllResources() released all buffers 
(see below for details).
-   if (isReleased.get()) {
-   try {
-   
memorySegmentProvider.recycleMemorySegments(Collections.singletonList(segment));
-   return;
-   } catch (Throwable t) {
-   ExceptionUtils.rethrow(t);
-   }
-   }
-   numAddedBuffers = bufferQueue.addExclusiveBuffer(new 
NetworkBuffer(segment, this), numRequiredBuffers);
-   }
-
-   if (numAddedBuffers > 0 && 
unannouncedCredit.getAndAdd(numAddedBuffers) == 0) {
-   notifyCreditAvailable();
-   }
-   }
-
public int getNumberOfAvailableBuffers() {
-   synchronized (bufferQueue) {
-   return bufferQueue.getAvailableBufferSize();
-   }
+   return bufferManager.getNumberOfAvailableBuffers();
}
 
public int getNumberOfRequiredBuffers() {
-   return numRequiredBuffers;
+   return bufferManager.getNumberOfRequiredBuffers();
}
 
public int getSenderBacklog() {

Review comment:
   Exactly as you said, we can remove it as a separate ticket or even 
hotfix commit in this PR if you think so.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 3 >

1 - 100 of 283 matches

Mail list logo