[GitHub] [flink] flinkbot edited a comment on pull request #15458: [FLINK-22064][table] Don't submit statement set when no insert is added in the sql client

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15458:
URL: https://github.com/apache/flink/pull/15458#issuecomment-811149954


   
   ## CI report:
   
   * e97c9b4bc77bb3842f3fb72396dcee2acac9fb5d Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15918)
 
   * 8dcf34c88cd35a8e7d1e044a1adf9c6e46b16bd3 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15952)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-22003) UnalignedCheckpointITCase fail

2021-03-31 Thread Yun Gao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312877#comment-17312877
 ] 

Yun Gao commented on FLINK-22003:
-

Hi, I think [~roman_khachatryan] is exactly right for the conditions of 
successfully triggering, but I read the log and found there might be still some 
issues since the source task 3/5, 4/5 and 5/5 are transited to RUNNING in JM 
side after the checkpoint 11 is triggered. 

After analyzed the log, I think it is very likely due to that when computing 
the tasks to trigger, the failover is still not done yet, thus some sources 
might acquired and checked against their prior execution. And it seems 
FLINK-21067 indeed has problem in that it ignores the tasks in 
CANCELING/CANCELED state, which is different from the prior logic. If this case 
happens, the prior execution would fail triggering and cause the checkpoint 
would have to wait for timeout to be canceled. Very sorry for introducing the 
bug and I'll open a PR for it today to change the condition to only allows for 
RUNNING and FINISHED status. 

But it also comes to me that there _might be_ another case: when we compute 
tasks to trigger and checking status, the failover might not happen yet, then 
all the tasks are running and then the checkpoint would be triggered. The 
triggering would also fail due to the prior execution is gone and cause the 
checkpoint to wait for timeout. This behavior should already exists before  
FLINK-21067 if the case did exist. I'll have a double confirmation about this 
case.

> UnalignedCheckpointITCase fail
> --
>
> Key: FLINK-22003
> URL: https://issues.apache.org/jira/browse/FLINK-22003
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.13.0
>Reporter: Guowei Ma
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15601=logs=119bbba7-f5e3-5e08-e72d-09f1529665de=7dc1f5a9-54e1-502e-8b02-c7df69073cfc=4142
> {code:java}
> [ERROR] execute[parallel pipeline with remote channels, p = 
> 5](org.apache.flink.test.checkpointing.UnalignedCheckpointITCase)  Time 
> elapsed: 60.018 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 6 
> milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1859)
>   at 
> org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:69)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1839)
>   at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1822)
>   at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointTestBase.execute(UnalignedCheckpointTestBase.java:138)
>   at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointITCase.execute(UnalignedCheckpointITCase.java:184)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wangyang0918 commented on a change in pull request #15396: [FLINK-21008][coordination] Register a shutdown supplier in the SignalHandler for ClusterEntrypoint

2021-03-31 Thread GitBox


wangyang0918 commented on a change in pull request #15396:
URL: https://github.com/apache/flink/pull/15396#discussion_r605389503



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/util/SignalHandler.java
##
@@ -53,6 +63,13 @@ public void handle(Signal signal) {
 "RECEIVED SIGNAL {}: SIG{}. Shutting down as requested.",
 signal.getNumber(),
 signal.getName());
+try {
+shutdownSupplier
+.get()
+.get(SHUTDOWN_TIMEOUT.toMilliseconds(), 
TimeUnit.MILLISECONDS);

Review comment:
   I am not aware of other signals that will terminate the JVM process. On 
Yarn and Kubernetes, when a container needs to be terminated, it will send a 
`SIGTERM` first and followed by a `SIGKILL` after timeout. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wangyang0918 commented on a change in pull request #15396: [FLINK-21008][coordination] Register a shutdown supplier in the SignalHandler for ClusterEntrypoint

2021-03-31 Thread GitBox


wangyang0918 commented on a change in pull request #15396:
URL: https://github.com/apache/flink/pull/15396#discussion_r605388703



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/util/SignalHandler.java
##
@@ -53,6 +63,13 @@ public void handle(Signal signal) {
 "RECEIVED SIGNAL {}: SIG{}. Shutting down as requested.",
 signal.getNumber(),
 signal.getName());
+try {
+shutdownSupplier
+.get()
+.get(SHUTDOWN_TIMEOUT.toMilliseconds(), 
TimeUnit.MILLISECONDS);

Review comment:
   I do not think `SIGKILL` could be handled via `sun.misc.SignalHandler`. 
`SIGKILL` means the force-kill, and the JVM does not have any chance to do the 
signal handler and shutdown hook.
   
   Currently, the `ClusterEntrypoint` will exit when receiving `SIGTERM`, 
`SIGHUP`, `SIGINT`. So the `shutdownSupplier` will be executed in such cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-22082) Nested projection push down doesn't work for data such as row(array(row))

2021-03-31 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312872#comment-17312872
 ] 

Jark Wu commented on FLINK-22082:
-

{{FileSystemTableSource}} doesn't support nested projection pushdown yet. 

> Nested projection push down doesn't work for data such as row(array(row))
> -
>
> Key: FLINK-22082
> URL: https://issues.apache.org/jira/browse/FLINK-22082
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Dian Fu
>Priority: Major
>
> For the following job:
> {code}
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment
> config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, config)
> source_ddl = """
> CREATE TABLE InTable (
> `ID` STRING,
> `Timestamp` TIMESTAMP(3),
> `Result` ROW(
> `data` ROW(`value` BIGINT) ARRAY),
> WATERMARK FOR `Timestamp` AS `Timestamp`
> ) WITH (
> 'connector' = 'filesystem',
> 'format' = 'json',
> 'path' = '/tmp/1.txt'
> )
> """
> sink_ddl = """
> CREATE TABLE OutTable (
> `ID` STRING,
> `value` BIGINT
> ) WITH (
> 'connector' = 'print'
> )
> """
> t_env.execute_sql(source_ddl)
> t_env.execute_sql(sink_ddl)
> table = t_env.from_path('InTable')
> table \
> .select(
> table.ID,
> table.Result.data.at(1).value) \
> .execute_insert('OutTable') \
> .wait()
> {code}
> It will thrown the following exception:
> {code}
> : scala.MatchError: ITEM($2.data, 1) (of class org.apache.calcite.rex.RexCall)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.internalVisit$1(NestedProjectionUtil.scala:273)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.visitFieldAccess(NestedProjectionUtil.scala:283)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.visitFieldAccess(NestedProjectionUtil.scala:269)
>   at org.apache.calcite.rex.RexFieldAccess.accept(RexFieldAccess.java:92)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$$anonfun$build$1.apply(NestedProjectionUtil.scala:112)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$$anonfun$build$1.apply(NestedProjectionUtil.scala:111)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$.build(NestedProjectionUtil.scala:111)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil.build(NestedProjectionUtil.scala)
>   at 
> org.apache.flink.table.planner.plan.rules.logical.ProjectWatermarkAssignerTransposeRule.getUsedFieldsInTopLevelProjectAndWatermarkAssigner(ProjectWatermarkAssignerTransposeRule.java:155)
>   at 
> org.apache.flink.table.planner.plan.rules.logical.ProjectWatermarkAssignerTransposeRule.matches(ProjectWatermarkAssignerTransposeRule.java:65)
> {code}
> See 
> https://stackoverflow.com/questions/66888486/pyflink-extract-nested-fields-from-json-array
>  for more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wangyang0918 commented on a change in pull request #15396: [FLINK-21008][coordination] Register a shutdown supplier in the SignalHandler for ClusterEntrypoint

2021-03-31 Thread GitBox


wangyang0918 commented on a change in pull request #15396:
URL: https://github.com/apache/flink/pull/15396#discussion_r605386244



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/ClusterEntrypoint.java
##
@@ -366,7 +366,7 @@ protected MetricRegistryImpl createMetricRegistry(
 return shutDownAsync(
 ApplicationStatus.UNKNOWN,
 "Cluster entrypoint has been closed externally.",
-true)
+false)

Review comment:
   I will do that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15442: [FLINK-22081][flink-core] handle entropy injection metadata path in pluggable HadoopS3FileSystem

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15442:
URL: https://github.com/apache/flink/pull/15442#issuecomment-810703449


   
   ## CI report:
   
   * 35f4bea03e4bfd96ca125d9980bc3fcdced52d79 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15850)
 
   * f7bb6c41ecbeee4c64486a29f73c4cf69196bd1d UNKNOWN
   * 9ebb37a4e839400a0c392335e984e3b01bd8079f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15407: [FLINK-21942][coordination] Remove job from LeaderIdService when disconnecting JobManager with globally terminal state

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15407:
URL: https://github.com/apache/flink/pull/15407#issuecomment-809165650


   
   ## CI report:
   
   * 5611a9d38557ba108df1e64b9f306f5cf98268d5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15848)
 
   * 2107b03de95d36a1d547ff5ac438b18ea07a328a UNKNOWN
   * 1f7d1658c2c8b0eed83a14ee4b5d55d79da5c21d UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wangyang0918 commented on pull request #15407: [FLINK-21942][coordination] Remove job from LeaderIdService when disconnecting JobManager with globally terminal state

2021-03-31 Thread GitBox


wangyang0918 commented on pull request #15407:
URL: https://github.com/apache/flink/pull/15407#issuecomment-811652456


   @tillrohrmann Thanks for your review. Comments addressed and force pushed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-22082) Nested projection push down doesn't work for data: row(array(row))

2021-03-31 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312868#comment-17312868
 ] 

Dian Fu commented on FLINK-22082:
-

cc [~fsk119]

> Nested projection push down doesn't work for data: row(array(row))
> --
>
> Key: FLINK-22082
> URL: https://issues.apache.org/jira/browse/FLINK-22082
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Dian Fu
>Priority: Major
>
> For the following job:
> {code}
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment
> config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, config)
> source_ddl = """
> CREATE TABLE InTable (
> `ID` STRING,
> `Timestamp` TIMESTAMP(3),
> `Result` ROW(
> `data` ROW(`value` BIGINT) ARRAY),
> WATERMARK FOR `Timestamp` AS `Timestamp`
> ) WITH (
> 'connector' = 'filesystem',
> 'format' = 'json',
> 'path' = '/tmp/1.txt'
> )
> """
> sink_ddl = """
> CREATE TABLE OutTable (
> `ID` STRING,
> `value` BIGINT
> ) WITH (
> 'connector' = 'print'
> )
> """
> t_env.execute_sql(source_ddl)
> t_env.execute_sql(sink_ddl)
> table = t_env.from_path('InTable')
> table \
> .select(
> table.ID,
> table.Result.data.at(1).value) \
> .execute_insert('OutTable') \
> .wait()
> {code}
> It will thrown the following exception:
> {code}
> : scala.MatchError: ITEM($2.data, 1) (of class org.apache.calcite.rex.RexCall)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.internalVisit$1(NestedProjectionUtil.scala:273)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.visitFieldAccess(NestedProjectionUtil.scala:283)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.visitFieldAccess(NestedProjectionUtil.scala:269)
>   at org.apache.calcite.rex.RexFieldAccess.accept(RexFieldAccess.java:92)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$$anonfun$build$1.apply(NestedProjectionUtil.scala:112)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$$anonfun$build$1.apply(NestedProjectionUtil.scala:111)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$.build(NestedProjectionUtil.scala:111)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil.build(NestedProjectionUtil.scala)
>   at 
> org.apache.flink.table.planner.plan.rules.logical.ProjectWatermarkAssignerTransposeRule.getUsedFieldsInTopLevelProjectAndWatermarkAssigner(ProjectWatermarkAssignerTransposeRule.java:155)
>   at 
> org.apache.flink.table.planner.plan.rules.logical.ProjectWatermarkAssignerTransposeRule.matches(ProjectWatermarkAssignerTransposeRule.java:65)
> {code}
> See 
> https://stackoverflow.com/questions/66888486/pyflink-extract-nested-fields-from-json-array
>  for more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22082) Nested projection push down doesn't work for data such as row(array(row))

2021-03-31 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-22082:

Summary: Nested projection push down doesn't work for data such as 
row(array(row))  (was: Nested projection push down doesn't work for data: 
row(array(row)))

> Nested projection push down doesn't work for data such as row(array(row))
> -
>
> Key: FLINK-22082
> URL: https://issues.apache.org/jira/browse/FLINK-22082
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.12.0, 1.13.0
>Reporter: Dian Fu
>Priority: Major
>
> For the following job:
> {code}
> from pyflink.datastream import StreamExecutionEnvironment
> from pyflink.table import TableConfig, StreamTableEnvironment
> config = TableConfig()
> env = StreamExecutionEnvironment.get_execution_environment()
> t_env = StreamTableEnvironment.create(env, config)
> source_ddl = """
> CREATE TABLE InTable (
> `ID` STRING,
> `Timestamp` TIMESTAMP(3),
> `Result` ROW(
> `data` ROW(`value` BIGINT) ARRAY),
> WATERMARK FOR `Timestamp` AS `Timestamp`
> ) WITH (
> 'connector' = 'filesystem',
> 'format' = 'json',
> 'path' = '/tmp/1.txt'
> )
> """
> sink_ddl = """
> CREATE TABLE OutTable (
> `ID` STRING,
> `value` BIGINT
> ) WITH (
> 'connector' = 'print'
> )
> """
> t_env.execute_sql(source_ddl)
> t_env.execute_sql(sink_ddl)
> table = t_env.from_path('InTable')
> table \
> .select(
> table.ID,
> table.Result.data.at(1).value) \
> .execute_insert('OutTable') \
> .wait()
> {code}
> It will thrown the following exception:
> {code}
> : scala.MatchError: ITEM($2.data, 1) (of class org.apache.calcite.rex.RexCall)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.internalVisit$1(NestedProjectionUtil.scala:273)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.visitFieldAccess(NestedProjectionUtil.scala:283)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.visitFieldAccess(NestedProjectionUtil.scala:269)
>   at org.apache.calcite.rex.RexFieldAccess.accept(RexFieldAccess.java:92)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$$anonfun$build$1.apply(NestedProjectionUtil.scala:112)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$$anonfun$build$1.apply(NestedProjectionUtil.scala:111)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$.build(NestedProjectionUtil.scala:111)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil.build(NestedProjectionUtil.scala)
>   at 
> org.apache.flink.table.planner.plan.rules.logical.ProjectWatermarkAssignerTransposeRule.getUsedFieldsInTopLevelProjectAndWatermarkAssigner(ProjectWatermarkAssignerTransposeRule.java:155)
>   at 
> org.apache.flink.table.planner.plan.rules.logical.ProjectWatermarkAssignerTransposeRule.matches(ProjectWatermarkAssignerTransposeRule.java:65)
> {code}
> See 
> https://stackoverflow.com/questions/66888486/pyflink-extract-nested-fields-from-json-array
>  for more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22082) Nested projection push down doesn't work for data: row(array(row))

2021-03-31 Thread Dian Fu (Jira)
Dian Fu created FLINK-22082:
---

 Summary: Nested projection push down doesn't work for data: 
row(array(row))
 Key: FLINK-22082
 URL: https://issues.apache.org/jira/browse/FLINK-22082
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.12.0, 1.13.0
Reporter: Dian Fu


For the following job:

{code}
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import TableConfig, StreamTableEnvironment

config = TableConfig()
env = StreamExecutionEnvironment.get_execution_environment()
t_env = StreamTableEnvironment.create(env, config)

source_ddl = """
CREATE TABLE InTable (
`ID` STRING,
`Timestamp` TIMESTAMP(3),
`Result` ROW(
`data` ROW(`value` BIGINT) ARRAY),
WATERMARK FOR `Timestamp` AS `Timestamp`
) WITH (
'connector' = 'filesystem',
'format' = 'json',
'path' = '/tmp/1.txt'
)
"""

sink_ddl = """
CREATE TABLE OutTable (
`ID` STRING,
`value` BIGINT
) WITH (
'connector' = 'print'
)
"""

t_env.execute_sql(source_ddl)
t_env.execute_sql(sink_ddl)

table = t_env.from_path('InTable')
table \
.select(
table.ID,
table.Result.data.at(1).value) \
.execute_insert('OutTable') \
.wait()
{code}

It will thrown the following exception:
{code}
: scala.MatchError: ITEM($2.data, 1) (of class org.apache.calcite.rex.RexCall)
at 
org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.internalVisit$1(NestedProjectionUtil.scala:273)
at 
org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.visitFieldAccess(NestedProjectionUtil.scala:283)
at 
org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.visitFieldAccess(NestedProjectionUtil.scala:269)
at org.apache.calcite.rex.RexFieldAccess.accept(RexFieldAccess.java:92)
at 
org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$$anonfun$build$1.apply(NestedProjectionUtil.scala:112)
at 
org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$$anonfun$build$1.apply(NestedProjectionUtil.scala:111)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$.build(NestedProjectionUtil.scala:111)
at 
org.apache.flink.table.planner.plan.utils.NestedProjectionUtil.build(NestedProjectionUtil.scala)
at 
org.apache.flink.table.planner.plan.rules.logical.ProjectWatermarkAssignerTransposeRule.getUsedFieldsInTopLevelProjectAndWatermarkAssigner(ProjectWatermarkAssignerTransposeRule.java:155)
at 
org.apache.flink.table.planner.plan.rules.logical.ProjectWatermarkAssignerTransposeRule.matches(ProjectWatermarkAssignerTransposeRule.java:65)
{code}

See 
https://stackoverflow.com/questions/66888486/pyflink-extract-nested-fields-from-json-array
 for more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22081) Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin

2021-03-31 Thread Chen Qin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312867#comment-17312867
 ] 

Chen Qin commented on FLINK-22081:
--

[~AHeise] could you assign this Jira to me and help review pr?

> Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin
> ---
>
> Key: FLINK-22081
> URL: https://issues.apache.org/jira/browse/FLINK-22081
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Reporter: Chen Qin
>Assignee: Prem Santosh
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 
> 1.11.3, 1.11.4, 1.12.0, 1.12.1, 1.12.2, 1.13.0, 1.12.3
>
> Attachments: image (13).png
>
>
> Using flink 1.11.2
> I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
> checkpoints paths like 
> {{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
>  which means the entropy injection key is not being resolved. After some 
> debugging I found that in the 
> [EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
>  we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
> and if so we check if the filesysystem is of type 
> {{SafetyNetWrapperFileSystem as well as it's delegate }}but don't check for 
> {{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
>  directly in getEntorpyFs method which would be the type if S3 file system 
> dependencies are added as a plugin.
>  
> Repro steps: 
> Flink 1.11.2 with flink-s3-fs-hadoop as plugin and turn on entropy injection 
> key _entropy_
> observe checkpoint dir with entropy marker not removed.
> s3a://xxx/dev/checkpoints/_entropy_/xenon/event-stream-splitter/jobid/chk-5/  
> compare to removed when running Flink 1.9.1
> s3a://xxx/dev/checkpoints/xenon/event-stream-splitter/jobid/chk-5/  
> Add some logging to getEntropyFs, observe it return null because passed in 
> parameter is not {{SafetyNetWrapperFileSystem}} but 
> {{ClassLoaderFixingFileSystem}}
> Apply patch, build release and run same job, resolved issue as attachment 
> shows
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22081) Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin

2021-03-31 Thread Chen Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qin updated FLINK-22081:
-
Description: 
Using flink 1.11.2

I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
checkpoints paths like 
{{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
 which means the entropy injection key is not being resolved. After some 
debugging I found that in the 
[EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
 we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
and if so we check if the filesysystem is of type {{SafetyNetWrapperFileSystem 
as well as it's delegate }}but don't check for 
{{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
 directly in getEntorpyFs method which would be the type if S3 file system 
dependencies are added as a plugin.

 

Repro steps: 

Flink 1.11.2 with flink-s3-fs-hadoop as plugin and turn on entropy injection 
key _entropy_
observe checkpoint dir with entropy marker not removed.
s3a://xxx/dev/checkpoints/_entropy_/xenon/event-stream-splitter/jobid/chk-5/  
compare to removed when running Flink 1.9.1
s3a://xxx/dev/checkpoints/xenon/event-stream-splitter/jobid/chk-5/  

Add some logging to getEntropyFs, observe it return null because passed in 
parameter is not {{SafetyNetWrapperFileSystem}} but 
{{ClassLoaderFixingFileSystem}}

Apply patch, build release and run same job, resolved issue as attachment shows

 

 

  was:
Using flink 1.11.2

I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
checkpoints paths like 
{{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
 which means the entropy injection key is not being resolved. After some 
debugging I found that in the 
[EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
 we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
and if so we check if the filesysystem is of type {{SafetyNetWrapperFileSystem 
as well as it's delegate }}but don't check for 
{{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
 directly in getEntorpyFs method which would be the type if S3 file system 
dependencies are added as a plugin.

 

Repro steps: 

Flink 1.11.2 with flink-s3-fs-hadoop as plugin and turn on entropy injection 
key _entropy_

observe checkpoint dir with entropy marker not removed.

s3a://xxx/dev/checkpoints/_entropy_/xenon/event-stream-splitter/jobid/chk-5/  

compare to removed when running Flink 1.9.1

s3a://xxx/dev/checkpoints/xenon/event-stream-splitter/jobid/chk-5/  

 

Add some logging to getEntropyFs, observe it return null because passed in 
parameter is not {{SafetyNetWrapperFileSystem}} but 
{{ClassLoaderFixingFileSystem}}

 

Apply patch, build release and run same job, resolved issue as attachment shows

 

 


> Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin
> ---
>
> Key: FLINK-22081
> URL: https://issues.apache.org/jira/browse/FLINK-22081
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Reporter: Chen Qin
>Assignee: Prem Santosh
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 
> 1.11.3, 1.11.4, 1.12.0, 1.12.1, 1.12.2, 1.13.0, 1.12.3
>
> Attachments: image (13).png
>
>
> Using flink 1.11.2
> I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
> checkpoints paths like 
> {{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
>  which means the entropy injection key is not being resolved. After some 
> debugging I found that in the 
> [EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
>  we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
> and if so we check if the filesysystem is of type 
> {{SafetyNetWrapperFileSystem as well as it's delegate }}but don't check for 
> {{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
>  directly in getEntorpyFs method which would be the type if S3 file system 
> dependencies are added as a 

[jira] [Updated] (FLINK-22081) Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin

2021-03-31 Thread Chen Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qin updated FLINK-22081:
-
Description: 
Using flink 1.11.2

I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
checkpoints paths like 
{{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
 which means the entropy injection key is not being resolved. After some 
debugging I found that in the 
[EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
 we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
and if so we check if the filesysystem is of type {{SafetyNetWrapperFileSystem 
as well as it's delegate }}but don't check for 
{{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
 directly in getEntorpyFs method which would be the type if S3 file system 
dependencies are added as a plugin.

 

Repro steps: 

Flink 1.11.2 with flink-s3-fs-hadoop as plugin and turn on entropy injection 
key _entropy_

observe checkpoint dir with entropy marker not removed.

s3a://xxx/dev/checkpoints/_entropy_/xenon/event-stream-splitter/jobid/chk-5/  

compare to removed when running Flink 1.9.1

s3a://xxx/dev/checkpoints/xenon/event-stream-splitter/jobid/chk-5/  

 

Add some logging to getEntropyFs, observe it return null because passed in 
parameter is not {{SafetyNetWrapperFileSystem}} but 
{{ClassLoaderFixingFileSystem}}

 

Apply patch, build release and run same job, resolved issue as attachment shows

 

 

  was:
Using flink 1.11.2

I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
checkpoints paths like 
{{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
 which means the entropy injection key is not being resolved. After some 
debugging I found that in the 
[EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
 we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
and if so we check if the filesysystem is of type {{SafetyNetWrapperFileSystem 
as well as it's delegate }}but don't check for 
{{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
 directly in getEntorpyFs method which would be the type if S3 file system 
dependencies are added as a plugin.

 

Repro steps: 

Flink 1.11.2 with flink-s3-fs-hadoop as plugin and turn on entropy injection 
key _entropy_

observe checkpoint dir with entropy marker not removed.

s3a://xxx/dev/checkpoints/_entropy_/xenon/event-stream-splitter/jobid/chk-5/  

compare to removed when running Flink 1.9.1

s3a://xxx/dev/checkpoints/xenon/event-stream-splitter/jobid/chk-5/  

 

Add some logging to getEntropyFs, observe it return null because passed in 
parameter is not {{SafetyNetWrapperFileSystem}} but 
{{ClassLoaderFixingFileSystem}}

 

Apply patch, build release and run same job

 

 


> Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin
> ---
>
> Key: FLINK-22081
> URL: https://issues.apache.org/jira/browse/FLINK-22081
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Reporter: Chen Qin
>Assignee: Prem Santosh
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 
> 1.11.3, 1.11.4, 1.12.0, 1.12.1, 1.12.2, 1.13.0, 1.12.3
>
> Attachments: image (13).png
>
>
> Using flink 1.11.2
> I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
> checkpoints paths like 
> {{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
>  which means the entropy injection key is not being resolved. After some 
> debugging I found that in the 
> [EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
>  we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
> and if so we check if the filesysystem is of type 
> {{SafetyNetWrapperFileSystem as well as it's delegate }}but don't check for 
> {{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
>  directly in getEntorpyFs method which would be the type if S3 file system 
> dependencies are added as a plugin.
>  
> Repro steps: 

[jira] [Updated] (FLINK-22081) Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin

2021-03-31 Thread Chen Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qin updated FLINK-22081:
-
Attachment: image (13).png

> Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin
> ---
>
> Key: FLINK-22081
> URL: https://issues.apache.org/jira/browse/FLINK-22081
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Reporter: Chen Qin
>Assignee: Prem Santosh
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 
> 1.11.3, 1.11.4, 1.12.0, 1.12.1, 1.12.2, 1.13.0, 1.12.3
>
> Attachments: image (13).png
>
>
> Using flink 1.11.2
> I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
> checkpoints paths like 
> {{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
>  which means the entropy injection key is not being resolved. After some 
> debugging I found that in the 
> [EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
>  we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
> and if so we check if the filesysystem is of type 
> {{SafetyNetWrapperFileSystem as well as it's delegate }}but don't check for 
> {{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
>  directly in getEntorpyFs method which would be the type if S3 file system 
> dependencies are added as a plugin.
>  
> Repro steps: 
> Flink 1.11.2 with flink-s3-fs-hadoop as plugin and turn on entropy injection 
> key _entropy_
> observe checkpoint dir with entropy marker not removed.
> s3a://xxx/dev/checkpoints/_entropy_/xenon/event-stream-splitter/jobid/chk-5/  
> compare to removed when running Flink 1.9.1
> s3a://xxx/dev/checkpoints/xenon/event-stream-splitter/jobid/chk-5/  
>  
> Add some logging to getEntropyFs, observe it return null because passed in 
> parameter is not {{SafetyNetWrapperFileSystem}} but 
> {{ClassLoaderFixingFileSystem}}
>  
> Apply patch, build release and run same job
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22081) Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin

2021-03-31 Thread Chen Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qin updated FLINK-22081:
-
Description: 
Using flink 1.11.2

I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
checkpoints paths like 
{{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
 which means the entropy injection key is not being resolved. After some 
debugging I found that in the 
[EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
 we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
and if so we check if the filesysystem is of type {{SafetyNetWrapperFileSystem 
as well as it's delegate }}but don't check for 
{{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
 directly in getEntorpyFs method which would be the type if S3 file system 
dependencies are added as a plugin.

 

Repro steps: 

Flink 1.11.2 with flink-s3-fs-hadoop as plugin and turn on entropy injection 
key _entropy_

observe checkpoint dir with entropy marker not removed.

s3a://xxx/dev/checkpoints/_entropy_/xenon/event-stream-splitter/jobid/chk-5/  

compare to removed when running Flink 1.9.1

s3a://xxx/dev/checkpoints/xenon/event-stream-splitter/jobid/chk-5/  

 

Add some logging to getEntropyFs, observe it return null because passed in 
parameter is not {{SafetyNetWrapperFileSystem}} but 
{{ClassLoaderFixingFileSystem}}

 

Apply patch, build release and run same job

 

 

  was:
Using flink 1.11.2

I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
checkpoints paths like 
{{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
 which means the entropy injection key is not being resolved. After some 
debugging I found that in the 
[EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
 we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
and if so we check if the filesysystem is of type {{SafetyNetWrapperFileSystem 
as well as it's delegate }}but don't check for 
{{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
 directly in getEntorpyFs method which would be the type if S3 file system 
dependencies are added as a plugin.

 

Current behavior when using hadoop-s3

s3a://xxx/dev/checkpoints/_entropy_/xenon/event-stream-splitter/6da165c7b3c8422125abbfdb97ca9c04/chk-5/
   

 


> Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin
> ---
>
> Key: FLINK-22081
> URL: https://issues.apache.org/jira/browse/FLINK-22081
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Reporter: Chen Qin
>Assignee: Prem Santosh
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 
> 1.11.3, 1.11.4, 1.12.0, 1.12.1, 1.12.2, 1.13.0, 1.12.3
>
> Attachments: image (13).png
>
>
> Using flink 1.11.2
> I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
> checkpoints paths like 
> {{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
>  which means the entropy injection key is not being resolved. After some 
> debugging I found that in the 
> [EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
>  we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
> and if so we check if the filesysystem is of type 
> {{SafetyNetWrapperFileSystem as well as it's delegate }}but don't check for 
> {{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
>  directly in getEntorpyFs method which would be the type if S3 file system 
> dependencies are added as a plugin.
>  
> Repro steps: 
> Flink 1.11.2 with flink-s3-fs-hadoop as plugin and turn on entropy injection 
> key _entropy_
> observe checkpoint dir with entropy marker not removed.
> s3a://xxx/dev/checkpoints/_entropy_/xenon/event-stream-splitter/jobid/chk-5/  
> compare to removed when running Flink 1.9.1
> s3a://xxx/dev/checkpoints/xenon/event-stream-splitter/jobid/chk-5/  
>  
> Add some logging to getEntropyFs, observe it return null because passed in 
> parameter is not 

[GitHub] [flink] flinkbot edited a comment on pull request #15458: [FLINK-22064][table] Don't submit statement set when no insert is added in the sql client

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15458:
URL: https://github.com/apache/flink/pull/15458#issuecomment-811149954


   
   ## CI report:
   
   * e97c9b4bc77bb3842f3fb72396dcee2acac9fb5d Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15918)
 
   * 8dcf34c88cd35a8e7d1e044a1adf9c6e46b16bd3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-22081) Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin

2021-03-31 Thread Chen Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qin updated FLINK-22081:
-
Description: 
Using flink 1.11.2

I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
checkpoints paths like 
{{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
 which means the entropy injection key is not being resolved. After some 
debugging I found that in the 
[EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
 we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
and if so we check if the filesysystem is of type {{SafetyNetWrapperFileSystem 
as well as it's delegate }}but don't check for 
{{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
 directly in getEntorpyFs method which would be the type if S3 file system 
dependencies are added as a plugin.

 

Current behavior when using hadoop-s3

s3a://xxx/dev/checkpoints/_entropy_/xenon/event-stream-splitter/6da165c7b3c8422125abbfdb97ca9c04/chk-5/
   

 

  was:
Using flink 1.11.2

I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
checkpoints paths like 
{{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
 which means the entropy injection key is not being resolved. After some 
debugging I found that in the 
[EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
 we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
and if so we check if the filesysystem is of type {{SafetyNetWrapperFileSystem 
as well as it's delegate }}but don't check for 
{{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
 directly in getEntorpyFs method which would be the type if S3 file system 
dependencies are added as a plugin.


> Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin
> ---
>
> Key: FLINK-22081
> URL: https://issues.apache.org/jira/browse/FLINK-22081
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Reporter: Chen Qin
>Assignee: Prem Santosh
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 
> 1.11.3, 1.11.4, 1.12.0, 1.12.1, 1.12.2, 1.13.0, 1.12.3
>
>
> Using flink 1.11.2
> I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
> checkpoints paths like 
> {{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
>  which means the entropy injection key is not being resolved. After some 
> debugging I found that in the 
> [EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
>  we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
> and if so we check if the filesysystem is of type 
> {{SafetyNetWrapperFileSystem as well as it's delegate }}but don't check for 
> {{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
>  directly in getEntorpyFs method which would be the type if S3 file system 
> dependencies are added as a plugin.
>  
> Current behavior when using hadoop-s3
> s3a://xxx/dev/checkpoints/_entropy_/xenon/event-stream-splitter/6da165c7b3c8422125abbfdb97ca9c04/chk-5/
>    
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15442: [FLINK-22081][flink-core] handle entropy injection metadata path in pluggable HadoopS3FileSystem

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15442:
URL: https://github.com/apache/flink/pull/15442#issuecomment-810703449


   
   ## CI report:
   
   * 35f4bea03e4bfd96ca125d9980bc3fcdced52d79 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15850)
 
   * f7bb6c41ecbeee4c64486a29f73c4cf69196bd1d UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15407: [FLINK-21942][coordination] Remove job from LeaderIdService when disconnecting JobManager with globally terminal state

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15407:
URL: https://github.com/apache/flink/pull/15407#issuecomment-809165650


   
   ## CI report:
   
   * 5611a9d38557ba108df1e64b9f306f5cf98268d5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15848)
 
   * 2107b03de95d36a1d547ff5ac438b18ea07a328a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] chenqin commented on pull request #15442: [FLINK-22081][flink-core] handle entropy injection metadata path in pluggable HadoopS3FileSystem

2021-03-31 Thread GitBox


chenqin commented on pull request #15442:
URL: https://github.com/apache/flink/pull/15442#issuecomment-811641843


   > I just noticed that you recycled a rather old ticket. Could you please 
create a new one that I can assign to you?
   
   FLINK-22081


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-22081) Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin

2021-03-31 Thread Chen Qin (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qin updated FLINK-22081:
-
Fix Version/s: 1.10.2
   1.10.3
   1.11.1
   1.11.2
   1.11.3
   1.12.0
   1.12.1
   1.12.2
   1.12.3
   1.13.0
   1.11.4
   1.10.4
  Description: 
Using flink 1.11.2

I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
checkpoints paths like 
{{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
 which means the entropy injection key is not being resolved. After some 
debugging I found that in the 
[EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
 we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
and if so we check if the filesysystem is of type {{SafetyNetWrapperFileSystem 
as well as it's delegate }}but don't check for 
{{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
 directly in getEntorpyFs method which would be the type if S3 file system 
dependencies are added as a plugin.

  was:
Using flink 1.10

I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
checkpoints paths like 
{{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
 which means the entropy injection key is not being resolved. After some 
debugging I found that in the 
[EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
 we check if the given fileSystem is of type {{SafetyNetWrapperFileSystem}} and 
if so we check if the delegate is of type {{EntropyInjectingFileSystem}} but 
don't check for {{[ClassLoaderFixingFileSystem 
|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
 which would be the type if S3 file system dependencies are added as a plugin.

 Priority: Minor  (was: Major)

> Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin
> ---
>
> Key: FLINK-22081
> URL: https://issues.apache.org/jira/browse/FLINK-22081
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystems
>Reporter: Chen Qin
>Assignee: Prem Santosh
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 
> 1.11.3, 1.11.4, 1.12.0, 1.12.1, 1.12.2, 1.13.0, 1.12.3
>
>
> Using flink 1.11.2
> I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
> checkpoints paths like 
> {{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
>  which means the entropy injection key is not being resolved. After some 
> debugging I found that in the 
> [EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
>  we check if the given fileSystem is of type {{ClassLoaderFixingFileSystem}} 
> and if so we check if the filesysystem is of type 
> {{SafetyNetWrapperFileSystem as well as it's delegate }}but don't check for 
> {{[ClassLoaderFixingFileSystem|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
>  directly in getEntorpyFs method which would be the type if S3 file system 
> dependencies are added as a plugin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15411: [FLINK-22055][runtime] Fix RPCEndpoint MainThreadExecutor scheduling command with wrong time unit.

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15411:
URL: https://github.com/apache/flink/pull/15411#issuecomment-809191175


   
   ## CI report:
   
   * 386a106bff7eda5396dc95f088c7121bbb55a73f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15828)
 
   * 9ae0c40165b92bfa89bfb3b776f60fe113f4922c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15940)
 
   * e8c27398f3222803af1968b5063c33fd9a12a7d4 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15949)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-22081) Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin

2021-03-31 Thread Chen Qin (Jira)
Chen Qin created FLINK-22081:


 Summary: Entropy key not resolved if flink-s3-fs-hadoop is added 
as a plugin
 Key: FLINK-22081
 URL: https://issues.apache.org/jira/browse/FLINK-22081
 Project: Flink
  Issue Type: Bug
  Components: FileSystems
Reporter: Chen Qin
Assignee: Prem Santosh
 Fix For: 1.10.1, 1.11.0


Using flink 1.10

I added the flink-s3-fs-hadoop jar in plugins dir but I am seeing the 
checkpoints paths like 
{{s3://my_app/__ENTROPY__/app_name-staging/flink/checkpoints/e10f47968ae74934bd833108d2272419/chk-3071}}
 which means the entropy injection key is not being resolved. After some 
debugging I found that in the 
[EntropyInjector|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/EntropyInjector.java#L97]
 we check if the given fileSystem is of type {{SafetyNetWrapperFileSystem}} and 
if so we check if the delegate is of type {{EntropyInjectingFileSystem}} but 
don't check for {{[ClassLoaderFixingFileSystem 
|https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/core/fs/PluginFileSystemFactory.java#L65]}}
 which would be the type if S3 file system dependencies are added as a plugin.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wangyang0918 commented on a change in pull request #15407: [FLINK-21942][coordination] Remove job from LeaderIdService when disconnecting JobManager with globally terminal state

2021-03-31 Thread GitBox


wangyang0918 commented on a change in pull request #15407:
URL: https://github.com/apache/flink/pull/15407#discussion_r605371557



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java
##
@@ -509,6 +509,11 @@ public void disconnectTaskManager(final ResourceID 
resourceId, final Exception c
 @Override
 public void disconnectJobManager(
 final JobID jobId, JobStatus jobStatus, final Exception cause) {
+
+if (jobStatus.isGloballyTerminalState()) {

Review comment:
   Yes. The only difference is we also check the existence in 
`jobManagerRegistrations` when `jobStatus` is globally terminal. And it is 
reasonable doing that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-22080) Refactor SqlClient for better testing

2021-03-31 Thread Jark Wu (Jira)
Jark Wu created FLINK-22080:
---

 Summary: Refactor SqlClient for better testing
 Key: FLINK-22080
 URL: https://issues.apache.org/jira/browse/FLINK-22080
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Jark Wu
 Fix For: 1.13.0


Currently, we added a JUnit Rule {{TerminalStreamsResource}} to replace 
{{System.in}} and {{System.out}} stream to get the output of SqlClient. 
However, this is not safe, especially used by multiple tests. 

We should refactor {{SqlClient}} to expose a new testing purpose {{main}} 
method which can pass in custom {{InputStream}} and {{OutputStream}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong commented on a change in pull request #15458: [FLINK-22064][table] Don't submit statement set when no insert is added in the sql client

2021-03-31 Thread GitBox


wuchong commented on a change in pull request #15458:
URL: https://github.com/apache/flink/pull/15458#discussion_r605364874



##
File path: flink-table/flink-sql-client/src/test/resources/sql/statement_set.q
##
@@ -152,3 +152,10 @@ SELECT * FROM BatchTable;
 +-+--+
 Received a total of 7 rows
 !ok
+
+BEGIN STATEMENT SET;
+[INFO] Begin a statement set.
+!info
+END;

Review comment:
   nit: would be better to add an empty line before `END;`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15437: [FLINK-20320][sql-client] support init sql file in sql client

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15437:
URL: https://github.com/apache/flink/pull/15437#issuecomment-810431644


   
   ## CI report:
   
   * 71c1edd98caa64cc1fe447f878f8cbc1321eb40b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15913)
 
   * 3ff81e4f6020dae58a29197f167fb67da014535e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15948)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15411: [FLINK-22055][runtime] Fix RPCEndpoint MainThreadExecutor scheduling command with wrong time unit.

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15411:
URL: https://github.com/apache/flink/pull/15411#issuecomment-809191175


   
   ## CI report:
   
   * 386a106bff7eda5396dc95f088c7121bbb55a73f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15828)
 
   * 9ae0c40165b92bfa89bfb3b776f60fe113f4922c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15940)
 
   * e8c27398f3222803af1968b5063c33fd9a12a7d4 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-22077) Wrong way to calculate cross-region ConsumedPartitionGroups in PipelinedRegionSchedulingStrategy

2021-03-31 Thread Zhilong Hong (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhilong Hong updated FLINK-22077:
-
Description: 
h3. Introduction

We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
{{PipelinedRegionSchedulingStrategy}} in FLINK-21330, it slows down the 
procedure of {{onExecutionStateChange}}, makes the complexity increase from 
O(N) to O(N^2). Also the semantic of cross-region is totally wrong.
h3. Details

In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we need 
to schedule all region with no external blocking edges, i.e., source regions. 
To decrease the complexity, we choose to schedule all the regions that has no 
external BLOCKING ConsumedPartitionGroups.

However, for the case illustrated in FLINK-22017, the region 2 has a 
ConsumedPartitionGroup, which has both internal and external blocking 
IntermediateResultPartitions. If we choose one to represent the entire 
ConsumedPartitionGroup, it may choose the internal one, and make the entire 
group internal. Region 2 will be scheduled.

As Region 1 is not finished, Region 2 cannot transition to running. A deadlock 
may happen if resource is not enough for both two regions.

To make it right, we introduced cross-region ConsumedPartitionGroups in 
FLINK-21330. The ConsumedPartitionGroups with both internal and external 
blocking IntermediateResultPartitions will be recorded. When we call 
{{startScheduling}}, these ConsumedPartitionGroups will be treated as external 
ones, and region 2 will not be scheduled.

But we have to admit that the implementation of cross-region is wrong. The 
ConsumedPartitionGroups that has multiple producer regions will be treated as 
cross-region groups. It is not the same logic as we mentioned above. The 
semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will be treated as cross-region groups, since their 
producers are in different regions. (Each producer has its own region.) This 
slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
h3. Solution

To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
calculate the producer regions for all ConsumedPartitionGroups, and then 
iterate all the regions and its ConsumedPartitionGroups. If the 
ConsumedPartitionGroup has two or more producer regions, and the regions 
contains the current region, it is a cross-region ConsumedPartitionGroup. This 
meets the correct semantics, and makes sure ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will not be treated as cross-region ones. This fix will 
also decrease the complexity from O(N^2) to O(N).

  was:
h3. Introduction

We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
{{PipelinedRegionSchedulingStrategy}} in FLINK-21330, it slows down the 
procedure of {{onExecutionStateChange}}, makes the complexity increase from 
O(N) to O(N^2). Also the semantic of cross-region is totally wrong.
h3. Details

In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we need 
to schedule all region with no external blocking edges, i.e., source regions. 
To decrease the complexity, we choose to schedule all the regions that has no 
external BLOCKING ConsumedPartitionGroups.

However, for the case illustrated in FLINK-22017, the region 2 has a 
ConsumedPartitionGroup, which has both internal and external blocking 
IntermediateResultPartitions. If we choose one to represent the entire 
ConsumedPartitionGroup, it may choose the internal one, and make the entire 
group internal. Region 2 will be scheduled.

As Region 1 is not finished, Region 2 cannot transition to running. A deadlock 
may happen if resource is not enough for both two regions.

To make it right, we introduced cross-region ConsumedPartitionGroups in 
FLINK-21330. The ConsumedPartitionGroups with both internal and external 
blocking IntermediateResultPartitions will be recorded. When we call 
{{startScheduling}}, these ConsumedPartitionGroups will be treated as external 
ones, and region 2 will not be scheduled.

But we have to admit that the implementation of cross-region is wrong. The 
ConsumedPartitionGroups that has multiple producer regions will be treated as 
cross-region groups. It is not the same logic as we mentioned above. The 
semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will be treated as cross-region groups, since their 
producers are in different regions. (Each producer has its own region.) This 
slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
h3. Solution

To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
calculate the producer regions for all ConsumedPartitionGroups, and then 
iterate all the regions and its ConsumedPartitionGroups. If the 
ConsumedPartitionGroup has two or more producer regions, and the 

[jira] [Updated] (FLINK-22077) Wrong way to calculate cross-region ConsumedPartitionGroups in PipelinedRegionSchedulingStrategy

2021-03-31 Thread Zhilong Hong (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhilong Hong updated FLINK-22077:
-
Description: 
h3. Introduction

We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
{{PipelinedRegionSchedulingStrategy}} in FLINK-21330, it slows down the 
procedure of {{onExecutionStateChange}}, makes the complexity increase from 
O(N) to O(N^2). Also the semantic of cross-region is totally wrong.
h3. Details

In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we need 
to schedule all region with no external blocking edges, i.e., source regions. 
To decrease the complexity, we choose to schedule all the regions that has no 
external BLOCKING ConsumedPartitionGroups.

However, for the case illustrated in FLINK-22017, the region 2 has a 
ConsumedPartitionGroup, which has both internal and external blocking 
IntermediateResultPartitions. If we choose one to represent the entire 
ConsumedPartitionGroup, it may choose the internal one, and make the entire 
group internal. Region 2 will be scheduled.

As Region 1 is not finished, Region 2 cannot transition to running. A deadlock 
may happen if resource is not enough for both two regions.

To make it right, we introduced cross-region ConsumedPartitionGroups in 
FLINK-21330. The ConsumedPartitionGroups with both internal and external 
blocking IntermediateResultPartitions will be recorded. When we call 
{{startScheduling}}, these ConsumedPartitionGroups will be treated as external 
ones, and region 2 will not be scheduled.

But we have to admit that the implementation of cross-region is wrong. The 
ConsumedPartitionGroups that has multiple producer regions will be treated as 
cross-region groups. It is not the same logic as we mentioned above. The 
semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will be treated as cross-region groups, since their 
producers are in different regions. (Each producer has its own region.) This 
slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
h3. Solution

To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
calculate the producer regions for all ConsumedPartitionGroups, and then 
iterate all the regions and its ConsumedPartitionGroups. If the 
ConsumedPartitionGroup has two or more producer regions, and the regions 
contains the current region, it is a cross-region ConsumedPartitionGroup. This 
meets the correct semantics, and makes sure ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will not be treated as cross-region ones. This fix will 
also decreases the complexity from O(N^2) to O(N).

  was:
h3. Introduction

We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
{{PipelinedRegionSchedulingStrategy}} in FLINK-21330, it slows down the 
procedure of {{onExecutionStateChange}}, makes the complexity increase from 
O(N) to O(N^2). Also the semantic of cross-region is totally wrong.
h3. Details

In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we need 
to schedule all region with no external blocking edges, i.e., source regions. 
To decrease the complexity, we choose to schedule all the regions that has no 
external BLOCKING ConsumedPartitionGroups.

However, for the case illustrated in FLINK-22017, the region 2 has a 
ConsumedPartitionGroup, which has both internal and external blocking 
IntermediateResultPartitions. If we choose one to represent the entire 
ConsumedPartitionGroup, it may choose the internal one, and make the entire 
group internal. Region 2 will be scheduled.

As Region 1 is not finished, Region 2 cannot transition to running. A deadlock 
may happen if resource is not enough for both two regions.

To make it right, we introduced cross-region ConsumedPartitionGroups in 
FLINK-21330. The ConsumedPartitionGroups with both internal and external 
blocking IntermediateResultPartitions will be recorded. When we call 
{{startScheduling}}, these ConsumedPartitionGroups will be treated as external 
ones, and region 2 will not be scheduled.

But we have to admit that the implementation of cross-region is wrong. The 
ConsumedPartitionGroups that has multiple producer regions will be treated as 
cross-region groups. It is not the same logic as we mentioned above. The 
semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will be treated as cross-region groups, since their 
producers are in different regions. (Each producer has its own region.) This 
slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
h3. Solution

To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
calculate the producer regions for all ConsumedPartitionGroups, and then 
iterate all the regions and its ConsumedPartitionGroups. If the 
ConsumedPartitionGroup has two or more producer regions, and the 

[GitHub] [flink] wuchong commented on a change in pull request #15458: [FLINK-22064][table] Don't submit statement set when no insert is added in the sql client

2021-03-31 Thread GitBox


wuchong commented on a change in pull request #15458:
URL: https://github.com/apache/flink/pull/15458#discussion_r605363528



##
File path: 
flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/cli/CliStrings.java
##
@@ -212,6 +212,9 @@ private CliStrings() {
 
 public static final String MESSAGE_BEGIN_STATEMENT_SET = "Begin a 
statement set.";
 
+public static final String MESSAGE_NO_INSERT_STATEMENT_IN_STATEMENT_SET =
+"No insert statement in statement set, skip submit.";

Review comment:
   ```suggestion
   "No statement in the statement set, skip submit.";
   ```
   
   I think we don't need to mention the statement kind, we may support more DML 
statements, e.g. `MERGE INTO` in the future. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-22077) Wrong way to calculate cross-region ConsumedPartitionGroups in PipelinedRegionSchedulingStrategy

2021-03-31 Thread Zhilong Hong (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhilong Hong updated FLINK-22077:
-
Description: 
h3. Introduction

We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
{{PipelinedRegionSchedulingStrategy}} in FLINK-21330, it slows down the 
procedure of {{onExecutionStateChange}}, makes the complexity increase from 
O(N) to O(N^2). Also the semantic of cross-region is totally wrong.
h3. Details

In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we need 
to schedule all region with no external blocking edges, i.e., source regions. 
To decrease the complexity, we choose to schedule all the regions that has no 
external BLOCKING ConsumedPartitionGroups.

However, for the case illustrated in FLINK-22017, the region 2 has a 
ConsumedPartitionGroup, which has both internal and external blocking 
IntermediateResultPartitions. If we choose one to represent the entire 
ConsumedPartitionGroup, it may choose the internal one, and make the entire 
group internal. Region 2 will be scheduled.

As Region 1 is not finished, Region 2 cannot transition to running. A deadlock 
may happen if resource is not enough for both two regions.

To make it right, we introduced cross-region ConsumedPartitionGroups in 
FLINK-21330. The ConsumedPartitionGroups with both internal and external 
blocking IntermediateResultPartitions will be recorded. When we call 
{{startScheduling}}, these ConsumedPartitionGroups will be treated as external 
ones, and region 2 will not be scheduled.

But we have to admit that the implementation of cross-region is wrong. The 
ConsumedPartitionGroups that has multiple producer regions will be treated as 
cross-region groups. It is not the same logic as we mentioned above. The 
semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will be treated as cross-region groups, since their 
producers are in different regions. (Each producer has its own region.) This 
slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
h3. Solution

To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
calculate the producer regions for all ConsumedPartitionGroups, and then 
iterate all the regions and its ConsumedPartitionGroups. If the 
ConsumedPartitionGroup has two or more producer regions, and the regions 
contains the current region, it is a cross-region ConsumedPartitionGroup. This 
meets the correct semantics, and make sure ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will not be treated as cross-region ones. This fix will 
also decreases the complexity from O(N^2) to O(N).

  was:
h3. Introduction

We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
{{PipelinedRegionSchedulingStrategy}} in FLINK-21330, it slows down the 
procedure of {{onExecutionStateChange}}, makes the complexity increase from 
O(N) to O(N^2). Also the semantic of cross-region is totally wrong.
h3. Details

In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we need 
to schedule all region with no external blocking edges, i.e., source regions. 
To decrease the complexity, we choose to schedule all the regions that has no 
external BLOCKING ConsumedPartitionGroups.

However, for the case illustrated in FLINK-22017, the region 2 has a 
ConsumedPartitionGroup, which has both internal and external blocking 
IntermediateResultPartitions. If we choose one to represent the entire 
ConsumedPartitionGroup, it may choose the internal one, and make the entire 
group internal. Region 2 will be scheduled.

As Region 1 is not finished, Region 2 cannot transition to running. A deadlock 
may happen if resource is not enough for both two regions.

To make it right, we introduced cross-region ConsumedPartitionGroups in 
FLINK-21330. The ConsumedPartitionGroups with both internal and external 
blocking IntermediateResultPartitions will be recorded. When we call 
{{startScheduling}}, these ConsumedPartitionGroups will be treated as external 
ones, and region 2 will not be scheduled.

But we have to admit that the implementation of cross-region is wrong. The 
ConsumedPartitionGroups that has multiple producer regions will be treated as 
cross-region groups. It is not the same logic as we mentioned above. The 
semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will be treated as cross-region groups, since their 
producers are in different regions. (Each producer has its own region.) This 
slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
h3. Solution

To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
calculate the producer regions for all ConsumedPartitionGroups, and then 
iterate all the regions and its ConsumedPartitionGroups. If the 
ConsumedPartitionGroup has two or more producer regions, and the 

[jira] [Issue Comment Deleted] (FLINK-22077) Wrong way to calculate cross-region ConsumedPartitionGroups in PipelinedRegionSchedulingStrategy

2021-03-31 Thread Zhilong Hong (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhilong Hong updated FLINK-22077:
-
Comment: was deleted

(was: cc [~trohrmann] I personally think it's necessary to add this bug-fix to 
release 1.13.
 )

> Wrong way to calculate cross-region ConsumedPartitionGroups in 
> PipelinedRegionSchedulingStrategy
> 
>
> Key: FLINK-22077
> URL: https://issues.apache.org/jira/browse/FLINK-22077
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.13.0
>Reporter: Zhilong Hong
>Priority: Critical
> Fix For: 1.13.0
>
>
> h3. Introduction
> We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
> {{PipelinedRegionSchedulingStrategy}} in FLINK-21330, it slows down the 
> procedure of {{onExecutionStateChange}}, makes the complexity increase from 
> O(N) to O(N^2). Also the semantic of cross-region is totally wrong.
> h3. Details
> In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we 
> need to schedule all region with no external blocking edges, i.e., source 
> regions. To decrease the complexity, we choose to schedule all the regions 
> that has no external BLOCKING ConsumedPartitionGroups.
> However, for the case illustrated in FLINK-22017, the region 2 has a 
> ConsumedPartitionGroup, which has both internal and external blocking 
> IntermediateResultPartitions. If we choose one to represent the entire 
> ConsumedPartitionGroup, it may choose the internal one, and make the entire 
> group internal. Region 2 will be scheduled.
> As Region 1 is not finished, Region 2 cannot transition to running. A 
> deadlock may happen if resource is not enough for both two regions.
> To make it right, we introduced cross-region ConsumedPartitionGroups in 
> FLINK-21330. The ConsumedPartitionGroups with both internal and external 
> blocking IntermediateResultPartitions will be recorded. When we call 
> {{startScheduling}}, these ConsumedPartitionGroups will be treated as 
> external ones, and region 2 will not be scheduled.
> But we have to admit that the implementation of cross-region is wrong. The 
> ConsumedPartitionGroups that has multiple producer regions will be treated as 
> cross-region groups. It is not the same logic as we mentioned above. The 
> semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
> ConsumedPartitionGroups will be treated as cross-region groups, since their 
> producers are in different regions. (Each producer has its own region.) This 
> slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
> h3. Solution
> To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
> calculate the producer regions for all ConsumedPartitionGroups, and then 
> iterate all the regions and its ConsumedPartitionGroups. If the 
> ConsumedPartitionGroup has two or more producer regions, and the regions 
> contains current region, it is a cross-region ConsumedPartitionGroup. This 
> meets the correct semantics, and make sure ALL-TO-ALL BLOCKING 
> ConsumedPartitionGroups will not be treated as cross-region one. This fix 
> will also decreases the complexity from O(N^2) to O(N).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22079) ReactiveModeITCase stack with fine grained resource manager.

2021-03-31 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312850#comment-17312850
 ] 

Xintong Song commented on FLINK-22079:
--

cc [~rmetzger]

> ReactiveModeITCase stack with fine grained resource manager.
> 
>
> Key: FLINK-22079
> URL: https://issues.apache.org/jira/browse/FLINK-22079
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.13.0
>Reporter: Xintong Song
>Priority: Major
>  Labels: test-stability
>
> CI stage for fine grained resource management hangs. According to the maven 
> logs, it is {{ReactiveModeITCase}} that never finish.
> [https://dev.azure.com/sewen0794/Flink/_build/results?buildId=249=logs=cc649950-03e9-5fae-8326-2f1ad744b536=51cab6ca-669f-5dc0-221d-1e4f7dc4fc85=9962]
> Not sure if this is related to {{FineGrainedSlotManager}} or not. Although we 
> say reactive mode does not support fine grained resource management, the 
> {{FineGrainedSlotManager}} itself is expected to work with both fine-grained 
> and coarse-grained resource requirements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22078) Introduce a Printer interface to rederict the output to the desired stream

2021-03-31 Thread Shengkai Fang (Jira)
Shengkai Fang created FLINK-22078:
-

 Summary: Introduce a Printer interface to rederict the output to 
the desired stream
 Key: FLINK-22078
 URL: https://issues.apache.org/jira/browse/FLINK-22078
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Client
Reporter: Shengkai Fang


The reason why we introduce this is the {{Terminal}} is not allow to modify the 
output stream after the {{Terminal}} is built. 

Therefore, we can introduce a Printer to control the behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-22079) ReactiveModeITCase stack with fine grained resource manager.

2021-03-31 Thread Xintong Song (Jira)
Xintong Song created FLINK-22079:


 Summary: ReactiveModeITCase stack with fine grained resource 
manager.
 Key: FLINK-22079
 URL: https://issues.apache.org/jira/browse/FLINK-22079
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Xintong Song


CI stage for fine grained resource management hangs. According to the maven 
logs, it is {{ReactiveModeITCase}} that never finish.

[https://dev.azure.com/sewen0794/Flink/_build/results?buildId=249=logs=cc649950-03e9-5fae-8326-2f1ad744b536=51cab6ca-669f-5dc0-221d-1e4f7dc4fc85=9962]

Not sure if this is related to {{FineGrainedSlotManager}} or not. Although we 
say reactive mode does not support fine grained resource management, the 
{{FineGrainedSlotManager}} itself is expected to work with both fine-grained 
and coarse-grained resource requirements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22077) Wrong way to calculate cross-region ConsumedPartitionGroups in PipelinedRegionSchedulingStrategy

2021-03-31 Thread Zhilong Hong (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312849#comment-17312849
 ] 

Zhilong Hong commented on FLINK-22077:
--

cc [~trohrmann] I personally think it's necessary to add this bug-fix to 
release 1.13.
 

> Wrong way to calculate cross-region ConsumedPartitionGroups in 
> PipelinedRegionSchedulingStrategy
> 
>
> Key: FLINK-22077
> URL: https://issues.apache.org/jira/browse/FLINK-22077
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.13.0
>Reporter: Zhilong Hong
>Priority: Critical
> Fix For: 1.13.0
>
>
> h3. Introduction
> We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
> {{PipelinedRegionSchedulingStrategy}} in FLINK-21330, it slows down the 
> procedure of {{onExecutionStateChange}}, makes the complexity increase from 
> O(N) to O(N^2). Also the semantic of cross-region is totally wrong.
> h3. Details
> In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we 
> need to schedule all region with no external blocking edges, i.e., source 
> regions. To decrease the complexity, we choose to schedule all the regions 
> that has no external BLOCKING ConsumedPartitionGroups.
> However, for the case illustrated in FLINK-22017, the region 2 has a 
> ConsumedPartitionGroup, which has both internal and external blocking 
> IntermediateResultPartitions. If we choose one to represent the entire 
> ConsumedPartitionGroup, it may choose the internal one, and make the entire 
> group internal. Region 2 will be scheduled.
> As Region 1 is not finished, Region 2 cannot transition to running. A 
> deadlock may happen if resource is not enough for both two regions.
> To make it right, we introduced cross-region ConsumedPartitionGroups in 
> FLINK-21330. The ConsumedPartitionGroups with both internal and external 
> blocking IntermediateResultPartitions will be recorded. When we call 
> {{startScheduling}}, these ConsumedPartitionGroups will be treated as 
> external ones, and region 2 will not be scheduled.
> But we have to admit that the implementation of cross-region is wrong. The 
> ConsumedPartitionGroups that has multiple producer regions will be treated as 
> cross-region groups. It is not the same logic as we mentioned above. The 
> semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
> ConsumedPartitionGroups will be treated as cross-region groups, since their 
> producers are in different regions. (Each producer has its own region.) This 
> slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
> h3. Solution
> To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
> calculate the producer regions for all ConsumedPartitionGroups, and then 
> iterate all the regions and its ConsumedPartitionGroups. If the 
> ConsumedPartitionGroup has two or more producer regions, and the regions 
> contains current region, it is a cross-region ConsumedPartitionGroup. This 
> meets the correct semantics, and make sure ALL-TO-ALL BLOCKING 
> ConsumedPartitionGroups will not be treated as cross-region one. This fix 
> will also decreases the complexity from O(N^2) to O(N).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22077) Wrong way to calculate cross-region ConsumedPartitionGroups in PipelinedRegionSchedulingStrategy

2021-03-31 Thread Zhilong Hong (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhilong Hong updated FLINK-22077:
-
Description: 
h3. Introduction

We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
{{PipelinedRegionSchedulingStrategy}} in FLINK-21330, it slows down the 
procedure of {{onExecutionStateChange}}, makes the complexity increase from 
O(N) to O(N^2). Also the semantic of cross-region is totally wrong.
h3. Details

In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we need 
to schedule all region with no external blocking edges, i.e., source regions. 
To decrease the complexity, we choose to schedule all the regions that has no 
external BLOCKING ConsumedPartitionGroups.

However, for the case illustrated in FLINK-22017, the region 2 has a 
ConsumedPartitionGroup, which has both internal and external blocking 
IntermediateResultPartitions. If we choose one to represent the entire 
ConsumedPartitionGroup, it may choose the internal one, and make the entire 
group internal. Region 2 will be scheduled.

As Region 1 is not finished, Region 2 cannot transition to running. A deadlock 
may happen if resource is not enough for both two regions.

To make it right, we introduced cross-region ConsumedPartitionGroups in 
FLINK-21330. The ConsumedPartitionGroups with both internal and external 
blocking IntermediateResultPartitions will be recorded. When we call 
{{startScheduling}}, these ConsumedPartitionGroups will be treated as external 
ones, and region 2 will not be scheduled.

But we have to admit that the implementation of cross-region is wrong. The 
ConsumedPartitionGroups that has multiple producer regions will be treated as 
cross-region groups. It is not the same logic as we mentioned above. The 
semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will be treated as cross-region groups, since their 
producers are in different regions. (Each producer has its own region.) This 
slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
h3. Solution

To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
calculate the producer regions for all ConsumedPartitionGroups, and then 
iterate all the regions and its ConsumedPartitionGroups. If the 
ConsumedPartitionGroup has two or more producer regions, and the regions 
contains current region, it is a cross-region ConsumedPartitionGroup. This 
meets the correct semantics, and make sure ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will not be treated as cross-region one. This fix will 
also decreases the complexity from O(N^2) to O(N).

  was:
h3. Introduction

We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
{{PipelinedRegionSchedulingStrategy}} in FLINK-21330, it slows down the 
procedure of {{onExecutionStateChange}}, makes the complexity from O(N) to 
O(N^2). Also the semantic of cross-region is totally wrong.
h3. Details

In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we need 
to schedule all region with no external blocking edges, i.e., source regions. 
To decrease the complexity, we choose to schedule all the regions that has no 
external BLOCKING ConsumedPartitionGroups.

However, for the case illustrated in FLINK-22017, the region 2 has a 
ConsumedPartitionGroup, which has both internal and external blocking 
IntermediateResultPartitions. If we choose one to represent the entire 
ConsumedPartitionGroup, it may choose the internal one, and make the entire 
group internal. Region 2 will be scheduled.

As Region 1 is not finished, Region 2 cannot transition to running. A deadlock 
may happen if resource is not enough for both two regions.

To make it right, we introduced cross-region ConsumedPartitionGroups in 
FLINK-21330. The regions which has ConsumedPartitionGroups with both internal 
and external blocking IntermediateResultPartitions will be recorded. When we 
call {{startScheduling}}, these ConsumedPartitionGroups will be treated as 
external, and region 2 will not be scheduled.

But we have to admit that the implementation of cross-region is wrong. The 
ConsumedPartitionGroups that has multiple producer regions will be treated as 
cross-region groups. It is not the same logic as we mentioned above. The 
semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will be treated as cross-region groups, since their 
producers are in different regions. (Each producer has its own region.) This 
slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
h3. Solution

To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
calculate the producer regions for all ConsumedPartitionGroups, and then 
iterate all the regions and its ConsumedPartitionGroups. If the 
ConsumedPartitionGroup has two or more producer regions, and the 

[jira] [Updated] (FLINK-22077) Wrong way to calculate cross-region ConsumedPartitionGroups in PipelinedRegionSchedulingStrategy

2021-03-31 Thread Zhilong Hong (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhilong Hong updated FLINK-22077:
-
Description: 
h3. Introduction

We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
{{PipelinedRegionSchedulingStrategy}} in FLINK-21330, it slows down the 
procedure of {{onExecutionStateChange}}, makes the complexity from O(N) to 
O(N^2). Also the semantic of cross-region is totally wrong.
h3. Details

In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we need 
to schedule all region with no external blocking edges, i.e., source regions. 
To decrease the complexity, we choose to schedule all the regions that has no 
external BLOCKING ConsumedPartitionGroups.

However, for the case illustrated in FLINK-22017, the region 2 has a 
ConsumedPartitionGroup, which has both internal and external blocking 
IntermediateResultPartitions. If we choose one to represent the entire 
ConsumedPartitionGroup, it may choose the internal one, and make the entire 
group internal. Region 2 will be scheduled.

As Region 1 is not finished, Region 2 cannot transition to running. A deadlock 
may happen if resource is not enough for both two regions.

To make it right, we introduced cross-region ConsumedPartitionGroups in 
FLINK-21330. The regions which has ConsumedPartitionGroups with both internal 
and external blocking IntermediateResultPartitions will be recorded. When we 
call {{startScheduling}}, these ConsumedPartitionGroups will be treated as 
external, and region 2 will not be scheduled.

But we have to admit that the implementation of cross-region is wrong. The 
ConsumedPartitionGroups that has multiple producer regions will be treated as 
cross-region groups. It is not the same logic as we mentioned above. The 
semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will be treated as cross-region groups, since their 
producers are in different regions. (Each producer has its own region.) This 
slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
h3. Solution

To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
calculate the producer regions for all ConsumedPartitionGroups, and then 
iterate all the regions and its ConsumedPartitionGroups. If the 
ConsumedPartitionGroup has two or more producer regions, and the regions 
contains current region, it is a cross-region ConsumedPartitionGroup. This 
meets the correct semantics, and make sure ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will not be treated as cross-region one. This fix will 
also decreases the complexity from O(N) to O(N^2). I prefer it's necessary to 
add this bug-fix to release 1.13.

  was:
h3. Introduction

We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
{{PipelinedRegionSchedulingStrategy}}, it slows down the procedure of 
{{onExecutionStateChange}}, make the complexity from O(N) to O(N^2). Also the 
semantic of cross-region is totally wrong.
h3. Details

In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we need 
to schedule all region with no external blocking edges, i.e., source regions. 
To decrease the complexity, we choose to schedule all the regions that has no 
external BLOCKING ConsumedPartitionGroups.

However, for the case illustrated in FLINK-22017, the region 2 has a 
ConsumedPartitionGroup, which has both internal and external blocking 
IntermediateResultPartitions. If we choose one to represent the entire 
ConsumedPartitionGroup, it may choose the internal one, and make the entire 
group internal. Region 2 will be scheduled.

As Region 1 is not finished, Region 2 cannot transition to running. A deadlock 
may happen if resource is not enough for both two regions.

To make it right, we introduced cross-region ConsumedPartitionGroups in 
FLINK-21330. The regions which has ConsumedPartitionGroups with both internal 
and external blocking IntermediateResultPartitions will be recorded. When we 
call {{startScheduling}}, these ConsumedPartitionGroups will be treated as 
external, and region 2 will not be scheduled.

But we have to admit that the implementation of cross-region is wrong. The 
ConsumedPartitionGroups that has multiple producer regions will be treated as 
cross-region groups. It is not the same logic as we mentioned above. The 
semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will be treated as cross-region groups, since their 
producers are in different regions. (Each producer has its own region.) This 
slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
h3. Solution

To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
calculate the producer regions for all ConsumedPartitionGroups, and then 
iterate all the regions and its ConsumedPartitionGroups. If the 

[GitHub] [flink] flinkbot edited a comment on pull request #15437: [FLINK-20320][sql-client] support init sql file in sql client

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15437:
URL: https://github.com/apache/flink/pull/15437#issuecomment-810431644


   
   ## CI report:
   
   * 71c1edd98caa64cc1fe447f878f8cbc1321eb40b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15913)
 
   * 3ff81e4f6020dae58a29197f167fb67da014535e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15280: [FLINK-21714][table-api] Use TIMESTAMP_LTZ as return type for function PROCTIME()

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15280:
URL: https://github.com/apache/flink/pull/15280#issuecomment-802654466


   
   ## CI report:
   
   * 3b4b5fcd9d8108b51e0bf62a0bee888b4fdb5186 UNKNOWN
   * f80daf3fbf6f5f5ce1d39813fbecf9866eb50efb Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15895)
 
   * 9d51cdd516ab6606275a16c324927637ad21d46a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15947)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15119: [FLINK-21736][state] Introduce state scope latency tracking metrics

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15119:
URL: https://github.com/apache/flink/pull/15119#issuecomment-792934307


   
   ## CI report:
   
   * 0cfa410c4bfa324f1fab08032f3992833fb4ceb7 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15861)
 
   * c816af323a8d685ad8dd05f2e46e39efb0a4866a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15946)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] xintongsong commented on a change in pull request #15411: [FLINK-22055][runtime] Fix RPCEndpoint MainThreadExecutor scheduling command with wrong time unit.

2021-03-31 Thread GitBox


xintongsong commented on a change in pull request #15411:
URL: https://github.com/apache/flink/pull/15411#discussion_r605325307



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/rpc/RpcEndpointTest.java
##
@@ -262,4 +269,268 @@ protected ExtendedEndpoint(
 return CompletableFuture.completedFuture(isRunning());
 }
 }
+
+/** test run the runnable in the main thread of the underlying RPC 
endpoint. */

Review comment:
   ```suggestion
   /** Tests running the runnable in the main thread of the underlying RPC 
endpoint. */
   ```

##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/rpc/RpcEndpointTest.java
##
@@ -262,4 +269,268 @@ protected ExtendedEndpoint(
 return CompletableFuture.completedFuture(isRunning());
 }
 }
+
+/** test run the runnable in the main thread of the underlying RPC 
endpoint. */
+@Test
+public void testRunAsync() throws InterruptedException, 
ExecutionException, TimeoutException {
+final RpcEndpoint endpoint = new BaseEndpoint(rpcService);
+CompletableFuture finishedFuture = new CompletableFuture<>();
+try {
+endpoint.start();
+endpoint.getMainThreadExecutor()
+.runAsync(
+() -> {
+// no need to catch the validation failure
+// if the validation fail, the future will 
never complete
+endpoint.validateRunsInMainThread();
+finishedFuture.complete(true);
+});
+Boolean actualFinished = finishedFuture.get(TIMEOUT.getSize(), 
TIMEOUT.getUnit());
+assertTrue(actualFinished);
+} finally {
+RpcUtils.terminateRpcEndpoint(endpoint, TIMEOUT);
+}
+}
+
+/**
+ * test scheduling the runnable in the main thread of the underlying RPC 
endpoint, with a delay
+ * of the given number of milliseconds.
+ */
+@Test
+public void testScheduleRunAsyncTime()
+throws InterruptedException, ExecutionException, TimeoutException {
+final RpcEndpoint endpoint = new BaseEndpoint(rpcService);
+final long expectedDelayMs = 500L;

Review comment:
   100ms might be good enough.
   
   I was suggesting 500ms for the test case with different units because that 
case needs at least 1s for anyway.
   
   And in that case, we might want to relax the assertion range to (0.5, 1.5).

##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/rpc/RpcEndpointTest.java
##
@@ -262,4 +269,268 @@ protected ExtendedEndpoint(
 return CompletableFuture.completedFuture(isRunning());
 }
 }
+
+/** test run the runnable in the main thread of the underlying RPC 
endpoint. */
+@Test
+public void testRunAsync() throws InterruptedException, 
ExecutionException, TimeoutException {
+final RpcEndpoint endpoint = new BaseEndpoint(rpcService);
+CompletableFuture finishedFuture = new CompletableFuture<>();
+try {
+endpoint.start();
+endpoint.getMainThreadExecutor()
+.runAsync(
+() -> {
+// no need to catch the validation failure
+// if the validation fail, the future will 
never complete
+endpoint.validateRunsInMainThread();
+finishedFuture.complete(true);
+});
+Boolean actualFinished = finishedFuture.get(TIMEOUT.getSize(), 
TIMEOUT.getUnit());
+assertTrue(actualFinished);
+} finally {
+RpcUtils.terminateRpcEndpoint(endpoint, TIMEOUT);
+}
+}
+
+/**
+ * test scheduling the runnable in the main thread of the underlying RPC 
endpoint, with a delay
+ * of the given number of milliseconds.
+ */
+@Test
+public void testScheduleRunAsyncTime()

Review comment:
   nit: It's would be better to align the name of test case with the method 
name being tested.
   ```suggestion
   public void testScheduleRunAsync()
   ```

##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/rpc/RpcEndpointTest.java
##
@@ -262,4 +269,268 @@ protected ExtendedEndpoint(
 return CompletableFuture.completedFuture(isRunning());
 }
 }
+
+/** test run the runnable in the main thread of the underlying RPC 
endpoint. */
+@Test
+public void testRunAsync() throws InterruptedException, 
ExecutionException, TimeoutException {
+final RpcEndpoint endpoint = new BaseEndpoint(rpcService);
+CompletableFuture finishedFuture = new CompletableFuture<>();
+try {
+endpoint.start();
+

[jira] [Commented] (FLINK-20816) NotifyCheckpointAbortedITCase failed due to timeout

2021-03-31 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312831#comment-17312831
 ] 

Yun Tang commented on FLINK-20816:
--

>From the comparison of success and failure logs, I think the root cause is 
>that {{DeclineSink}} did not execute sync phase of snapshot of checkpoint-2. 
>However, we expect to fail the checkpoint-2 during async phase and that's why 
>the test timeout as we did not wait for the expect checkpoint failure.

If we extract all related logs of {{DeclineSink}} from failure log:
{code:java}
21:04:19,272 [ DeclineSink (1/1)#0] DEBUG 
org.apache.flink.runtime.io.network.partition.consumer.ChannelStatePersister [] 
- found barrier 2, lastSeenBarrier = 1 (COMPLETED) @ 
InputChannelInfo{gateIdx=0, inputChannelIdx=0}
21:04:19,272 [ DeclineSink (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler
 [] - DeclineSink (1/1)#0 (7457bf515844f409738c9929fffc54f7): Received barrier 
from channel InputChannelInfo{gateIdx=0, inputChannelIdx=0} @ 2.
21:04:19,272 [ DeclineSink (1/1)#0] DEBUG 
org.apache.flink.runtime.checkpoint.channel.ChannelStateWriterImpl [] - 
DeclineSink (1/1)#0 starting checkpoint 2 (CheckpointOptions {checkpointType = 
CHECKPOINT, targetLocation = (default), isExactlyOnceMode = true, 
isUnalignedCheckpoint = true, alignmentTimeout = 9223372036854775807})
21:04:19,272 [ DeclineSink (1/1)#0] DEBUG 
org.apache.flink.runtime.io.network.partition.consumer.ChannelStatePersister [] 
- startPersisting 2, lastSeenBarrier = 2 (BARRIER_RECEIVED) @ 
InputChannelInfo{gateIdx=0, inputChannelIdx=0}
21:04:19,272 [ DeclineSink (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler
 [] - DeclineSink (1/1)#0 (7457bf515844f409738c9929fffc54f7): Triggering 
checkpoint 2 on the barrier announcement at 1617138259258.
21:04:19,272 [ DeclineSink (1/1)#0] DEBUG 
org.apache.flink.streaming.runtime.tasks.StreamTask  [] - Starting 
checkpoint (2) CHECKPOINT on task DeclineSink (1/1)#0
21:04:19,272 [ DeclineSink (1/1)#0] DEBUG 
org.apache.flink.runtime.checkpoint.channel.ChannelStateWriterImpl [] - 
DeclineSink (1/1)#0 finishing output data, checkpoint 2
21:04:19,272 [ DeclineSink (1/1)#0] DEBUG 
org.apache.flink.runtime.checkpoint.channel.ChannelStateWriterImpl [] - 
DeclineSink (1/1)#0 requested write result, checkpoint 2
21:04:19,273 [Channel state writer DeclineSink (1/1)#0] DEBUG 
org.apache.flink.runtime.checkpoint.channel.ChannelStateCheckpointWriter [] - 
complete output, input completed: false
21:05:58,298 [flink-akka.actor.default-dispatcher-12] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DeclineSink 
(1/1) (7457bf515844f409738c9929fffc54f7) switched from RUNNING to CANCELING.
{code}

Since I am not familiar with unaligned checkpoint, I think {{Channel state 
writer DeclineSink}} should wait for input completed as true and then [code 
below|https://github.com/apache/flink/blob/04bbf03a0cdb2f455c1b06569dea95ace6fa7e7c/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/SubtaskCheckpointCoordinatorImpl.java#L305-L317]
 could execute:

{code:java}
// Step (4): Take the state snapshot. This should be largely 
asynchronous, to not impact
// progress of the
// streaming topology

Map snapshotFutures =
new HashMap<>(operatorChain.getNumberOfOperators());
try {
if (takeSnapshotSync(
snapshotFutures, metadata, metrics, options, operatorChain, 
isRunning)) {
finishAndReportAsync(snapshotFutures, metadata, metrics, 
isRunning);
} else {
cleanup(snapshotFutures, metadata, metrics, new 
Exception("Checkpoint declined"));
}
} catch (Exception ex) {
cleanup(snapshotFutures, metadata, metrics, ex);
throw ex;
}
{code}




> NotifyCheckpointAbortedITCase failed due to timeout
> ---
>
> Key: FLINK-20816
> URL: https://issues.apache.org/jira/browse/FLINK-20816
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.2, 1.13.0
>Reporter: Matthias
>Assignee: Matthias
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.13.0
>
> Attachments: flink-20816-failure.log, flink-20816-success.log
>
>
> [This 
> build|https://dev.azure.com/mapohl/flink/_build/results?buildId=152=logs=0a15d512-44ac-5ba5-97ab-13a5d066c22c=634cd701-c189-5dff-24cb-606ed884db87=4245]
>  failed caused by a failing of {{NotifyCheckpointAbortedITCase}} due to a 
> timeout.
> {code}
> 2020-12-29T21:48:40.9430511Z [INFO] Running 
> org.apache.flink.test.checkpointing.NotifyCheckpointAbortedITCase
> 

[jira] [Created] (FLINK-22077) Wrong way to calculate cross-region ConsumedPartitionGroups in PipelinedRegionSchedulingStrategy

2021-03-31 Thread Zhilong Hong (Jira)
Zhilong Hong created FLINK-22077:


 Summary: Wrong way to calculate cross-region 
ConsumedPartitionGroups in PipelinedRegionSchedulingStrategy
 Key: FLINK-22077
 URL: https://issues.apache.org/jira/browse/FLINK-22077
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.13.0
Reporter: Zhilong Hong
 Fix For: 1.13.0


h3. Introduction

We implement a wrong way to calculate cross-region ConsumedPartitionGroups in 
{{PipelinedRegionSchedulingStrategy}}, it slows down the procedure of 
{{onExecutionStateChange}}, make the complexity from O(N) to O(N^2). Also the 
semantic of cross-region is totally wrong.
h3. Details

In {{PipelinedRegionSchedulingStrategy#startScheduling}}, as expected, we need 
to schedule all region with no external blocking edges, i.e., source regions. 
To decrease the complexity, we choose to schedule all the regions that has no 
external BLOCKING ConsumedPartitionGroups.

However, for the case illustrated in FLINK-22017, the region 2 has a 
ConsumedPartitionGroup, which has both internal and external blocking 
IntermediateResultPartitions. If we choose one to represent the entire 
ConsumedPartitionGroup, it may choose the internal one, and make the entire 
group internal. Region 2 will be scheduled.

As Region 1 is not finished, Region 2 cannot transition to running. A deadlock 
may happen if resource is not enough for both two regions.

To make it right, we introduced cross-region ConsumedPartitionGroups in 
FLINK-21330. The regions which has ConsumedPartitionGroups with both internal 
and external blocking IntermediateResultPartitions will be recorded. When we 
call {{startScheduling}}, these ConsumedPartitionGroups will be treated as 
external, and region 2 will not be scheduled.

But we have to admit that the implementation of cross-region is wrong. The 
ConsumedPartitionGroups that has multiple producer regions will be treated as 
cross-region groups. It is not the same logic as we mentioned above. The 
semantic is totally wrong. Also all the ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will be treated as cross-region groups, since their 
producers are in different regions. (Each producer has its own region.) This 
slows down the complexity from O(N) to O(N^2) for ALL-TO-ALL BLOCKING edges.
h3. Solution

To correctly calculate the cross-region ConsumedPartitionGroups, we can just 
calculate the producer regions for all ConsumedPartitionGroups, and then 
iterate all the regions and its ConsumedPartitionGroups. If the 
ConsumedPartitionGroup has two or more producer regions, and the regions 
contains current region, it is a cross-region ConsumedPartitionGroup. This 
meets the correct semantics, and make sure ALL-TO-ALL BLOCKING 
ConsumedPartitionGroups will not be treated as cross-region one. This fix will 
also decreases the complexity from O(N) to O(N^2). I prefer it's necessary to 
add this bug-fix to release 1.13.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15280: [FLINK-21714][table-api] Use TIMESTAMP_LTZ as return type for function PROCTIME()

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15280:
URL: https://github.com/apache/flink/pull/15280#issuecomment-802654466


   
   ## CI report:
   
   * 3b4b5fcd9d8108b51e0bf62a0bee888b4fdb5186 UNKNOWN
   * f80daf3fbf6f5f5ce1d39813fbecf9866eb50efb Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15895)
 
   * 9d51cdd516ab6606275a16c324927637ad21d46a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15119: [FLINK-21736][state] Introduce state scope latency tracking metrics

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15119:
URL: https://github.com/apache/flink/pull/15119#issuecomment-792934307


   
   ## CI report:
   
   * 0cfa410c4bfa324f1fab08032f3992833fb4ceb7 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15861)
 
   * c816af323a8d685ad8dd05f2e46e39efb0a4866a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15456: [BP-1.12][FLINK-21685][k8s] Use dedicated thread pool for Kubernetes client IO operations

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15456:
URL: https://github.com/apache/flink/pull/15456#issuecomment-811085523


   
   ## CI report:
   
   * 4f07860cd382e6e188ac739b4aa9262e1a9da7bf Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15945)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15910)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15456: [BP-1.12][FLINK-21685][k8s] Use dedicated thread pool for Kubernetes client IO operations

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15456:
URL: https://github.com/apache/flink/pull/15456#issuecomment-811085523


   
   ## CI report:
   
   * 4f07860cd382e6e188ac739b4aa9262e1a9da7bf Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15910)
 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15945)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-21808) Support DQL/DML in HiveParser

2021-03-31 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-21808:
---

Assignee: Rui Li

> Support DQL/DML in HiveParser
> -
>
> Key: FLINK-21808
> URL: https://issues.apache.org/jira/browse/FLINK-21808
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-21808) Support DQL/DML in HiveParser

2021-03-31 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-21808.
---
Resolution: Fixed

Fixed in master: 9222c55da340d4e5fa1dbfe67286df3bc90665cb... 
04bbf03a0cdb2f455c1b06569dea95ace6fa7e7c

> Support DQL/DML in HiveParser
> -
>
> Key: FLINK-21808
> URL: https://issues.apache.org/jira/browse/FLINK-21808
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: Rui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong closed pull request #15253: [FLINK-21808][hive] Support DQL/DML in HiveParser

2021-03-31 Thread GitBox


wuchong closed pull request #15253:
URL: https://github.com/apache/flink/pull/15253


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wangyang0918 commented on a change in pull request #15385: [FLINK-21685][k8s] Introduce dedicated thread pool and cache for Kubernetes client

2021-03-31 Thread GitBox


wangyang0918 commented on a change in pull request #15385:
URL: https://github.com/apache/flink/pull/15385#discussion_r605336670



##
File path: 
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/Fabric8FlinkKubeClient.java
##
@@ -75,12 +77,12 @@
 private final String namespace;
 private final int maxRetryAttempts;
 
-private final Executor kubeClientExecutorService;
+private final ExecutorService kubeClientExecutorService;
 
 public Fabric8FlinkKubeClient(
 Configuration flinkConfig,
 NamespacedKubernetesClient client,
-Supplier asyncExecutorFactory) {
+Supplier asyncExecutorFactory) {

Review comment:
   Thanks for the updating. It makes sense to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wangyang0918 commented on pull request #15456: [BP-1.12][FLINK-21685][k8s] Use dedicated thread pool for Kubernetes client IO operations

2021-03-31 Thread GitBox


wangyang0918 commented on pull request #15456:
URL: https://github.com/apache/flink/pull/15456#issuecomment-811592529


   The azure pipeline failed because something is wrong with the 
alicloud-mvn-mirror. Trigger again.
   
   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wuchong commented on pull request #15253: [FLINK-21808][hive] Support DQL/DML in HiveParser

2021-03-31 Thread GitBox


wuchong commented on pull request #15253:
URL: https://github.com/apache/flink/pull/15253#issuecomment-811591878


   I will merge this for now. The failed test should be related to FLINK-20558.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-20558) ParquetAvroStreamingFileSinkITCase.testWriteParquetAvroSpecific test failure

2021-03-31 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312812#comment-17312812
 ] 

Jark Wu edited comment on FLINK-20558 at 4/1/21, 2:28 AM:
--

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15900=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=03dca39c-73e8-5aaf-601d-328ae5c35f20

Different test method, but the similar error. 
{code}
ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.064 s 
<<< FAILURE! - in 
org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase
[ERROR] 
testWriteParquetAvroReflect(org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase)
  Time elapsed: 3.213 s  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at 
org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase.validateResults(ParquetAvroStreamingFileSinkITCase.java:161)
at 
org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase.testWriteParquetAvroReflect(ParquetAvroStreamingFileSinkITCase.java:152)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}



was (Author: jark):
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15900=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=03dca39c-73e8-5aaf-601d-328ae5c35f20

> ParquetAvroStreamingFileSinkITCase.testWriteParquetAvroSpecific test failure
> 
>
> Key: FLINK-20558
> URL: https://issues.apache.org/jira/browse/FLINK-20558
> Project: Flink
>  Issue Type: Test
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.13.0
>Reporter: Matthias
>Priority: Major
>  Labels: testability
>
> [Build|https://dev.azure.com/mapohl/flink/_build/results?buildId=135=results]
>  failed due to failing test 
> \{{ParquetAvroStreamingFileSinkITCase.testWriteParquetAvroSpecific}}.
> {code:java}
> [ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 10.193 s <<< FAILURE! - in 
> org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase
> [ERROR] 
> testWriteParquetAvroSpecific(org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase)
>   Time elapsed: 0.561 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase.validateResults(ParquetAvroStreamingFileSinkITCase.java:160)
>   at 
> org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase.testWriteParquetAvroSpecific(ParquetAvroStreamingFileSinkITCase.java:95)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> 

[jira] [Commented] (FLINK-20558) ParquetAvroStreamingFileSinkITCase.testWriteParquetAvroSpecific test failure

2021-03-31 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312812#comment-17312812
 ] 

Jark Wu commented on FLINK-20558:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=15900=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=03dca39c-73e8-5aaf-601d-328ae5c35f20

> ParquetAvroStreamingFileSinkITCase.testWriteParquetAvroSpecific test failure
> 
>
> Key: FLINK-20558
> URL: https://issues.apache.org/jira/browse/FLINK-20558
> Project: Flink
>  Issue Type: Test
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>Affects Versions: 1.13.0
>Reporter: Matthias
>Priority: Major
>  Labels: testability
>
> [Build|https://dev.azure.com/mapohl/flink/_build/results?buildId=135=results]
>  failed due to failing test 
> \{{ParquetAvroStreamingFileSinkITCase.testWriteParquetAvroSpecific}}.
> {code:java}
> [ERROR] Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 10.193 s <<< FAILURE! - in 
> org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase
> [ERROR] 
> testWriteParquetAvroSpecific(org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase)
>   Time elapsed: 0.561 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase.validateResults(ParquetAvroStreamingFileSinkITCase.java:160)
>   at 
> org.apache.flink.formats.parquet.avro.ParquetAvroStreamingFileSinkITCase.testWriteParquetAvroSpecific(ParquetAvroStreamingFileSinkITCase.java:95)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748) {code}
> The assertion was caused by [this assert call checking the number of files in 
> the 
> bucket|https://github.com/apache/flink/blob/fdea3cdc47052d59fc20611e1be019d223d77501/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/ParquetAvroStreamingFileSinkITCase.java#L160].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-21191) Support reducing buffer for upsert-kafka sink

2021-03-31 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-21191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-21191.
---
Resolution: Fixed

Fixed in master: ec9b0c5b60290697769415eb3e1b1ed2052460ac

> Support reducing buffer for upsert-kafka sink
> -
>
> Key: FLINK-21191
> URL: https://issues.apache.org/jira/browse/FLINK-21191
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Kafka, Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: Shengkai Fang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> Currently, if there is a job agg -> filter -> upsert-kafka, then upsert-kafka 
> will receive -U and +U for every updates instead of only a +U. This will 
> produce a lot of tombstone messages in Kafka. It's not just about the 
> unnecessary data volume in Kafka, but users may processes that trigger side 
> effects when a tombstone records is ingested from a Kafka topic. 
> A simple solution would be add a reducing buffer for the upsert-kafka, to 
> reduce the -U and +U before emitting to the underlying sink. This should be 
> very similar to the implementation of upsert JDBC sink. 
> We can even extract the reducing logic out of the JDBC connector and it can 
> be reused by other connectors. 
> This should be something like `BufferedUpsertSinkFunction` which has a 
> reducing buffer and flush to the underlying SinkFunction
> once checkpointing or buffer timeout. We can put it in `flink-connector-base` 
> which can be shared for builtin connectors and custom connectors. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wuchong closed pull request #15434: [FLINK-21191][upsert-kafka] Support buffered sink function for upsert…

2021-03-31 Thread GitBox


wuchong closed pull request #15434:
URL: https://github.com/apache/flink/pull/15434


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-22076) Python Test failed with "OSError: [Errno 12] Cannot allocate memory"

2021-03-31 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-22076:


 Summary: Python Test failed with "OSError: [Errno 12] Cannot 
allocate memory"
 Key: FLINK-22076
 URL: https://issues.apache.org/jira/browse/FLINK-22076
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.13.0
Reporter: Huang Xingbo


https://dev.azure.com/sewen0794/Flink/_build/results?buildId=249=logs=fba17979-6d2e-591d-72f1-97cf42797c11=443dc6bf-b240-56df-6acf-c882d4b238da=21533

Python Test failed with "OSError: [Errno 12] Cannot allocate memory" in Azure 
Pipeline. I am not sure if it is caused by insufficient machine memory on Azure.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #15054: [FLINK-13550][runtime][ui] Operator's Flame Graph

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15054:
URL: https://github.com/apache/flink/pull/15054#issuecomment-788337524


   
   ## CI report:
   
   * 26a28f2d83f56cb386e1365fd4df4fb8a2f2bf86 UNKNOWN
   * 0b5aaf42f2861a38e26a80e25cf2324e7cf06bb7 UNKNOWN
   * a682564bcc9fe670b70fd526258b799f71431437 UNKNOWN
   * 9fcc466f5b1fbab8bc1827cc6e090c9d3ce75e8e UNKNOWN
   * 569696902bdfdecf3afd840ffc46e3878eb11b97 UNKNOWN
   * f1117ee096285bf2534b70c470694eac0406ca53 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15933)
 
   * edbc6251d9d4a3a08b2a85ac817ae17afe8f851a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15941)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15461: [FLINK-21817][connector/kafka] Remove split assignment tracking and make worker thread stateless in Kafka enumerator

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15461:
URL: https://github.com/apache/flink/pull/15461#issuecomment-811315471


   
   ## CI report:
   
   * 610bda77e51cd92403f69ac09ae73c3d9a463386 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15930)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15434: [FLINK-21191][upsert-kafka] Support buffered sink function for upsert…

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15434:
URL: https://github.com/apache/flink/pull/15434#issuecomment-810339027


   
   ## CI report:
   
   * 3ccebd310ade4b05520d8875c071c1290779ce55 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15928)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15460: [FLINK-22050][runtime] Don't release StreamTask network resources from TaskCanceller

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15460:
URL: https://github.com/apache/flink/pull/15460#issuecomment-811196253


   
   ## CI report:
   
   * 76187017457341aae886aa4a3180b61ae2fa93de Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15925)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15054: [FLINK-13550][runtime][ui] Operator's Flame Graph

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15054:
URL: https://github.com/apache/flink/pull/15054#issuecomment-788337524


   
   ## CI report:
   
   * 26a28f2d83f56cb386e1365fd4df4fb8a2f2bf86 UNKNOWN
   * 0b5aaf42f2861a38e26a80e25cf2324e7cf06bb7 UNKNOWN
   * a682564bcc9fe670b70fd526258b799f71431437 UNKNOWN
   * 4a35885502b6eafe8b46b8bd6df36aa3477c9929 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15729)
 
   * 9fcc466f5b1fbab8bc1827cc6e090c9d3ce75e8e UNKNOWN
   * 569696902bdfdecf3afd840ffc46e3878eb11b97 UNKNOWN
   * f1117ee096285bf2534b70c470694eac0406ca53 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15933)
 
   * edbc6251d9d4a3a08b2a85ac817ae17afe8f851a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15941)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-21976) Move Flink ML pipeline API and library code to a separate repository named flink-ml

2021-03-31 Thread Dian Fu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312789#comment-17312789
 ] 

Dian Fu commented on FLINK-21976:
-

Removed flink-ml from Flink repo via 60b3895db28c2270f6e21a02e14b0834557be101 ~ 
75e1f60ad741f87af3e15c5ab3633ce7e4c17643

> Move Flink ML pipeline API and library code to a separate repository named 
> flink-ml
> ---
>
> Key: FLINK-21976
> URL: https://issues.apache.org/jira/browse/FLINK-21976
> Project: Flink
>  Issue Type: Improvement
>Reporter: Dong Lin
>Assignee: Dong Lin
>Priority: Major
>  Labels: pull-request-available
>
> According to the discussion in [1] and [2], move Flink ML pipeline API and 
> library code to a separate repository named flink-ml.
> The new repo will have the following setup:
> - The repo will be created at https://github.com/apache/flink-ml. This repo 
> will depend on the core Flink repo.
> - The flink-ml documentation will be linked from the existing main Flink docs 
> similar to https://ci.apache.org/projects/flink/flink-statefun-docs-master.
> - The new repo will be under namespace org.apache.flink.
> - We can revisit whether we should put it back to the core Flink repo after 
> the above issue is resolved and if there is good reason to make the change.
> [1] 
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Move-Flink-ML-pipeline-API-and-library-code-to-a-separate-repository-named-flink-ml-tc49420.html
> [2] 
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Move-Flink-ML-pipeline-API-and-library-code-to-a-separate-repository-named-flink-ml-tt49568.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] dianfu closed pull request #15405: [Flink-21976] Remove Flink ML pipeline API and library code from the Flink repo

2021-03-31 Thread GitBox


dianfu closed pull request #15405:
URL: https://github.com/apache/flink/pull/15405


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15411: [FLINK-22055][runtime] Fix RPCEndpoint MainThreadExecutor scheduling command with wrong time unit.

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15411:
URL: https://github.com/apache/flink/pull/15411#issuecomment-809191175


   
   ## CI report:
   
   * 386a106bff7eda5396dc95f088c7121bbb55a73f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15828)
 
   * 9ae0c40165b92bfa89bfb3b776f60fe113f4922c Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15940)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15458: [FLINK-22064][table] Don't submit statement set when no insert is added in the sql client

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15458:
URL: https://github.com/apache/flink/pull/15458#issuecomment-811149954


   
   ## CI report:
   
   * e97c9b4bc77bb3842f3fb72396dcee2acac9fb5d Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15918)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15437: [FLINK-20320][sql-client] support init sql file in sql client

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15437:
URL: https://github.com/apache/flink/pull/15437#issuecomment-810431644


   
   ## CI report:
   
   * 71c1edd98caa64cc1fe447f878f8cbc1321eb40b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15913)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15411: [FLINK-22055][runtime] Fix RPCEndpoint MainThreadExecutor scheduling command with wrong time unit.

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15411:
URL: https://github.com/apache/flink/pull/15411#issuecomment-809191175


   
   ## CI report:
   
   * 386a106bff7eda5396dc95f088c7121bbb55a73f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15828)
 
   * 9ae0c40165b92bfa89bfb3b776f60fe113f4922c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15054: [FLINK-13550][runtime][ui] Operator's Flame Graph

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15054:
URL: https://github.com/apache/flink/pull/15054#issuecomment-788337524


   
   ## CI report:
   
   * 26a28f2d83f56cb386e1365fd4df4fb8a2f2bf86 UNKNOWN
   * 0b5aaf42f2861a38e26a80e25cf2324e7cf06bb7 UNKNOWN
   * a682564bcc9fe670b70fd526258b799f71431437 UNKNOWN
   * 4a35885502b6eafe8b46b8bd6df36aa3477c9929 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15729)
 
   * 9fcc466f5b1fbab8bc1827cc6e090c9d3ce75e8e UNKNOWN
   * 569696902bdfdecf3afd840ffc46e3878eb11b97 UNKNOWN
   * f1117ee096285bf2534b70c470694eac0406ca53 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15933)
 
   * edbc6251d9d4a3a08b2a85ac817ae17afe8f851a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] StephanEwen commented on pull request #15119: [FLINK-21736][state] Introduce state scope latency tracking metrics

2021-03-31 Thread GitBox


StephanEwen commented on pull request #15119:
URL: https://github.com/apache/flink/pull/15119#issuecomment-811513817


   Looks good to me, +1 to merge after rebasing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] est08zw commented on a change in pull request #15411: [FLINK-22055][runtime] Fix RPCEndpoint MainThreadExecutor scheduling command with wrong time unit.

2021-03-31 Thread GitBox


est08zw commented on a change in pull request #15411:
URL: https://github.com/apache/flink/pull/15411#discussion_r605263050



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/rpc/RpcEndpointTest.java
##
@@ -262,4 +267,136 @@ protected ExtendedEndpoint(
 return CompletableFuture.completedFuture(isRunning());
 }
 }
+
+/**
+ * Tests that the {@link RpcEndpoint} can execute the runnable. See also 
{@link
+ * org.apache.flink.runtime.rpc.RpcEndpoint#runAsync(Runnable)}
+ */
+@Test
+public void testRunAsync() throws InterruptedException, 
ExecutionException, TimeoutException {
+RpcEndpoint dummyRpcEndpoint = new RpcEndpoint(rpcService) {};
+final OneShotLatch latch = new OneShotLatch();
+try {
+dummyRpcEndpoint.start();
+dummyRpcEndpoint.runAsync(latch::trigger);
+latch.await(TIMEOUT.getSize(), TIMEOUT.getUnit());
+assertTrue(latch.isTriggered());
+} finally {
+RpcUtils.terminateRpcEndpoint(dummyRpcEndpoint, TIMEOUT);
+}
+}
+
+/**
+ * Tests that the {@link RpcEndpoint} can execute the runnable with a 
delay specified in Time.
+ * See also {@link 
org.apache.flink.runtime.rpc.RpcEndpoint#scheduleRunAsync(Runnable, Time)}
+ */
+@Test
+public void testScheduleRunAsyncTime()
+throws InterruptedException, ExecutionException, TimeoutException {
+RpcEndpoint dummyRpcEndpoint = new RpcEndpoint(rpcService) {};
+OneShotLatch latch = new OneShotLatch();
+Time delay = Time.seconds(1);
+try {
+dummyRpcEndpoint.start();
+dummyRpcEndpoint.scheduleRunAsync(latch::trigger, delay);
+assertFalse(latch.isTriggered());
+if (addExtraDelay) {
+TimeUnit.MILLISECONDS.sleep(extraDelayMilliSeconds);
+}
+latch.await(delay.getSize(), delay.getUnit());
+assertTrue(latch.isTriggered());
+} finally {
+RpcUtils.terminateRpcEndpoint(dummyRpcEndpoint, TIMEOUT);
+}
+}
+
+/**
+ * Tests that the {@link RpcEndpoint} can execute the runnable with a 
delay specified in
+ * TimeUnit. See also {@link 
org.apache.flink.runtime.rpc.RpcEndpoint#scheduleRunAsync(Runnable,
+ * long, TimeUnit)}
+ */
+@Test
+public void testScheduleRunAsyncTimeUnit()
+throws InterruptedException, ExecutionException, TimeoutException {
+RpcEndpoint dummyRpcEndpoint = new RpcEndpoint(rpcService) {};
+OneShotLatch latch = new OneShotLatch();
+final long delayMinutes = 1;
+final Time delayTime = Time.minutes(delayMinutes);
+
+try {
+dummyRpcEndpoint.start();
+
+dummyRpcEndpoint.scheduleRunAsync(latch::trigger, delayMinutes, 
TimeUnit.MINUTES);
+assertFalse(latch.isTriggered());
+if (addExtraDelay) {
+TimeUnit.MILLISECONDS.sleep(extraDelayMilliSeconds);
+}
+latch.await(delayTime.getSize(), delayTime.getUnit());
+assertTrue(latch.isTriggered());
+latch.reset();

Review comment:
   use [delay * 0.8, delay * 1.2] range to check the actual delay.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] est08zw commented on a change in pull request #15411: [FLINK-22055][runtime] Fix RPCEndpoint MainThreadExecutor scheduling command with wrong time unit.

2021-03-31 Thread GitBox


est08zw commented on a change in pull request #15411:
URL: https://github.com/apache/flink/pull/15411#discussion_r605262390



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/rpc/RpcEndpointTest.java
##
@@ -262,4 +267,136 @@ protected ExtendedEndpoint(
 return CompletableFuture.completedFuture(isRunning());
 }
 }
+
+/**
+ * Tests that the {@link RpcEndpoint} can execute the runnable. See also 
{@link
+ * org.apache.flink.runtime.rpc.RpcEndpoint#runAsync(Runnable)}
+ */
+@Test
+public void testRunAsync() throws InterruptedException, 
ExecutionException, TimeoutException {
+RpcEndpoint dummyRpcEndpoint = new RpcEndpoint(rpcService) {};
+final OneShotLatch latch = new OneShotLatch();
+try {
+dummyRpcEndpoint.start();
+dummyRpcEndpoint.runAsync(latch::trigger);
+latch.await(TIMEOUT.getSize(), TIMEOUT.getUnit());
+assertTrue(latch.isTriggered());
+} finally {
+RpcUtils.terminateRpcEndpoint(dummyRpcEndpoint, TIMEOUT);
+}
+}
+
+/**
+ * Tests that the {@link RpcEndpoint} can execute the runnable with a 
delay specified in Time.
+ * See also {@link 
org.apache.flink.runtime.rpc.RpcEndpoint#scheduleRunAsync(Runnable, Time)}
+ */
+@Test
+public void testScheduleRunAsyncTime()
+throws InterruptedException, ExecutionException, TimeoutException {
+RpcEndpoint dummyRpcEndpoint = new RpcEndpoint(rpcService) {};
+OneShotLatch latch = new OneShotLatch();
+Time delay = Time.seconds(1);
+try {
+dummyRpcEndpoint.start();
+dummyRpcEndpoint.scheduleRunAsync(latch::trigger, delay);
+assertFalse(latch.isTriggered());
+if (addExtraDelay) {
+TimeUnit.MILLISECONDS.sleep(extraDelayMilliSeconds);
+}
+latch.await(delay.getSize(), delay.getUnit());
+assertTrue(latch.isTriggered());
+} finally {
+RpcUtils.terminateRpcEndpoint(dummyRpcEndpoint, TIMEOUT);
+}
+}
+
+/**
+ * Tests that the {@link RpcEndpoint} can execute the runnable with a 
delay specified in
+ * TimeUnit. See also {@link 
org.apache.flink.runtime.rpc.RpcEndpoint#scheduleRunAsync(Runnable,
+ * long, TimeUnit)}
+ */
+@Test
+public void testScheduleRunAsyncTimeUnit()

Review comment:
   just test MILLISECONDS and SECONDS. reduce the delay and schedule the 2 
commands immediately, do not wait the first command finished to schedule the 
2nd command.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] est08zw commented on a change in pull request #15411: [FLINK-22055][runtime] Fix RPCEndpoint MainThreadExecutor scheduling command with wrong time unit.

2021-03-31 Thread GitBox


est08zw commented on a change in pull request #15411:
URL: https://github.com/apache/flink/pull/15411#discussion_r605261249



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/rpc/RpcEndpointTest.java
##
@@ -262,4 +267,136 @@ protected ExtendedEndpoint(
 return CompletableFuture.completedFuture(isRunning());
 }
 }
+
+/**
+ * Tests that the {@link RpcEndpoint} can execute the runnable. See also 
{@link
+ * org.apache.flink.runtime.rpc.RpcEndpoint#runAsync(Runnable)}
+ */
+@Test
+public void testRunAsync() throws InterruptedException, 
ExecutionException, TimeoutException {
+RpcEndpoint dummyRpcEndpoint = new RpcEndpoint(rpcService) {};
+final OneShotLatch latch = new OneShotLatch();
+try {
+dummyRpcEndpoint.start();
+dummyRpcEndpoint.runAsync(latch::trigger);
+latch.await(TIMEOUT.getSize(), TIMEOUT.getUnit());
+assertTrue(latch.isTriggered());
+} finally {
+RpcUtils.terminateRpcEndpoint(dummyRpcEndpoint, TIMEOUT);
+}
+}
+
+/**
+ * Tests that the {@link RpcEndpoint} can execute the runnable with a 
delay specified in Time.
+ * See also {@link 
org.apache.flink.runtime.rpc.RpcEndpoint#scheduleRunAsync(Runnable, Time)}
+ */
+@Test
+public void testScheduleRunAsyncTime()
+throws InterruptedException, ExecutionException, TimeoutException {
+RpcEndpoint dummyRpcEndpoint = new RpcEndpoint(rpcService) {};
+OneShotLatch latch = new OneShotLatch();
+Time delay = Time.seconds(1);
+try {
+dummyRpcEndpoint.start();
+dummyRpcEndpoint.scheduleRunAsync(latch::trigger, delay);

Review comment:
   use RpcEndpoint#getMainThreadExecutor() to schedule command indirectly. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] est08zw commented on a change in pull request #15411: [FLINK-22055][runtime] Fix RPCEndpoint MainThreadExecutor scheduling command with wrong time unit.

2021-03-31 Thread GitBox


est08zw commented on a change in pull request #15411:
URL: https://github.com/apache/flink/pull/15411#discussion_r605260386



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/rpc/RpcEndpointTest.java
##
@@ -262,4 +267,136 @@ protected ExtendedEndpoint(
 return CompletableFuture.completedFuture(isRunning());
 }
 }
+
+/**
+ * Tests that the {@link RpcEndpoint} can execute the runnable. See also 
{@link
+ * org.apache.flink.runtime.rpc.RpcEndpoint#runAsync(Runnable)}
+ */
+@Test
+public void testRunAsync() throws InterruptedException, 
ExecutionException, TimeoutException {
+RpcEndpoint dummyRpcEndpoint = new RpcEndpoint(rpcService) {};
+final OneShotLatch latch = new OneShotLatch();
+try {
+dummyRpcEndpoint.start();
+dummyRpcEndpoint.runAsync(latch::trigger);

Review comment:
   use RpcEndpoint#validateRunsInMainThread() to check run in the main 
thread.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] est08zw commented on a change in pull request #15411: [FLINK-22055][runtime] Fix RPCEndpoint MainThreadExecutor scheduling command with wrong time unit.

2021-03-31 Thread GitBox


est08zw commented on a change in pull request #15411:
URL: https://github.com/apache/flink/pull/15411#discussion_r605259670



##
File path: 
flink-runtime/src/test/java/org/apache/flink/runtime/rpc/RpcEndpointTest.java
##
@@ -262,4 +267,136 @@ protected ExtendedEndpoint(
 return CompletableFuture.completedFuture(isRunning());
 }
 }
+
+/**
+ * Tests that the {@link RpcEndpoint} can execute the runnable. See also 
{@link
+ * org.apache.flink.runtime.rpc.RpcEndpoint#runAsync(Runnable)}
+ */
+@Test
+public void testRunAsync() throws InterruptedException, 
ExecutionException, TimeoutException {
+RpcEndpoint dummyRpcEndpoint = new RpcEndpoint(rpcService) {};

Review comment:
   add one parameter constructor for BaseEndpoint and use baseEndpoint for 
the tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] StephanEwen commented on pull request #15397: [FLINK-21817][connector/common] Remove split assignment tracker from coordinator state

2021-03-31 Thread GitBox


StephanEwen commented on pull request #15397:
URL: https://github.com/apache/flink/pull/15397#issuecomment-811503088


   Sorry, had to abort the merge because the compile failed with a checkstyle 
issue...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] StephanEwen closed pull request #15429: [FLINK-21935] Remove 'state.backend.async' option

2021-03-31 Thread GitBox


StephanEwen closed pull request #15429:
URL: https://github.com/apache/flink/pull/15429


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] StephanEwen closed pull request #15449: [FLINK-22053][core] Allow NumberSequenceSource to have less splits than parallelism.

2021-03-31 Thread GitBox


StephanEwen closed pull request #15449:
URL: https://github.com/apache/flink/pull/15449


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15441: [FLINK-22052][python] Add FLIP-142 public classes to python API

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15441:
URL: https://github.com/apache/flink/pull/15441#issuecomment-810695518


   
   ## CI report:
   
   * ca1680265a1fd5b1049d597202efbb992f71fb95 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15902)
 
   * 5f2a1ada2fd1124bc6950fdbfea8747fecf0ab1d Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15938)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] StephanEwen commented on a change in pull request #15461: [FLINK-21817][connector/kafka] Remove split assignment tracking and make worker thread stateless in Kafka enumerator

2021-03-31 Thread GitBox


StephanEwen commented on a change in pull request #15461:
URL: https://github.com/apache/flink/pull/15461#discussion_r605201241



##
File path: 
flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/enumerator/KafkaSourceEnumerator.java
##
@@ -98,7 +95,7 @@ public KafkaSourceEnumerator(
 stoppingOffsetInitializer,
 properties,
 context,
-new HashMap<>());
+new HashSet<>());

Review comment:
   Can be `Collections.emptySet()`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15456: [BP-1.12][FLINK-21685][k8s] Use dedicated thread pool for Kubernetes client IO operations

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15456:
URL: https://github.com/apache/flink/pull/15456#issuecomment-811085523


   
   ## CI report:
   
   * 4f07860cd382e6e188ac739b4aa9262e1a9da7bf Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15910)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15441: [FLINK-22052][python] Add FLIP-142 public classes to python API

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15441:
URL: https://github.com/apache/flink/pull/15441#issuecomment-810695518


   
   ## CI report:
   
   * ca1680265a1fd5b1049d597202efbb992f71fb95 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15902)
 
   * 5f2a1ada2fd1124bc6950fdbfea8747fecf0ab1d UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-17012) Expose stage of task initialization

2021-03-31 Thread Piotr Nowojski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-17012.
--
Fix Version/s: 1.13.0
 Release Note: Task's RUNNING state was split into two states: RECOVERING 
and RUNNING. Task is RECOVERING while state is initialising and in case of 
unaligned checkpoints, until all of the in-flight data has been recovered.
   Resolution: Fixed

Merged to master as c0c156bd638, 22c41b06b73 and dbf1221debe

> Expose stage of task initialization
> ---
>
> Key: FLINK-17012
> URL: https://issues.apache.org/jira/browse/FLINK-17012
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Task
>Reporter: Wenlong Lyu
>Assignee: Anton Kalashnikov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.13.0
>
>
> Currently a task switches to running before fully initialized, does not take 
> state initialization and operator initialization(#open ) in to account, which 
> may take long time to finish. As a result, there would be a weird phenomenon 
> that all tasks are running but throughput is 0. 
> I think it could be good if we can expose the initialization stage of tasks. 
> What to you think?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] pnowojski closed pull request #15375: [FLINK-17012][streaming] Implemented the separated 'restore' method for StreamTask

2021-03-31 Thread GitBox


pnowojski closed pull request #15375:
URL: https://github.com/apache/flink/pull/15375


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] pnowojski commented on pull request #15375: [FLINK-17012][streaming] Implemented the separated 'restore' method for StreamTask

2021-03-31 Thread GitBox


pnowojski commented on pull request #15375:
URL: https://github.com/apache/flink/pull/15375#issuecomment-811465586


   The final build is almost done, except of e2e tests. Compared to previous 
builds there were no significant changes (java docs vs the previous build or 
renames vs the build before), and those previous builds have succeeded or 
failed for unrelated issues, so I'm going to merge this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] sjwiesman commented on pull request #15441: [FLINK-22052][python] Add FLIP-142 public classes to python API

2021-03-31 Thread GitBox


sjwiesman commented on pull request #15441:
URL: https://github.com/apache/flink/pull/15441#issuecomment-811465353


   Thanks @dianfu. I've made the final fix. Since this is based #15429 I will 
wait on that PR to finish before merging. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15441: [FLINK-22052][python] Add FLIP-142 public classes to python API

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15441:
URL: https://github.com/apache/flink/pull/15441#issuecomment-810695518


   
   ## CI report:
   
   * ca1680265a1fd5b1049d597202efbb992f71fb95 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15902)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15434: [FLINK-21191][upsert-kafka] Support buffered sink function for upsert…

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15434:
URL: https://github.com/apache/flink/pull/15434#issuecomment-810339027


   
   ## CI report:
   
   * 7c32b2a7767dc356df8f7949646e4e586c65ed8a Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15907)
 
   * 3ccebd310ade4b05520d8875c071c1290779ce55 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15928)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15405: [Flink-21976] Remove Flink ML pipeline API and library code from the Flink repo

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15405:
URL: https://github.com/apache/flink/pull/15405#issuecomment-809101943


   
   ## CI report:
   
   * 3e06be64db57497dd4ba30dd471cd9cf94a215e1 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15906)
 
   * c5f15eb10b748f8ab53a8bd8b289b7eb6f4d2e7c UNKNOWN
   * 78a3da1265dbdaa94862d83c3d274b1bf45f3201 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15923)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15385: [FLINK-21685][k8s] Introduce dedicated thread pool and cache for Kubernetes client

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15385:
URL: https://github.com/apache/flink/pull/15385#issuecomment-808104280


   
   ## CI report:
   
   * c628943d4da91fddf19f7c47cb8c9120f61c3ce5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15908)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15397: [FLINK-21817][connector/common] Remove split assignment tracker from coordinator state

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15397:
URL: https://github.com/apache/flink/pull/15397#issuecomment-808916074


   
   ## CI report:
   
   * 3ff767f2f01190feee3c93c09048f92e977d8b8f UNKNOWN
   * ca1e7a3bfb2f8fada3699e84d12eddb17aef2a87 UNKNOWN
   * 36ec1871dcf7ac4ce33e840fda6a7927aea0a479 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15927)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15375: [FLINK-17012][streaming] Implemented the separated 'restore' method for StreamTask

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15375:
URL: https://github.com/apache/flink/pull/15375#issuecomment-807081996


   
   ## CI report:
   
   * 6366c66172146c943ef8dda07f3c044c87379ef1 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15905)
 
   * d928ce9276c8f8e741cbad1a92fc988dd447c478 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15915)
 
   * 59cca4c2874732397766eac71a03750bc42f2a4b UNKNOWN
   * 4cc069ab7ad54a0198a4dcfb7f3783640cdd4ff3 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15922)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15121: The method $(String) is undefined for the type TableExample

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15121:
URL: https://github.com/apache/flink/pull/15121#issuecomment-793480240


   
   ## CI report:
   
   * cf26ce895a7956d258acc073817f578558e78227 UNKNOWN
   * 619fedb31ed33e3501be115fc618aa41f5dab8bc Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15904)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #15054: [FLINK-13550][runtime][ui] Operator's Flame Graph

2021-03-31 Thread GitBox


flinkbot edited a comment on pull request #15054:
URL: https://github.com/apache/flink/pull/15054#issuecomment-788337524


   
   ## CI report:
   
   * 26a28f2d83f56cb386e1365fd4df4fb8a2f2bf86 UNKNOWN
   * 0b5aaf42f2861a38e26a80e25cf2324e7cf06bb7 UNKNOWN
   * a682564bcc9fe670b70fd526258b799f71431437 UNKNOWN
   * 4a35885502b6eafe8b46b8bd6df36aa3477c9929 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15729)
 
   * 9fcc466f5b1fbab8bc1827cc6e090c9d3ce75e8e UNKNOWN
   * 569696902bdfdecf3afd840ffc46e3878eb11b97 UNKNOWN
   * f1117ee096285bf2534b70c470694eac0406ca53 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=15933)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] StephanEwen commented on pull request #15461: [FLINK-21817][connector/kafka] Remove split assignment tracking and make worker thread stateless in Kafka enumerator

2021-03-31 Thread GitBox


StephanEwen commented on pull request #15461:
URL: https://github.com/apache/flink/pull/15461#issuecomment-811445749


   Thanks for picking up these changes and taking the initiative to change the 
bookkeeping of splits, in addition to the bugfix.
   I originally meant to postpone the bookkeeping refactoring to a new PR, 
because we were still discussing about it and to safe time before the feature 
freeze. So, sorry to have caused extra work here.
   
   From my side, I think it looks fine at a first glance. Full review is below.
   But with the big changes here, I think this needs also a look from 
@becketqin.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   6   7   >