[jira] [Comment Edited] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-20 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236606#comment-17236606
 ] 

Zhu Zhu edited comment on FLINK-20220 at 11/21/20, 7:32 AM:


I guess the reason is that the status of a vertex to schedule was changed in 
{{LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices()}} 
during the invocation of {{schedulerOperations.allocateSlotsAndDeploy(...)}} on 
other vertices.
e.g. ev1 and ev2 are in the same pipelined region and are restarted one by one 
in the scheduling loop in 
{{LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices()}}.
 They are all CREATED at the moment. ev1 is scheduled first but it immediately 
fails due to some slot allocation error and ev2 will be canceled as a result. 
So when ev2 is scheduled, its state would be CANCELED and the state check 
failed.

[~rmetzger] would you share the full jm log? I can check it to see whether the 
cause is what I guessed.
If it is the cause. A possible fix is to change 
{{LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices()}} 
to put the vertex filtering and {{allocateSlotsAndDeploy()}} invocation in one 
same loop .


was (Author: zhuzh):
I guess the reason is that the status of a vertex to schedule was changed in 
{{LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices()}} 
during the invocation of {{schedulerOperations.allocateSlotsAndDeploy(...)}} on 
other vertices.
e.g. {ev1, ev2} are in the same pipelined region and are restarted one by one 
in the scheduling loop in 
{{LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices()}}.
 They are all CREATED at the moment. ev1 is scheduled first but it immediately 
fails due to some slot allocation error and ev2 will be canceled as a result. 
So when ev2 is scheduled, its state would be CANCELED and the state check 
failed.

[~rmetzger] would you share the full jm log? I can check it to see whether the 
cause is what I guessed.
If it is the cause. A possible fix is to change 
{{LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices()}} 
to put the vertex filtering and {{allocateSlotsAndDeploy()}} invocation in one 
same loop .

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Robert Metzger
>Priority: Critical
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableExcepti

[jira] [Commented] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-20 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236606#comment-17236606
 ] 

Zhu Zhu commented on FLINK-20220:
-

I guess the reason is that the status of a vertex to schedule was changed in 
{{LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices()}} 
during the invocation of {{schedulerOperations.allocateSlotsAndDeploy(...)}} on 
other vertices.
e.g. {ev1, ev2} are in the same pipelined region and are restarted one by one 
in the scheduling loop in 
{{LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices()}}.
 They are all CREATED at the moment. ev1 is scheduled first but it immediately 
fails due to some slot allocation error and ev2 will be canceled as a result. 
So when ev2 is scheduled, its state would be CANCELED and the state check 
failed.

[~rmetzger] would you share the full jm log? I can check it to see whether the 
cause is what I guessed.
If it is the cause. A possible fix is to change 
{{LazyFromSourcesSchedulingStrategy#allocateSlotsAndDeployExecutionVertices()}} 
to put the vertex filtering and {{allocateSlotsAndDeploy()}} invocation in one 
same loop .

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Robert Metzger
>Priority: Critical
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcAc

[GitHub] [flink] kezhuw commented on pull request #14140: [FLINK-19864][tests] Fix unpredictable Thread.getState in StreamTaskTestHarness due to concurrent class loading

2020-11-20 Thread GitBox


kezhuw commented on pull request #14140:
URL: https://github.com/apache/flink/pull/14140#issuecomment-731520618


   @AHeise I am getting confused. We probably have essential divergences on 
what `StreamTaskTestHarness.waitForInputProcessing` should do. From my 
understanding, it should wait until **all currently available input** has been 
processed not end of stream. It is waiting for an intermediate status, and 
could occur several times for single test harness, say three times in 
`TwoInputStreamTaskTest.testWatermarkMetrics`. What you try to propose here 
should have covered by combination of `StreamTaskTestHarness.endInput` and 
`StreamTaskTestHarness.waitForTaskCompletion`. That combination wait for task 
termination which is a terminated status, and should occur at most once for 
single test harness.
   
   @rkhachatryan gave similar suggestion in previous review cycle, I think we 
probably should align on what `StreamTaskTestHarness.waitForInputProcessing` 
should do.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20266) New FileSource prevents IntelliJ from stopping spawned JVM when running a job

2020-11-20 Thread Dian Fu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-20266:

Priority: Critical  (was: Major)

> New FileSource prevents IntelliJ from stopping spawned JVM when running a job
> -
>
> Key: FLINK-20266
> URL: https://issues.apache.org/jira/browse/FLINK-20266
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.12.0
>Reporter: Till Rohrmann
>Priority: Critical
> Fix For: 1.12.0
>
>
> When trying out the new {{FileSource}} I noticed that the jobs which I 
> started from my IDE won't properly terminate. To be more precise the spawned 
> JVM for the jobs wouldn't properly terminate. I cannot really tell what the 
> {{FileSource}} does differently, but when not using this source, the JVM 
> terminates properly.
> The stack trace of the hanging JVM is
> {code}
> 2020-11-20 18:20:02
> Full thread dump OpenJDK 64-Bit Server VM (11.0.2+9 mixed mode):
> Threads class SMR info:
> _java_thread_list=0x7fb5bc15f1b0, length=19, elements={
> 0x7fb60d807000, 0x7fb60d80c000, 0x7fb60d80f000, 
> 0x7fb60d809000,
> 0x7fb60d81a000, 0x7fb61f00b000, 0x7fb63d80e000, 
> 0x7fb61d826800,
> 0x7fb61d829800, 0x7fb61e80, 0x7fb63d95d800, 
> 0x7fb63e2f8800,
> 0x7fb5ba37a800, 0x7fb5afe1a800, 0x7fb61dff6800, 
> 0x7fb63da49800,
> 0x7fb63e8d0800, 0x7fb5be001000, 0x7fb5bb8a4000
> }
> "Reference Handler" #2 daemon prio=10 os_prio=31 cpu=10.05ms elapsed=86.35s 
> tid=0x7fb60d807000 nid=0x4b03 waiting on condition  [0x736e9000]
>java.lang.Thread.State: RUNNABLE
>   at 
> java.lang.ref.Reference.waitForReferencePendingList(java.base@11.0.2/Native 
> Method)
>   at 
> java.lang.ref.Reference.processPendingReferences(java.base@11.0.2/Reference.java:241)
>   at 
> java.lang.ref.Reference$ReferenceHandler.run(java.base@11.0.2/Reference.java:213)
> "Finalizer" #3 daemon prio=8 os_prio=31 cpu=0.90ms elapsed=86.35s 
> tid=0x7fb60d80c000 nid=0x3803 in Object.wait()  [0x737ec000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(java.base@11.0.2/Native Method)
>   - waiting on <0x000600204780> (a java.lang.ref.ReferenceQueue$Lock)
>   at 
> java.lang.ref.ReferenceQueue.remove(java.base@11.0.2/ReferenceQueue.java:155)
>   - waiting to re-lock in wait() <0x000600204780> (a 
> java.lang.ref.ReferenceQueue$Lock)
>   at 
> java.lang.ref.ReferenceQueue.remove(java.base@11.0.2/ReferenceQueue.java:176)
>   at 
> java.lang.ref.Finalizer$FinalizerThread.run(java.base@11.0.2/Finalizer.java:170)
> "Signal Dispatcher" #4 daemon prio=9 os_prio=31 cpu=0.31ms elapsed=86.34s 
> tid=0x7fb60d80f000 nid=0x4203 runnable  [0x]
>java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread0" #5 daemon prio=9 os_prio=31 cpu=2479.36ms elapsed=86.34s 
> tid=0x7fb60d809000 nid=0x3f03 waiting on condition  [0x]
>java.lang.Thread.State: RUNNABLE
>No compile task
> "C1 CompilerThread0" #8 daemon prio=9 os_prio=31 cpu=1412.88ms elapsed=86.34s 
> tid=0x7fb60d81a000 nid=0x3d03 waiting on condition  [0x]
>java.lang.Thread.State: RUNNABLE
>No compile task
> "Sweeper thread" #9 daemon prio=9 os_prio=31 cpu=42.82ms elapsed=86.34s 
> tid=0x7fb61f00b000 nid=0xa803 runnable  [0x]
>java.lang.Thread.State: RUNNABLE
> "Common-Cleaner" #10 daemon prio=8 os_prio=31 cpu=3.25ms elapsed=86.29s 
> tid=0x7fb63d80e000 nid=0x5703 in Object.wait()  [0x73cfb000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
>   at java.lang.Object.wait(java.base@11.0.2/Native Method)
>   - waiting on <0x000600205aa0> (a java.lang.ref.ReferenceQueue$Lock)
>   at 
> java.lang.ref.ReferenceQueue.remove(java.base@11.0.2/ReferenceQueue.java:155)
>   - waiting to re-lock in wait() <0x000600205aa0> (a 
> java.lang.ref.ReferenceQueue$Lock)
>   at 
> jdk.internal.ref.CleanerImpl.run(java.base@11.0.2/CleanerImpl.java:148)
>   at java.lang.Thread.run(java.base@11.0.2/Thread.java:834)
>   at 
> jdk.internal.misc.InnocuousThread.run(java.base@11.0.2/InnocuousThread.java:134)
> "JDWP Transport Listener: dt_socket" #11 daemon prio=10 os_prio=31 
> cpu=43.46ms elapsed=86.27s tid=0x7fb61d826800 nid=0xa603 runnable  
> [0x]
>java.lang.Thread.State: RUNNABLE
> "JDWP Event Helper Thread" #12 daemon prio=10 os_prio=31 cpu=220.06ms 
> elapsed=86.27s tid=0x7fb61d829800 nid=0x5e03 runnable  
> [0x]
>java.lang.Thread.State: RUNNABLE
> "JDWP Command Reader" #13 daemon prio=10 os_prio=31 cpu=27.26ms 
> elapsed=86.27s ti

[GitHub] [flink] flinkbot edited a comment on pull request #14141: [FLINK-20145][checkpointing] Fix priority event handling.

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14141:
URL: https://github.com/apache/flink/pull/14141#issuecomment-730461048


   
   ## CI report:
   
   * e67078cc99525b0cd2ed6fb23c1eac9063600191 UNKNOWN
   * 214ce39805f2ea2968ac2b1c7ba983715a66cdbd Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9889)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14005: Support mounting custom PVCs, secrets and config maps to job/Task manager pods

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14005:
URL: https://github.com/apache/flink/pull/14005#issuecomment-724364977


   
   ## CI report:
   
   * f5a79145e3f3cc20392e5d755eeb85a572f551d4 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9892)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14005: Support mounting custom PVCs, secrets and config maps to job/Task manager pods

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14005:
URL: https://github.com/apache/flink/pull/14005#issuecomment-724364977


   
   ## CI report:
   
   * 37ee9403c42852684b9de8116dac1dc45743f6ad Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9842)
 
   * f5a79145e3f3cc20392e5d755eeb85a572f551d4 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9892)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14005: Support mounting custom PVCs, secrets and config maps to job/Task manager pods

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14005:
URL: https://github.com/apache/flink/pull/14005#issuecomment-724364977


   
   ## CI report:
   
   * 37ee9403c42852684b9de8116dac1dc45743f6ad Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9842)
 
   * f5a79145e3f3cc20392e5d755eeb85a572f551d4 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14141: [FLINK-20145][checkpointing] Fix priority event handling.

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14141:
URL: https://github.com/apache/flink/pull/14141#issuecomment-730461048


   
   ## CI report:
   
   * e67078cc99525b0cd2ed6fb23c1eac9063600191 UNKNOWN
   * 449b6e02f6b0b873a7fbed41185663e85e30105b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9880)
 
   * 214ce39805f2ea2968ac2b1c7ba983715a66cdbd Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9889)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14157: [FLINK-19969] CLI print run-application help msg

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14157:
URL: https://github.com/apache/flink/pull/14157#issuecomment-731272625


   
   ## CI report:
   
   * 14d6e11c4a97f05f743d24760654633ee04ff841 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9884)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14156: [FLINK-20072][docs] Add documentation for FLIP-107

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14156:
URL: https://github.com/apache/flink/pull/14156#issuecomment-731272355


   
   ## CI report:
   
   * 3b54979f7f27138f764d39cac89229316b76afb7 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9883)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14141: [FLINK-20145][checkpointing] Fix priority event handling.

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14141:
URL: https://github.com/apache/flink/pull/14141#issuecomment-730461048


   
   ## CI report:
   
   * e67078cc99525b0cd2ed6fb23c1eac9063600191 UNKNOWN
   * 449b6e02f6b0b873a7fbed41185663e85e30105b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9880)
 
   * 214ce39805f2ea2968ac2b1c7ba983715a66cdbd UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] AHeise commented on pull request #14140: [FLINK-19864][tests] Fix unpredictable Thread.getState in StreamTaskTestHarness due to concurrent class loading

2020-11-20 Thread GitBox


AHeise commented on pull request #14140:
URL: https://github.com/apache/flink/pull/14140#issuecomment-731379700


   > Hi @AHeise, I try to list existing alternatives below:
   > 
   > 1. Migrate dependent tests to use `StreamTaskMailboxTestHarness`.
   > 
   > 2. Introduce `TaskMailbox.isMailboxThreadBlocked` to let testing 
thread query status of mailbox thread concurrently.
   > 
   > 3. Use `MailboxExecutor` to query whether all input has been processed.
   >a. Use 
`StreamTask.getInputOutputJointFuture(InputStatus.NOTHING_AVAILABLE).isDone()`
   >b. Use `MailboxProcessor.isDefaultActionUnavailable()`.
   > 
   > 
   > First, comparing to `MailboxExecutor`, 
`TaskMailbox.isMailboxThreadBlocked` is intrusive and undermine the 
encapsulation mailbox modeling trying to provide. I think we have converged to 
avoid this approach.
   > 
   > Second, migration may require big hard work since there are almost 53 
dependent tests as you counted. Personally, I think it would be nice if we can 
solve unstable `StreamTaskTestHarness.waitForInputProcessing` with few changes 
before migration. But it is totally up to you and/or other committers to decide 
whether it is worthwhile or not.
   > 
   > Third, I think 3(a) or similar may be what you suggest in jira. Togather 
with all-queues-empty while-looping, 3(a) and 3(b) should have same effect. I 
notice that there are some optimizations in `UnionInputGate` which cause 
`UnionInputGate.getAvailableFuture().isDone()` returns `false` while there are 
cached data. If we drop all-queues-empty while-looping, 3(a) will fail due to 
above optimizations. I am kind of preferring 
`MailboxProcessor.isDefaultActionUnavailable()`, since it is resistance to 
these optimizations.
   
   Thank you very much for your deep investigation. I asked Roman to assess the 
solution and the alternatives as he is much more adept on threading issues than 
me.
   
   In theory, I'd go with the first approach, but I understand that this is 
hardly feasible. So I like your current fix in most regards (details may or may 
not be improved). 
   
   One more idea, couldn't we also inject `END_OF_INPUT` `StreamStatus` at the 
beginning of `allInputProcessed`? I was hoping that the thread would then 
eventually terminate itself and we could simply `join`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14155: [FLINK-19775][tests] Fix SystemProcessingTimeServiceTest.testImmediateShutdown

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14155:
URL: https://github.com/apache/flink/pull/14155#issuecomment-731224882


   
   ## CI report:
   
   * d567c706a74fbcf18f7d8721bb7f5cfe414592b6 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9882)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14154: [FLINK-19878][table-planner-blink] Fix WatermarkAssigner shouldn't be after ChangelogNormalize

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14154:
URL: https://github.com/apache/flink/pull/14154#issuecomment-731223999


   
   ## CI report:
   
   * b476a71cd82914eca67cfa71abae513dad56ac6f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9881)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-20239) Confusing pages: "Hive Read & Write" and "Hive Streaming"

2020-11-20 Thread Seth Wiesman (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seth Wiesman closed FLINK-20239.

Resolution: Fixed

> Confusing pages: "Hive Read & Write" and "Hive Streaming"
> -
>
> Key: FLINK-20239
> URL: https://issues.apache.org/jira/browse/FLINK-20239
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive, Documentation
>Affects Versions: 1.12.0
>Reporter: Dawid Wysakowicz
>Assignee: Seth Wiesman
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> The two pages describe how to read & write from Hive. It is not very clear 
> what is the relation between the two pages. Moreover the {{Hive Streaming}} 
> is way more comprehensive.
> Personally I found the {{Hive Read & Write}} page not helpful and bloated 
> with irrelevant sections such as e.g. Formats, Limit pushdown which often 
> contain a single sentence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20239) Confusing pages: "Hive Read & Write" and "Hive Streaming"

2020-11-20 Thread Seth Wiesman (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236386#comment-17236386
 ] 

Seth Wiesman commented on FLINK-20239:
--

resolved in master: 5659c8faddca384ba725fe5eab5e09a62f0bf722

> Confusing pages: "Hive Read & Write" and "Hive Streaming"
> -
>
> Key: FLINK-20239
> URL: https://issues.apache.org/jira/browse/FLINK-20239
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive, Documentation
>Affects Versions: 1.12.0
>Reporter: Dawid Wysakowicz
>Assignee: Seth Wiesman
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> The two pages describe how to read & write from Hive. It is not very clear 
> what is the relation between the two pages. Moreover the {{Hive Streaming}} 
> is way more comprehensive.
> Personally I found the {{Hive Read & Write}} page not helpful and bloated 
> with irrelevant sections such as e.g. Formats, Limit pushdown which often 
> contain a single sentence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] sjwiesman closed pull request #14145: [FLINK-20239][docs] Confusing pages: Hive Read & Write and Hive Strea…

2020-11-20 Thread GitBox


sjwiesman closed pull request #14145:
URL: https://github.com/apache/flink/pull/14145


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] sjwiesman commented on pull request #14145: [FLINK-20239][docs] Confusing pages: Hive Read & Write and Hive Strea…

2020-11-20 Thread GitBox


sjwiesman commented on pull request #14145:
URL: https://github.com/apache/flink/pull/14145#issuecomment-731359621


   Thanks for the review, merging . . . 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20165) YARNSessionFIFOITCase.checkForProhibitedLogContents: Error occurred during initialization of boot layer java.lang.IllegalStateException: Module system already initialize

2020-11-20 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-20165:
---
Fix Version/s: (was: 1.11.4)
   1.11.3

> YARNSessionFIFOITCase.checkForProhibitedLogContents: Error occurred during 
> initialization of boot layer java.lang.IllegalStateException: Module system 
> already initialized
> --
>
> Key: FLINK-20165
> URL: https://issues.apache.org/jira/browse/FLINK-20165
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0, 1.11.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9597&view=logs&j=298e20ef-7951-5965-0e79-ea664ddc435e&t=8560c56f-9ec1-5c40-4ff5-9d3e882d
> {code}
> 2020-11-15T22:42:03.3053212Z 22:42:03,303 [   Time-limited test] INFO  
> org.apache.flink.yarn.YARNSessionFIFOITCase  [] - Finished 
> testDetachedMode()
> 2020-11-15T22:42:37.9020133Z [ERROR] Tests run: 5, Failures: 2, Errors: 0, 
> Skipped: 2, Time elapsed: 67.485 s <<< FAILURE! - in 
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase
> 2020-11-15T22:42:37.9022015Z [ERROR] 
> testDetachedMode(org.apache.flink.yarn.YARNSessionFIFOSecuredITCase)  Time 
> elapsed: 12.841 s  <<< FAILURE!
> 2020-11-15T22:42:37.9023701Z java.lang.AssertionError: 
> 2020-11-15T22:42:37.9025649Z Found a file 
> /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-1_0/application_1605480087188_0002/container_1605480087188_0002_01_02/taskmanager.out
>  with a prohibited string (one of [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]). Excerpts:
> 2020-11-15T22:42:37.9026730Z [
> 2020-11-15T22:42:37.9027080Z Error occurred during initialization of boot 
> layer
> 2020-11-15T22:42:37.9027623Z java.lang.IllegalStateException: Module system 
> already initialized
> 2020-11-15T22:42:37.9033278Z java.lang.IllegalStateException: Module system 
> already initialized
> 2020-11-15T22:42:37.9033825Z ]
> 2020-11-15T22:42:37.9034291Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-11-15T22:42:37.9034971Z  at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:479)
> 2020-11-15T22:42:37.9035814Z  at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:83)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] rmetzger commented on pull request #14119: [FLINK-20165] Update test docker image & improve YARN logging

2020-11-20 Thread GitBox


rmetzger commented on pull request #14119:
URL: https://github.com/apache/flink/pull/14119#issuecomment-731352020


   Thanks for the review!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20165) YARNSessionFIFOITCase.checkForProhibitedLogContents: Error occurred during initialization of boot layer java.lang.IllegalStateException: Module system already initiali

2020-11-20 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236380#comment-17236380
 ] 

Robert Metzger commented on FLINK-20165:


update & improve logging in release-1.11: 
https://github.com/apache/flink/commit/ffb14e41a4b251d7c8f3644faca8f3b9877b243e

same for master: 
https://github.com/apache/flink/commit/6e3870513ae49e7b960edf71779b6df6d227194c

> YARNSessionFIFOITCase.checkForProhibitedLogContents: Error occurred during 
> initialization of boot layer java.lang.IllegalStateException: Module system 
> already initialized
> --
>
> Key: FLINK-20165
> URL: https://issues.apache.org/jira/browse/FLINK-20165
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.11.4
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9597&view=logs&j=298e20ef-7951-5965-0e79-ea664ddc435e&t=8560c56f-9ec1-5c40-4ff5-9d3e882d
> {code}
> 2020-11-15T22:42:03.3053212Z 22:42:03,303 [   Time-limited test] INFO  
> org.apache.flink.yarn.YARNSessionFIFOITCase  [] - Finished 
> testDetachedMode()
> 2020-11-15T22:42:37.9020133Z [ERROR] Tests run: 5, Failures: 2, Errors: 0, 
> Skipped: 2, Time elapsed: 67.485 s <<< FAILURE! - in 
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase
> 2020-11-15T22:42:37.9022015Z [ERROR] 
> testDetachedMode(org.apache.flink.yarn.YARNSessionFIFOSecuredITCase)  Time 
> elapsed: 12.841 s  <<< FAILURE!
> 2020-11-15T22:42:37.9023701Z java.lang.AssertionError: 
> 2020-11-15T22:42:37.9025649Z Found a file 
> /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-1_0/application_1605480087188_0002/container_1605480087188_0002_01_02/taskmanager.out
>  with a prohibited string (one of [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]). Excerpts:
> 2020-11-15T22:42:37.9026730Z [
> 2020-11-15T22:42:37.9027080Z Error occurred during initialization of boot 
> layer
> 2020-11-15T22:42:37.9027623Z java.lang.IllegalStateException: Module system 
> already initialized
> 2020-11-15T22:42:37.9033278Z java.lang.IllegalStateException: Module system 
> already initialized
> 2020-11-15T22:42:37.9033825Z ]
> 2020-11-15T22:42:37.9034291Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-11-15T22:42:37.9034971Z  at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:479)
> 2020-11-15T22:42:37.9035814Z  at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:83)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-20165) YARNSessionFIFOITCase.checkForProhibitedLogContents: Error occurred during initialization of boot layer java.lang.IllegalStateException: Module system already initialized

2020-11-20 Thread Robert Metzger (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-20165.
--
Fix Version/s: 1.12.0
   Resolution: Fixed

> YARNSessionFIFOITCase.checkForProhibitedLogContents: Error occurred during 
> initialization of boot layer java.lang.IllegalStateException: Module system 
> already initialized
> --
>
> Key: FLINK-20165
> URL: https://issues.apache.org/jira/browse/FLINK-20165
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0, 1.11.4
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9597&view=logs&j=298e20ef-7951-5965-0e79-ea664ddc435e&t=8560c56f-9ec1-5c40-4ff5-9d3e882d
> {code}
> 2020-11-15T22:42:03.3053212Z 22:42:03,303 [   Time-limited test] INFO  
> org.apache.flink.yarn.YARNSessionFIFOITCase  [] - Finished 
> testDetachedMode()
> 2020-11-15T22:42:37.9020133Z [ERROR] Tests run: 5, Failures: 2, Errors: 0, 
> Skipped: 2, Time elapsed: 67.485 s <<< FAILURE! - in 
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase
> 2020-11-15T22:42:37.9022015Z [ERROR] 
> testDetachedMode(org.apache.flink.yarn.YARNSessionFIFOSecuredITCase)  Time 
> elapsed: 12.841 s  <<< FAILURE!
> 2020-11-15T22:42:37.9023701Z java.lang.AssertionError: 
> 2020-11-15T22:42:37.9025649Z Found a file 
> /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-1_0/application_1605480087188_0002/container_1605480087188_0002_01_02/taskmanager.out
>  with a prohibited string (one of [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]). Excerpts:
> 2020-11-15T22:42:37.9026730Z [
> 2020-11-15T22:42:37.9027080Z Error occurred during initialization of boot 
> layer
> 2020-11-15T22:42:37.9027623Z java.lang.IllegalStateException: Module system 
> already initialized
> 2020-11-15T22:42:37.9033278Z java.lang.IllegalStateException: Module system 
> already initialized
> 2020-11-15T22:42:37.9033825Z ]
> 2020-11-15T22:42:37.9034291Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-11-15T22:42:37.9034971Z  at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:479)
> 2020-11-15T22:42:37.9035814Z  at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:83)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] rmetzger merged pull request #14119: [FLINK-20165] Update test docker image & improve YARN logging

2020-11-20 Thread GitBox


rmetzger merged pull request #14119:
URL: https://github.com/apache/flink/pull/14119


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger merged pull request #14118: [FLINK-20165][1.11] Update test docker image & improve YARN logging

2020-11-20 Thread GitBox


rmetzger merged pull request #14118:
URL: https://github.com/apache/flink/pull/14118


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger commented on pull request #14118: [FLINK-20165][1.11] Update test docker image & improve YARN logging

2020-11-20 Thread GitBox


rmetzger commented on pull request #14118:
URL: https://github.com/apache/flink/pull/14118#issuecomment-731351429


   Thank you for the review!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rmetzger merged pull request #14129: [hotfix] Reduce logging verbosity from the Checkpoint-related REST handlers

2020-11-20 Thread GitBox


rmetzger merged pull request #14129:
URL: https://github.com/apache/flink/pull/14129


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14141: [FLINK-20145][checkpointing] Fix priority event handling.

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14141:
URL: https://github.com/apache/flink/pull/14141#issuecomment-730461048


   
   ## CI report:
   
   * e67078cc99525b0cd2ed6fb23c1eac9063600191 UNKNOWN
   * 449b6e02f6b0b873a7fbed41185663e85e30105b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9880)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] sjwiesman commented on a change in pull request #14145: [FLINK-20239][docs] Confusing pages: Hive Read & Write and Hive Strea…

2020-11-20 Thread GitBox


sjwiesman commented on a change in pull request #14145:
URL: https://github.com/apache/flink/pull/14145#discussion_r527904542



##
File path: docs/dev/table/hive/hive_read_write.md
##
@@ -22,119 +22,199 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Using the `HiveCatalog` and Flink's connector to Hive, Flink can read and 
write from Hive data as an alternative to Hive's batch engine.
-Be sure to follow the instructions to include the correct [dependencies]({{ 
site.baseurl }}/dev/table/hive/#depedencies) in your application.
-And please also note that Hive connector only works with blink planner.
+Using the HiveCatalog, Apache Flink can be used for unified `BATCH` and STREAM 
processing of Apache 
+Hive Tables. This means Flink can be used as a more performant alternative to 
Hive’s batch engine,
+or to continuously read and write data into and out of Hive tables to power 
real-time data
+warehousing applications. 
+
+
+   IMPORTANT: Reading and writing to and from Apache Hive is only 
supported by the Blink table planner.
+
 
 * This will be replaced by the TOC
 {:toc}
 
-## Reading From Hive
+## Reading
 
-Assume Hive contains a single table in its `default` database, named people 
that contains several rows.
+Flink supports reading data from Hive in both `BATCH` and `STREAMING` modes. 
When run as a `BATCH`
+application, Flink will execute its query over the state of the table at the 
point in time when the
+query is executed. `STREAMING` reads will continuously monitor the table and 
incrementally fetch
+new data as it is made available. Flink will read tables as bounded by default.
+
+`STREAMING` reads support consuming both partitioned and non-partitioned 
tables. 
+For partitioned tables, Flink will monitor the generation of new partitions, 
and read
+them incrementally when available. For non-partitioned tables, Flink will 
monitor the generation
+of new files in the folder and read new files incrementally.
+
+
+  
+
+Key
+Default
+Type
+Description
+
+  
+  
+
+streaming-source.enable
+false
+Boolean
+Enable streaming source or not. NOTES: Please make sure that each 
partition/file should be written atomically, otherwise the reader may get 
incomplete data.
+
+
+streaming-source.monitor-interval
+1 m
+Duration
+Time interval for consecutively monitoring partition/file.
+
+
+streaming-source.consume-order
+create-time
+String
+The consume order of streaming source, support create-time and 
partition-time. create-time compare partition/file creation time, this is not 
the partition create time in Hive metaStore, but the folder/file modification 
time in filesystem; partition-time compare time represented by partition name, 
if the partition folder somehow gets updated, e.g. add new file into folder, it 
can affect how the data is consumed. For non-partition table, this value should 
always be 'create-time'.
+
+
+streaming-source.consume-start-offset
+1970-00-00
+String
+Start offset for streaming consuming. How to parse and compare 
offsets depends on your order. For create-time and partition-time, should be a 
timestamp string (-[m]m-[d]d [hh:mm:ss]). For partition-time, will use 
partition time extractor to extract time from partition.
+
+  
+
+
+[SQL Hints]({% link dev/table/sql/hints.md %}) can be used to apply 
configurations to a Hive table
+without changing its definition in the Hive metastore.
+
+{% highlight sql %}
+
+SELECT * 
+FROM hive_table 
+/*+ OPTIONS('streaming-source.enable'='true', 
'streaming-source.consume-start-offset'='2020-05-20') */;
 
-{% highlight bash %}
-hive> show databases;
-OK
-default
-Time taken: 0.841 seconds, Fetched: 1 row(s)
-
-hive> show tables;
-OK
-Time taken: 0.087 seconds
-
-hive> CREATE TABLE mytable(name string, value double);
-OK
-Time taken: 0.127 seconds
-
-hive> SELECT * FROM mytable;
-OK
-Tom   4.72
-John  8.0
-Tom   24.2
-Bob   3.14
-Bob   4.72
-Tom   34.9
-Mary  4.79
-Tiff  2.72
-Bill  4.33
-Mary  77.7
-Time taken: 0.097 seconds, Fetched: 10 row(s)
 {% endhighlight %}
 
-With the data ready your can connect to Hive [connect to an existing Hive 
installation]({{ site.baseurl }}/dev/table/hive/#connecting-to-hive) and begin 
querying. 
+**Notes**
 
-{% highlight bash %}
+- Monitor strategy is to scan all directories/files currently in the location 
path. Many partitions may cause performance degradation.
+- Streaming reads for non-partitioned tables requires that each file be 
written atomically into the target directory.
+- Streaming reading for partitioned tables requires that each partition should 
be added atomically in the view of hive metastore. If not, new data added to an 
existing partition will be consumed.
+- Streaming reads do not support watermark grammar in Flink DDL. These tab

[GitHub] [flink] sjwiesman commented on a change in pull request #14156: [FLINK-20072][docs] Add documentation for FLIP-107

2020-11-20 Thread GitBox


sjwiesman commented on a change in pull request #14156:
URL: https://github.com/apache/flink/pull/14156#discussion_r527895688



##
File path: docs/dev/table/connectors/index.md
##
@@ -95,20 +95,23 @@ Flink natively support various connectors. The following 
tables list all availab
 How to use connectors
 
 
-Flink supports to use SQL CREATE TABLE statement to register a table. One can 
define the table name, the table schema, and the table options for connecting 
to an external system.
+Flink supports to use SQL `CREATE TABLE` statement to register a table. One 
can define the table name,
+the table schema, and the table options for connecting to an external system.

Review comment:
   ```suggestion
   Flink supports using SQL `CREATE TABLE` statements to register tables. One 
can define the table name,
   the table schema, and the table options for connecting to an external system.
   ```

##
File path: docs/dev/table/sql/create.md
##
@@ -182,24 +187,165 @@ CREATE TABLE [catalog_name.][db_name.]table_name
 
 {% endhighlight %}
 
-Creates a table with the given name. If a table with the same name already 
exists in the catalog, an exception is thrown.
+The statement above creates a table with the given name. If a table with the 
same name already exists
+in the catalog, an exception is thrown.
+
+### Columns
 
-**COMPUTED COLUMN**
+**Physical / Regular Columns**
 
-A computed column is a virtual column that is generated using the syntax  
"`column_name AS computed_column_expression`". It is generated from a non-query 
expression that uses other columns in the same table and is not physically 
stored within the table. For example, a computed column could be defined as 
`cost AS price * quantity`. The expression may contain any combination of 
physical column, constant, function, or variable. The expression cannot contain 
a subquery.
+Physical columns are regular columns known from databases. They define the 
names, the types, and the
+order of fields in the physical data. Thus, physical columns represent the 
payload that is read from
+and written to an external system. Connectors and formats are using these 
columns (in the defined order)
+to configure themselves. Other kinds of columns can be declared between 
physical columns but will not
+influence the final physical schema.
 
-Computed columns are commonly used in Flink for defining [time attributes]({{ 
site.baseurl}}/dev/table/streaming/time_attributes.html) in CREATE TABLE 
statements.
-A [processing time attribute]({{ 
site.baseurl}}/dev/table/streaming/time_attributes.html#processing-time) can be 
defined easily via `proc AS PROCTIME()` using the system `PROCTIME()` function.
-On the other hand, computed column can be used to derive event time column 
because an event time column may need to be derived from existing fields, e.g. 
the original field is not `TIMESTAMP(3)` type or is nested in a JSON string.
+The following statement creates a table with only regular columns:
 
-Notes:
+{% highlight sql %}
+CREATE TABLE MyTable (
+  `user_id` BIGINT,
+  `name` STRING
+) WITH (
+  ...
+);
+{% endhighlight %}
 
-- A computed column defined on a source table is computed after reading from 
the source, it can be used in the following SELECT query statements.
-- A computed column cannot be the target of an INSERT statement. In INSERT 
statements, the schema of SELECT clause should match the schema of the target 
table without computed columns.
+**Metadata Columns**
+
+Metadata columns are an extension to the SQL standard and allow to access 
connector and/or format specific
+fields for every row of a table. A metadata column is indicated by the 
`METADATA` keyword. For example,
+a metadata column can be be used to read and write the timestamp from and to 
Kafka records for time-based
+operations. The [connector and format documentation]({% link 
dev/table/connectors/index.md %}) lists the
+available metadata fields for every component. However, declaring a metadata 
column in a table's schema
+is optional.
+
+The following statement creates a table with an additional metadata column 
that references the metadata field `timestamp`:
+
+{% highlight sql %}
+CREATE TABLE MyTable (
+  `user_id` BIGINT,
+  `name` STRING,
+  `record_time` TIMESTAMP(3) WITH LOCAL TIME ZONE METADATA FROM 'timestamp'
-- reads and writes a Kafka record's timestamp
+) WITH (
+  'connector' = 'kafka'
+  ...
+);
+{% endhighlight %}
 
-**WATERMARK**
+Every metadata field is identified by a string-based key and has a documented 
data type. For example,
+the Kafka connector exposes a metadata field with key `timestamp` and data 
type `TIMESTAMP(3) WITH LOCAL TIME ZONE`
+that can be used for both reading and writing records.
 
-The `WATERMARK` defines the event time attributes of a table and takes the 
form `WATERMARK FOR rowtime_column_name  AS watermark_strategy_expression`.
+In the example above, the metadata column `record_time` become

[jira] [Commented] (FLINK-20267) JaasModule prevents Flink from starting if working directory is a symbolic link

2020-11-20 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236355#comment-17236355
 ] 

Arvid Heise commented on FLINK-20267:
-

There is a workaround for my setting by disabling Jaas with
{noformat}
security.module.factory.classes: 
"org.apache.flink.runtime.security.modules.HadoopModuleFactory";"org.apache.flink.runtime.security.modules.ZookeeperModuleFactory"
 {noformat}
But of course, if you need Jaas, it's not an option.

> JaasModule prevents Flink from starting if working directory is a symbolic 
> link
> ---
>
> Key: FLINK-20267
> URL: https://issues.apache.org/jira/browse/FLINK-20267
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Till Rohrmann
>Priority: Blocker
> Fix For: 1.12.0
>
>
> [~AHeise] reported that starting Flink on EMR fails with
> {code}
> java.lang.RuntimeException: unable to generate a JAAS configuration file
> at 
> org.apache.flink.runtime.security.modules.JaasModule.generateDefaultConfigFile(JaasModule.java:170)
> at 
> org.apache.flink.runtime.security.modules.JaasModule.install(JaasModule.java:94)
> at 
> org.apache.flink.runtime.security.SecurityUtils.installModules(SecurityUtils.java:78)
> at 
> org.apache.flink.runtime.security.SecurityUtils.install(SecurityUtils.java:59)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1045)
> Caused by: java.nio.file.FileAlreadyExistsException: /tmp
> at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at 
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
> at java.nio.file.Files.createDirectory(Files.java:674)
> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
> at java.nio.file.Files.createDirectories(Files.java:727)
> at 
> org.apache.flink.runtime.security.modules.JaasModule.generateDefaultConfigFile(JaasModule.java:162)
> ... 4 more
> {code}
> The problem is that on EMR {{/tmp}} is a symbolic link. Due to FLINK-19252 
> where we introduced the [creation of the working 
> directory|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/JaasModule.java#L162]
>  in order to create the default Jaas config file, the start up process fails 
> if the path for the working directory is not a directory (apparently 
> {{Files.createDirectories}} cannot deal with symbolic links).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #14129: [hotfix] Reduce logging verbosity from the Checkpoint-related REST handlers

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14129:
URL: https://github.com/apache/flink/pull/14129#issuecomment-730245586


   
   ## CI report:
   
   * 6e4ce598d82761026e8da8af6b6236e45e760ac2 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9878)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] kezhuw commented on pull request #14140: [FLINK-19864][tests] Fix unpredictable Thread.getState in StreamTaskTestHarness due to concurrent class loading

2020-11-20 Thread GitBox


kezhuw commented on pull request #14140:
URL: https://github.com/apache/flink/pull/14140#issuecomment-731332718


   Hi @AHeise, I try to list existing alternatives below:
   1. Migrate dependent tests to use `StreamTaskMailboxTestHarness`.
   2. Introduce `TaskMailbox.isMailboxThreadBlocked` to let testing thread 
query status of mailbox thread concurrently.
   3. Use `MailboxExecutor` to query whether all input has been processed.
  a. Use 
`StreamTask.getInputOutputJointFuture(InputStatus.NOTHING_AVAILABLE).isDone()`
  b. Use `MailboxProcessor.isDefaultActionUnavailable()`.
   
   First, comparing to `MailboxExecutor`, `TaskMailbox.isMailboxThreadBlocked` 
is intrusive and undermine the encapsulation mailbox modeling trying to 
provide. I think we have converged to avoid this approach.
   
   Second, migration may require big hard work since there are almost 53 
dependent tests as you counted. Personally, I think it would be nice if we can 
solve unstable `StreamTaskTestHarness.waitForInputProcessing` with few changes 
before migration. But it is totally up to you and/or other committers to decide 
whether it is worthwhile or not.
   
   Third, I think 3(a) or similar may be what you suggest in jira. Togather 
with all-queues-empty while-looping, 3(a) and 3(b) should have same effect. I 
notice that there are some optimizations in `UnionInputGate` which cause 
`UnionInputGate.getAvailableFuture().isDone()` returns `false` while there are 
cached data. If we drop all-queues-empty while-looping, 3(a) will fail due to 
above optimizations. I am kind of preferring 
`MailboxProcessor.isDefaultActionUnavailable()`, since it is resistance to 
these optimizations.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14085: [FLINK-19997] Implement an e2e test for sql-client with Confluent Registry Avro format

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14085:
URL: https://github.com/apache/flink/pull/14085#issuecomment-728305152


   
   ## CI report:
   
   * 6e64fefa42cb1e5f3fb47d209fc6cc2e902f2f08 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9876)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-20267) JaasModule prevents Flink from starting if working directory is a symbolic link

2020-11-20 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-20267:
-

 Summary: JaasModule prevents Flink from starting if working 
directory is a symbolic link
 Key: FLINK-20267
 URL: https://issues.apache.org/jira/browse/FLINK-20267
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.12.0
Reporter: Till Rohrmann
 Fix For: 1.12.0


[~AHeise] reported that starting Flink on EMR fails with

{code}
java.lang.RuntimeException: unable to generate a JAAS configuration file
at 
org.apache.flink.runtime.security.modules.JaasModule.generateDefaultConfigFile(JaasModule.java:170)
at 
org.apache.flink.runtime.security.modules.JaasModule.install(JaasModule.java:94)
at 
org.apache.flink.runtime.security.SecurityUtils.installModules(SecurityUtils.java:78)
at 
org.apache.flink.runtime.security.SecurityUtils.install(SecurityUtils.java:59)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1045)
Caused by: java.nio.file.FileAlreadyExistsException: /tmp
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
at java.nio.file.Files.createDirectory(Files.java:674)
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
at java.nio.file.Files.createDirectories(Files.java:727)
at 
org.apache.flink.runtime.security.modules.JaasModule.generateDefaultConfigFile(JaasModule.java:162)
... 4 more
{code}

The problem is that on EMR {{/tmp}} is a symbolic link. Due to FLINK-19252 
where we introduced the [creation of the working 
directory|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/JaasModule.java#L162]
 in order to create the default Jaas config file, the start up process fails if 
the path for the working directory is not a directory (apparently 
{{Files.createDirectories}} cannot deal with symbolic links).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20115) Test Batch execution for the DataStream API 

2020-11-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236327#comment-17236327
 ] 

Till Rohrmann commented on FLINK-20115:
---

I continued testing and I noticed that when running a Flink job using the new 
{{FileSource}} from the IDE (meaning the IDE spawns a JVM to run the job), it 
won't properly terminate. I am a bit clueless why this is the case but somehow 
the new {{FileSource}} must do something which the JVM termination does not 
like. See FLINK-20266 for the respective ticket.

> Test Batch execution for the DataStream API 
> 
>
> Key: FLINK-20115
> URL: https://issues.apache.org/jira/browse/FLINK-20115
> Project: Flink
>  Issue Type: Sub-task
>  Components: API / DataStream
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Till Rohrmann
>Priority: Critical
> Fix For: 1.12.0
>
>
> Test the following new features:
>  - https://issues.apache.org/jira/browse/FLINK-19316
>  - https://issues.apache.org/jira/browse/FLINK-19268
>  - https://issues.apache.org/jira/browse/FLINK-19758 
> The three issues can really only be tested in combination. FLINK-19316 is 
> done but missing documentation.
> Write an example that uses a (new) FileSource, a (new) FileSink, some random 
> transformations
> Run the example in BATCH mode
> How ergonomic is the API/configuration?
> Are there any weird log messages/exceptions in the JM/TM logs
> Maybe try sth that doesn't work on BATCH execution, such as 
> iterations/feedback edges.
> 
> [General Information about the Flink 1.12 release 
> testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing]
> When testing a feature, consider the following aspects:
> - Is the documentation easy to understand
> - Are the error messages, log messages, APIs etc. easy to understand
> - Is the feature working as expected under normal conditions
> - Is the feature working / failing as expected with invalid input, induced 
> errors etc.
> If you find a problem during testing, please file a ticket 
> (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket.
> During the testing, and once you are finished, please write a short summary 
> of all things you have tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20266) New FileSource prevents IntelliJ from stopping spawned JVM when running a job

2020-11-20 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-20266:
-

 Summary: New FileSource prevents IntelliJ from stopping spawned 
JVM when running a job
 Key: FLINK-20266
 URL: https://issues.apache.org/jira/browse/FLINK-20266
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem
Affects Versions: 1.12.0
Reporter: Till Rohrmann
 Fix For: 1.12.0


When trying out the new {{FileSource}} I noticed that the jobs which I started 
from my IDE won't properly terminate. To be more precise the spawned JVM for 
the jobs wouldn't properly terminate. I cannot really tell what the 
{{FileSource}} does differently, but when not using this source, the JVM 
terminates properly.

The stack trace of the hanging JVM is

{code}
2020-11-20 18:20:02
Full thread dump OpenJDK 64-Bit Server VM (11.0.2+9 mixed mode):

Threads class SMR info:
_java_thread_list=0x7fb5bc15f1b0, length=19, elements={
0x7fb60d807000, 0x7fb60d80c000, 0x7fb60d80f000, 0x7fb60d809000,
0x7fb60d81a000, 0x7fb61f00b000, 0x7fb63d80e000, 0x7fb61d826800,
0x7fb61d829800, 0x7fb61e80, 0x7fb63d95d800, 0x7fb63e2f8800,
0x7fb5ba37a800, 0x7fb5afe1a800, 0x7fb61dff6800, 0x7fb63da49800,
0x7fb63e8d0800, 0x7fb5be001000, 0x7fb5bb8a4000
}

"Reference Handler" #2 daemon prio=10 os_prio=31 cpu=10.05ms elapsed=86.35s 
tid=0x7fb60d807000 nid=0x4b03 waiting on condition  [0x736e9000]
   java.lang.Thread.State: RUNNABLE
at 
java.lang.ref.Reference.waitForReferencePendingList(java.base@11.0.2/Native 
Method)
at 
java.lang.ref.Reference.processPendingReferences(java.base@11.0.2/Reference.java:241)
at 
java.lang.ref.Reference$ReferenceHandler.run(java.base@11.0.2/Reference.java:213)

"Finalizer" #3 daemon prio=8 os_prio=31 cpu=0.90ms elapsed=86.35s 
tid=0x7fb60d80c000 nid=0x3803 in Object.wait()  [0x737ec000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(java.base@11.0.2/Native Method)
- waiting on <0x000600204780> (a java.lang.ref.ReferenceQueue$Lock)
at 
java.lang.ref.ReferenceQueue.remove(java.base@11.0.2/ReferenceQueue.java:155)
- waiting to re-lock in wait() <0x000600204780> (a 
java.lang.ref.ReferenceQueue$Lock)
at 
java.lang.ref.ReferenceQueue.remove(java.base@11.0.2/ReferenceQueue.java:176)
at 
java.lang.ref.Finalizer$FinalizerThread.run(java.base@11.0.2/Finalizer.java:170)

"Signal Dispatcher" #4 daemon prio=9 os_prio=31 cpu=0.31ms elapsed=86.34s 
tid=0x7fb60d80f000 nid=0x4203 runnable  [0x]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=31 cpu=2479.36ms elapsed=86.34s 
tid=0x7fb60d809000 nid=0x3f03 waiting on condition  [0x]
   java.lang.Thread.State: RUNNABLE
   No compile task

"C1 CompilerThread0" #8 daemon prio=9 os_prio=31 cpu=1412.88ms elapsed=86.34s 
tid=0x7fb60d81a000 nid=0x3d03 waiting on condition  [0x]
   java.lang.Thread.State: RUNNABLE
   No compile task

"Sweeper thread" #9 daemon prio=9 os_prio=31 cpu=42.82ms elapsed=86.34s 
tid=0x7fb61f00b000 nid=0xa803 runnable  [0x]
   java.lang.Thread.State: RUNNABLE

"Common-Cleaner" #10 daemon prio=8 os_prio=31 cpu=3.25ms elapsed=86.29s 
tid=0x7fb63d80e000 nid=0x5703 in Object.wait()  [0x73cfb000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(java.base@11.0.2/Native Method)
- waiting on <0x000600205aa0> (a java.lang.ref.ReferenceQueue$Lock)
at 
java.lang.ref.ReferenceQueue.remove(java.base@11.0.2/ReferenceQueue.java:155)
- waiting to re-lock in wait() <0x000600205aa0> (a 
java.lang.ref.ReferenceQueue$Lock)
at 
jdk.internal.ref.CleanerImpl.run(java.base@11.0.2/CleanerImpl.java:148)
at java.lang.Thread.run(java.base@11.0.2/Thread.java:834)
at 
jdk.internal.misc.InnocuousThread.run(java.base@11.0.2/InnocuousThread.java:134)

"JDWP Transport Listener: dt_socket" #11 daemon prio=10 os_prio=31 cpu=43.46ms 
elapsed=86.27s tid=0x7fb61d826800 nid=0xa603 runnable  [0x]
   java.lang.Thread.State: RUNNABLE

"JDWP Event Helper Thread" #12 daemon prio=10 os_prio=31 cpu=220.06ms 
elapsed=86.27s tid=0x7fb61d829800 nid=0x5e03 runnable  [0x]
   java.lang.Thread.State: RUNNABLE

"JDWP Command Reader" #13 daemon prio=10 os_prio=31 cpu=27.26ms elapsed=86.27s 
tid=0x7fb61e80 nid=0x6103 runnable  [0x]
   java.lang.Thread.State: RUNNABLE

"Service Thread" #14 daemon prio=9 os_prio=31 cpu=0.06ms elapsed=86.19s 
tid=0x7fb63d95d800 nid=0xa203 runnable  [0x]
   java.lang.Thread.State: RUNNABLE

"ForkJoinPool.commonPool-worker-19" #25 daemon prio=1 os_prio=31 cpu=2.00ms 
elapsed=84.76s tid=0x7fb

[GitHub] [flink] flinkbot edited a comment on pull request #14153: [FLINK-19585][tests] Waiting for job to run before savepointing in UnalignedCheckpointCompatibilityITCase. [1.11]

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14153:
URL: https://github.com/apache/flink/pull/14153#issuecomment-731191113


   
   ## CI report:
   
   * c6476a7c07039dde8ab76dffd5c42e05f565e635 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9879)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20265) Extend invocation protocol to allow functions to indicate incomplete invocation context

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20265:

Description: 
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states [A, B].
- Function receives request, but since it requires states [A, B, C, D], it 
responds with a {{IncompleteInvocationContext}} response that indicates state 
values for [C, D] is missing.
- StateFun receives this response, and registers new Flink state handles for 
[C, D].
- Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for [A, B, C, D] is resent to the 
function.

This JIRA only targets updating the Protobuf messages {{ToFunction}} and 
{{FromFunction}} to fulfill the extended protocol, and support handling 
{{IncompleteInvocationContext}} responses in the request dispatcher.

Updating SDKs should be separate subtask JIRAs.

  was:
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states [A, B].
- Function receives request, but since it requires states [A, B, C, D], it 
responds with a {{IncompleteInvocationContext}} response that indicates state 
values for [C, D] is missing.
- StateFun receives this response, and registers new Flink state handles for 
[C, D].
- Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for [A, B, C, D] is resent to the 
function.



> Extend invocation protocol to allow functions to indicate incomplete 
> invocation context
> ---
>
> Key: FLINK-20265
> URL: https://issues.apache.org/jira/browse/FLINK-20265
> Project: Flink
>  Issue Type: Sub-task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, users declare the states a function will access with a module YAML 
> definition file. The modules are loaded once when starting a StateFun 
> cluster, meaning that the state specifications remain static throughout the 
> cluster's execution lifetime.
> We propose that state specifications should be declared by the function 
> themselves via the language SDKs, instead of being declared in the module 
> YAMLs.
> The state specifications, now living in the functions, can be made 
> discoverable by the StateFun runtime through the invocation request-reply 
> protocol.
> Brief simplified sketch of the extended protocol:
> - StateFun dispatches an invocation request, with states [A, B].
> - Function receives request, but since it requires states [A, B, C, D], it 
> responds with a {{IncompleteInvocationContext}} response that indicates state 
> values for [C, D] is missing.
> - StateFun receives this response, and registers new Flink state handles for 
> [C, D].
> - Finally, a new invocation request with the same input messages, but 
> "patched" with new states to contain all values for [A, B, C, D] is resent to 
> the function.
> This JIRA only targets updating the Protobuf messages {{ToFunction}} and 
> {{FromFunction}} to fulfill the extended protocol, and support handling 
> {{IncompleteInvocationContext}} responses in the request dispatcher.
> Updating SDKs should be separate subtask JIRAs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20264) Zero-downtime / dynamic function upgrades in Stateful Functions

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20264:

Description: 
Currently, due to how functions can be executed as stateless deployments 
separate to the StateFun runtime, they can be easily upgraded with 
zero-downtime.

However, up to now there are still some restrictions to what can be done 
without restarting StateFun processes:

* Users can not upgrade existing functions to declare new persisted state
* Users can not add new addressable (can be routed messages to it by the 
StateFun runtime) functions to an existing StateFun application

The end goal of this epic is to enable the above operations for function 
deployments, without the need to restart the StateFun runtime. Further details 
can be found in subtasks of this JIRA.

  was:
Currently, due to how functions can be executed as stateless deployments 
separate to the StateFun runtime, they can be easily upgraded with 
zero-downtime.

However, up to now there are still some restrictions to what can be done 
without restarting StateFun processes:

* Users can not upgrade existing functions to declare new persisted state
* Users can not add new functions to an existing StateFun application, and have 
messages routed to it

The end goal of this epic is to enable the above operations for function 
deployments, without the need to restart the StateFun runtime. Further details 
can be found in subtasks of this JIRA.


> Zero-downtime / dynamic function upgrades in Stateful Functions
> ---
>
> Key: FLINK-20264
> URL: https://issues.apache.org/jira/browse/FLINK-20264
> Project: Flink
>  Issue Type: New Feature
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, due to how functions can be executed as stateless deployments 
> separate to the StateFun runtime, they can be easily upgraded with 
> zero-downtime.
> However, up to now there are still some restrictions to what can be done 
> without restarting StateFun processes:
> * Users can not upgrade existing functions to declare new persisted state
> * Users can not add new addressable (can be routed messages to it by the 
> StateFun runtime) functions to an existing StateFun application
> The end goal of this epic is to enable the above operations for function 
> deployments, without the need to restart the StateFun runtime. Further 
> details can be found in subtasks of this JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20265) Extend invocation protocol to allow functions to indicate incomplete state context

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20265:

Description: 
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states [A, B].
- Function receives request, but since it requires states [A, B, C, D], it 
responds with a {{IncompleteInvocationContext}} response that indicates state 
values for [C, D] is missing.
- StateFun receives this response, and registers new Flink state handles for 
[C, D].
- Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for [A, B, C, D] is resent to the 
function.


  was:
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states [A, B].
- Function receives request, but since it requires [A, B, C, D], it responds 
with a {{IncompleteInvocationContext}} response that indicates state values for 
[C, D] is missing.
- StateFun receives this response, and registers new Flink state handles for 
[C, D].
- Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for [A, B, C, D] is resent to the 
function.



> Extend invocation protocol to allow functions to indicate incomplete state 
> context
> --
>
> Key: FLINK-20265
> URL: https://issues.apache.org/jira/browse/FLINK-20265
> Project: Flink
>  Issue Type: Sub-task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, users declare the states a function will access with a module YAML 
> definition file. The modules are loaded once when starting a StateFun 
> cluster, meaning that the state specifications remain static throughout the 
> cluster's execution lifetime.
> We propose that state specifications should be declared by the function 
> themselves via the language SDKs, instead of being declared in the module 
> YAMLs.
> The state specifications, now living in the functions, can be made 
> discoverable by the StateFun runtime through the invocation request-reply 
> protocol.
> Brief simplified sketch of the extended protocol:
> - StateFun dispatches an invocation request, with states [A, B].
> - Function receives request, but since it requires states [A, B, C, D], it 
> responds with a {{IncompleteInvocationContext}} response that indicates state 
> values for [C, D] is missing.
> - StateFun receives this response, and registers new Flink state handles for 
> [C, D].
> - Finally, a new invocation request with the same input messages, but 
> "patched" with new states to contain all values for [A, B, C, D] is resent to 
> the function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20265) Extend invocation protocol to allow functions to indicate incomplete invocation context

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20265:

Summary: Extend invocation protocol to allow functions to indicate 
incomplete invocation context  (was: Extend invocation protocol to allow 
functions to indicate incomplete state context)

> Extend invocation protocol to allow functions to indicate incomplete 
> invocation context
> ---
>
> Key: FLINK-20265
> URL: https://issues.apache.org/jira/browse/FLINK-20265
> Project: Flink
>  Issue Type: Sub-task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, users declare the states a function will access with a module YAML 
> definition file. The modules are loaded once when starting a StateFun 
> cluster, meaning that the state specifications remain static throughout the 
> cluster's execution lifetime.
> We propose that state specifications should be declared by the function 
> themselves via the language SDKs, instead of being declared in the module 
> YAMLs.
> The state specifications, now living in the functions, can be made 
> discoverable by the StateFun runtime through the invocation request-reply 
> protocol.
> Brief simplified sketch of the extended protocol:
> - StateFun dispatches an invocation request, with states [A, B].
> - Function receives request, but since it requires states [A, B, C, D], it 
> responds with a {{IncompleteInvocationContext}} response that indicates state 
> values for [C, D] is missing.
> - StateFun receives this response, and registers new Flink state handles for 
> [C, D].
> - Finally, a new invocation request with the same input messages, but 
> "patched" with new states to contain all values for [A, B, C, D] is resent to 
> the function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20265) Extend invocation protocol to allow functions to indicate incomplete state context

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20265:

Description: 
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states [A, B].
- Function receives request, but since it requires [A, B, C, D], it responds 
with a {{IncompleteInvocationContext}} response that indicates state values for 
[C, D] is missing.
- StateFun receives this response, and registers new Flink state handles for 
[C, D].
- Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for [A, B, C, D] is resent to the 
function.


  was:
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states {A, B}.
- Function receives request, but since it requires {A, B, C, D} i
- StateFun receives this response, and registers new Flink state handles for 
{C, D}.
- Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.



> Extend invocation protocol to allow functions to indicate incomplete state 
> context
> --
>
> Key: FLINK-20265
> URL: https://issues.apache.org/jira/browse/FLINK-20265
> Project: Flink
>  Issue Type: Sub-task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, users declare the states a function will access with a module YAML 
> definition file. The modules are loaded once when starting a StateFun 
> cluster, meaning that the state specifications remain static throughout the 
> cluster's execution lifetime.
> We propose that state specifications should be declared by the function 
> themselves via the language SDKs, instead of being declared in the module 
> YAMLs.
> The state specifications, now living in the functions, can be made 
> discoverable by the StateFun runtime through the invocation request-reply 
> protocol.
> Brief simplified sketch of the extended protocol:
> - StateFun dispatches an invocation request, with states [A, B].
> - Function receives request, but since it requires [A, B, C, D], it responds 
> with a {{IncompleteInvocationContext}} response that indicates state values 
> for [C, D] is missing.
> - StateFun receives this response, and registers new Flink state handles for 
> [C, D].
> - Finally, a new invocation request with the same input messages, but 
> "patched" with new states to contain all values for [A, B, C, D] is resent to 
> the function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20265) Extend invocation protocol to allow functions to indicate incomplete state context

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20265:

Description: 
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states {A, B}.
- Function receives request, but since it requires {A, B, C, D} i
- StateFun receives this response, and registers new Flink state handles for 
{C, D}.
- Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.


  was:
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states {A, B}.
- Function receives request, but since it requires {A, B, C, D} it responds 
with a IncompleteInvocationContext response indicating that state values for C, 
D is missing.
- StateFun receives this response, and registers new Flink state handles for 
{C, D}.
- Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.



> Extend invocation protocol to allow functions to indicate incomplete state 
> context
> --
>
> Key: FLINK-20265
> URL: https://issues.apache.org/jira/browse/FLINK-20265
> Project: Flink
>  Issue Type: Sub-task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, users declare the states a function will access with a module YAML 
> definition file. The modules are loaded once when starting a StateFun 
> cluster, meaning that the state specifications remain static throughout the 
> cluster's execution lifetime.
> We propose that state specifications should be declared by the function 
> themselves via the language SDKs, instead of being declared in the module 
> YAMLs.
> The state specifications, now living in the functions, can be made 
> discoverable by the StateFun runtime through the invocation request-reply 
> protocol.
> Brief simplified sketch of the extended protocol:
> - StateFun dispatches an invocation request, with states {A, B}.
> - Function receives request, but since it requires {A, B, C, D} i
> - StateFun receives this response, and registers new Flink state handles for 
> {C, D}.
> - Finally, a new invocation request with the same input messages, but 
> "patched" with new states to contain all values for {A, B, C, D} is resent to 
> the function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20265) Extend invocation protocol to allow functions to indicate incomplete state context

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20265:

Description: 
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states {A, B}.
# Function receives request, but since it requires {A, B, C, D} it responds 
with a IncompleteInvocationContext response indicating that state values for C, 
D is missing.
# StateFun receives this response, and registers new Flink state handles for 
{C, D}.
# Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.


  was:
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:

# StateFun dispatches an invocation request, with states {A, B}.
# Function receives request, but since it requires {A, B, C, D}, it responds 
with a IncompleteInvocationContext response indicating that state values for C, 
D is missing.
# StateFun receives this response, and registers new Flink state handles for 
{C, D}.
# Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.



> Extend invocation protocol to allow functions to indicate incomplete state 
> context
> --
>
> Key: FLINK-20265
> URL: https://issues.apache.org/jira/browse/FLINK-20265
> Project: Flink
>  Issue Type: Sub-task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, users declare the states a function will access with a module YAML 
> definition file. The modules are loaded once when starting a StateFun 
> cluster, meaning that the state specifications remain static throughout the 
> cluster's execution lifetime.
> We propose that state specifications should be declared by the function 
> themselves via the language SDKs, instead of being declared in the module 
> YAMLs.
> The state specifications, now living in the functions, can be made 
> discoverable by the StateFun runtime through the invocation request-reply 
> protocol.
> Brief simplified sketch of the extended protocol:
> - StateFun dispatches an invocation request, with states {A, B}.
> # Function receives request, but since it requires {A, B, C, D} it responds 
> with a IncompleteInvocationContext response indicating that state values for 
> C, D is missing.
> # StateFun receives this response, and registers new Flink state handles for 
> {C, D}.
> # Finally, a new invocation request with the same input messages, but 
> "patched" with new states to contain all values for {A, B, C, D} is resent to 
> the function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20265) Extend invocation protocol to allow functions to indicate incomplete state context

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20265:

Description: 
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:

* StateFun dispatches an invocation request, with states {A, B}.
* Function receives request, but since it requires {A, B, C, D}, it responds 
with a IncompleteInvocationContext response indicating that state values for C, 
D is missing.
* StateFun receives this response, and registers new Flink state handles for 
{C, D}.
* Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.


  was:
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states {A, B}.

- Function receives request, but since it requires {A, B, C, D}, it responds 
with a IncompleteInvocationContext response indicating that state values for C, 
D is missing

- StateFun receives this response, and registers new Flink state handles for 
{C, D}.

- Then, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.




> Extend invocation protocol to allow functions to indicate incomplete state 
> context
> --
>
> Key: FLINK-20265
> URL: https://issues.apache.org/jira/browse/FLINK-20265
> Project: Flink
>  Issue Type: Sub-task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, users declare the states a function will access with a module YAML 
> definition file. The modules are loaded once when starting a StateFun 
> cluster, meaning that the state specifications remain static throughout the 
> cluster's execution lifetime.
> We propose that state specifications should be declared by the function 
> themselves via the language SDKs, instead of being declared in the module 
> YAMLs.
> The state specifications, now living in the functions, can be made 
> discoverable by the StateFun runtime through the invocation request-reply 
> protocol.
> Brief simplified sketch of the extended protocol:
> * StateFun dispatches an invocation request, with states {A, B}.
> * Function receives request, but since it requires {A, B, C, D}, it responds 
> with a IncompleteInvocationContext response indicating that state values for 
> C, D is missing.
> * StateFun receives this response, and registers new Flink state handles for 
> {C, D}.
> * Finally, a new invocation request with the same input messages, but 
> "patched" with new states to contain all values for {A, B, C, D} is resent to 
> the function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20265) Extend invocation protocol to allow functions to indicate incomplete state context

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20265:

Description: 
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:

# StateFun dispatches an invocation request, with states {A, B}.
# Function receives request, but since it requires {A, B, C, D}, it responds 
with a IncompleteInvocationContext response indicating that state values for C, 
D is missing.
# StateFun receives this response, and registers new Flink state handles for 
{C, D}.
# Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.


  was:
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:

* StateFun dispatches an invocation request, with states {A, B}.
* Function receives request, but since it requires {A, B, C, D}, it responds 
with a IncompleteInvocationContext response indicating that state values for C, 
D is missing.
* StateFun receives this response, and registers new Flink state handles for 
{C, D}.
* Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.



> Extend invocation protocol to allow functions to indicate incomplete state 
> context
> --
>
> Key: FLINK-20265
> URL: https://issues.apache.org/jira/browse/FLINK-20265
> Project: Flink
>  Issue Type: Sub-task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, users declare the states a function will access with a module YAML 
> definition file. The modules are loaded once when starting a StateFun 
> cluster, meaning that the state specifications remain static throughout the 
> cluster's execution lifetime.
> We propose that state specifications should be declared by the function 
> themselves via the language SDKs, instead of being declared in the module 
> YAMLs.
> The state specifications, now living in the functions, can be made 
> discoverable by the StateFun runtime through the invocation request-reply 
> protocol.
> Brief simplified sketch of the extended protocol:
> # StateFun dispatches an invocation request, with states {A, B}.
> # Function receives request, but since it requires {A, B, C, D}, it responds 
> with a IncompleteInvocationContext response indicating that state values for 
> C, D is missing.
> # StateFun receives this response, and registers new Flink state handles for 
> {C, D}.
> # Finally, a new invocation request with the same input messages, but 
> "patched" with new states to contain all values for {A, B, C, D} is resent to 
> the function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20265) Extend invocation protocol to allow functions to indicate incomplete state context

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20265:

Description: 
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states {A, B}.
- Function receives request, but since it requires {A, B, C, D} it responds 
with a IncompleteInvocationContext response indicating that state values for C, 
D is missing.
- StateFun receives this response, and registers new Flink state handles for 
{C, D}.
- Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.


  was:
Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states {A, B}.
# Function receives request, but since it requires {A, B, C, D} it responds 
with a IncompleteInvocationContext response indicating that state values for C, 
D is missing.
# StateFun receives this response, and registers new Flink state handles for 
{C, D}.
# Finally, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.



> Extend invocation protocol to allow functions to indicate incomplete state 
> context
> --
>
> Key: FLINK-20265
> URL: https://issues.apache.org/jira/browse/FLINK-20265
> Project: Flink
>  Issue Type: Sub-task
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, users declare the states a function will access with a module YAML 
> definition file. The modules are loaded once when starting a StateFun 
> cluster, meaning that the state specifications remain static throughout the 
> cluster's execution lifetime.
> We propose that state specifications should be declared by the function 
> themselves via the language SDKs, instead of being declared in the module 
> YAMLs.
> The state specifications, now living in the functions, can be made 
> discoverable by the StateFun runtime through the invocation request-reply 
> protocol.
> Brief simplified sketch of the extended protocol:
> - StateFun dispatches an invocation request, with states {A, B}.
> - Function receives request, but since it requires {A, B, C, D} it responds 
> with a IncompleteInvocationContext response indicating that state values for 
> C, D is missing.
> - StateFun receives this response, and registers new Flink state handles for 
> {C, D}.
> - Finally, a new invocation request with the same input messages, but 
> "patched" with new states to contain all values for {A, B, C, D} is resent to 
> the function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20265) Extend invocation protocol to allow functions to indicate incomplete state context

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-20265:
---

 Summary: Extend invocation protocol to allow functions to indicate 
incomplete state context
 Key: FLINK-20265
 URL: https://issues.apache.org/jira/browse/FLINK-20265
 Project: Flink
  Issue Type: Sub-task
  Components: Stateful Functions
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai
 Fix For: statefun-2.3.0


Currently, users declare the states a function will access with a module YAML 
definition file. The modules are loaded once when starting a StateFun cluster, 
meaning that the state specifications remain static throughout the cluster's 
execution lifetime.

We propose that state specifications should be declared by the function 
themselves via the language SDKs, instead of being declared in the module YAMLs.

The state specifications, now living in the functions, can be made discoverable 
by the StateFun runtime through the invocation request-reply protocol.

Brief simplified sketch of the extended protocol:
- StateFun dispatches an invocation request, with states {A, B}.

- Function receives request, but since it requires {A, B, C, D}, it responds 
with a IncompleteInvocationContext response indicating that state values for C, 
D is missing

- StateFun receives this response, and registers new Flink state handles for 
{C, D}.

- Then, a new invocation request with the same input messages, but "patched" 
with new states to contain all values for {A, B, C, D} is resent to the 
function.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20256) UDAF type inference will fail if accumulator contains MapView with Pojo value type

2020-11-20 Thread Timo Walther (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236316#comment-17236316
 ] 

Timo Walther commented on FLINK-20256:
--

[~TsReaper] the reason is that {{MyPojo}} is not a valid structured type. It 
has neither a default constructor nor fully assigning constructor. However, 
this issue is valid because the exception is swallowed due to the special 
{{MapView}} path in the logic.

> UDAF type inference will fail if accumulator contains MapView with Pojo value 
> type
> --
>
> Key: FLINK-20256
> URL: https://issues.apache.org/jira/browse/FLINK-20256
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Caizhi Weng
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.12.0
>
>
> To reproduce this bug, add the following test to {{FunctionITCase.java}}.
> {code:java}
> public static class MyPojo implements Serializable {
>   public String a;
>   public int b;
>   public MyPojo(String s) {
>   this.a = s;
>   this.b = s.length();
>   }
> }
> public static class MyAcc implements Serializable {
>   public MapView view = new MapView<>();
>   public MyAcc() {}
>   public void add(String a, String b) {
>   try {
>   view.put(a, new MyPojo(b));
>   } catch (Exception e) {
>   throw new RuntimeException(e);
>   }
>   }
> }
> public static class TestUDAF extends AggregateFunction {
>   @Override
>   public MyAcc createAccumulator() {
>   return new MyAcc();
>   }
>   public void accumulate(MyAcc acc, String value) {
>   if (value != null) {
>   acc.add(value, value);
>   }
>   }
>   @Override
>   public String getValue(MyAcc acc) {
>   return "test";
>   }
> }
> @Test
> public void myTest() throws Exception {
>   String ddl = "create function MyACC as '" + TestUDAF.class.getName() + 
> "'";
>   tEnv().executeSql(ddl).await();
>   try (CloseableIterator it = tEnv().executeSql("SELECT 
> MyACC('123')").collect()) {
>   while (it.hasNext()) {
>   System.out.println(it.next());
>   }
>   }
> }
> {code}
> And we'll get the following exception stack
> {code}
> java.lang.ClassCastException: org.apache.flink.table.types.AtomicDataType 
> cannot be cast to org.apache.flink.table.types.KeyValueDataType
>   at 
> org.apache.flink.table.planner.typeutils.DataViewUtils$MapViewSpec.getKeyDataType(DataViewUtils.java:257)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1$$anonfun$22.apply(AggsHandlerCodeGenerator.scala:1231)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1$$anonfun$22.apply(AggsHandlerCodeGenerator.scala:1231)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$.org$apache$flink$table$planner$codegen$agg$AggsHandlerCodeGenerator$$addReusableDataViewSerializer(AggsHandlerCodeGenerator.scala:1294)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1.apply(AggsHandlerCodeGenerator.scala:1228)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1.apply(AggsHandlerCodeGenerator.scala:1211)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$.addReusableStateDataViews(AggsHandlerCodeGenerator.scala:1211)
>   at 
> org.apache.flink.table.planner.codegen.agg.ImperativeAggCodeGen.(ImperativeAggCodeGen.scala:112)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$3.apply(AggsHandlerCodeGenerator.scala:233)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$3.apply(AggsHandlerCodeGenerator.scala:214)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at

[jira] [Comment Edited] (FLINK-17641) How to secure flink applications on yarn on multi-tenant environment

2020-11-20 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236289#comment-17236289
 ] 

Ethan Li edited comment on FLINK-17641 at 11/20/20, 5:06 PM:
-

Thanks very much for your reply [~rmetzger]. The suggestions are helpful.

Sorry I haven't been able to come back to this Jira and reply.

We currently have a solution for this issue (or at least part of it) internally 
and I'd like to share it once we have it working in production. Thanks!


was (Author: ethanli):
Thanks very much for your reply [~rmetzger]. The suggestions are helpful.

Sorry I haven't been able to come back to this Jira and reply.

We currently have a solution for this issue internally and I'd like to share it 
once we have it working in production. Thanks!

> How to secure flink applications on yarn on multi-tenant environment
> 
>
> Key: FLINK-17641
> URL: https://issues.apache.org/jira/browse/FLINK-17641
> Project: Flink
>  Issue Type: Wish
>  Components: Deployment / YARN
>Reporter: Ethan Li
>Priority: Major
>
> This is a question I wish to get some insights on. 
> We are trying to support and secure flink on shared yarn cluster. Besides the 
> security provided by yarn side (queueACL, kerberos), what I noticed is that 
> flink CLI can still interact with the flink job as long as it knows the 
> jobmanager rpc port/hostname and rest.port, which can be obtained easily with 
> yarn command. 
> Also on the UI side, on yarn cluster, users can visit flink job UI via yarn 
> proxy using browser. As long as the user can authenticate and view yarn 
> resourcemanager webpage, he/she can visit the flink UI without any problem. 
> This basically means Flink UI is wide-open to corp internal users.
> On the internal connection side, I am aware of the support added in 1.10 to 
> limit the mTLS connection by configuring 
> security.ssl.internal.cert.fingerprint 
> (https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html)
> This works but it is not very flexible. Users need to update the config if 
> the cert changes before they submit a new job.
> I asked the similar question on the mailing list before. I am really 
> interested in how other folks deal with this issue. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236292#comment-17236292
 ] 

Till Rohrmann commented on FLINK-20220:
---

Please also use the latest master if possible [~rmetzger].

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Robert Metzger
>Priority: Critical
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.acto

[jira] [Commented] (FLINK-17641) How to secure flink applications on yarn on multi-tenant environment

2020-11-20 Thread Ethan Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236289#comment-17236289
 ] 

Ethan Li commented on FLINK-17641:
--

Thanks very much for your reply [~rmetzger]. The suggestions are helpful.

Sorry I haven't been able to come back to this Jira and reply.

We currently have a solution for this issue internally and I'd like to share it 
once we have it working in production. Thanks!

> How to secure flink applications on yarn on multi-tenant environment
> 
>
> Key: FLINK-17641
> URL: https://issues.apache.org/jira/browse/FLINK-17641
> Project: Flink
>  Issue Type: Wish
>  Components: Deployment / YARN
>Reporter: Ethan Li
>Priority: Major
>
> This is a question I wish to get some insights on. 
> We are trying to support and secure flink on shared yarn cluster. Besides the 
> security provided by yarn side (queueACL, kerberos), what I noticed is that 
> flink CLI can still interact with the flink job as long as it knows the 
> jobmanager rpc port/hostname and rest.port, which can be obtained easily with 
> yarn command. 
> Also on the UI side, on yarn cluster, users can visit flink job UI via yarn 
> proxy using browser. As long as the user can authenticate and view yarn 
> resourcemanager webpage, he/she can visit the flink UI without any problem. 
> This basically means Flink UI is wide-open to corp internal users.
> On the internal connection side, I am aware of the support added in 1.10 to 
> limit the mTLS connection by configuring 
> security.ssl.internal.cert.fingerprint 
> (https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html)
> This works but it is not very flexible. Users need to update the config if 
> the cert changes before they submit a new job.
> I asked the similar question on the mailing list before. I am really 
> interested in how other folks deal with this issue. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] tillrohrmann commented on a change in pull request #14114: [FLINK-20153] Add documentation for BATCH execution mode

2020-11-20 Thread GitBox


tillrohrmann commented on a change in pull request #14114:
URL: https://github.com/apache/flink/pull/14114#discussion_r527824264



##
File path: docs/dev/datastream_execution_mode.md
##
@@ -0,0 +1,241 @@
+---
+title: "Execution Mode (Batch/Streaming)"
+nav-id: datastream_execution_mode
+nav-parent_id: streaming
+nav-pos: 1
+---
+
+
+The DataStream API supports different runtime execution modes from which you
+can choose depending on the requirements of your use case and the
+characteristics of your job.
+
+There is the "classic" execution behavior of the DataStream API, which we call
+`STREAMING` execution mode. This should be used for unbounded jobs that require
+continuous incremental processing and are expected to stay online indefinitely.
+
+Additionally, there is a batch-style execution mode that we call `BATCH`
+execution mode. This executes jobs in a way that is more reminiscent of batch
+processing frameworks such as MapReduce. This should be used for bounded jobs
+for which you have a known fixed input and which do not run continuously.
+
+We have these different execution modes because `BATCH` execution allows some
+additional optimizations that we can only do when we know that our input is
+bounded. For example, different join/aggregation strategies can be used, in
+addition to a different shuffle implementation that allows more efficient
+failure recovery behavior. We will go into some of the details of the execution
+behavior below.
+
+* This will be replaced by the TOC
+{:toc}
+
+## When can/should I use BATCH execution mode?
+
+The BATCH execution mode can only be used for Jobs/Flink Programs that are
+_bounded_. Boundedness is a property of a data source that tells us whether all
+the input coming from that source is known before execution or whether new data
+will show up, potentially indefinitely. A job, in turn, is bounded if all its
+sources are bounded, and unbounded otherwise.
+
+STREAMING execution mode, on the other hand, can be used for both bounded and
+unbounded jobs.
+
+As a rule of thumb, you should be using BATCH execution mode when your program
+is bounded because this will be more efficient. You have to use STREAMING
+execution mode when your program is unbounded because only this mode is general
+enough to be able to deal with continuous data streams.
+
+One obvious outlier case is when you want to use a bounded job to bootstrap
+some job state that you then want to use in an unbounded job. For example, by
+running a bounded job using STREAMING mode, taking a savepoint, and then
+restoring that savepoint on an unbounded job. This is a very specific use case
+and one that might soon become obsolete when we allow producing a savepoint as
+additional output of a BATCH execution job.
+
+Another case where you might run a bounded job using STREAMING mode is when
+writing tests for code that will eventually run with unbounded sources. For
+testing it can be more natural to use a bounded source in those cases.
+
+## Configuring BATCH execution mode
+
+The execution mode can be configured via the `execution.runtime-mode` setting.
+There are three possible values:
+
+ - `STREAMING`: The classic DataStream execution mode
+ - `BATCH`: Batch-style execution on the DataStream API
+ - `AUTOMATIC`: Let the system decide based on the boundedness of the sources
+
+ This can be configured either in `flink-conf.yaml`, via command line
+ parameters of `bin/flink run ...`, or programmatically when
+ creating/configuring the `StreamExecutionEnvironment`.
+
+ Here's how you can configure the execution mode via the command line:
+
+ ```bash
+ $ bin/flink run -Dexecution.runtime-mode=BATCH 
examples/streaming/WordCount.jar
+ ```
+
+ This example shows how you can configure the execution mode in code:
+
+ ```java
+Configuration config = new Configuration();
+config.set(ExecutionOptions.RUNTIME_MODE, RuntimeExecutionMode.BATCH);
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment(config);
+ ```
+
+## Execution Behavior
+
+This section provides an overview of the execution behavior of BATCH execution
+mode and contrasts it with STREAMING execution mode. For more details, please
+refer to the FLIPs that introduced this feature:
+[FLIP-134](https://cwiki.apache.org/confluence/x/4i94CQ) and

Review comment:
   Out of curiosity, how did you get the shortened urls? Moreover, are they 
stable across renames of the FLIP?

##
File path: docs/dev/datastream_execution_mode.md
##
@@ -0,0 +1,241 @@
+---
+title: "Execution Mode (Batch/Streaming)"
+nav-id: datastream_execution_mode
+nav-parent_id: streaming
+nav-pos: 1
+---
+
+
+The DataStream API supports different runtime execution modes from which you
+can choose depending on the requirements of your use case and the
+characteristics of your job.
+
+There is the "classic" execution behavior of the DataStream API, which we call
+`STREAMING` execution mode. This should be used for unbounded j

[GitHub] [flink] tillrohrmann commented on a change in pull request #14114: [FLINK-20153] Add documentation for BATCH execution mode

2020-11-20 Thread GitBox


tillrohrmann commented on a change in pull request #14114:
URL: https://github.com/apache/flink/pull/14114#discussion_r527823106



##
File path: docs/dev/datastream_execution_mode.md
##
@@ -0,0 +1,241 @@
+---
+title: "Execution Mode (Batch/Streaming)"
+nav-id: datastream_execution_mode
+nav-parent_id: streaming
+nav-pos: 1
+---
+
+
+The DataStream API supports different runtime execution modes from which you
+can choose depending on the requirements of your use case and the
+characteristics of your job.
+
+There is the "classic" execution behavior of the DataStream API, which we call
+`STREAMING` execution mode. This should be used for unbounded jobs that require
+continuous incremental processing and are expected to stay online indefinitely.
+
+Additionally, there is a batch-style execution mode that we call `BATCH`
+execution mode. This executes jobs in a way that is more reminiscent of batch
+processing frameworks such as MapReduce. This should be used for bounded jobs
+for which you have a known fixed input and which do not run continuously.
+
+We have these different execution modes because `BATCH` execution allows some
+additional optimizations that we can only do when we know that our input is
+bounded. For example, different join/aggregation strategies can be used, in
+addition to a different shuffle implementation that allows more efficient
+failure recovery behavior. We will go into some of the details of the execution
+behavior below.
+
+* This will be replaced by the TOC
+{:toc}
+
+## When can/should I use BATCH execution mode?
+
+The BATCH execution mode can only be used for Jobs/Flink Programs that are
+_bounded_. Boundedness is a property of a data source that tells us whether all
+the input coming from that source is known before execution or whether new data
+will show up, potentially indefinitely. A job, in turn, is bounded if all its
+sources are bounded, and unbounded otherwise.
+
+STREAMING execution mode, on the other hand, can be used for both bounded and
+unbounded jobs.
+
+As a rule of thumb, you should be using BATCH execution mode when your program
+is bounded because this will be more efficient. You have to use STREAMING
+execution mode when your program is unbounded because only this mode is general
+enough to be able to deal with continuous data streams.
+
+One obvious outlier case is when you want to use a bounded job to bootstrap
+some job state that you then want to use in an unbounded job. For example, by
+running a bounded job using STREAMING mode, taking a savepoint, and then
+restoring that savepoint on an unbounded job. This is a very specific use case
+and one that might soon become obsolete when we allow producing a savepoint as
+additional output of a BATCH execution job.
+
+Another case where you might run a bounded job using STREAMING mode is when
+writing tests for code that will eventually run with unbounded sources. For
+testing it can be more natural to use a bounded source in those cases.
+
+## Configuring BATCH execution mode
+
+The execution mode can be configured via the `execution.runtime-mode` setting.
+There are three possible values:
+
+ - `STREAMING`: The classic DataStream execution mode
+ - `BATCH`: Batch-style execution on the DataStream API
+ - `AUTOMATIC`: Let the system decide based on the boundedness of the sources
+
+ This can be configured either in `flink-conf.yaml`, via command line

Review comment:
   If we don't want to encourage people to put the execution mode into the 
`flink-conf.yaml`, then we shouldn't mention it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20256) UDAF type inference will fail if accumulator contains MapView with Pojo value type

2020-11-20 Thread Timo Walther (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236287#comment-17236287
 ] 

Timo Walther commented on FLINK-20256:
--

[~rmetzger] I will investigate that now and come back shortly. But map views 
are more of an internal feature anyway. We recently updated the docs to mention 
them in one sentence but without examples.

> UDAF type inference will fail if accumulator contains MapView with Pojo value 
> type
> --
>
> Key: FLINK-20256
> URL: https://issues.apache.org/jira/browse/FLINK-20256
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Caizhi Weng
>Assignee: Timo Walther
>Priority: Blocker
> Fix For: 1.12.0
>
>
> To reproduce this bug, add the following test to {{FunctionITCase.java}}.
> {code:java}
> public static class MyPojo implements Serializable {
>   public String a;
>   public int b;
>   public MyPojo(String s) {
>   this.a = s;
>   this.b = s.length();
>   }
> }
> public static class MyAcc implements Serializable {
>   public MapView view = new MapView<>();
>   public MyAcc() {}
>   public void add(String a, String b) {
>   try {
>   view.put(a, new MyPojo(b));
>   } catch (Exception e) {
>   throw new RuntimeException(e);
>   }
>   }
> }
> public static class TestUDAF extends AggregateFunction {
>   @Override
>   public MyAcc createAccumulator() {
>   return new MyAcc();
>   }
>   public void accumulate(MyAcc acc, String value) {
>   if (value != null) {
>   acc.add(value, value);
>   }
>   }
>   @Override
>   public String getValue(MyAcc acc) {
>   return "test";
>   }
> }
> @Test
> public void myTest() throws Exception {
>   String ddl = "create function MyACC as '" + TestUDAF.class.getName() + 
> "'";
>   tEnv().executeSql(ddl).await();
>   try (CloseableIterator it = tEnv().executeSql("SELECT 
> MyACC('123')").collect()) {
>   while (it.hasNext()) {
>   System.out.println(it.next());
>   }
>   }
> }
> {code}
> And we'll get the following exception stack
> {code}
> java.lang.ClassCastException: org.apache.flink.table.types.AtomicDataType 
> cannot be cast to org.apache.flink.table.types.KeyValueDataType
>   at 
> org.apache.flink.table.planner.typeutils.DataViewUtils$MapViewSpec.getKeyDataType(DataViewUtils.java:257)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1$$anonfun$22.apply(AggsHandlerCodeGenerator.scala:1231)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1$$anonfun$22.apply(AggsHandlerCodeGenerator.scala:1231)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$.org$apache$flink$table$planner$codegen$agg$AggsHandlerCodeGenerator$$addReusableDataViewSerializer(AggsHandlerCodeGenerator.scala:1294)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1.apply(AggsHandlerCodeGenerator.scala:1228)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$addReusableStateDataViews$1.apply(AggsHandlerCodeGenerator.scala:1211)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$.addReusableStateDataViews(AggsHandlerCodeGenerator.scala:1211)
>   at 
> org.apache.flink.table.planner.codegen.agg.ImperativeAggCodeGen.(ImperativeAggCodeGen.scala:112)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$3.apply(AggsHandlerCodeGenerator.scala:233)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerCodeGenerator$$anonfun$3.apply(AggsHandlerCodeGenerator.scala:214)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at 
> org.apache.flink.table.planner.codegen.agg.AggsHandlerC

[GitHub] [flink] flinkbot edited a comment on pull request #14157: [FLINK-19969] CLI print run-application help msg

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14157:
URL: https://github.com/apache/flink/pull/14157#issuecomment-731272625


   
   ## CI report:
   
   * 14d6e11c4a97f05f743d24760654633ee04ff841 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9884)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14156: [FLINK-20072][docs] Add documentation for FLIP-107

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14156:
URL: https://github.com/apache/flink/pull/14156#issuecomment-731272355


   
   ## CI report:
   
   * 3b54979f7f27138f764d39cac89229316b76afb7 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9883)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14085: [FLINK-19997] Implement an e2e test for sql-client with Confluent Registry Avro format

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14085:
URL: https://github.com/apache/flink/pull/14085#issuecomment-728305152


   
   ## CI report:
   
   * f7bf961107516f7998ffce948f4d4eb3c830bce3 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9874)
 
   * 6e64fefa42cb1e5f3fb47d209fc6cc2e902f2f08 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9876)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14157: [FLINK-19969] CLI print run-application help msg

2020-11-20 Thread GitBox


flinkbot commented on pull request #14157:
URL: https://github.com/apache/flink/pull/14157#issuecomment-731272625


   
   ## CI report:
   
   * 14d6e11c4a97f05f743d24760654633ee04ff841 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #14156: [FLINK-20072][docs] Add documentation for FLIP-107

2020-11-20 Thread GitBox


flinkbot commented on pull request #14156:
URL: https://github.com/apache/flink/pull/14156#issuecomment-731272355


   
   ## CI report:
   
   * 3b54979f7f27138f764d39cac89229316b76afb7 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-20263) Improve exception when metadata name mismatch

2020-11-20 Thread Timo Walther (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236273#comment-17236273
 ] 

Timo Walther commented on FLINK-20263:
--

I'm fine if that helps. I also opened a PR with a lot of docs for the usage.

> Improve exception when metadata name mismatch
> -
>
> Key: FLINK-20263
> URL: https://issues.apache.org/jira/browse/FLINK-20263
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Dawid Wysakowicz
>Assignee: Timo Walther
>Priority: Critical
> Fix For: 1.12.0
>
>
> I'd suggest to slightly improve the exception message when there is a 
> mismatch in the field name. It would be nice to provide with an example of a 
> valid syntax.
> I used:
> {code}
>tstmp TIMESTAMP(3) METADATA,
> {code}
> Right now we get:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> {code}
> would be nice to have something like:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> Example:
> tstmp TIMESTAMP(3) METADATA FROM 'headers' -- I think it would be fine to 
> simply pick up the first key, I understand it would be hard to derive the 
> desired property
> {code}
> This would let users easier figure out the error in the syntax. I might've 
> copied the example from somewhere, because of being lazy and I might not be 
> aware of the other syntax. It would be hard for me to figure out what is the 
> problem from the original exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-20263) Improve exception when metadata name mismatch

2020-11-20 Thread Timo Walther (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther reassigned FLINK-20263:


Assignee: Timo Walther

> Improve exception when metadata name mismatch
> -
>
> Key: FLINK-20263
> URL: https://issues.apache.org/jira/browse/FLINK-20263
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Dawid Wysakowicz
>Assignee: Timo Walther
>Priority: Critical
> Fix For: 1.12.0
>
>
> I'd suggest to slightly improve the exception message when there is a 
> mismatch in the field name. It would be nice to provide with an example of a 
> valid syntax.
> I used:
> {code}
>tstmp TIMESTAMP(3) METADATA,
> {code}
> Right now we get:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> {code}
> would be nice to have something like:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> Example:
> tstmp TIMESTAMP(3) METADATA FROM 'headers' -- I think it would be fine to 
> simply pick up the first key, I understand it would be hard to derive the 
> desired property
> {code}
> This would let users easier figure out the error in the syntax. I might've 
> copied the example from somewhere, because of being lazy and I might not be 
> aware of the other syntax. It would be hard for me to figure out what is the 
> problem from the original exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #14157: [FLINK-19969] CLI print run-application help msg

2020-11-20 Thread GitBox


flinkbot commented on pull request #14157:
URL: https://github.com/apache/flink/pull/14157#issuecomment-731267534


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 14d6e11c4a97f05f743d24760654633ee04ff841 (Fri Nov 20 
16:25:05 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20264) Zero-downtime / dynamic application upgrades in Stateful Functions

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20264:

Summary: Zero-downtime / dynamic application upgrades in Stateful Functions 
 (was: Zero-downtime / dynamic function upgrades in Stateful Functions)

> Zero-downtime / dynamic application upgrades in Stateful Functions
> --
>
> Key: FLINK-20264
> URL: https://issues.apache.org/jira/browse/FLINK-20264
> Project: Flink
>  Issue Type: New Feature
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, due to how functions can be executed as stateless deployments 
> separate to the StateFun runtime, they can be easily upgraded with 
> zero-downtime.
> However, up to now there are still some restrictions to what can be done 
> without restarting StateFun processes:
> * Users can not upgrade existing functions to declare new persisted state
> * Users can not add new functions to an existing StateFun application, and 
> have messages routed to it
> The end goal of this epic is to enable the above operations for function 
> deployments, without the need to restart the StateFun runtime. Further 
> details can be found in subtasks of this JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20264) Zero-downtime / dynamic function upgrades in Stateful Functions

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20264:

Summary: Zero-downtime / dynamic function upgrades in Stateful Functions  
(was: Zero-downtime / dynamic application upgrades in Stateful Functions)

> Zero-downtime / dynamic function upgrades in Stateful Functions
> ---
>
> Key: FLINK-20264
> URL: https://issues.apache.org/jira/browse/FLINK-20264
> Project: Flink
>  Issue Type: New Feature
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, due to how functions can be executed as stateless deployments 
> separate to the StateFun runtime, they can be easily upgraded with 
> zero-downtime.
> However, up to now there are still some restrictions to what can be done 
> without restarting StateFun processes:
> * Users can not upgrade existing functions to declare new persisted state
> * Users can not add new functions to an existing StateFun application, and 
> have messages routed to it
> The end goal of this epic is to enable the above operations for function 
> deployments, without the need to restart the StateFun runtime. Further 
> details can be found in subtasks of this JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19969) CliFrontendParser does not provide any help for run-application

2020-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-19969:
---
Labels: pull-request-available  (was: )

> CliFrontendParser does not provide any help for run-application
> ---
>
> Key: FLINK-19969
> URL: https://issues.apache.org/jira/browse/FLINK-19969
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.11.2
>Reporter: Flavio Pompermaier
>Assignee: Kostas Kloudas
>Priority: Minor
>  Labels: pull-request-available
>
> flink CLI doesn't show any help about run-application



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20264) Zero-downtime / dynamic function upgrades in Stateful Functions

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20264:

Description: 
Currently, due to how functions can be executed as stateless deployments 
separate to the StateFun runtime, they can be easily upgraded with 
zero-downtime.

However, up to now there are still some restrictions to what can be done 
without restarting StateFun processes:

* Users can not upgrade existing functions to declare new persisted state
* Users can not add new functions to an existing StateFun application, and have 
messages routed to it

The end goal of this epic is to enable the above operations for function 
deployments, without the need to restart the StateFun runtime. Further details 
can be found in subtasks of this JIRA.

  was:
Currently, due to how functions can be executed as stateless deployments 
separate to the StateFun runtime, they can be easily upgraded with 
zero-downtime.

However, up to now there are still some restrictions to what can be done 
without restarting StateFun processes:

* Can't upgrade existing functions to declare new persisted state
* Can't add new functions to an existing StateFun application, and have 
messages routed to it

The end goal of this epic is to enable the above operations for function 
deployments, without the need to restart the StateFun runtime. Further details 
can be found in subtasks of this JIRA.


> Zero-downtime / dynamic function upgrades in Stateful Functions
> ---
>
> Key: FLINK-20264
> URL: https://issues.apache.org/jira/browse/FLINK-20264
> Project: Flink
>  Issue Type: New Feature
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, due to how functions can be executed as stateless deployments 
> separate to the StateFun runtime, they can be easily upgraded with 
> zero-downtime.
> However, up to now there are still some restrictions to what can be done 
> without restarting StateFun processes:
> * Users can not upgrade existing functions to declare new persisted state
> * Users can not add new functions to an existing StateFun application, and 
> have messages routed to it
> The end goal of this epic is to enable the above operations for function 
> deployments, without the need to restart the StateFun runtime. Further 
> details can be found in subtasks of this JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] kl0u opened a new pull request #14157: [FLINK-19969] CLI print run-application help msg

2020-11-20 Thread GitBox


kl0u opened a new pull request #14157:
URL: https://github.com/apache/flink/pull/14157


   ## What is the purpose of the change
   
   Adds a help message for the `run-application` CLI command.
   
   ## Brief change log
   
   The main change is in the `CliFrontendParser.printHelpForRunApplication()`.
   
   ## Verifying this change
   
   Build Flink and run from the CLI: `run-application -h`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / **no**)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
 - The serializers: (yes / **no** / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / 
don't know)
 - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / **no**)
 - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20264) Zero-downtime / dynamic function upgrades in Stateful Functions

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tzu-Li (Gordon) Tai updated FLINK-20264:

Issue Type: New Feature  (was: Task)

> Zero-downtime / dynamic function upgrades in Stateful Functions
> ---
>
> Key: FLINK-20264
> URL: https://issues.apache.org/jira/browse/FLINK-20264
> Project: Flink
>  Issue Type: New Feature
>  Components: Stateful Functions
>Reporter: Tzu-Li (Gordon) Tai
>Priority: Major
> Fix For: statefun-2.3.0
>
>
> Currently, due to how functions can be executed as stateless deployments 
> separate to the StateFun runtime, they can be easily upgraded with 
> zero-downtime.
> However, up to now there are still some restrictions to what can be done 
> without restarting StateFun processes:
> * Can't upgrade existing functions to declare new persisted state
> * Can't add new functions to an existing StateFun application, and have 
> messages routed to it
> The end goal of this epic is to enable the above operations for function 
> deployments, without the need to restart the StateFun runtime. Further 
> details can be found in subtasks of this JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20264) Zero-downtime / dynamic function upgrades in Stateful Functions

2020-11-20 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-20264:
---

 Summary: Zero-downtime / dynamic function upgrades in Stateful 
Functions
 Key: FLINK-20264
 URL: https://issues.apache.org/jira/browse/FLINK-20264
 Project: Flink
  Issue Type: Task
  Components: Stateful Functions
Reporter: Tzu-Li (Gordon) Tai
 Fix For: statefun-2.3.0


Currently, due to how functions can be executed as stateless deployments 
separate to the StateFun runtime, they can be easily upgraded with 
zero-downtime.

However, up to now there are still some restrictions to what can be done 
without restarting StateFun processes:

* Can't upgrade existing functions to declare new persisted state
* Can't add new functions to an existing StateFun application, and have 
messages routed to it

The end goal of this epic is to enable the above operations for function 
deployments, without the need to restart the StateFun runtime. Further details 
can be found in subtasks of this JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-19969) CliFrontendParser does not provide any help for run-application

2020-11-20 Thread Kostas Kloudas (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostas Kloudas reassigned FLINK-19969:
--

Assignee: Kostas Kloudas

> CliFrontendParser does not provide any help for run-application
> ---
>
> Key: FLINK-19969
> URL: https://issues.apache.org/jira/browse/FLINK-19969
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.11.2
>Reporter: Flavio Pompermaier
>Assignee: Kostas Kloudas
>Priority: Minor
>
> flink CLI doesn't show any help about run-application



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #14156: [FLINK-20072][docs] Add documentation for FLIP-107

2020-11-20 Thread GitBox


flinkbot commented on pull request #14156:
URL: https://github.com/apache/flink/pull/14156#issuecomment-731259626


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 91d5af092ff9853f1194232465c70c9925cebda8 (Fri Nov 20 
16:10:33 UTC 2020)
   
✅no warnings
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] rkhachatryan commented on a change in pull request #13827: [FLINK-19681][checkpointing] Timeout aligned checkpoints based on checkpointStartDelay

2020-11-20 Thread GitBox


rkhachatryan commented on a change in pull request #13827:
URL: https://github.com/apache/flink/pull/13827#discussion_r527794600



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/SingleCheckpointBarrierHandler.java
##
@@ -235,6 +257,23 @@ int getNumOpenChannels() {
return numOpenChannels;
}
 
+   private CheckpointBarrier maybeTimeout(CheckpointBarrier barrier) {
+   CheckpointOptions options = barrier.getCheckpointOptions();
+   boolean shouldTimeout = (options.isTimeoutable()) && (
+   barrier.getId() == timeoutedBarrierId ||
+   (System.currentTimeMillis() - barrier.getTimestamp()) > 
options.getAlignmentTimeout());

Review comment:
   After a discussion with @NicoK, @sjwiesman and @alpinegizmo we decided 
to:
   1. Decide to timeout based on the alignment start time
   1. By default, propagate this decision downstream; provide an option to 
disable propagation
   1. In the UI, show checkpoint type for each subtask; on a checkpoint level 
display unaligned if at least one subtask did UC
   1. Consider renaming `alignment timeout` option to  `subtask alignment 
timeout` 
   
   Considerations:
   - the overhead of UC (persisting channels) should ideally be localized
   - the less global the decision is, the more difficult it might be to debug 
UC-related issues
   - In a common scenario, backpressure comes from sinks; buffers will be full, 
so disabling propagation doesn't make a difference
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20072) Add documentation for FLIP-107

2020-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-20072:
---
Labels: pull-request-available  (was: )

> Add documentation for FLIP-107
> --
>
> Key: FLINK-20072
> URL: https://issues.apache.org/jira/browse/FLINK-20072
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Timo Walther
>Assignee: Timo Walther
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Add documentation for FLIP-107:
> - Connector/format metadata in general
> - Kafka key/value and metadata
> - Debezium metadata



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] twalthr opened a new pull request #14156: [FLINK-20072][docs] Add documentation for FLIP-107

2020-11-20 Thread GitBox


twalthr opened a new pull request #14156:
URL: https://github.com/apache/flink/pull/14156


   ## What is the purpose of the change
   
   Documents all features of FLIP-107.
   
   ## Brief change log
   
   Updates:
   - DDL
   - Kafka connector
   - Upsert Kafka connector
   - Debezium format
   
   Fixes various issues with:
   - Kinesis connector
   - Format page
   - Connector page
   
   Additionally, fixes nullability in Debezium metadata.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? docs
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-20263) Improve exception when metadata name mismatch

2020-11-20 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-20263:
-
Description: 
I'd suggest to slightly improve the exception message when there is a mismatch 
in the field name. It would be nice to provide with an example of a valid 
syntax.

I used:
{code}
   tstmp TIMESTAMP(3) METADATA,
{code}

Right now we get:
{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp
{code}

would be nice to have something like:

{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp

Example:
tstmp TIMESTAMP(3) METADATA FROM 'headers' -- I think it would be fine to 
simply pick up the first key, I understand it would be hard to derive the 
desired property
{code}

This would let users easier figure out the error in the syntax. I might've 
copied the example from somewhere, because of being lazy and I might not be 
aware of the other syntax. It would be hard for me to figure out what is the 
problem from the original exception.

  was:
I'd suggest to slightly improve the exception message when there is a mismatch 
in the field name. It would be nice to provide with an example of a valid 
syntax.

I used:
{code}
   tstmp TIMESTAMP(3) METADATA,
{code}

Right now we get:
{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp
{code}

would be nice to have something like:

{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp

Example:
tstmp TIMESTAMP(3) METADATA FROM 'headers'
{code}

This would let users easier figure out the error in the syntax. I might've 
copied the example from somewhere, because of being lazy and I might not be 
aware of the other syntax. It would be hard for me to figure out what is the 
problem from the original exception.


> Improve exception when metadata name mismatch
> -
>
> Key: FLINK-20263
> URL: https://issues.apache.org/jira/browse/FLINK-20263
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Dawid Wysakowicz
>Priority: Critical
> Fix For: 1.12.0
>
>
> I'd suggest to slightly improve the exception message when there is a 
> mismatch in the field name. It would be nice to provide with an example of a 
> valid syntax.
> I used:
> {code}
>tstmp TIMESTAMP(3) METADATA,
> {code}
> Right now we get:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> {code}
> would be nice to have something like:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> Example:
> tstmp TIMESTAMP(3) METADATA FROM 'headers' -- I think it would be fine to 
> simply pick up the first key, I understand it would be hard to derive the 
> desired property
> {code}
> This would let users easier figure out the error in the syntax. I might've 
> copied the example from somewhere, because of being lazy and I might not be 
> aware of the other syntax. It would be hard for me to figure out what is the 
> problem from the original exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20157) SourceCoordinatorProvider kills JobManager with IllegalStateException on job submission

2020-11-20 Thread Jiangjie Qin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236253#comment-17236253
 ] 

Jiangjie Qin commented on FLINK-20157:
--

[~rmetzger] Thanks for reporting the issue. It seems that there are a few bugs 
related to this issue.

I have fired a few tickets, including FLINK-20193, FLINK-20194, FLINK-20222, 
FLINK-20223 and FLINK-20081. With the patches from those tickets, the 
{{StateMachineExample}} runs fine.

> SourceCoordinatorProvider kills JobManager with IllegalStateException on job 
> submission
> ---
>
> Key: FLINK-20157
> URL: https://issues.apache.org/jira/browse/FLINK-20157
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Jiangjie Qin
>Priority: Blocker
> Fix For: 1.12.0
>
>
> While setting up a test job using the new Kafka source for testing the RC1 of 
> Flink 1.12, my JobManager died with a fatal exception:
> {code}
> 2020-11-13 17:05:53,947 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Deploying 
> Flat Map -> Sink: Print to Std. Out (1/1) (attempt #0) with attempt id 
> fc36327d85e775204e82fc8507bf4264 to 192.168.1.25:57387-78ca68 @ 
> robertsbabamac2.localdomain (dataPort=57390) with allocation id 
> a8d918c0cfb57305908ce5a4f4787034
> 2020-11-13 17:05:53,988 ERROR 
> org.apache.flink.runtime.util.FatalExitExceptionHandler  [] - FATAL: 
> Thread 'SourceCoordinator-Source: Kafka Source' produced an uncaught 
> exception. Stopping the process...
> java.lang.IllegalStateException: Should never happen. This factory should 
> only be used by a SingleThreadExecutor.
> at 
> org.apache.flink.runtime.source.coordinator.SourceCoordinatorProvider$CoordinatorExecutorThreadFactory.newThread(SourceCoordinatorProvider.java:94)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:619)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  ~[?:1.8.0_222]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_222]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
> {code}
> I'm using the KafkaSource as documented, with a single partition topic:
> {code:java}
>   KafkaSource source = KafkaSource
>.builder()
>.setBootstrapServers(brokers)
>.setGroupId("myGroup")
>.setTopics(Arrays.asList(kafkaTopic))
>.setDeserializer(new NewEventDeserializer())
>.build();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20263) Improve exception when metadata name mismatch

2020-11-20 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-20263:
-
Description: 
I'd suggest to slightly improve the exception message when there is a mismatch 
in the field name. It would be nice to provide with an example of a valid 
syntax.

I used:
{code}
   tstmp TIMESTAMP(3) METADATA,
{code}

Right now we get:
{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp
{code}

would be nice to have something like:

{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp

Example:
tstmp TIMESTAMP(3) METADATA FROM 'headers'
{code}

This would let users easier figure out the error in the syntax. I might've 
copied the example from somewhere, because of being lazy and I might not be 
aware of the other syntax. It would be hard for me to figure out what is the 
problem from the original exception.

  was:
I'd suggest to slightly improve the exception message when there is a mismatch 
in the field name. It would be nice to provide with an example of a valid 
syntax.

I used:
{code}
   tstmp TIMESTAMP(3) METADATA,
{code}

Right now we get:
{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp
{code}

would be nice to have something like:

{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp

Example:
tstmp TIMESTAMP(3) METADATA FROM 'timestamp'
{code}

This would let users easier figure out the error in the syntax. I might've 
copied the example from somewhere, because of being lazy and I might not be 
aware of the other syntax. It would be hard for me to figure out what is the 
problem from the original exception.


> Improve exception when metadata name mismatch
> -
>
> Key: FLINK-20263
> URL: https://issues.apache.org/jira/browse/FLINK-20263
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Dawid Wysakowicz
>Priority: Critical
> Fix For: 1.12.0
>
>
> I'd suggest to slightly improve the exception message when there is a 
> mismatch in the field name. It would be nice to provide with an example of a 
> valid syntax.
> I used:
> {code}
>tstmp TIMESTAMP(3) METADATA,
> {code}
> Right now we get:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> {code}
> would be nice to have something like:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> Example:
> tstmp TIMESTAMP(3) METADATA FROM 'headers'
> {code}
> This would let users easier figure out the error in the syntax. I might've 
> copied the example from somewhere, because of being lazy and I might not be 
> aware of the other syntax. It would be hard for me to figure out what is the 
> problem from the original exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20263) Improve exception when metadata name mismatch

2020-11-20 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236251#comment-17236251
 ] 

Jark Wu commented on FLINK-20263:
-

Nit: add quotes around the column name :


{code:java}
Example:
`tstmp` TIMESTAMP(3) METADATA FROM 'timestamp'
{code}


> Improve exception when metadata name mismatch
> -
>
> Key: FLINK-20263
> URL: https://issues.apache.org/jira/browse/FLINK-20263
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Dawid Wysakowicz
>Priority: Critical
> Fix For: 1.12.0
>
>
> I'd suggest to slightly improve the exception message when there is a 
> mismatch in the field name. It would be nice to provide with an example of a 
> valid syntax.
> I used:
> {code}
>tstmp TIMESTAMP(3) METADATA,
> {code}
> Right now we get:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> {code}
> would be nice to have something like:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> Example:
> tstmp TIMESTAMP(3) METADATA FROM 'headers'
> {code}
> This would let users easier figure out the error in the syntax. I might've 
> copied the example from somewhere, because of being lazy and I might not be 
> aware of the other syntax. It would be hard for me to figure out what is the 
> problem from the original exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20263) Improve exception when metadata name mismatch

2020-11-20 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-20263:
-
Summary: Improve exception when metadata name mismatch  (was: Improve 
exception when metada name mismatch)

> Improve exception when metadata name mismatch
> -
>
> Key: FLINK-20263
> URL: https://issues.apache.org/jira/browse/FLINK-20263
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Dawid Wysakowicz
>Priority: Critical
> Fix For: 1.12.0
>
>
> I'd suggest to slightly improve the exception message when there is a 
> mismatch in the field name. It would be nice to provide with an example of a 
> valid syntax.
> I used:
> {code}
>tstmp TIMESTAMP(3) METADATA,
> {code}
> Right now we get:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> {code}
> would be nice to have something like:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> Example:
> tstmp TIMESTAMP(3) METADATA FROM 'timestamp'
> {code}
> This would let users easier figure out the error in the syntax. I might've 
> copied the example from somewhere, because of being lazy and I might not be 
> aware of the other syntax. It would be hard for me to figure out what is the 
> problem from the original exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20263) Improve exception when metadata name mismatch

2020-11-20 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236250#comment-17236250
 ] 

Jark Wu commented on FLINK-20263:
-

Great idea!

> Improve exception when metadata name mismatch
> -
>
> Key: FLINK-20263
> URL: https://issues.apache.org/jira/browse/FLINK-20263
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Dawid Wysakowicz
>Priority: Critical
> Fix For: 1.12.0
>
>
> I'd suggest to slightly improve the exception message when there is a 
> mismatch in the field name. It would be nice to provide with an example of a 
> valid syntax.
> I used:
> {code}
>tstmp TIMESTAMP(3) METADATA,
> {code}
> Right now we get:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> {code}
> would be nice to have something like:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> Example:
> tstmp TIMESTAMP(3) METADATA FROM 'timestamp'
> {code}
> This would let users easier figure out the error in the syntax. I might've 
> copied the example from somewhere, because of being lazy and I might not be 
> aware of the other syntax. It would be hard for me to figure out what is the 
> problem from the original exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (FLINK-20059) Outdated SQL docs on aggregate functions' merge

2020-11-20 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reopened FLINK-20059:
-

> Outdated SQL docs on aggregate functions' merge
> ---
>
> Key: FLINK-20059
> URL: https://issues.apache.org/jira/browse/FLINK-20059
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table SQL / API
>Affects Versions: 1.12.0, 1.11.2
>Reporter: Nico Kruber
>Assignee: Jark Wu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> In the java docs as well as the user docs, the {{merge}} method of an 
> aggregation UDF is described as optional, e.g.
> {quote}Merges a group of accumulator instances into one accumulator instance. 
> This function must be implemented for data stream session window grouping 
> aggregates and data set grouping aggregates.{quote}
> However, it seems that nowadays this method is required in more cases (I 
> stumbled on this for a HOP window in streaming):
> {code}
> StreamExecGlobalGroupAggregate.scala
>   .needMerge(mergedAccOffset, mergedAccOnHeap, mergedAccExternalTypes)
> StreamExecGroupWindowAggregateBase.scala
>   generator.needMerge(mergedAccOffset = 0, mergedAccOnHeap = false)
> StreamExecIncrementalGroupAggregate.scala
>   .needMerge(mergedAccOffset, mergedAccOnHeap = true, 
> mergedAccExternalTypes)
> StreamExecLocalGroupAggregate.scala
>   .needMerge(mergedAccOffset = 0, mergedAccOnHeap = true)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20263) Improve exception when metada name mismatch

2020-11-20 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-20263:
-
Description: 
I'd suggest to slightly improve the exception message when there is a mismatch 
in the field name. It would be nice to provide with an example of a valid 
syntax.

I used:
{code}
   tstmp TIMESTAMP(3) METADATA,
{code}

Right now we get:
{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp
{code}

would be nice to have something like:

{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp

Example:
tstmp TIMESTAMP(3) METADATA FROM 'timestamp'
{code}

This would let users easier figure out the error in the syntax. I might've 
copied the example from somewhere, because of being lazy and I might not be 
aware of the other syntax. It would be hard for me to figure out what is the 
problem from the original exception.

  was:
I'd suggest to slightly improve the exception message when there is a mismatch 
in the field name. It would be nice to provide with an example of a valid 
syntax.

Right now we get:
{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp
{code}

would be nice to have something like:

{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp

Example:
tstmp TIMESTAMP(3) METADATA FROM 'timestamp'
{code}

This would let users easier figure out the error in the syntax.


> Improve exception when metada name mismatch
> ---
>
> Key: FLINK-20263
> URL: https://issues.apache.org/jira/browse/FLINK-20263
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Dawid Wysakowicz
>Priority: Critical
> Fix For: 1.12.0
>
>
> I'd suggest to slightly improve the exception message when there is a 
> mismatch in the field name. It would be nice to provide with an example of a 
> valid syntax.
> I used:
> {code}
>tstmp TIMESTAMP(3) METADATA,
> {code}
> Right now we get:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> {code}
> would be nice to have something like:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> Example:
> tstmp TIMESTAMP(3) METADATA FROM 'timestamp'
> {code}
> This would let users easier figure out the error in the syntax. I might've 
> copied the example from somewhere, because of being lazy and I might not be 
> aware of the other syntax. It would be hard for me to figure out what is the 
> problem from the original exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20059) Outdated SQL docs on aggregate functions' merge

2020-11-20 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236247#comment-17236247
 ] 

Jark Wu commented on FLINK-20059:
-

Sure [~twalthr], AFAIK, `TableAggregateFunction` also requires "merge" in 
hopping windows. But `TableAggregateFunction` doesn't support local-global 
optimization yet. 

> Outdated SQL docs on aggregate functions' merge
> ---
>
> Key: FLINK-20059
> URL: https://issues.apache.org/jira/browse/FLINK-20059
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Table SQL / API
>Affects Versions: 1.12.0, 1.11.2
>Reporter: Nico Kruber
>Assignee: Jark Wu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> In the java docs as well as the user docs, the {{merge}} method of an 
> aggregation UDF is described as optional, e.g.
> {quote}Merges a group of accumulator instances into one accumulator instance. 
> This function must be implemented for data stream session window grouping 
> aggregates and data set grouping aggregates.{quote}
> However, it seems that nowadays this method is required in more cases (I 
> stumbled on this for a HOP window in streaming):
> {code}
> StreamExecGlobalGroupAggregate.scala
>   .needMerge(mergedAccOffset, mergedAccOnHeap, mergedAccExternalTypes)
> StreamExecGroupWindowAggregateBase.scala
>   generator.needMerge(mergedAccOffset = 0, mergedAccOnHeap = false)
> StreamExecIncrementalGroupAggregate.scala
>   .needMerge(mergedAccOffset, mergedAccOnHeap = true, 
> mergedAccExternalTypes)
> StreamExecLocalGroupAggregate.scala
>   .needMerge(mergedAccOffset = 0, mergedAccOnHeap = true)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20263) Improve exception when metada name mismatch

2020-11-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-20263:
--
Summary: Improve exception when metada name mismatch  (was: Improve 
exception when metada name mismath)

> Improve exception when metada name mismatch
> ---
>
> Key: FLINK-20263
> URL: https://issues.apache.org/jira/browse/FLINK-20263
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: Dawid Wysakowicz
>Priority: Critical
> Fix For: 1.12.0
>
>
> I'd suggest to slightly improve the exception message when there is a 
> mismatch in the field name. It would be nice to provide with an example of a 
> valid syntax.
> Right now we get:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> {code}
> would be nice to have something like:
> {code}
> org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' 
> in column 'tstmp' of table 
> 'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
> class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
> supports the following metadata keys for writing:
> headers
> timestamp
> Example:
> tstmp TIMESTAMP(3) METADATA FROM 'timestamp'
> {code}
> This would let users easier figure out the error in the syntax.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] kezhuw commented on a change in pull request #14140: [FLINK-19864][tests] Fix unpredictable Thread.getState in StreamTaskTestHarness due to concurrent class loading

2020-11-20 Thread GitBox


kezhuw commented on a change in pull request #14140:
URL: https://github.com/apache/flink/pull/14140#discussion_r527781089



##
File path: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/tasks/StreamTaskTestHarness.java
##
@@ -401,20 +401,36 @@ public void waitForInputProcessing() throws Exception {
}
}
 
-   // then wait for the Task Thread to be in a blocked state
-   // Check whether the state is blocked, this should be the case 
if it cannot
-   // notifyNonEmpty more input, i.e. all currently available 
input has been processed.
-   while (true) {
-   Thread.State state = taskThread.getState();
-   if (state == Thread.State.BLOCKED || state == 
Thread.State.TERMINATED ||
-   state == Thread.State.WAITING || state 
== Thread.State.TIMED_WAITING) {
+   // Wait for all currently available input has been processed.
+   final AtomicBoolean allInputProcessed = new AtomicBoolean();
+   final MailboxProcessor mailboxProcessor = 
taskThread.task.mailboxProcessor;
+   final MailboxExecutor mailboxExecutor = 
mailboxProcessor.getMainMailboxExecutor();
+   while (taskThread.isAlive()) {
+   try {
+   final CountDownLatch latch = new 
CountDownLatch(1);
+   mailboxExecutor.execute(() -> {
+   
allInputProcessed.set(mailboxProcessor.isDefaultActionUnavailable());
+   latch.countDown();
+   }, 
"query-whether-processInput-has-suspend-itself");
+   // Mail could be dropped due to task exception, 
so we do timed-await here.
+   latch.await(1, TimeUnit.SECONDS);

Review comment:
   This `await` has two purposes here:
   1. Wait until post mail has been processed, so we can query 
`allInputProcessed` safely.
   2. If post mail has been dropped due to task exception, break out indefinite 
wait.
   
   It does not serve as sleeping to yield control to mailbox thread. Without 
`sleep`, testing thread and mailbox thread may do ping-pong game between 
process-one-element and execute-one-mail.
   
   I tend to keep it, it does not affect correctness at least.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-20263) Improve exception when metada name mismath

2020-11-20 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-20263:


 Summary: Improve exception when metada name mismath
 Key: FLINK-20263
 URL: https://issues.apache.org/jira/browse/FLINK-20263
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.12.0
Reporter: Dawid Wysakowicz
 Fix For: 1.12.0


I'd suggest to slightly improve the exception message when there is a mismatch 
in the field name. It would be nice to provide with an example of a valid 
syntax.

Right now we get:
{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp
{code}

would be nice to have something like:

{code}
org.apache.flink.table.api.ValidationException: Invalid metadata key 'tstmp' in 
column 'tstmp' of table 
'default_catalog.default_database.pageviews_per_region'. The DynamicTableSink 
class 'org.apache.flink.streaming.connectors.kafka.table.KafkaDynamicSink' 
supports the following metadata keys for writing:
headers
timestamp

Example:
tstmp TIMESTAMP(3) METADATA FROM 'timestamp'
{code}

This would let users easier figure out the error in the syntax.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19125) Avoid memory fragmentation when running flink docker image

2020-11-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-19125.
-
Resolution: Fixed

Fixed via b183c32ded5bf1776360d14c2793928a0d3d118e in the apache/flink-docker 
repository.

> Avoid memory fragmentation when running flink docker image
> --
>
> Key: FLINK-19125
> URL: https://issues.apache.org/jira/browse/FLINK-19125
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes, Runtime / State Backends
>Affects Versions: 1.12.0, 1.11.1
>Reporter: Yun Tang
>Assignee: Yun Tang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0, 1.11.3
>
>
> This ticket tracks the problem of memory fragmentation when launching default 
> Flink docker image.
> In FLINK-18712, user reported if he submits job with rocksDB state backend on 
> a k8s session cluster again and again once it finished, the memory usage of 
> task manager grows continuously until OOM killed. 
>  I reproduce this problem with official Flink docker image no matter how we 
> use rocksDB (whether to enable managed memory or not).
> I dig into the problem and found this is due to the memory fragmentation 
> caused by {{glibc}}, which would not return memory to kernel gracefully 
> (please refer to [glibc 
> bugzilla|https://sourceware.org/bugzilla/show_bug.cgi?id=15321] and [glibc 
> manual|https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc])
> I found limiting MALLOC_ARENA_MAX to 2 could mitigate this problem (please 
> refer to 
> [choose-for-malloc_arena_max|https://devcenter.heroku.com/articles/tuning-glibc-memory-behavior#what-value-to-choose-for-malloc_arena_max]
>  for more details).
> And if we choose to use jemalloc to allocate memory via rebuilding another 
> docker image, the problem would be gone. 
> {code:java}
> apt-get -y install libjemalloc-dev
> ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so
> {code}
> Jemalloc intends to [emphasize fragmentation 
> avoidance|https://github.com/jemalloc/jemalloc/wiki/Background#intended-use] 
> and we might consider to re-factor our Dockerfile to base on jemalloc to 
> avoid memory fragmentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-20 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236240#comment-17236240
 ] 

Robert Metzger commented on FLINK-20220:


[~trohrmann] note that these logs are from 1.11.2. I will try to reproduce the 
issue on DEBUG level.

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Robert Metzger
>Priority: Critical
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2

[GitHub] [flink] flinkbot edited a comment on pull request #14154: [FLINK-19878][table-planner-blink] Fix WatermarkAssigner shouldn't be after ChangelogNormalize

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14154:
URL: https://github.com/apache/flink/pull/14154#issuecomment-731223999


   
   ## CI report:
   
   * b476a71cd82914eca67cfa71abae513dad56ac6f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9881)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14155: [FLINK-19775][tests] Fix SystemProcessingTimeServiceTest.testImmediateShutdown

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14155:
URL: https://github.com/apache/flink/pull/14155#issuecomment-731224882


   
   ## CI report:
   
   * d567c706a74fbcf18f7d8721bb7f5cfe414592b6 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9882)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #14141: [FLINK-20145][checkpointing] Fix priority event handling.

2020-11-20 Thread GitBox


flinkbot edited a comment on pull request #14141:
URL: https://github.com/apache/flink/pull/14141#issuecomment-730461048


   
   ## CI report:
   
   * e67078cc99525b0cd2ed6fb23c1eac9063600191 UNKNOWN
   * 523aa4920e1d5a5eb15f7716ef4857ad052c3833 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9871)
 
   * 449b6e02f6b0b873a7fbed41185663e85e30105b Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9880)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] tillrohrmann commented on a change in pull request #14132: [FLINK-20214][k8s] Fix the unnecessary warning logs when Hadoop environment is not set

2020-11-20 Thread GitBox


tillrohrmann commented on a change in pull request #14132:
URL: https://github.com/apache/flink/pull/14132#discussion_r527764469



##
File path: 
flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/parameters/AbstractKubernetesParameters.java
##
@@ -159,16 +160,16 @@ public boolean hasLog4j() {
 
@Override
public Optional getLocalHadoopConfigurationDirectory() {
-   final String[] possibleHadoopConfPaths = new String[] {
-   System.getenv(Constants.ENV_HADOOP_CONF_DIR),
-   System.getenv(Constants.ENV_HADOOP_HOME) + 
"/etc/hadoop", // hadoop 2.2
-   System.getenv(Constants.ENV_HADOOP_HOME) + "/conf"

Review comment:
   Why is it ok to remove this line here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] kezhuw commented on a change in pull request #14140: [FLINK-19864][tests] Fix unpredictable Thread.getState in StreamTaskTestHarness due to concurrent class loading

2020-11-20 Thread GitBox


kezhuw commented on a change in pull request #14140:
URL: https://github.com/apache/flink/pull/14140#discussion_r527763762



##
File path: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/tasks/StreamTaskTestHarness.java
##
@@ -401,20 +401,36 @@ public void waitForInputProcessing() throws Exception {
}
}
 
-   // then wait for the Task Thread to be in a blocked state
-   // Check whether the state is blocked, this should be the case 
if it cannot
-   // notifyNonEmpty more input, i.e. all currently available 
input has been processed.
-   while (true) {
-   Thread.State state = taskThread.getState();
-   if (state == Thread.State.BLOCKED || state == 
Thread.State.TERMINATED ||
-   state == Thread.State.WAITING || state 
== Thread.State.TIMED_WAITING) {
+   // Wait for all currently available input has been processed.
+   final AtomicBoolean allInputProcessed = new AtomicBoolean();
+   final MailboxProcessor mailboxProcessor = 
taskThread.task.mailboxProcessor;
+   final MailboxExecutor mailboxExecutor = 
mailboxProcessor.getMainMailboxExecutor();
+   while (taskThread.isAlive()) {
+   try {
+   final CountDownLatch latch = new 
CountDownLatch(1);
+   mailboxExecutor.execute(() -> {
+   
allInputProcessed.set(mailboxProcessor.isDefaultActionUnavailable());

Review comment:
   It may made wrong name for `allInputProcessed`, it should express all 
current available input has been processed, not end of input. What 
`StreamTaskTestHarness.waitForInputProcessing` does is waiting current 
available input processed, so that following up testing code could do 
post-process assertion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Assigned] (FLINK-19857) FLIP-149: Introduce the upsert-kafka Connector

2020-11-20 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-19857:
---

Assignee: Shengkai Fang

> FLIP-149: Introduce the upsert-kafka Connector
> --
>
> Key: FLINK-19857
> URL: https://issues.apache.org/jira/browse/FLINK-19857
> Project: Flink
>  Issue Type: New Feature
>  Components: Connectors / Kafka, Table SQL / API, Table SQL / 
> Ecosystem
>Reporter: Jark Wu
>Assignee: Shengkai Fang
>Priority: Major
>
> This is the umbrella issue for FLIP-149: Introduce the upsert-kafka Connector
> FLIP-149: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20214) Unnecessary warning log when starting a k8s session cluster

2020-11-20 Thread Till Rohrmann (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-20214:
--
Fix Version/s: 1.12.0

> Unnecessary warning log when starting a k8s session cluster
> ---
>
> Key: FLINK-20214
> URL: https://issues.apache.org/jira/browse/FLINK-20214
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Guowei Ma
>Assignee: Yang Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> 2020-11-18 17:46:36,727 WARN 
> org.apache.flink.kubernetes.kubeclient.decorators.HadoopConfMountDecorator [] 
> - Found 0 files in directory null/etc/hadoop, skip to mount the Hadoop 
> Configuration ConfigMap.
> 2020-11-18 17:46:36,727 WARN 
> org.apache.flink.kubernetes.kubeclient.decorators.HadoopConfMountDecorator [] 
> - Found 0 files in directory null/etc/hadoop, skip to create the Hadoop 
> Configuration ConfigMap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19795) Fix Flink SQL throws exception when changelog source contains duplicate change events

2020-11-20 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236224#comment-17236224
 ] 

Jark Wu commented on FLINK-19795:
-

I would propose to add a job option 
{{table.exec.source.cdc-events-duplicate=true|false}} to indicate whether the 
cdc source produce duplicate messages that require the framework to 
deduplicate. By default, the value is false (for backward-compatibility). When 
it is set to true, it requires to set a primary key on the source, and the 
framework will this the primary key to deduplicate/normalize the changelog 
(using the ChangelogNormalize operator). 

This option only take effect on the cdc sources which will produce full 
changelog, including INSERT/UPDATE_BEFORE/UPDATE_AFTER/DELETE events. So it 
doesn't affect the upsert sources, because upsert sources always generate a 
ChangelogNormalize operator after it. 

 

What do you think about this? [~Leonard Xu] [~godfreyhe]

> Fix Flink SQL throws exception when changelog source contains duplicate 
> change events
> -
>
> Key: FLINK-19795
> URL: https://issues.apache.org/jira/browse/FLINK-19795
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / Runtime
>Affects Versions: 1.11.2
>Reporter: jinxin
>Assignee: Jark Wu
>Priority: Major
> Fix For: 1.12.0
>
>
> We are using Canal to synchornize MySQL data into Kafka, the synchornization 
> delivery is not exactly-once, so there might be dupcliate 
> INSERT/UPDATE/DELETE messages for the same primary key. We are using 
> {{'connecotr' = 'kafka', 'format' = 'canal-json'}} to consume such topic. 
> However, when appling TopN query on this created source table, the TopN 
> operator will thrown exception: {{Caused by: java.lang.RuntimeException: Can 
> not retract a non-existent record. This should never happen.}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20220) DataSet.collect() uses TaskExecutionState for transferring user-payload

2020-11-20 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236221#comment-17236221
 ] 

Till Rohrmann commented on FLINK-20220:
---

[~rmetzger] the last stack trace looks like a problem in the scheduler. Do you 
have the debug logs for this run? I think this is something we should look 
into. cc [~azagrebin], [~zhuzh].

> DataSet.collect() uses TaskExecutionState for transferring user-payload
> ---
>
> Key: FLINK-20220
> URL: https://issues.apache.org/jira/browse/FLINK-20220
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Reporter: Robert Metzger
>Priority: Critical
>
> Running the {{PageRank}} example in Flink, I accidentally tried collect()-ing 
> 125MB of data using accumulators to my client.
> From a user's perspective, my job failed with this exception:
> {code}
> 2020-11-18 12:56:06,897 INFO  
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - DataSink 
> (collect()) (3/4) (a1389a18cccabe10339099064032d6b8) switched from RUNNING to 
> FAILED on 
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@6af1e09b.
> org.apache.flink.util.FlinkException: Execution 
> a1389a18cccabe10339099064032d6b8 is unexpectedly no longer running on task 
> executor 192.168.1.25:56111-928d60.
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$1.onMissingDeploymentsOf(JobMaster.java:248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.DefaultExecutionDeploymentReconciler.reconcileExecutionDeployments(DefaultExecutionDeploymentReconciler.java:55)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1248)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.reportPayload(JobMaster.java:1235)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl.receiveHeartbeat(HeartbeatManagerImpl.java:199)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.heartbeatFromTaskManager(JobMaster.java:686)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_222]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
> {code}
> The root cause for this problem is the following exception on the TaskManager:
> {code}
> 2020-11-18 12:56:05,972 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor  
>  [] - Caught exception while executing runnable in main thread.
> java.lang.reflect.UndeclaredThrowableException: null
> at com.sun.proxy.$Proxy25.updateTaskExecutionState(Unknown Source) 
> ~[?:?]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.updateTaskExecutionState(TaskExecutor.java:1563)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.unregisterTaskAndNotifyFinalState(TaskExecutor.java:1593)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.access$2400(TaskExecutor.java:174)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor$TaskManagerActionsImpl.lambda$updateTaskExecutionState$1(TaskExecutor.java:1925)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
> at 
>

[jira] [Closed] (FLINK-18826) Support to emit and encode upsert messages to Kafka

2020-11-20 Thread Jark Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu closed FLINK-18826.
---
  Assignee: (was: Shengkai Fang)
Resolution: Duplicate

> Support to emit and encode upsert messages to Kafka
> ---
>
> Key: FLINK-18826
> URL: https://issues.apache.org/jira/browse/FLINK-18826
> Project: Flink
>  Issue Type: Sub-task
>  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>Reporter: Jark Wu
>Priority: Major
>
> Currently, it is not possible to emit the results of a group by query into 
> Kafka. We can provide a format (e.g. {{upsert-json}}??) to encode changelogs 
> to make it possible to write into Kafka. Another idea is provide an 
> {{update-mode=upsert}} property, in this way, we can support upsert behavior 
> for all the append-only formats (avro, avro-confluent, json, csv) without 
> changing the implementation of the formats. The update-mode came from kafka 
> connector [1] before 1.10, but was dropped after that. 
> [1]: 
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#update-modes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >