[jira] [Commented] (FLINK-12683) Provide task manager's location information for checkpoint coordinator specific log messages

2019-05-30 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852658#comment-16852658
 ] 

Congxian Qiu(klion26) commented on FLINK-12683:
---

In my opinion, the jm log of current version(1.8) is enough(we can find the 
task location in jm log now), so I don't think we need this patch.

> Provide task manager's location information for checkpoint coordinator 
> specific log messages
> 
>
> Key: FLINK-12683
> URL: https://issues.apache.org/jira/browse/FLINK-12683
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, the {{AcknowledgeCheckpoint}} does not contain the task manager's 
> location information. When a task's snapshot task sends an ack message to the 
> coordinator, we can only log this message:
> {code:java}
> Received late message for now expired checkpoint attempt 6035 from 
> ccd88d08bf82245f3466c9480fb5687a of job 775ef8ff0159b071da7804925bbd362f.
> {code}
> Sometimes we need to get this sub task's location information to do the 
> further debug work, e.g. stack trace dump. But, without the location 
> information, It will not help to quickly locate the problem.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12683) Provide task manager's location information for checkpoint acknowledge message to improve the checkpoint log message

2019-05-30 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852572#comment-16852572
 ] 

Congxian Qiu(klion26) commented on FLINK-12683:
---

Please have a look at FLINK-1165.:)

> Provide task manager's location information for checkpoint acknowledge 
> message to improve the checkpoint log message
> 
>
> Key: FLINK-12683
> URL: https://issues.apache.org/jira/browse/FLINK-12683
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Currently, the {{AcknowledgeCheckpoint}} does not contain the task manager's 
> location information. When a task's snapshot task sends an ack message to the 
> coordinator, we can only log this message:
> {code:java}
> Received late message for now expired checkpoint attempt 6035 from 
> ccd88d08bf82245f3466c9480fb5687a of job 775ef8ff0159b071da7804925bbd362f.
> {code}
> Sometimes we need to get this sub task's location information to do the 
> further debug work, e.g. stack trace dump. But, without the location 
> information, It will not help to quickly locate the problem.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-12683) Provide task manager's location information for checkpoint acknowledge message to improve the checkpoint log message

2019-05-30 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852572#comment-16852572
 ] 

Congxian Qiu(klion26) edited comment on FLINK-12683 at 5/31/19 2:03 AM:


Please have a look at FLINK-11165.:)


was (Author: klion26):
Please have a look at FLINK-1165.:)

> Provide task manager's location information for checkpoint acknowledge 
> message to improve the checkpoint log message
> 
>
> Key: FLINK-12683
> URL: https://issues.apache.org/jira/browse/FLINK-12683
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Currently, the {{AcknowledgeCheckpoint}} does not contain the task manager's 
> location information. When a task's snapshot task sends an ack message to the 
> coordinator, we can only log this message:
> {code:java}
> Received late message for now expired checkpoint attempt 6035 from 
> ccd88d08bf82245f3466c9480fb5687a of job 775ef8ff0159b071da7804925bbd362f.
> {code}
> Sometimes we need to get this sub task's location information to do the 
> further debug work, e.g. stack trace dump. But, without the location 
> information, It will not help to quickly locate the problem.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12683) Provide task manager's location information for checkpoint acknowledge message to improve the checkpoint log message

2019-05-30 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851694#comment-16851694
 ] 

Congxian Qiu(klion26) commented on FLINK-12683:
---

I think we can find the host and containerid by using taskExecutionId in the 
log in jm log currently.

> Provide task manager's location information for checkpoint acknowledge 
> message to improve the checkpoint log message
> 
>
> Key: FLINK-12683
> URL: https://issues.apache.org/jira/browse/FLINK-12683
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>
> Currently, the {{AcknowledgeCheckpoint}} does not contain the task manager's 
> location information. When a task's snapshot task sends an ack message to the 
> coordinator, we can only log this message:
> {code:java}
> Received late message for now expired checkpoint attempt 6035 from 
> ccd88d08bf82245f3466c9480fb5687a of job 775ef8ff0159b071da7804925bbd362f.
> {code}
> Sometimes we need to get this sub task's location information to do the 
> further debug work, e.g. stack trace dump. But, without the location 
> information, It will not help to quickly locate the problem.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11947) Support MapState value schema evolution for RocksDB

2019-05-29 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851503#comment-16851503
 ] 

Congxian Qiu(klion26) commented on FLINK-11947:
---

hi, [~cre...@gmail.com]  I've filed a pr for this 
[https://github.com/apache/flink/pull/8565]

> Support MapState value schema evolution for RocksDB
> ---
>
> Key: FLINK-11947
> URL: https://issues.apache.org/jira/browse/FLINK-11947
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System, Runtime / State Backends
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Congxian Qiu(klion26)
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we do not attempt to perform state schema evolution if the key or 
> value's schema of a user {{MapState}} has changed when using {{RocksDB}}:
> https://github.com/apache/flink/blob/953a5ffcbdae4115f7d525f310723cf8770779df/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L542
> This was disallowed in the initial support for state schema evolution because 
> the way we did state evolution in the RocksDB state backend was simply 
> overwriting values.
> For {{MapState}} key evolution, only overwriting RocksDB values does not 
> work, since RocksDB entries for {{MapState}} uses a composite key containing 
> the map state key. This means that when evolving {{MapState}} in this case 
> with an evolved key schema, we will have new entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11947) Support MapState key / value schema evolution for RocksDB

2019-05-28 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850497#comment-16850497
 ] 

Congxian Qiu(klion26) commented on FLINK-11947:
---

[~cresny] I'm interested in this issue, and wanna give a fix for this. I'm 
happy [~tzulitai] will shepherd the effort.

> Support MapState key / value schema evolution for RocksDB
> -
>
> Key: FLINK-11947
> URL: https://issues.apache.org/jira/browse/FLINK-11947
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System, Runtime / State Backends
>Reporter: Tzu-Li (Gordon) Tai
>Priority: Critical
>
> Currently, we do not attempt to perform state schema evolution if the key or 
> value's schema of a user {{MapState}} has changed when using {{RocksDB}}:
> https://github.com/apache/flink/blob/953a5ffcbdae4115f7d525f310723cf8770779df/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L542
> This was disallowed in the initial support for state schema evolution because 
> the way we did state evolution in the RocksDB state backend was simply 
> overwriting values.
> For {{MapState}} key evolution, only overwriting RocksDB values does not 
> work, since RocksDB entries for {{MapState}} uses a composite key containing 
> the map state key. This means that when evolving {{MapState}} in this case 
> with an evolved key schema, we will have new entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-05-28 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850490#comment-16850490
 ] 

Congxian Qiu(klion26) commented on FLINK-12296:
---

Thanks for the reminder [~sunjincheng121]

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.7.3, 1.9.0, 1.8.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-05-28 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850320#comment-16850320
 ] 

Congxian Qiu(klion26) edited comment on FLINK-12296 at 5/29/19 2:42 AM:


Since the commits have been merged in master, release-1.8, release-1.7 and 
release-1.6(find the commits' info in the previous comments), resolve this 
issue.


was (Author: klion26):
Since the commits have been merged in master, release-1.8, release-1.7 and 
release-1.6(find the commits' info in the last comments), resolve this issue.

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.7.3, 1.9.0, 1.8.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-05-28 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) resolved FLINK-12296.
---
Resolution: Fixed

Since the commits have been merged in master, release-1.8, release-1.7 and 
release-1.6(find the commits' info in the last comments), resolve this issue.

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.7.3, 1.9.0, 1.8.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12619) Support TERMINATE/SUSPEND Job with Checkpoint

2019-05-28 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849589#comment-16849589
 ] 

Congxian Qiu(klion26) commented on FLINK-12619:
---

[~aljoscha] thanks for you reply.

For this issue, I want to the following steps (most of them will reuse the code 
of FLINK-11458)
 * add a requried functions and RPCs in JM and TM for this issue and
 ** {{CheckpointCoordinator#triggerSynchronousCheckpoint}} (aligned with 
{{triggerSynchronousSavepoint}})
 ** {{SchedulerNG#stopWithCheckpoint}} (aligned with {{stopWithSavepoint}})
 ** a new {{CheckpointType}} named with {{SYNC_CHECKPOINT}}(aiigned with 
{{SYNC_SAVEPOINT}}
 ** {{JobMaster#stopWithCheckpoint}} (aligned with {{stopWithSavepoint}})
 ** Aligned to allow  sync checkpoint (current only support sync savepoint)
 ** Some needed test for this
 * export this to CLI
 ** will add a option receive no paramer(will reuse the preconfigured 
checkpoint directory), mostly like {{CliFrontedParser.STOP_WITH_SAVEPOINT}}
 * add rest api for this
 ** will add endpoint, restful gateway, trigger handler, request boby and so 
on(like FLINK-11458)

What do you think?

> Support TERMINATE/SUSPEND Job with Checkpoint
> -
>
> Key: FLINK-12619
> URL: https://issues.apache.org/jira/browse/FLINK-12619
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / State Backends
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> Inspired by the idea of FLINK-11458, we propose to support terminate/suspend 
> a job with checkpoint. This improvement cooperates with incremental and 
> external checkpoint features, that if checkpoint is retained and this feature 
> is configured, we will trigger a checkpoint before the job stops. It could 
> accelarate job recovery a lot since:
> 1. No source rewinding required any more.
> 2. It's much faster than taking a savepoint since incremental checkpoint is 
> enabled.
> Please note that conceptually savepoints is different from checkpoint in a 
> similar way that backups are different from recovery logs in traditional 
> database systems. So we suggest using this feature only for job recovery, 
> while stick with FLINK-11458 for the 
> upgrading/cross-cluster-job-migration/state-backend-switch cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12619) Support TERMINATE/SUSPEND Job with Checkpoint

2019-05-24 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-12619:
-

 Summary: Support TERMINATE/SUSPEND Job with Checkpoint
 Key: FLINK-12619
 URL: https://issues.apache.org/jira/browse/FLINK-12619
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / State Backends
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


Inspired by the idea of FLINK-11458, we propose to support terminate/suspend a 
job with checkpoint. This improvement cooperates with incremental and external 
checkpoint features, that if checkpoint is retained and this feature is 
configured, we will trigger a checkpoint before the job stops. It could 
accelarate job recovery a lot since:
1. No source rewinding required any more.
2. It's much faster than taking a savepoint since incremental checkpoint is 
enabled.

Please note that conceptually savepoints is different from checkpoint in a 
similar way that backups are different from recovery logs in traditional 
database systems. So we suggest using this feature only for job recovery, while 
stick with FLINK-11458 for the 
upgrading/cross-cluster-job-migration/state-backend-switch cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12607) Introduce a REST API that returns the maxParallelism of a job

2019-05-23 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847179#comment-16847179
 ] 

Congxian Qiu(klion26) commented on FLINK-12607:
---

Thanks for filing the issue, I like this improvement, so that people can know 
this metric easily.

Before this improvement, I think we can calculate this by using the formula in 
the doc[1]

[1][https://ci.apache.org/projects/flink/flink-docs-stable/ops/production_ready.html#set-maximum-parallelism-for-operators-explicitly]

> Introduce a REST API that returns the maxParallelism of a job
> -
>
> Key: FLINK-12607
> URL: https://issues.apache.org/jira/browse/FLINK-12607
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.6.3
>Reporter: Akshay Kanfade
>Priority: Minor
>
> Today, Flink does not offer any way to get the maxParallelism for a job and 
> it's operators through any of the REST APIs. Since, the internal state 
> already tracks maxParallelism for a job and it's operators, we should expose 
> this via the REST APIs so that application developer can get more insights on 
> the current state.
> There can be two approaches on how we can do this -
> Approach 1 :
> Modify the existing rest API response model to additionally expose a new 
> field 'maxParallelism'. Some of the REST APIs that would be affected by this
> |h5. */jobs/:jobid/vertices/:vertexid*|
> |h5. */jobs/:jobid*|
>  
> Approach 2 :
> Create a new REST API that would only return maxParallelism for a job and 
> it's operators.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-05-22 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846382#comment-16846382
 ] 

Congxian Qiu(klion26) commented on FLINK-12296:
---

merged
 * master         ee60846dc588b1a832a497ff9522d7a3a282c350
 * release-1.8  531d727f9b32c310d8d63b253019b8cc4a23a3eb
 * release-1.7  1ce2efd7a38d091fc004a8dba034ece0bcc42385
 * release-1.6  0dda6fe9dff4f667b110cda39bfe9738ba615b24

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.7.3, 1.9.0, 1.8.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-12536) Make BufferOrEventSequence#getNext() non-blocking

2019-05-22 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-12536:
-

Assignee: Congxian Qiu(klion26)

> Make BufferOrEventSequence#getNext() non-blocking
> -
>
> Key: FLINK-12536
> URL: https://issues.apache.org/jira/browse/FLINK-12536
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Affects Versions: 1.9.0
>Reporter: Piotr Nowojski
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> Currently it is non-blocking in case of credit-based flow control (default), 
> however for \{{SpilledBufferOrEventSequence}} it is blocking on reading from 
> file. We might want to consider reimplementing it to be non blocking with 
> {{CompletableFuture isAvailable()}} method.
>  
> Otherwise we will block mailbox processing for the duration of reading from 
> file - for example we will block processing time timers and potentially in 
> the future network flushes.
>  
> This is not a high priority change, since it affects non-default 
> configuration option AND at the moment only processing time timers are 
> planned to be moved to the mailbox for 1.9.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11634) Translate "State Backends" page into Chinese

2019-05-22 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845702#comment-16845702
 ] 

Congxian Qiu(klion26) commented on FLINK-11634:
---

[~zxhe] I can't find your id for assigner, maybe only can assign to you once 
you become a contributor.

If you really want to contribute to this issue, maybe you could file a pr 
directly? and you'll become contributor after the pr has been merged.

> Translate "State Backends" page into Chinese
> 
>
> Key: FLINK-11634
> URL: https://issues.apache.org/jira/browse/FLINK-11634
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> doc locates in flink/docs/dev/stream/state/state_backens.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11560) Translate "Flink Applications" page into Chinese

2019-05-21 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845437#comment-16845437
 ] 

Congxian Qiu(klion26) commented on FLINK-11560:
---

thanks for your contribution. I'll review this today [~Brian Zhou]

> Translate "Flink Applications" page into Chinese
> 
>
> Key: FLINK-11560
> URL: https://issues.apache.org/jira/browse/FLINK-11560
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Project Website
>Reporter: Jark Wu
>Assignee: Zhou Yumin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Translate "Flink Applications" page into Chinese.
> The markdown file is located in: flink-web/flink-applications.zh.md
> The url link is: https://flink.apache.org/zh/flink-applications.html
> Please adjust the links in the page to Chinese pages when translating. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11637) Translate "Checkpoints" page into Chinese

2019-05-21 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844658#comment-16844658
 ] 

Congxian Qiu(klion26) commented on FLINK-11637:
---

[~guanghui] If there is no assignee yet, and you want to contribute to the 
issue, I think it's ok to assign it to yourself.

> Translate "Checkpoints" page into Chinese
> -
>
> Key: FLINK-11637
> URL: https://issues.apache.org/jira/browse/FLINK-11637
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Assignee: guanghui.rong
>Priority: Major
> Fix For: 1.9.0
>
>
> doc locates in flink/docs/ops/state/checkpoints.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-12400) NullpointerException using SimpleStringSchema with Kafka

2019-05-04 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833187#comment-16833187
 ] 

Congxian Qiu(klion26) edited comment on FLINK-12400 at 5/5/19 1:52 AM:
---

[~PierreZ] thanks for filing this issue, I think there is already an issue want 
to fix this, please have a look at 
https://issues.apache.org/jira/browse/FLINK-11820

And a pr for FLINK-11820  [https://github.com/apache/flink/pull/7987]


was (Author: klion26):
[~PierreZ] thanks for filing this issue, I think there is already an issue want 
to fix this, please have a look at 
https://issues.apache.org/jira/browse/FLINK-11820

> NullpointerException using SimpleStringSchema with Kafka
> 
>
> Key: FLINK-12400
> URL: https://issues.apache.org/jira/browse/FLINK-12400
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System
>Affects Versions: 1.7.2, 1.8.0
> Environment: Flink 1.7.2 job on 1.8 cluster
> Kafka 0.10 with a topic in log-compaction
>Reporter: Pierre Zemb
>Assignee: Pierre Zemb
>Priority: Minor
>
> Hi!
> Yesterday, we saw a strange behavior with our Flink job and Kafka. We are 
> consuming a Kafka topic setup in 
> [log-compaction|https://kafka.apache.org/documentation/#compaction] mode. As 
> such, sending a message with a null payload acts like a tombstone.
> We are consuming Kafka like this:
> {code:java}
> new FlinkKafkaConsumer010<>  ("topic", new SimpleStringSchema(), 
> this.kafkaProperties)
> {code}
> When we sent the message, job failed because of a NullPointerException 
> [here|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/serialization/SimpleStringSchema.java#L75].
>  `byte[] message` was null, causing the NPE. 
> We forked the class and added a basic nullable check, returning null if so. 
> It fixed our issue. 
> Should we add it to the main class?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12400) NullpointerException using SimpleStringSchema with Kafka

2019-05-04 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833187#comment-16833187
 ] 

Congxian Qiu(klion26) commented on FLINK-12400:
---

[~PierreZ] thanks for filing this issue, I think there is already an issue want 
to fix this, please have a look at 
https://issues.apache.org/jira/browse/FLINK-11820

> NullpointerException using SimpleStringSchema with Kafka
> 
>
> Key: FLINK-12400
> URL: https://issues.apache.org/jira/browse/FLINK-12400
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System
>Affects Versions: 1.7.2, 1.8.0
> Environment: Flink 1.7.2 job on 1.8 cluster
> Kafka 0.10 with a topic in log-compaction
>Reporter: Pierre Zemb
>Assignee: Pierre Zemb
>Priority: Minor
>
> Hi!
> Yesterday, we saw a strange behavior with our Flink job and Kafka. We are 
> consuming a Kafka topic setup in 
> [log-compaction|https://kafka.apache.org/documentation/#compaction] mode. As 
> such, sending a message with a null payload acts like a tombstone.
> We are consuming Kafka like this:
> {code:java}
> new FlinkKafkaConsumer010<>  ("topic", new SimpleStringSchema(), 
> this.kafkaProperties)
> {code}
> When we sent the message, job failed because of a NullPointerException 
> [here|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/serialization/SimpleStringSchema.java#L75].
>  `byte[] message` was null, causing the NPE. 
> We forked the class and added a basic nullable check, returning null if so. 
> It fixed our issue. 
> Should we add it to the main class?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11637) Translate "Checkpoints" page into Chinese

2019-05-03 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832987#comment-16832987
 ] 

Congxian Qiu(klion26) commented on FLINK-11637:
---

[~lsy] I'm not working on this now, if you want to contribute, please feel free 
to assign it to yourself.

> Translate "Checkpoints" page into Chinese
> -
>
> Key: FLINK-11637
> URL: https://issues.apache.org/jira/browse/FLINK-11637
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> doc locates in flink/docs/ops/state/checkpoints.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-04-27 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827806#comment-16827806
 ] 

Congxian Qiu(klion26) edited comment on FLINK-12296 at 4/28/19 4:44 AM:


After discussed offline with [~srichter], will change the directory to following

{{../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/operatorIdentifier}}
 


was (Author: klion26):
After discussed offline with [~srichter], will change the directory to following

{{../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/rocksdb_operatorIdentifier}}
 

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.7.3, 1.9.0, 1.8.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-04-27 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827806#comment-16827806
 ] 

Congxian Qiu(klion26) commented on FLINK-12296:
---

After discussed offline with [~srichter], will change the directory to following

{{../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/rocksdb_operatorIdentifier}}
 

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.7.3, 1.9.0, 1.8.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11635) Translate "Checkpointing" page into Chinese

2019-04-24 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825674#comment-16825674
 ] 

Congxian Qiu(klion26) commented on FLINK-11635:
---

[~pengyang] Please feel free to take it. You can assign it to yourself if you 
want.

> Translate "Checkpointing" page into Chinese
> ---
>
> Key: FLINK-11635
> URL: https://issues.apache.org/jira/browse/FLINK-11635
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> doc locates in flink/docs/dev/stream/state/checkpointing.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-04-24 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825199#comment-16825199
 ] 

Congxian Qiu(klion26) commented on FLINK-12296:
---

[~srichter] I first implemented as you suggest[1], but in 
{{TaskLocalStateStoreImpl#dispose()}} I can't get {{operatorId}} which needed 
to find the data path(I added an {{UUID}} as a placeholder in the patch 
temporary), so I proposed the path as 
{{../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1*_operator_id*/(state)}}

 

[1] 
https://github.com/klion26/flink/blob/9efd8907dd08a0f50ff81e04b028cb87945da0ca/flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java#L258

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Blocker
> Fix For: 1.7.3, 1.9.0, 1.8.1
>
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-04-24 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825034#comment-16825034
 ] 

Congxian Qiu(klion26) commented on FLINK-12296:
---

[~srichter]  from what [~ningshi] have posted in the ml[1], the two states are 
user state and timer state
{noformat}
There are two stateful operators in the chain, one is a

CoBroadcastWithKeyedOperator, the other one is a StreamMapper. The

CoBroadcastWithKeyedOperator creates timer states in RocksDB, the latter

doesn’t. Because of the checkpoint directory collision bug, we always end up

saving the states for CoBroadcastWithKeyedOperator.{noformat}
 

Currently, I want to add {{operatorId}} in the path, so the path will look like

{{../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1_operator_id/(state)}}

 

Why use {{operatorId}} instead of the {{operatorIdentifier}} because we need to 
find the local data path for a specific checkpoint ( 
{{TaskLocalStateStoreImpl#discardLocalStateForCheckpoint(long, 
TaskStateSnapshot)}}).

 

Will change  as following:
{code:java}
//LocalRecoveryDirectoryProviderImpl.java
@Override
public File subtaskSpecificCheckpointDirectory(long checkpointId, String 
operatorIdentifier) {
   return new File(subtaskBaseDirectory(checkpointId), 
checkpointDirString(operatorIdentifier, checkpointId));
}
@VisibleForTesting
String checkpointDirString(String operatorIdentifier, long checkpointId) {
   return "opid_" + 
OperatorSubtaskDescriptionText.getOperatorIdByOperatorIdentifier(operatorIdentifier)
 + "_chk_" + checkpointId;
}

//OperatorSubtaskDescriptionText.java
public static String getOperatorIdByOperatorIdentifier(String 
operatorIdentifier) { ... }
{code}
 

What do you think about this?

 

[1][http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2imv4i2nd.wl-nings...@gmail.com%3E]

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Blocker
> Fix For: 1.7.3, 1.9.0, 1.8.1
>
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-04-23 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-12296:
--
Description: 
As the mail list said[1], there may be a problem when more than one operator 
chained in a single task, and all the operators have states, we'll encounter 
data loss silently problem.

Currently, the local directory we used is like below

../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),

 

if more than one operator chained in a single task, and all the operators have 
states, then all the operators will share the same local directory(because the 
vertext_id is the same), this will lead a data loss problem. 

 

The path generation logic is below:
{code:java}
// LocalRecoveryDirectoryProviderImpl.java

@Override
public File subtaskSpecificCheckpointDirectory(long checkpointId) {
   return new File(subtaskBaseDirectory(checkpointId), 
checkpointDirString(checkpointId));
}


@VisibleForTesting
String subtaskDirString() {
   return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
subtaskIndex).toString();
}

@VisibleForTesting
String checkpointDirString(long checkpointId) {
   return "chk_" + checkpointId;
}
{code}
[1] 
[http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E]

  was:
As the mail list said[1], there may be a problem when more than one operator 
chained in a single task, and all the operators have states, we'll encounter 
data loss silently problem.

Currently, the local directory we used is like below

../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),

 

if more than one operator chained in a single task, and all the operators have 
states, then all the operators will share the same local directory(because the 
vertext_id is the same), this will lead a data loss problem. 

 

The path generation logic is below:
{code:java}
// LocalRecoveryDirectoryProviderImpl.java

@Override
public File subtaskSpecificCheckpointDirectory(long checkpointId) {
   return new File(subtaskBaseDirectory(checkpointId), 
checkpointDirString(checkpointId));
}


@VisibleForTesting
String subtaskDirString() {
   return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
subtaskIndex).toString();
}

@VisibleForTesting
String checkpointDirString(long checkpointId) {
   return "chk_" + checkpointId;
}
{code}
[1] 
[https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7]


> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Critical
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11636) Translate "State Schema Evolution" into Chinese

2019-04-23 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824762#comment-16824762
 ] 

Congxian Qiu(klion26) commented on FLINK-11636:
---

[~yangfei] Now I'm not working on it, please assign it to yourself if you want.

> Translate "State Schema Evolution" into Chinese
> ---
>
> Key: FLINK-11636
> URL: https://issues.apache.org/jira/browse/FLINK-11636
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> doc locates in flink/docs/dev/stream/state/schema_evolution.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-04-23 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-12296:
--
Affects Version/s: 1.6.3
   1.6.4
   1.7.2
   1.8.0

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Critical
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-04-23 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-12296:
--
Priority: Critical  (was: Major)

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Critical
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-04-23 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823998#comment-16823998
 ] 

Congxian Qiu(klion26) edited comment on FLINK-12296 at 4/23/19 11:46 AM:
-

I think this is a long existing issue from release-1.5,

the relative issue is [#8360|https://issues.apache.org/jira/browse/FLINK-8360]


was (Author: klion26):
I think this is a long existing issue from release-1.5.

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-04-23 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823998#comment-16823998
 ] 

Congxian Qiu(klion26) commented on FLINK-12296:
---

I think this is a long existing issue from release-1.5.

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task

2019-04-23 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-12296:
--
Summary: Data loss silently in RocksDBStateBackend when more than one 
operator(has states) chained in a single task   (was: Data loss silently in 
RocksDBStateBackend when more than one operator chained in a single task )

> Data loss silently in RocksDBStateBackend when more than one operator(has 
> states) chained in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator chained in a single task

2019-04-23 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-12296:
--
Description: 
As the mail list said[1], there may be a problem when more than one operator 
chained in a single task, and all the operators have states, we'll encounter 
data loss silently problem.

Currently, the local directory we used is like below

../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),

 

if more than one operator chained in a single task, and all the operators have 
states, then all the operators will share the same local directory(because the 
vertext_id is the same), this will lead a data loss problem. 

 

The path generation logic is below:
{code:java}
// LocalRecoveryDirectoryProviderImpl.java

@Override
public File subtaskSpecificCheckpointDirectory(long checkpointId) {
   return new File(subtaskBaseDirectory(checkpointId), 
checkpointDirString(checkpointId));
}


@VisibleForTesting
String subtaskDirString() {
   return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
subtaskIndex).toString();
}

@VisibleForTesting
String checkpointDirString(long checkpointId) {
   return "chk_" + checkpointId;
}
{code}
[1] 
[https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7]

  was:
As the mail list said[1], there may be a problem when more than one operator 
chained in a single task, and all the operators have states, this will be data 
loss silently.

 

[1] 
https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7


> Data loss silently in RocksDBStateBackend when more than one operator chained 
> in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, we'll encounter 
> data loss silently problem.
> Currently, the local directory we used is like below
> ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state),
>  
> if more than one operator chained in a single task, and all the operators 
> have states, then all the operators will share the same local 
> directory(because the vertext_id is the same), this will lead a data loss 
> problem. 
>  
> The path generation logic is below:
> {code:java}
> // LocalRecoveryDirectoryProviderImpl.java
> @Override
> public File subtaskSpecificCheckpointDirectory(long checkpointId) {
>return new File(subtaskBaseDirectory(checkpointId), 
> checkpointDirString(checkpointId));
> }
> @VisibleForTesting
> String subtaskDirString() {
>return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + 
> subtaskIndex).toString();
> }
> @VisibleForTesting
> String checkpointDirString(long checkpointId) {
>return "chk_" + checkpointId;
> }
> {code}
> [1] 
> [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator chained in a single task

2019-04-23 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-12296:
--
Issue Type: Bug  (was: Test)

> Data loss silently in RocksDBStateBackend when more than one operator chained 
> in a single task 
> ---
>
> Key: FLINK-12296
> URL: https://issues.apache.org/jira/browse/FLINK-12296
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> As the mail list said[1], there may be a problem when more than one operator 
> chained in a single task, and all the operators have states, this will be 
> data loss silently.
>  
> [1] 
> https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator chained in a single task

2019-04-23 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-12296:
-

 Summary: Data loss silently in RocksDBStateBackend when more than 
one operator chained in a single task 
 Key: FLINK-12296
 URL: https://issues.apache.org/jira/browse/FLINK-12296
 Project: Flink
  Issue Type: Test
  Components: Runtime / State Backends
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


As the mail list said[1], there may be a problem when more than one operator 
chained in a single task, and all the operators have states, this will be data 
loss silently.

 

[1] 
https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11633) Translate "Working with State" into Chinese

2019-04-22 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-11633:
-

Assignee: Congxian Qiu(klion26)

> Translate "Working with State" into Chinese
> ---
>
> Key: FLINK-11633
> URL: https://issues.apache.org/jira/browse/FLINK-11633
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> Doc locates in flink/doc/dev/state/state.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-12281) SortDistinctAggregateITCase>DistinctAggregateITCaseBase.testSomeColumnsBothInDistinctAggAndGroupBy Failed

2019-04-20 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-12281:
-

 Summary: 
SortDistinctAggregateITCase>DistinctAggregateITCaseBase.testSomeColumnsBothInDistinctAggAndGroupBy
 Failed
 Key: FLINK-12281
 URL: https://issues.apache.org/jira/browse/FLINK-12281
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: Congxian Qiu(klion26)


01:07:26.060 [INFO] Running 
org.apache.flink.table.runtime.batch.sql.agg.SortDistinctAggregateITCase
01:08:38.157 [ERROR] Tests run: 23, Failures: 0, Errors: 1, Skipped: 2, Time 
elapsed: 72.093 s <<< FAILURE! - in 
org.apache.flink.table.runtime.batch.sql.agg.SortDistinctAggregateITCase
01:08:38.157 [ERROR] 
testSomeColumnsBothInDistinctAggAndGroupBy(org.apache.flink.table.runtime.batch.sql.agg.SortDistinctAggregateITCase)
 Time elapsed: 5.972 s <<< ERROR!
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
Caused by: java.lang.RuntimeException: 
org.apache.flink.runtime.memory.MemoryAllocationException: Could not allocate 
64 pages. Only 60 pages are remaining.
Caused by: org.apache.flink.runtime.memory.MemoryAllocationException: Could not 
allocate 64 pages. Only 60 pages are remaining.

[https://travis-ci.org/apache/flink/jobs/522602981]
[https://travis-ci.org/apache/flink/jobs/522603433]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11635) Translate "Checkpointing" page into Chinese

2019-04-20 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-11635:
-

Assignee: (was: Congxian Qiu(klion26))

> Translate "Checkpointing" page into Chinese
> ---
>
> Key: FLINK-11635
> URL: https://issues.apache.org/jira/browse/FLINK-11635
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> doc locates in flink/docs/dev/stream/state/checkpointing.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11636) Translate "State Schema Evolution" into Chinese

2019-04-20 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-11636:
-

Assignee: (was: Congxian Qiu(klion26))

> Translate "State Schema Evolution" into Chinese
> ---
>
> Key: FLINK-11636
> URL: https://issues.apache.org/jira/browse/FLINK-11636
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> doc locates in flink/docs/dev/stream/state/schema_evolution.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11638) Translate "Savepoints" page into Chinese

2019-04-20 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-11638:
-

Assignee: (was: Congxian Qiu(klion26))

> Translate "Savepoints" page into Chinese
> 
>
> Key: FLINK-11638
> URL: https://issues.apache.org/jira/browse/FLINK-11638
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> doc locates in flink/docs/ops/state/savepoints.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11634) Translate "State Backends" page into Chinese

2019-04-20 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-11634:
-

Assignee: (was: Congxian Qiu(klion26))

> Translate "State Backends" page into Chinese
> 
>
> Key: FLINK-11634
> URL: https://issues.apache.org/jira/browse/FLINK-11634
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> doc locates in flink/docs/dev/stream/state/state_backens.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11637) Translate "Checkpoints" page into Chinese

2019-04-20 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-11637:
-

Assignee: (was: Congxian Qiu(klion26))

> Translate "Checkpoints" page into Chinese
> -
>
> Key: FLINK-11637
> URL: https://issues.apache.org/jira/browse/FLINK-11637
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> doc locates in flink/docs/ops/state/checkpoints.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11633) Translate "Working with State" into Chinese

2019-04-20 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-11633:
-

Assignee: (was: Congxian Qiu(klion26))

> Translate "Working with State" into Chinese
> ---
>
> Key: FLINK-11633
> URL: https://issues.apache.org/jira/browse/FLINK-11633
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation
>Reporter: Congxian Qiu(klion26)
>Priority: Major
>
> Doc locates in flink/doc/dev/state/state.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11561) Translate "Flink Architecture" page into Chinese

2019-04-20 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822475#comment-16822475
 ] 

Congxian Qiu(klion26) commented on FLINK-11561:
---

[~jark]  thanks for reminder, I do not have more time to thanslate this issue, 
I've released the ticket.

[~linjie] If you still have time, please feel free to take it.

> Translate "Flink Architecture" page into Chinese
> 
>
> Key: FLINK-11561
> URL: https://issues.apache.org/jira/browse/FLINK-11561
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Project Website
>Reporter: Jark Wu
>Priority: Major
>
> Translate "Flink Architecture" page into Chinese.
> The markdown file is located in: flink-web/flink-architecture.zh.md
> The url link is: https://flink.apache.org/zh/flink-architecture.html
> Please adjust the links in the page to Chinese pages when translating. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11561) Translate "Flink Architecture" page into Chinese

2019-04-20 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-11561:
-

Assignee: (was: Congxian Qiu(klion26))

> Translate "Flink Architecture" page into Chinese
> 
>
> Key: FLINK-11561
> URL: https://issues.apache.org/jira/browse/FLINK-11561
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Project Website
>Reporter: Jark Wu
>Priority: Major
>
> Translate "Flink Architecture" page into Chinese.
> The markdown file is located in: flink-web/flink-architecture.zh.md
> The url link is: https://flink.apache.org/zh/flink-architecture.html
> Please adjust the links in the page to Chinese pages when translating. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11948) When kafka sink parallelism

2019-03-18 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794775#comment-16794775
 ] 

Congxian Qiu(klion26) commented on FLINK-11948:
---

As the doc said,
{noformat}
To avoid such an unbalanced partitioning, use a round-robin kafka partitioner 
(note that this will
* cause a lot of network connections between all the Flink instances and all 
the Kafka brokers).{noformat}
Maybe we could have a round-robin partitioner here?

> When kafka sink parallelism unbalance
> -
>
> Key: FLINK-11948
> URL: https://issues.apache.org/jira/browse/FLINK-11948
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.6.3, 1.6.4, 1.7.2
>Reporter: qi quan
>Priority: Major
>
> The default FlinkFixedPartitioner return  int[] partitions by subtaskid % 
> partitions.length.When kafka sink parallelism first few kafka partitions will write data.
> I think it needs to be improved here.
> {code:java}
>   @Override
>   public int partition(T record, byte[] key, byte[] value, String 
> targetTopic, int[] partitions) {
>   Preconditions.checkArgument(
>   partitions != null && partitions.length > 0,
>   "Partitions of the target topic is empty.");
>   return partitions[parallelInstanceId % partitions.length];
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11939) Adjust components that need be modified for support FSCSOS

2019-03-16 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-11939:
--
Summary: Adjust components that need be modified for support FSCSOS  (was: 
Adjust components that need be modified for support FS)

> Adjust components that need be modified for support FSCSOS
> --
>
> Key: FLINK-11939
> URL: https://issues.apache.org/jira/browse/FLINK-11939
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Checkpointing
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> In this sub-task, we'll adjust all the components that need to be modified, 
> such as {{SharedStateRegistry, }}{{PlaceholderStreamStateHandle}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11939) Adjust components that need be modified to support FSCSOS

2019-03-16 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-11939:
--
Summary: Adjust components that need be modified to support FSCSOS  (was: 
Adjust components that need be modified for support FSCSOS)

> Adjust components that need be modified to support FSCSOS
> -
>
> Key: FLINK-11939
> URL: https://issues.apache.org/jira/browse/FLINK-11939
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Checkpointing
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> In this sub-task, we'll adjust all the components that need to be modified, 
> such as {{SharedStateRegistry, }}{{PlaceholderStreamStateHandle}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11938) Introduce all the new components to support FSCSOS

2019-03-16 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-11938:
--
Summary: Introduce all the new components to support FSCSOS  (was: 
Introduce all the new components)

> Introduce all the new components to support FSCSOS
> --
>
> Key: FLINK-11938
> URL: https://issues.apache.org/jira/browse/FLINK-11938
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Checkpointing
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> In this sub-task, we'll introduce all the new components, such as 
> {{FileSegmentCheckpointStreamFactory}}, 
> {{FileSegmentCheckpointStateOutputStream}}, 
> {{FileSegmentCheckpointLocationStorage}}, {{FileSegmentStateHandle}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11939) Adjust components that need be modified for support FS

2019-03-16 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-11939:
--
Summary: Adjust components that need be modified for support FS  (was: 
Adjust components that need be modified)

> Adjust components that need be modified for support FS
> --
>
> Key: FLINK-11939
> URL: https://issues.apache.org/jira/browse/FLINK-11939
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Checkpointing
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> In this sub-task, we'll adjust all the components that need to be modified, 
> such as {{SharedStateRegistry, }}{{PlaceholderStreamStateHandle}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11941) Reuse single FileSegmentCheckpointStateOutputStream for multiple checkpoints

2019-03-16 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-11941:
-

 Summary: Reuse single FileSegmentCheckpointStateOutputStream for 
multiple checkpoints
 Key: FLINK-11941
 URL: https://issues.apache.org/jira/browse/FLINK-11941
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


After the previous first sub-tasks have been resolved, we'll solve the small 
file problems by writing all states of one checkpoint into a single physical 
file. In this sub-task, we'll try to share the underlying physical file across 
multiple checkpoints.

This is configurable and disabled for default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11940) Reduce storage amplification for FSCSOS

2019-03-16 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-11940:
-

 Summary: Reduce storage amplification for FSCSOS
 Key: FLINK-11940
 URL: https://issues.apache.org/jira/browse/FLINK-11940
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


In this sub-task, we'll solve the storage amplification problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11939) Adjust components that need be modified

2019-03-16 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-11939:
-

 Summary: Adjust components that need be modified
 Key: FLINK-11939
 URL: https://issues.apache.org/jira/browse/FLINK-11939
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


In this sub-task, we'll adjust all the components that need to be modified, 
such as {{SharedStateRegistry, }}{{PlaceholderStreamStateHandle}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11938) Introduce all the new components

2019-03-16 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-11938:
-

 Summary: Introduce all the new components
 Key: FLINK-11938
 URL: https://issues.apache.org/jira/browse/FLINK-11938
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


In this sub-task, we'll introduce all the new components, such as 
{{FileSegmentCheckpointStreamFactory}}, 
{{FileSegmentCheckpointStateOutputStream}}, 
{{FileSegmentCheckpointLocationStorage}}, {{FileSegmentStateHandle}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11937) Resolve small file problem in RocksDB incremental checkpoint

2019-03-16 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-11937:
--
Description: 
Currently when incremental checkpoint is enabled in RocksDBStateBackend a 
separate file will be generated on DFS for each sst file. This may cause “file 
flood” when running intensive workload (many jobs with high parallelism) in big 
cluster. According to our observation in Alibaba production, such file flood 
introduces at lease two drawbacks when using HDFS as the checkpoint storage 
FileSystem: 1) huge number of RPC request issued to NN which may burst its 
response queue; 2) huge number of files causes big pressure on NN’s on-heap 
memory.

In Flink we ever noticed similar small file flood problem and tried to resolved 
it by introducing ByteStreamStateHandle(FLINK-2808), but this solution has its 
limitation that if we configure the threshold too low there will still be too 
many small files, while if too high the JM will finally OOM, thus could hardly 
resolve the issue in case of using RocksDBStateBackend with incremental 
snapshot strategy.

We propose a new OutputStream called 
FileSegmentCheckpointStateOutputStream(FSCSOS) to fix the problem. FSCSOS will 
reuse the same underlying distributed file until its size exceeds a preset 
threshold. We
plan to complete the work in 3 steps: firstly introduce FSCSOS, secondly 
resolve the specific storage amplification issue on FSCSOS, and lastly add an 
option to reuse FSCSOS across multiple checkpoints to further reduce the DFS 
file number.

More details please refer to the attached design doc.

  was:
Currently when incremental checkpoint is enabled in RocksDBStateBackend a 
separate file will be generated on DFS for each sst file. This may cause “file 
flood” when running intensive workload (many jobs with high parallelism) in big 
cluster. According to our observation in Alibaba production, such file flood 
introduces at lease two drawbacks when using HDFS as the checkpoint storage 
FileSystem: 1) huge number of RPC request issued to NN which may burst its 
response queue; 2) huge number of files causes big pressure on NN’s on-heap 
memory.

In Flink we ever noticed similar small file flood problem and tried to resolved 
it by introducing ByteStreamStateHandle(FLINK-2818), but this solution has its 
limitation that if we configure the threshold too low there will still be too 
many small files, while if too high the JM will finally OOM, thus could hardly 
resolve the issue in case of using RocksDBStateBackend with incremental 
snapshot strategy.

We propose a new OutputStream called 
FileSegmentCheckpointStateOutputStream(FSCSOS) to fix the problem. FSCSOS will 
reuse the same underlying distributed file until its size exceeds a preset 
threshold. We
plan to complete the work in 3 steps: firstly introduce FSCSOS, secondly 
resolve the specific storage amplification issue on FSCSOS, and lastly add an 
option to reuse FSCSOS across multiple checkpoints to further reduce the DFS 
file number.

More details please refer to the attached design doc.

[1] 
[https://www.slideshare.net/dataArtisans/stephan-ewen-experiences-running-flink-at-very-large-scale]
 


> Resolve small file problem in RocksDB incremental checkpoint
> 
>
> Key: FLINK-11937
> URL: https://issues.apache.org/jira/browse/FLINK-11937
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Checkpointing
>Affects Versions: 1.7.2
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> Currently when incremental checkpoint is enabled in RocksDBStateBackend a 
> separate file will be generated on DFS for each sst file. This may cause 
> “file flood” when running intensive workload (many jobs with high 
> parallelism) in big cluster. According to our observation in Alibaba 
> production, such file flood introduces at lease two drawbacks when using HDFS 
> as the checkpoint storage FileSystem: 1) huge number of RPC request issued to 
> NN which may burst its response queue; 2) huge number of files causes big 
> pressure on NN’s on-heap memory.
> In Flink we ever noticed similar small file flood problem and tried to 
> resolved it by introducing ByteStreamStateHandle(FLINK-2808), but this 
> solution has its limitation that if we configure the threshold too low there 
> will still be too many small files, while if too high the JM will finally 
> OOM, thus could hardly resolve the issue in case of using RocksDBStateBackend 
> with incremental snapshot strategy.
> We propose a new OutputStream called 
> FileSegmentCheckpointStateOutputStream(FSCSOS) to fix the problem. FSCSOS 
> will reuse the same underlying distributed file until its size exceeds a 
> preset 

[jira] [Created] (FLINK-11937) Resolve small file problem in RocksDB incremental checkpoint

2019-03-16 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-11937:
-

 Summary: Resolve small file problem in RocksDB incremental 
checkpoint
 Key: FLINK-11937
 URL: https://issues.apache.org/jira/browse/FLINK-11937
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Checkpointing
Affects Versions: 1.7.2
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


Currently when incremental checkpoint is enabled in RocksDBStateBackend a 
separate file will be generated on DFS for each sst file. This may cause “file 
flood” when running intensive workload (many jobs with high parallelism) in big 
cluster. According to our observation in Alibaba production, such file flood 
introduces at lease two drawbacks when using HDFS as the checkpoint storage 
FileSystem: 1) huge number of RPC request issued to NN which may burst its 
response queue; 2) huge number of files causes big pressure on NN’s on-heap 
memory.

In Flink we ever noticed similar small file flood problem and tried to resolved 
it by introducing ByteStreamStateHandle(FLINK-2818), but this solution has its 
limitation that if we configure the threshold too low there will still be too 
many small files, while if too high the JM will finally OOM, thus could hardly 
resolve the issue in case of using RocksDBStateBackend with incremental 
snapshot strategy.

We propose a new OutputStream called 
FileSegmentCheckpointStateOutputStream(FSCSOS) to fix the problem. FSCSOS will 
reuse the same underlying distributed file until its size exceeds a preset 
threshold. We
plan to complete the work in 3 steps: firstly introduce FSCSOS, secondly 
resolve the specific storage amplification issue on FSCSOS, and lastly add an 
option to reuse FSCSOS across multiple checkpoints to further reduce the DFS 
file number.

More details please refer to the attached design doc.

[1] 
[https://www.slideshare.net/dataArtisans/stephan-ewen-experiences-running-flink-at-very-large-scale]
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11904) Improve MemoryStateBackendTest by using JUnit's Parameterized

2019-03-13 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-11904:
-

 Summary: Improve MemoryStateBackendTest by using JUnit's 
Parameterized
 Key: FLINK-11904
 URL: https://issues.apache.org/jira/browse/FLINK-11904
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


Currently, there are two classes {{MemoryStateBackendTest}} and 
{{AsyncMemoryStateBackendTest}}, the only difference is whether using async 
mode.

We can improve this by using JUnit's Parameterize



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11903) Improve FileStateBackendTest by using JUnit's Parameterized

2019-03-13 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-11903:
-

 Summary: Improve FileStateBackendTest by using JUnit's 
Parameterized
 Key: FLINK-11903
 URL: https://issues.apache.org/jira/browse/FLINK-11903
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


Currently, there is a test base class called {{FileStateBackendTest}}, and a 
subclass {{AsyncFileStateBackendTest}}. the only difference is whether to use 
async mode.

We can improve the test code by using JUnit's Parameterized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11850) ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis

2019-03-07 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-11850:
-

Assignee: Congxian Qiu(klion26)  (was: Till Rohrmann)

> ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis
> 
>
> Key: FLINK-11850
> URL: https://issues.apache.org/jira/browse/FLINK-11850
> Project: Flink
>  Issue Type: Test
>  Components: Runtime / Coordination, Tests
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Critical
>  Labels: test-stability
>
> org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest
> 08:20:01.694 [ERROR] 
> testSimpleCloseAndCleanupAllData(org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest)
>  Time elapsed: 0.076 s <<< ERROR!
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /foo/bar/flink/default/leaderlatch/resource_manager_lock/_c_477d0124-92f3-4069-98aa-a71b8243250c-latch-00
>  at 
> org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.runCleanupTest(ZooKeeperHaServicesTest.java:203)
>  at 
> org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.testSimpleCloseAndCleanupAllData(ZooKeeperHaServicesTest.java:128)
>  
> Travis links: https://travis-ci.org/apache/flink/jobs/502960186



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11850) ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis

2019-03-07 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786680#comment-16786680
 ] 

Congxian Qiu(klion26) commented on FLINK-11850:
---

Sorry for the reassign, I just hit the wrong hotkey.

> ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis
> 
>
> Key: FLINK-11850
> URL: https://issues.apache.org/jira/browse/FLINK-11850
> Project: Flink
>  Issue Type: Test
>  Components: Runtime / Coordination, Tests
>Reporter: Congxian Qiu(klion26)
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
>
> org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest
> 08:20:01.694 [ERROR] 
> testSimpleCloseAndCleanupAllData(org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest)
>  Time elapsed: 0.076 s <<< ERROR!
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /foo/bar/flink/default/leaderlatch/resource_manager_lock/_c_477d0124-92f3-4069-98aa-a71b8243250c-latch-00
>  at 
> org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.runCleanupTest(ZooKeeperHaServicesTest.java:203)
>  at 
> org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.testSimpleCloseAndCleanupAllData(ZooKeeperHaServicesTest.java:128)
>  
> Travis links: https://travis-ci.org/apache/flink/jobs/502960186



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-11850) ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis

2019-03-07 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-11850:
-

Assignee: Till Rohrmann  (was: Congxian Qiu(klion26))

> ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis
> 
>
> Key: FLINK-11850
> URL: https://issues.apache.org/jira/browse/FLINK-11850
> Project: Flink
>  Issue Type: Test
>  Components: Runtime / Coordination, Tests
>Reporter: Congxian Qiu(klion26)
>Assignee: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
>
> org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest
> 08:20:01.694 [ERROR] 
> testSimpleCloseAndCleanupAllData(org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest)
>  Time elapsed: 0.076 s <<< ERROR!
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /foo/bar/flink/default/leaderlatch/resource_manager_lock/_c_477d0124-92f3-4069-98aa-a71b8243250c-latch-00
>  at 
> org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.runCleanupTest(ZooKeeperHaServicesTest.java:203)
>  at 
> org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.testSimpleCloseAndCleanupAllData(ZooKeeperHaServicesTest.java:128)
>  
> Travis links: https://travis-ci.org/apache/flink/jobs/502960186



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11850) ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis

2019-03-07 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-11850:
-

 Summary: ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData 
fails on Travis
 Key: FLINK-11850
 URL: https://issues.apache.org/jira/browse/FLINK-11850
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: Congxian Qiu(klion26)


org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest
08:20:01.694 [ERROR] 
testSimpleCloseAndCleanupAllData(org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest)
 Time elapsed: 0.076 s <<< ERROR!
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/foo/bar/flink/default/leaderlatch/resource_manager_lock/_c_477d0124-92f3-4069-98aa-a71b8243250c-latch-00
 at 
org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.runCleanupTest(ZooKeeperHaServicesTest.java:203)
 at 
org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.testSimpleCloseAndCleanupAllData(ZooKeeperHaServicesTest.java:128)
 
Travis links: https://travis-ci.org/apache/flink/jobs/502960186



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11825) Resolve name clash of StateTTL TimeCharacteristic class

2019-03-06 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785553#comment-16785553
 ] 

Congxian Qiu(klion26) commented on FLINK-11825:
---

Thank you [~azagrebin], I'll give a patch soon as you suggested.

> Resolve name clash of StateTTL TimeCharacteristic class
> ---
>
> Key: FLINK-11825
> URL: https://issues.apache.org/jira/browse/FLINK-11825
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.7.2
>Reporter: Fabian Hueske
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> The StateTTL feature introduced the class 
> \{{org.apache.flink.api.common.state.TimeCharacteristic}} which clashes with 
> \{{org.apache.flink.streaming.api.TimeCharacteristic}}. 
> This is a problem for two reasons:
> 1. Users get confused because the mistakenly import 
> \{{org.apache.flink.api.common.state.TimeCharacteristic}}.
> 2. When using the StateTTL feature, users need to spell out the package name 
> for \{{org.apache.flink.api.common.state.TimeCharacteristic}} because the 
> other class is most likely already imported.
> Since \{{org.apache.flink.streaming.api.TimeCharacteristic}} is one of the 
> most used classes of the DataStream API, we should make sure that users can 
> use it without import problems.
> These error are hard to spot and confusing for many users. 
> I see two ways to resolve the issue:
> 1. drop \{{org.apache.flink.api.common.state.TimeCharacteristic}} and use 
> \{{org.apache.flink.streaming.api.TimeCharacteristic}} throwing an exception 
> if an incorrect characteristic is used.
> 2. rename the class \{{org.apache.flink.api.common.state.TimeCharacteristic}} 
> to some other name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11825) Resolve name clash of StateTTL TimeCharacteristic class

2019-03-05 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785282#comment-16785282
 ] 

Congxian Qiu(klion26) commented on FLINK-11825:
---

[~fhueske] If you do not mind, I could help to implement this.

I think the first solution is better because we can maintain TimeCharacteristic 
in one place and we may support the other TimeCharacteristics for TTL state in 
the future.

> Resolve name clash of StateTTL TimeCharacteristic class
> ---
>
> Key: FLINK-11825
> URL: https://issues.apache.org/jira/browse/FLINK-11825
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Affects Versions: 1.7.2
>Reporter: Fabian Hueske
>Priority: Major
>
> The StateTTL feature introduced the class 
> \{{org.apache.flink.api.common.state.TimeCharacteristic}} which clashes with 
> \{{org.apache.flink.streaming.api.TimeCharacteristic}}. 
> This is a problem for two reasons:
> 1. Users get confused because the mistakenly import 
> \{{org.apache.flink.api.common.state.TimeCharacteristic}}.
> 2. When using the StateTTL feature, users need to spell out the package name 
> for \{{org.apache.flink.api.common.state.TimeCharacteristic}} because the 
> other class is most likely already imported.
> Since \{{org.apache.flink.streaming.api.TimeCharacteristic}} is one of the 
> most used classes of the DataStream API, we should make sure that users can 
> use it without import problems.
> These error are hard to spot and confusing for many users. 
> I see two ways to resolve the issue:
> 1. drop \{{org.apache.flink.api.common.state.TimeCharacteristic}} and use 
> \{{org.apache.flink.streaming.api.TimeCharacteristic}} throwing an exception 
> if an incorrect characteristic is used.
> 2. rename the class \{{org.apache.flink.api.common.state.TimeCharacteristic}} 
> to some other name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11797) Implement empty unit tests in CheckpointCoordinatorMasterHooksTest

2019-03-02 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-11797:
--
Description: 
Current, there are some empty tests(no implementation) such as 
{{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, 
}}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, 
}}{{testTypeIncompatibleWithSerializerOnStore and 
}}{{testTypeIncompatibleWithHookOnRestore in class 
}}{{CheckpointCoordinatorMasterHooksTest[1].}}

If implement them make sense, I'll give the patch.

[1] 
[https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393]

  was:
Current, there are some empty tests(no implementation) such as 
{{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, 
}}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, 
}}{{testTypeIncompatibleWithSerializerOnStore and 
}}{{testTypeIncompatibleWithHookOnRestore in class 
}}{{CheckpointCoordinatorMasterHooksTest[1].}}

If implementation them make sense, I'll give the patch.

[1] 
[https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393]


> Implement empty unit tests in CheckpointCoordinatorMasterHooksTest
> --
>
> Key: FLINK-11797
> URL: https://issues.apache.org/jira/browse/FLINK-11797
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> Current, there are some empty tests(no implementation) such as 
> {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, 
> }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, 
> }}{{testTypeIncompatibleWithSerializerOnStore and 
> }}{{testTypeIncompatibleWithHookOnRestore in class 
> }}{{CheckpointCoordinatorMasterHooksTest[1].}}
> If implement them make sense, I'll give the patch.
> [1] 
> [https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11797) Implement empty unit tests in CheckpointCoordinatorMasterHooksTest

2019-03-02 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-11797:
--
Summary: Implement empty unit tests in CheckpointCoordinatorMasterHooksTest 
 (was: Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest)

> Implement empty unit tests in CheckpointCoordinatorMasterHooksTest
> --
>
> Key: FLINK-11797
> URL: https://issues.apache.org/jira/browse/FLINK-11797
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> Current, there are some empty tests(no implementation) such as 
> {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, 
> }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, 
> }}{{testTypeIncompatibleWithSerializerOnStore and 
> }}{{testTypeIncompatibleWithHookOnRestore in class 
> }}{{CheckpointCoordinatorMasterHooksTest[1].}}
> If implementation them make sense, I'll give the patch.
> [1] 
> [https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11797) Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest

2019-03-02 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-11797:
--
Description: 
Current, there are some empty tests(no implementation) such as 
{{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, 
}}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, 
}}{{testTypeIncompatibleWithSerializerOnStore and 
}}{{testTypeIncompatibleWithHookOnRestore in class 
}}{{CheckpointCoordinatorMasterHooksTest[1].}}

If implementation them make sense, I'll give the patch.

[1] 
[https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393]

  was:
Current, there are some empty tests(no implementation) such as 
{{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, 
}}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, 
}}{{testTypeIncompatibleWithSerializerOnStore and 
}}{{testTypeIncompatibleWithHookOnRestore in class 
}}{{CheckpointCoordinatorMasterHooksTest[1].}}{{}}

If implementation them make sense, I'll give the patch.

[1] 
https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393


> Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest
> ---
>
> Key: FLINK-11797
> URL: https://issues.apache.org/jira/browse/FLINK-11797
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> Current, there are some empty tests(no implementation) such as 
> {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, 
> }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, 
> }}{{testTypeIncompatibleWithSerializerOnStore and 
> }}{{testTypeIncompatibleWithHookOnRestore in class 
> }}{{CheckpointCoordinatorMasterHooksTest[1].}}
> If implementation them make sense, I'll give the patch.
> [1] 
> [https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-11797) Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest

2019-03-02 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) updated FLINK-11797:
--
Affects Version/s: 1.8.0

> Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest
> ---
>
> Key: FLINK-11797
> URL: https://issues.apache.org/jira/browse/FLINK-11797
> Project: Flink
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 1.8.0
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>
> Current, there are some empty tests(no implementation) such as 
> {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, 
> }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, 
> }}{{testTypeIncompatibleWithSerializerOnStore and 
> }}{{testTypeIncompatibleWithHookOnRestore in class 
> }}{{CheckpointCoordinatorMasterHooksTest[1].}}{{}}
> If implementation them make sense, I'll give the patch.
> [1] 
> https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11797) Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest

2019-03-02 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-11797:
-

 Summary: Implementation empty unit tests in 
CheckpointCoordinatorMasterHooksTest
 Key: FLINK-11797
 URL: https://issues.apache.org/jira/browse/FLINK-11797
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


Current, there are some empty tests(no implementation) such as 
{{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, 
}}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, 
}}{{testTypeIncompatibleWithSerializerOnStore and 
}}{{testTypeIncompatibleWithHookOnRestore in class 
}}{{CheckpointCoordinatorMasterHooksTest[1].}}{{}}

If implementation them make sense, I'll give the patch.

[1] 
https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-11774) IllegalArgumentException in HeapPriorityQueueSet

2019-02-28 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780426#comment-16780426
 ] 

Congxian Qiu(klion26) edited comment on FLINK-11774 at 2/28/19 11:46 AM:
-

Running the given code on master branch also has the problem sometimes


was (Author: klion26):
Running the given code on master branch also has the problem, I'll start to 
investigate this.

> IllegalArgumentException in HeapPriorityQueueSet
> 
>
> Key: FLINK-11774
> URL: https://issues.apache.org/jira/browse/FLINK-11774
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.7.2
> Environment: Can reproduce on the following configurations:
>  
> OS: macOS 10.14.3
> Java: 1.8.0_202
>  
> OS: CentOS 7.2.1511
> Java: 1.8.0_102
>Reporter: Kirill Vainer
>Priority: Major
> Attachments: flink-bug-dist.zip, flink-bug-src.zip
>
>
> Hi,
> I encountered the following exception:
> {code}
> Exception in thread "main" 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
> at 
> org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:647)
> at 
> org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123)
> at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1510)
> at flink.bug.App.main(App.java:21)
> Caused by: java.lang.IllegalArgumentException
> at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:123)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.globalKeyGroupToLocalIndex(HeapPriorityQueueSet.java:158)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.getDedupMapForKeyGroup(HeapPriorityQueueSet.java:147)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.getDedupMapForElement(HeapPriorityQueueSet.java:154)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.add(HeapPriorityQueueSet.java:121)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.add(HeapPriorityQueueSet.java:49)
> at 
> org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.registerProcessingTimeTimer(InternalTimerServiceImpl.java:197)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator$Context.registerProcessingTimeTimer(WindowOperator.java:876)
> at 
> org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger.onElement(ProcessingTimeTrigger.java:36)
> at 
> org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger.onElement(ProcessingTimeTrigger.java:28)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator$Context.onElement(WindowOperator.java:895)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:396)
> at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
> at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Code that reproduces the problem:
> {code:java}
> package flink.bug;
> import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
> import org.apache.flink.streaming.api.functions.sink.SinkFunction;
> import org.apache.flink.streaming.api.windowing.time.Time;
> public class App {
> public static void main(String[] args) throws Exception {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(2);
> env.fromElements(1, 2)
> .map(Aggregate::new)
> .keyBy(Aggregate::getKey)
> .timeWindow(Time.seconds(2))
> .reduce(Aggregate::reduce)
> .addSink(new CollectSink());
> env.execute();
> }
> private static class Aggregate {
> private Key key = new Key();
> public Aggregate(long number) {
> }
> public static Aggregate reduce(Aggregate a, Aggregate b) {
> return new Aggregate(0);
> }
> public Key getKey() {
> return key;
> }
> }
> public static class Key {
> }
> private static class CollectSink implements SinkFunction {

[jira] [Commented] (FLINK-11774) IllegalArgumentException in HeapPriorityQueueSet

2019-02-28 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780426#comment-16780426
 ] 

Congxian Qiu(klion26) commented on FLINK-11774:
---

Running the given code on master branch also has the problem, I'll start to 
investigate this.

> IllegalArgumentException in HeapPriorityQueueSet
> 
>
> Key: FLINK-11774
> URL: https://issues.apache.org/jira/browse/FLINK-11774
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.7.2
> Environment: Can reproduce on the following configurations:
>  
> OS: macOS 10.14.3
> Java: 1.8.0_202
>  
> OS: CentOS 7.2.1511
> Java: 1.8.0_102
>Reporter: Kirill Vainer
>Priority: Major
> Attachments: flink-bug-dist.zip, flink-bug-src.zip
>
>
> Hi,
> I encountered the following exception:
> {code}
> Exception in thread "main" 
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at 
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
> at 
> org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:647)
> at 
> org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123)
> at 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1510)
> at flink.bug.App.main(App.java:21)
> Caused by: java.lang.IllegalArgumentException
> at 
> org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:123)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.globalKeyGroupToLocalIndex(HeapPriorityQueueSet.java:158)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.getDedupMapForKeyGroup(HeapPriorityQueueSet.java:147)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.getDedupMapForElement(HeapPriorityQueueSet.java:154)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.add(HeapPriorityQueueSet.java:121)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.add(HeapPriorityQueueSet.java:49)
> at 
> org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.registerProcessingTimeTimer(InternalTimerServiceImpl.java:197)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator$Context.registerProcessingTimeTimer(WindowOperator.java:876)
> at 
> org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger.onElement(ProcessingTimeTrigger.java:36)
> at 
> org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger.onElement(ProcessingTimeTrigger.java:28)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator$Context.onElement(WindowOperator.java:895)
> at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:396)
> at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
> at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Code that reproduces the problem:
> {code:java}
> package flink.bug;
> import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
> import org.apache.flink.streaming.api.functions.sink.SinkFunction;
> import org.apache.flink.streaming.api.windowing.time.Time;
> public class App {
> public static void main(String[] args) throws Exception {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(2);
> env.fromElements(1, 2)
> .map(Aggregate::new)
> .keyBy(Aggregate::getKey)
> .timeWindow(Time.seconds(2))
> .reduce(Aggregate::reduce)
> .addSink(new CollectSink());
> env.execute();
> }
> private static class Aggregate {
> private Key key = new Key();
> public Aggregate(long number) {
> }
> public static Aggregate reduce(Aggregate a, Aggregate b) {
> return new Aggregate(0);
> }
> public Key getKey() {
> return key;
> }
> }
> public static class Key {
> }
> private static class CollectSink implements SinkFunction {
> private static final long serialVersionUID = 1;
> @SuppressWarnings("rawtypes")
> @Override
> public void 

[jira] [Assigned] (FLINK-11141) Key generation for RocksDBMapState can theoretically be ambiguous

2019-02-27 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) reassigned FLINK-11141:
-

Assignee: (was: Congxian Qiu(klion26))

> Key generation for RocksDBMapState can theoretically be ambiguous
> -
>
> Key: FLINK-11141
> URL: https://issues.apache.org/jira/browse/FLINK-11141
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.5.5, 1.6.2, 1.7.0
>Reporter: Stefan Richter
>Priority: Critical
>
> RocksDBMap state stores values in RocksDB under a composite key from the 
> serialized bytes of {{key-group-id|key|namespace|user-key}}. In this 
> composition, key, namespace, and user-key can either have fixed sized or 
> variable sized serialization formats. In cases of at least 2 variable 
> formats, ambiguity can be possible, e.g.:
> abcd <-> efg
> abc <-> defg
> Our code takes care of this for all other states, where composite keys only 
> consist of key and namespace by checking for 2x variable size and appending 
> the serialized length to each byte sequence.
> However, for map state there is no inclusion of the user-key in the check for 
> potential ambiguity, as well as for appending the size. This means that, in 
> theory, some combinations can produce colliding composite keys in RocksDB. 
> What is required is to include the user-key serializer in the check and 
> append the length there as well.
> Please notice that this cannot be simply changed because it has implications 
> for backwards compatibility and requires some form of migration for the state 
> keys on restore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-11483) Improve StreamOperatorSnapshotRestoreTest with Parameterized

2019-02-26 Thread Congxian Qiu(klion26) (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-11483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Congxian Qiu(klion26) resolved FLINK-11483.
---
Resolution: Fixed

> Improve StreamOperatorSnapshotRestoreTest with Parameterized
> 
>
> Key: FLINK-11483
> URL: https://issues.apache.org/jira/browse/FLINK-11483
> Project: Flink
>  Issue Type: Test
>  Components: State Backends, Checkpointing, Tests
>Reporter: Congxian Qiu(klion26)
>Assignee: Congxian Qiu(klion26)
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In current implementation, we will test {{StreamOperatorSnapshot}} with three 
> statebackend: {{File}}, {{RocksDB_FULL}}, {{RocksDB_Incremental}}, each in a 
> sperate class, we could improve this with Parameterized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11738) Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher15e85f5d-a55d-4773-8197-f0db5658f55b#1335897563]] after [10000 ms]. Se

2019-02-24 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776596#comment-16776596
 ] 

Congxian Qiu(klion26) commented on FLINK-11738:
---

Hi, [~thinktothings] , I think 11690 has solved the problem. you can get the 
latest code from master branch. If the problem has been resolved, could you 
please close this issue.

> Caused by: akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher15e85f5d-a55d-4773-8197-f0db5658f55b#1335897563]]
>  after [1 ms]. Sender[null] sent
> --
>
> Key: FLINK-11738
> URL: https://issues.apache.org/jira/browse/FLINK-11738
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.7.2
> Environment: flink 1.7.2 client
> !image-2019-02-25-10-57-20-106.png!
>  
> !image-2019-02-25-10-57-32-876.png!
>  
>  
> !image-2019-02-25-10-57-39-753.png!
>  
>  
>  
>  
>  
>Reporter: thinktothings
>Priority: Major
> Attachments: image-2019-02-25-13-11-13-723.png
>
>
> Akka.ask.timeout 10 seconds, this miniCluster environment is written dead, 
> can not be changed?
> -
> org.apache.flink.runtime.minicluster.MiniCluster
> /**
>  * Creates a new Flink mini cluster based on the given configuration.
>  *
>  * @param miniClusterConfiguration The configuration for the mini cluster
>  */
>  public MiniCluster(MiniClusterConfiguration miniClusterConfiguration) \{ 
> this.miniClusterConfiguration = checkNotNull(miniClusterConfiguration, 
> "config may not be null"); this.rpcTimeout = Time.seconds(10L); 
> this.terminationFuture = CompletableFuture.completedFuture(null); running = 
> false; }
> -
>   !image-2019-02-25-13-11-13-723.png!
>  
>  
> -
>  
> package com.opensourceteams.module.bigdata.flink.example.stream.worldcount.nc
> import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
>  import org.apache.flink.streaming.api.windowing.time.Time
> /**
>  * nc -lk 1234 输入数据
>  */
>  object SocketWindowWordCount {
> def main(args: Array[String]): Unit = {
> val port = 1234
>  // get the execution environment
>  val env: StreamExecutionEnvironment = 
> StreamExecutionEnvironment.getExecutionEnvironment
> // get input data by connecting to the socket
>  val dataStream = env.socketTextStream("localhost", port, '\n')
> import org.apache.flink.streaming.api.scala._
>  val textResult = dataStream.flatMap( w => w.split("
>  s") ).map( w => WordWithCount(w,1))
>  .keyBy("word")
>  /**
>  * 每5秒刷新一次,相当于重新开始计数,
>  * 好处,不需要一直拿所有的数据统计
>  * 只需要在指定时间间隔内的增量数据,减少了数据规模
>  */
>  .timeWindow(Time.seconds(5))
>  .sum("count" )
> textResult.print().setParallelism(1)
> if(args == null || args.size ==0)
> { env.execute("默认作业") }
> else
> { env.execute(args(0)) }
> println("结束")
> }
> // Data type for words with count
>  case class WordWithCount(word: String, count: Long)
> }
>  
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10481) Wordcount end-to-end test in docker env unstable

2019-02-20 Thread Congxian Qiu(klion26) (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773755#comment-16773755
 ] 

Congxian Qiu(klion26) commented on FLINK-10481:
---

another instance: https://api.travis-ci.org/v3/job/496339787/log.txt

> Wordcount end-to-end test in docker env unstable
> 
>
> Key: FLINK-10481
> URL: https://issues.apache.org/jira/browse/FLINK-10481
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.7.0
>Reporter: Till Rohrmann
>Assignee: Dawid Wysakowicz
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.6.3, 1.7.0
>
>
> The {{Wordcount end-to-end test in docker env}} fails sometimes on Travis 
> with the following problem:
> {code}
> Status: Downloaded newer image for java:8-jre-alpine
>  ---> fdc893b19a14
> Step 2/16 : RUN apk add --no-cache bash snappy
>  ---> [Warning] IPv4 forwarding is disabled. Networking will not work.
>  ---> Running in 4329ebcd8a77
> fetch http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz
> WARNING: Ignoring 
> http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz: 
> temporary error (try again later)
> fetch 
> http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz
> WARNING: Ignoring 
> http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz: 
> temporary error (try again later)
> ERROR: unsatisfiable constraints:
>   bash (missing):
> required by: world[bash]
>   snappy (missing):
> required by: world[snappy]
> The command '/bin/sh -c apk add --no-cache bash snappy' returned a non-zero 
> code: 2
> {code}
> https://api.travis-ci.org/v3/job/434909395/log.txt
> It seems as if it is related to 
> https://github.com/gliderlabs/docker-alpine/issues/264 and 
> https://github.com/gliderlabs/docker-alpine/issues/279.
> We might want to switch to a different base image to avoid these problems in 
> the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11704) Improve AbstractCheckpointStateOutputStreamTestBase with Parameterized

2019-02-20 Thread Congxian Qiu(klion26) (JIRA)
Congxian Qiu(klion26) created FLINK-11704:
-

 Summary: Improve AbstractCheckpointStateOutputStreamTestBase with 
Parameterized
 Key: FLINK-11704
 URL: https://issues.apache.org/jira/browse/FLINK-11704
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Reporter: Congxian Qiu(klion26)
Assignee: Congxian Qiu(klion26)


In the current version,  there locates an 
{{AbstractCheckpointStateOutputStreamTestBase}}  and two implementation class 
{{FileBasedStateOutputStreamTest}} and  
{{FsCheckpointMetadataOutputStreamTest}}, the differences are return the 
different {{FSDataOutputStream}} and the specified {{FileStateHandle. we can 
improve this by using Junit's Parameterized.}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    1   2   3   4