[jira] [Commented] (FLINK-12683) Provide task manager's location information for checkpoint coordinator specific log messages
[ https://issues.apache.org/jira/browse/FLINK-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852658#comment-16852658 ] Congxian Qiu(klion26) commented on FLINK-12683: --- In my opinion, the jm log of current version(1.8) is enough(we can find the task location in jm log now), so I don't think we need this patch. > Provide task manager's location information for checkpoint coordinator > specific log messages > > > Key: FLINK-12683 > URL: https://issues.apache.org/jira/browse/FLINK-12683 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, the {{AcknowledgeCheckpoint}} does not contain the task manager's > location information. When a task's snapshot task sends an ack message to the > coordinator, we can only log this message: > {code:java} > Received late message for now expired checkpoint attempt 6035 from > ccd88d08bf82245f3466c9480fb5687a of job 775ef8ff0159b071da7804925bbd362f. > {code} > Sometimes we need to get this sub task's location information to do the > further debug work, e.g. stack trace dump. But, without the location > information, It will not help to quickly locate the problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12683) Provide task manager's location information for checkpoint acknowledge message to improve the checkpoint log message
[ https://issues.apache.org/jira/browse/FLINK-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852572#comment-16852572 ] Congxian Qiu(klion26) commented on FLINK-12683: --- Please have a look at FLINK-1165.:) > Provide task manager's location information for checkpoint acknowledge > message to improve the checkpoint log message > > > Key: FLINK-12683 > URL: https://issues.apache.org/jira/browse/FLINK-12683 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > Currently, the {{AcknowledgeCheckpoint}} does not contain the task manager's > location information. When a task's snapshot task sends an ack message to the > coordinator, we can only log this message: > {code:java} > Received late message for now expired checkpoint attempt 6035 from > ccd88d08bf82245f3466c9480fb5687a of job 775ef8ff0159b071da7804925bbd362f. > {code} > Sometimes we need to get this sub task's location information to do the > further debug work, e.g. stack trace dump. But, without the location > information, It will not help to quickly locate the problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-12683) Provide task manager's location information for checkpoint acknowledge message to improve the checkpoint log message
[ https://issues.apache.org/jira/browse/FLINK-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852572#comment-16852572 ] Congxian Qiu(klion26) edited comment on FLINK-12683 at 5/31/19 2:03 AM: Please have a look at FLINK-11165.:) was (Author: klion26): Please have a look at FLINK-1165.:) > Provide task manager's location information for checkpoint acknowledge > message to improve the checkpoint log message > > > Key: FLINK-12683 > URL: https://issues.apache.org/jira/browse/FLINK-12683 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > Currently, the {{AcknowledgeCheckpoint}} does not contain the task manager's > location information. When a task's snapshot task sends an ack message to the > coordinator, we can only log this message: > {code:java} > Received late message for now expired checkpoint attempt 6035 from > ccd88d08bf82245f3466c9480fb5687a of job 775ef8ff0159b071da7804925bbd362f. > {code} > Sometimes we need to get this sub task's location information to do the > further debug work, e.g. stack trace dump. But, without the location > information, It will not help to quickly locate the problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12683) Provide task manager's location information for checkpoint acknowledge message to improve the checkpoint log message
[ https://issues.apache.org/jira/browse/FLINK-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851694#comment-16851694 ] Congxian Qiu(klion26) commented on FLINK-12683: --- I think we can find the host and containerid by using taskExecutionId in the log in jm log currently. > Provide task manager's location information for checkpoint acknowledge > message to improve the checkpoint log message > > > Key: FLINK-12683 > URL: https://issues.apache.org/jira/browse/FLINK-12683 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: vinoyang >Assignee: vinoyang >Priority: Major > > Currently, the {{AcknowledgeCheckpoint}} does not contain the task manager's > location information. When a task's snapshot task sends an ack message to the > coordinator, we can only log this message: > {code:java} > Received late message for now expired checkpoint attempt 6035 from > ccd88d08bf82245f3466c9480fb5687a of job 775ef8ff0159b071da7804925bbd362f. > {code} > Sometimes we need to get this sub task's location information to do the > further debug work, e.g. stack trace dump. But, without the location > information, It will not help to quickly locate the problem. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11947) Support MapState value schema evolution for RocksDB
[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851503#comment-16851503 ] Congxian Qiu(klion26) commented on FLINK-11947: --- hi, [~cre...@gmail.com] I've filed a pr for this [https://github.com/apache/flink/pull/8565] > Support MapState value schema evolution for RocksDB > --- > > Key: FLINK-11947 > URL: https://issues.apache.org/jira/browse/FLINK-11947 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System, Runtime / State Backends >Reporter: Tzu-Li (Gordon) Tai >Assignee: Congxian Qiu(klion26) >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we do not attempt to perform state schema evolution if the key or > value's schema of a user {{MapState}} has changed when using {{RocksDB}}: > https://github.com/apache/flink/blob/953a5ffcbdae4115f7d525f310723cf8770779df/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L542 > This was disallowed in the initial support for state schema evolution because > the way we did state evolution in the RocksDB state backend was simply > overwriting values. > For {{MapState}} key evolution, only overwriting RocksDB values does not > work, since RocksDB entries for {{MapState}} uses a composite key containing > the map state key. This means that when evolving {{MapState}} in this case > with an evolved key schema, we will have new entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11947) Support MapState key / value schema evolution for RocksDB
[ https://issues.apache.org/jira/browse/FLINK-11947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850497#comment-16850497 ] Congxian Qiu(klion26) commented on FLINK-11947: --- [~cresny] I'm interested in this issue, and wanna give a fix for this. I'm happy [~tzulitai] will shepherd the effort. > Support MapState key / value schema evolution for RocksDB > - > > Key: FLINK-11947 > URL: https://issues.apache.org/jira/browse/FLINK-11947 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System, Runtime / State Backends >Reporter: Tzu-Li (Gordon) Tai >Priority: Critical > > Currently, we do not attempt to perform state schema evolution if the key or > value's schema of a user {{MapState}} has changed when using {{RocksDB}}: > https://github.com/apache/flink/blob/953a5ffcbdae4115f7d525f310723cf8770779df/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L542 > This was disallowed in the initial support for state schema evolution because > the way we did state evolution in the RocksDB state backend was simply > overwriting values. > For {{MapState}} key evolution, only overwriting RocksDB values does not > work, since RocksDB entries for {{MapState}} uses a composite key containing > the map state key. This means that when evolving {{MapState}} in this case > with an evolved key schema, we will have new entries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850490#comment-16850490 ] Congxian Qiu(klion26) commented on FLINK-12296: --- Thanks for the reminder [~sunjincheng121] > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Blocker > Labels: pull-request-available > Fix For: 1.7.3, 1.9.0, 1.8.1 > > Time Spent: 20m > Remaining Estimate: 0h > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16850320#comment-16850320 ] Congxian Qiu(klion26) edited comment on FLINK-12296 at 5/29/19 2:42 AM: Since the commits have been merged in master, release-1.8, release-1.7 and release-1.6(find the commits' info in the previous comments), resolve this issue. was (Author: klion26): Since the commits have been merged in master, release-1.8, release-1.7 and release-1.6(find the commits' info in the last comments), resolve this issue. > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Blocker > Labels: pull-request-available > Fix For: 1.7.3, 1.9.0, 1.8.1 > > Time Spent: 20m > Remaining Estimate: 0h > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) resolved FLINK-12296. --- Resolution: Fixed Since the commits have been merged in master, release-1.8, release-1.7 and release-1.6(find the commits' info in the last comments), resolve this issue. > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Blocker > Labels: pull-request-available > Fix For: 1.7.3, 1.9.0, 1.8.1 > > Time Spent: 20m > Remaining Estimate: 0h > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12619) Support TERMINATE/SUSPEND Job with Checkpoint
[ https://issues.apache.org/jira/browse/FLINK-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849589#comment-16849589 ] Congxian Qiu(klion26) commented on FLINK-12619: --- [~aljoscha] thanks for you reply. For this issue, I want to the following steps (most of them will reuse the code of FLINK-11458) * add a requried functions and RPCs in JM and TM for this issue and ** {{CheckpointCoordinator#triggerSynchronousCheckpoint}} (aligned with {{triggerSynchronousSavepoint}}) ** {{SchedulerNG#stopWithCheckpoint}} (aligned with {{stopWithSavepoint}}) ** a new {{CheckpointType}} named with {{SYNC_CHECKPOINT}}(aiigned with {{SYNC_SAVEPOINT}} ** {{JobMaster#stopWithCheckpoint}} (aligned with {{stopWithSavepoint}}) ** Aligned to allow sync checkpoint (current only support sync savepoint) ** Some needed test for this * export this to CLI ** will add a option receive no paramer(will reuse the preconfigured checkpoint directory), mostly like {{CliFrontedParser.STOP_WITH_SAVEPOINT}} * add rest api for this ** will add endpoint, restful gateway, trigger handler, request boby and so on(like FLINK-11458) What do you think? > Support TERMINATE/SUSPEND Job with Checkpoint > - > > Key: FLINK-12619 > URL: https://issues.apache.org/jira/browse/FLINK-12619 > Project: Flink > Issue Type: New Feature > Components: Runtime / State Backends >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > Inspired by the idea of FLINK-11458, we propose to support terminate/suspend > a job with checkpoint. This improvement cooperates with incremental and > external checkpoint features, that if checkpoint is retained and this feature > is configured, we will trigger a checkpoint before the job stops. It could > accelarate job recovery a lot since: > 1. No source rewinding required any more. > 2. It's much faster than taking a savepoint since incremental checkpoint is > enabled. > Please note that conceptually savepoints is different from checkpoint in a > similar way that backups are different from recovery logs in traditional > database systems. So we suggest using this feature only for job recovery, > while stick with FLINK-11458 for the > upgrading/cross-cluster-job-migration/state-backend-switch cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12619) Support TERMINATE/SUSPEND Job with Checkpoint
Congxian Qiu(klion26) created FLINK-12619: - Summary: Support TERMINATE/SUSPEND Job with Checkpoint Key: FLINK-12619 URL: https://issues.apache.org/jira/browse/FLINK-12619 Project: Flink Issue Type: New Feature Components: Runtime / State Backends Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) Inspired by the idea of FLINK-11458, we propose to support terminate/suspend a job with checkpoint. This improvement cooperates with incremental and external checkpoint features, that if checkpoint is retained and this feature is configured, we will trigger a checkpoint before the job stops. It could accelarate job recovery a lot since: 1. No source rewinding required any more. 2. It's much faster than taking a savepoint since incremental checkpoint is enabled. Please note that conceptually savepoints is different from checkpoint in a similar way that backups are different from recovery logs in traditional database systems. So we suggest using this feature only for job recovery, while stick with FLINK-11458 for the upgrading/cross-cluster-job-migration/state-backend-switch cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12607) Introduce a REST API that returns the maxParallelism of a job
[ https://issues.apache.org/jira/browse/FLINK-12607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847179#comment-16847179 ] Congxian Qiu(klion26) commented on FLINK-12607: --- Thanks for filing the issue, I like this improvement, so that people can know this metric easily. Before this improvement, I think we can calculate this by using the formula in the doc[1] [1][https://ci.apache.org/projects/flink/flink-docs-stable/ops/production_ready.html#set-maximum-parallelism-for-operators-explicitly] > Introduce a REST API that returns the maxParallelism of a job > - > > Key: FLINK-12607 > URL: https://issues.apache.org/jira/browse/FLINK-12607 > Project: Flink > Issue Type: Improvement > Components: Runtime / REST >Affects Versions: 1.6.3 >Reporter: Akshay Kanfade >Priority: Minor > > Today, Flink does not offer any way to get the maxParallelism for a job and > it's operators through any of the REST APIs. Since, the internal state > already tracks maxParallelism for a job and it's operators, we should expose > this via the REST APIs so that application developer can get more insights on > the current state. > There can be two approaches on how we can do this - > Approach 1 : > Modify the existing rest API response model to additionally expose a new > field 'maxParallelism'. Some of the REST APIs that would be affected by this > |h5. */jobs/:jobid/vertices/:vertexid*| > |h5. */jobs/:jobid*| > > Approach 2 : > Create a new REST API that would only return maxParallelism for a job and > it's operators. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846382#comment-16846382 ] Congxian Qiu(klion26) commented on FLINK-12296: --- merged * master ee60846dc588b1a832a497ff9522d7a3a282c350 * release-1.8 531d727f9b32c310d8d63b253019b8cc4a23a3eb * release-1.7 1ce2efd7a38d091fc004a8dba034ece0bcc42385 * release-1.6 0dda6fe9dff4f667b110cda39bfe9738ba615b24 > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Blocker > Labels: pull-request-available > Fix For: 1.7.3, 1.9.0, 1.8.1 > > Time Spent: 20m > Remaining Estimate: 0h > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-12536) Make BufferOrEventSequence#getNext() non-blocking
[ https://issues.apache.org/jira/browse/FLINK-12536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-12536: - Assignee: Congxian Qiu(klion26) > Make BufferOrEventSequence#getNext() non-blocking > - > > Key: FLINK-12536 > URL: https://issues.apache.org/jira/browse/FLINK-12536 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Network >Affects Versions: 1.9.0 >Reporter: Piotr Nowojski >Assignee: Congxian Qiu(klion26) >Priority: Major > > Currently it is non-blocking in case of credit-based flow control (default), > however for \{{SpilledBufferOrEventSequence}} it is blocking on reading from > file. We might want to consider reimplementing it to be non blocking with > {{CompletableFuture isAvailable()}} method. > > Otherwise we will block mailbox processing for the duration of reading from > file - for example we will block processing time timers and potentially in > the future network flushes. > > This is not a high priority change, since it affects non-default > configuration option AND at the moment only processing time timers are > planned to be moved to the mailbox for 1.9. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11634) Translate "State Backends" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845702#comment-16845702 ] Congxian Qiu(klion26) commented on FLINK-11634: --- [~zxhe] I can't find your id for assigner, maybe only can assign to you once you become a contributor. If you really want to contribute to this issue, maybe you could file a pr directly? and you'll become contributor after the pr has been merged. > Translate "State Backends" page into Chinese > > > Key: FLINK-11634 > URL: https://issues.apache.org/jira/browse/FLINK-11634 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Priority: Major > > doc locates in flink/docs/dev/stream/state/state_backens.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11560) Translate "Flink Applications" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845437#comment-16845437 ] Congxian Qiu(klion26) commented on FLINK-11560: --- thanks for your contribution. I'll review this today [~Brian Zhou] > Translate "Flink Applications" page into Chinese > > > Key: FLINK-11560 > URL: https://issues.apache.org/jira/browse/FLINK-11560 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Project Website >Reporter: Jark Wu >Assignee: Zhou Yumin >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Translate "Flink Applications" page into Chinese. > The markdown file is located in: flink-web/flink-applications.zh.md > The url link is: https://flink.apache.org/zh/flink-applications.html > Please adjust the links in the page to Chinese pages when translating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11637) Translate "Checkpoints" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844658#comment-16844658 ] Congxian Qiu(klion26) commented on FLINK-11637: --- [~guanghui] If there is no assignee yet, and you want to contribute to the issue, I think it's ok to assign it to yourself. > Translate "Checkpoints" page into Chinese > - > > Key: FLINK-11637 > URL: https://issues.apache.org/jira/browse/FLINK-11637 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Assignee: guanghui.rong >Priority: Major > Fix For: 1.9.0 > > > doc locates in flink/docs/ops/state/checkpoints.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-12400) NullpointerException using SimpleStringSchema with Kafka
[ https://issues.apache.org/jira/browse/FLINK-12400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833187#comment-16833187 ] Congxian Qiu(klion26) edited comment on FLINK-12400 at 5/5/19 1:52 AM: --- [~PierreZ] thanks for filing this issue, I think there is already an issue want to fix this, please have a look at https://issues.apache.org/jira/browse/FLINK-11820 And a pr for FLINK-11820 [https://github.com/apache/flink/pull/7987] was (Author: klion26): [~PierreZ] thanks for filing this issue, I think there is already an issue want to fix this, please have a look at https://issues.apache.org/jira/browse/FLINK-11820 > NullpointerException using SimpleStringSchema with Kafka > > > Key: FLINK-12400 > URL: https://issues.apache.org/jira/browse/FLINK-12400 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System >Affects Versions: 1.7.2, 1.8.0 > Environment: Flink 1.7.2 job on 1.8 cluster > Kafka 0.10 with a topic in log-compaction >Reporter: Pierre Zemb >Assignee: Pierre Zemb >Priority: Minor > > Hi! > Yesterday, we saw a strange behavior with our Flink job and Kafka. We are > consuming a Kafka topic setup in > [log-compaction|https://kafka.apache.org/documentation/#compaction] mode. As > such, sending a message with a null payload acts like a tombstone. > We are consuming Kafka like this: > {code:java} > new FlinkKafkaConsumer010<> ("topic", new SimpleStringSchema(), > this.kafkaProperties) > {code} > When we sent the message, job failed because of a NullPointerException > [here|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/serialization/SimpleStringSchema.java#L75]. > `byte[] message` was null, causing the NPE. > We forked the class and added a basic nullable check, returning null if so. > It fixed our issue. > Should we add it to the main class? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12400) NullpointerException using SimpleStringSchema with Kafka
[ https://issues.apache.org/jira/browse/FLINK-12400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833187#comment-16833187 ] Congxian Qiu(klion26) commented on FLINK-12400: --- [~PierreZ] thanks for filing this issue, I think there is already an issue want to fix this, please have a look at https://issues.apache.org/jira/browse/FLINK-11820 > NullpointerException using SimpleStringSchema with Kafka > > > Key: FLINK-12400 > URL: https://issues.apache.org/jira/browse/FLINK-12400 > Project: Flink > Issue Type: Improvement > Components: API / Type Serialization System >Affects Versions: 1.7.2, 1.8.0 > Environment: Flink 1.7.2 job on 1.8 cluster > Kafka 0.10 with a topic in log-compaction >Reporter: Pierre Zemb >Assignee: Pierre Zemb >Priority: Minor > > Hi! > Yesterday, we saw a strange behavior with our Flink job and Kafka. We are > consuming a Kafka topic setup in > [log-compaction|https://kafka.apache.org/documentation/#compaction] mode. As > such, sending a message with a null payload acts like a tombstone. > We are consuming Kafka like this: > {code:java} > new FlinkKafkaConsumer010<> ("topic", new SimpleStringSchema(), > this.kafkaProperties) > {code} > When we sent the message, job failed because of a NullPointerException > [here|https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/serialization/SimpleStringSchema.java#L75]. > `byte[] message` was null, causing the NPE. > We forked the class and added a basic nullable check, returning null if so. > It fixed our issue. > Should we add it to the main class? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11637) Translate "Checkpoints" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832987#comment-16832987 ] Congxian Qiu(klion26) commented on FLINK-11637: --- [~lsy] I'm not working on this now, if you want to contribute, please feel free to assign it to yourself. > Translate "Checkpoints" page into Chinese > - > > Key: FLINK-11637 > URL: https://issues.apache.org/jira/browse/FLINK-11637 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Priority: Major > > doc locates in flink/docs/ops/state/checkpoints.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827806#comment-16827806 ] Congxian Qiu(klion26) edited comment on FLINK-12296 at 4/28/19 4:44 AM: After discussed offline with [~srichter], will change the directory to following {{../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/operatorIdentifier}} was (Author: klion26): After discussed offline with [~srichter], will change the directory to following {{../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/rocksdb_operatorIdentifier}} > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Blocker > Labels: pull-request-available > Fix For: 1.7.3, 1.9.0, 1.8.1 > > Time Spent: 10m > Remaining Estimate: 0h > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827806#comment-16827806 ] Congxian Qiu(klion26) commented on FLINK-12296: --- After discussed offline with [~srichter], will change the directory to following {{../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/rocksdb_operatorIdentifier}} > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Blocker > Labels: pull-request-available > Fix For: 1.7.3, 1.9.0, 1.8.1 > > Time Spent: 10m > Remaining Estimate: 0h > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11635) Translate "Checkpointing" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825674#comment-16825674 ] Congxian Qiu(klion26) commented on FLINK-11635: --- [~pengyang] Please feel free to take it. You can assign it to yourself if you want. > Translate "Checkpointing" page into Chinese > --- > > Key: FLINK-11635 > URL: https://issues.apache.org/jira/browse/FLINK-11635 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Priority: Major > > doc locates in flink/docs/dev/stream/state/checkpointing.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825199#comment-16825199 ] Congxian Qiu(klion26) commented on FLINK-12296: --- [~srichter] I first implemented as you suggest[1], but in {{TaskLocalStateStoreImpl#dispose()}} I can't get {{operatorId}} which needed to find the data path(I added an {{UUID}} as a placeholder in the patch temporary), so I proposed the path as {{../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1*_operator_id*/(state)}} [1] https://github.com/klion26/flink/blob/9efd8907dd08a0f50ff81e04b028cb87945da0ca/flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java#L258 > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Blocker > Fix For: 1.7.3, 1.9.0, 1.8.1 > > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825034#comment-16825034 ] Congxian Qiu(klion26) commented on FLINK-12296: --- [~srichter] from what [~ningshi] have posted in the ml[1], the two states are user state and timer state {noformat} There are two stateful operators in the chain, one is a CoBroadcastWithKeyedOperator, the other one is a StreamMapper. The CoBroadcastWithKeyedOperator creates timer states in RocksDB, the latter doesn’t. Because of the checkpoint directory collision bug, we always end up saving the states for CoBroadcastWithKeyedOperator.{noformat} Currently, I want to add {{operatorId}} in the path, so the path will look like {{../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1_operator_id/(state)}} Why use {{operatorId}} instead of the {{operatorIdentifier}} because we need to find the local data path for a specific checkpoint ( {{TaskLocalStateStoreImpl#discardLocalStateForCheckpoint(long, TaskStateSnapshot)}}). Will change as following: {code:java} //LocalRecoveryDirectoryProviderImpl.java @Override public File subtaskSpecificCheckpointDirectory(long checkpointId, String operatorIdentifier) { return new File(subtaskBaseDirectory(checkpointId), checkpointDirString(operatorIdentifier, checkpointId)); } @VisibleForTesting String checkpointDirString(String operatorIdentifier, long checkpointId) { return "opid_" + OperatorSubtaskDescriptionText.getOperatorIdByOperatorIdentifier(operatorIdentifier) + "_chk_" + checkpointId; } //OperatorSubtaskDescriptionText.java public static String getOperatorIdByOperatorIdentifier(String operatorIdentifier) { ... } {code} What do you think about this? [1][http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2imv4i2nd.wl-nings...@gmail.com%3E] > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Blocker > Fix For: 1.7.3, 1.9.0, 1.8.1 > > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-12296: -- Description: As the mail list said[1], there may be a problem when more than one operator chained in a single task, and all the operators have states, we'll encounter data loss silently problem. Currently, the local directory we used is like below ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), if more than one operator chained in a single task, and all the operators have states, then all the operators will share the same local directory(because the vertext_id is the same), this will lead a data loss problem. The path generation logic is below: {code:java} // LocalRecoveryDirectoryProviderImpl.java @Override public File subtaskSpecificCheckpointDirectory(long checkpointId) { return new File(subtaskBaseDirectory(checkpointId), checkpointDirString(checkpointId)); } @VisibleForTesting String subtaskDirString() { return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + subtaskIndex).toString(); } @VisibleForTesting String checkpointDirString(long checkpointId) { return "chk_" + checkpointId; } {code} [1] [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E] was: As the mail list said[1], there may be a problem when more than one operator chained in a single task, and all the operators have states, we'll encounter data loss silently problem. Currently, the local directory we used is like below ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), if more than one operator chained in a single task, and all the operators have states, then all the operators will share the same local directory(because the vertext_id is the same), this will lead a data loss problem. The path generation logic is below: {code:java} // LocalRecoveryDirectoryProviderImpl.java @Override public File subtaskSpecificCheckpointDirectory(long checkpointId) { return new File(subtaskBaseDirectory(checkpointId), checkpointDirString(checkpointId)); } @VisibleForTesting String subtaskDirString() { return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + subtaskIndex).toString(); } @VisibleForTesting String checkpointDirString(long checkpointId) { return "chk_" + checkpointId; } {code} [1] [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7] > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Critical > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11636) Translate "State Schema Evolution" into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824762#comment-16824762 ] Congxian Qiu(klion26) commented on FLINK-11636: --- [~yangfei] Now I'm not working on it, please assign it to yourself if you want. > Translate "State Schema Evolution" into Chinese > --- > > Key: FLINK-11636 > URL: https://issues.apache.org/jira/browse/FLINK-11636 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Priority: Major > > doc locates in flink/docs/dev/stream/state/schema_evolution.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-12296: -- Affects Version/s: 1.6.3 1.6.4 1.7.2 1.8.0 > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.6.3, 1.6.4, 1.7.2, 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Critical > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-12296: -- Priority: Critical (was: Major) > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Critical > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823998#comment-16823998 ] Congxian Qiu(klion26) edited comment on FLINK-12296 at 4/23/19 11:46 AM: - I think this is a long existing issue from release-1.5, the relative issue is [#8360|https://issues.apache.org/jira/browse/FLINK-8360] was (Author: klion26): I think this is a long existing issue from release-1.5. > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823998#comment-16823998 ] Congxian Qiu(klion26) commented on FLINK-12296: --- I think this is a long existing issue from release-1.5. > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-12296: -- Summary: Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task (was: Data loss silently in RocksDBStateBackend when more than one operator chained in a single task ) > Data loss silently in RocksDBStateBackend when more than one operator(has > states) chained in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-12296: -- Description: As the mail list said[1], there may be a problem when more than one operator chained in a single task, and all the operators have states, we'll encounter data loss silently problem. Currently, the local directory we used is like below ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), if more than one operator chained in a single task, and all the operators have states, then all the operators will share the same local directory(because the vertext_id is the same), this will lead a data loss problem. The path generation logic is below: {code:java} // LocalRecoveryDirectoryProviderImpl.java @Override public File subtaskSpecificCheckpointDirectory(long checkpointId) { return new File(subtaskBaseDirectory(checkpointId), checkpointDirString(checkpointId)); } @VisibleForTesting String subtaskDirString() { return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + subtaskIndex).toString(); } @VisibleForTesting String checkpointDirString(long checkpointId) { return "chk_" + checkpointId; } {code} [1] [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7] was: As the mail list said[1], there may be a problem when more than one operator chained in a single task, and all the operators have states, this will be data loss silently. [1] https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7 > Data loss silently in RocksDBStateBackend when more than one operator chained > in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, we'll encounter > data loss silently problem. > Currently, the local directory we used is like below > ../local_state_root_1/allocation_id/job_id/vertex_id_subtask_idx/chk_1/(state), > > if more than one operator chained in a single task, and all the operators > have states, then all the operators will share the same local > directory(because the vertext_id is the same), this will lead a data loss > problem. > > The path generation logic is below: > {code:java} > // LocalRecoveryDirectoryProviderImpl.java > @Override > public File subtaskSpecificCheckpointDirectory(long checkpointId) { >return new File(subtaskBaseDirectory(checkpointId), > checkpointDirString(checkpointId)); > } > @VisibleForTesting > String subtaskDirString() { >return Paths.get("jid_" + jobID, "vtx_" + jobVertexID + "_sti_" + > subtaskIndex).toString(); > } > @VisibleForTesting > String checkpointDirString(long checkpointId) { >return "chk_" + checkpointId; > } > {code} > [1] > [https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator chained in a single task
[ https://issues.apache.org/jira/browse/FLINK-12296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-12296: -- Issue Type: Bug (was: Test) > Data loss silently in RocksDBStateBackend when more than one operator chained > in a single task > --- > > Key: FLINK-12296 > URL: https://issues.apache.org/jira/browse/FLINK-12296 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > As the mail list said[1], there may be a problem when more than one operator > chained in a single task, and all the operators have states, this will be > data loss silently. > > [1] > https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12296) Data loss silently in RocksDBStateBackend when more than one operator chained in a single task
Congxian Qiu(klion26) created FLINK-12296: - Summary: Data loss silently in RocksDBStateBackend when more than one operator chained in a single task Key: FLINK-12296 URL: https://issues.apache.org/jira/browse/FLINK-12296 Project: Flink Issue Type: Test Components: Runtime / State Backends Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) As the mail list said[1], there may be a problem when more than one operator chained in a single task, and all the operators have states, this will be data loss silently. [1] https://app.smartmailcloud.com/web-share/MDkE4DArUT2eoSv86xq772I1HDgMNTVhLEmsnbQ7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11633) Translate "Working with State" into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-11633: - Assignee: Congxian Qiu(klion26) > Translate "Working with State" into Chinese > --- > > Key: FLINK-11633 > URL: https://issues.apache.org/jira/browse/FLINK-11633 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > Doc locates in flink/doc/dev/state/state.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-12281) SortDistinctAggregateITCase>DistinctAggregateITCaseBase.testSomeColumnsBothInDistinctAggAndGroupBy Failed
Congxian Qiu(klion26) created FLINK-12281: - Summary: SortDistinctAggregateITCase>DistinctAggregateITCaseBase.testSomeColumnsBothInDistinctAggAndGroupBy Failed Key: FLINK-12281 URL: https://issues.apache.org/jira/browse/FLINK-12281 Project: Flink Issue Type: Test Components: Tests Reporter: Congxian Qiu(klion26) 01:07:26.060 [INFO] Running org.apache.flink.table.runtime.batch.sql.agg.SortDistinctAggregateITCase 01:08:38.157 [ERROR] Tests run: 23, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 72.093 s <<< FAILURE! - in org.apache.flink.table.runtime.batch.sql.agg.SortDistinctAggregateITCase 01:08:38.157 [ERROR] testSomeColumnsBothInDistinctAggAndGroupBy(org.apache.flink.table.runtime.batch.sql.agg.SortDistinctAggregateITCase) Time elapsed: 5.972 s <<< ERROR! org.apache.flink.runtime.client.JobExecutionException: Job execution failed. Caused by: java.lang.RuntimeException: org.apache.flink.runtime.memory.MemoryAllocationException: Could not allocate 64 pages. Only 60 pages are remaining. Caused by: org.apache.flink.runtime.memory.MemoryAllocationException: Could not allocate 64 pages. Only 60 pages are remaining. [https://travis-ci.org/apache/flink/jobs/522602981] [https://travis-ci.org/apache/flink/jobs/522603433] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11635) Translate "Checkpointing" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-11635: - Assignee: (was: Congxian Qiu(klion26)) > Translate "Checkpointing" page into Chinese > --- > > Key: FLINK-11635 > URL: https://issues.apache.org/jira/browse/FLINK-11635 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Priority: Major > > doc locates in flink/docs/dev/stream/state/checkpointing.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11636) Translate "State Schema Evolution" into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-11636: - Assignee: (was: Congxian Qiu(klion26)) > Translate "State Schema Evolution" into Chinese > --- > > Key: FLINK-11636 > URL: https://issues.apache.org/jira/browse/FLINK-11636 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Priority: Major > > doc locates in flink/docs/dev/stream/state/schema_evolution.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11638) Translate "Savepoints" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-11638: - Assignee: (was: Congxian Qiu(klion26)) > Translate "Savepoints" page into Chinese > > > Key: FLINK-11638 > URL: https://issues.apache.org/jira/browse/FLINK-11638 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Priority: Major > > doc locates in flink/docs/ops/state/savepoints.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11634) Translate "State Backends" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-11634: - Assignee: (was: Congxian Qiu(klion26)) > Translate "State Backends" page into Chinese > > > Key: FLINK-11634 > URL: https://issues.apache.org/jira/browse/FLINK-11634 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Priority: Major > > doc locates in flink/docs/dev/stream/state/state_backens.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11637) Translate "Checkpoints" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-11637: - Assignee: (was: Congxian Qiu(klion26)) > Translate "Checkpoints" page into Chinese > - > > Key: FLINK-11637 > URL: https://issues.apache.org/jira/browse/FLINK-11637 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Priority: Major > > doc locates in flink/docs/ops/state/checkpoints.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11633) Translate "Working with State" into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-11633: - Assignee: (was: Congxian Qiu(klion26)) > Translate "Working with State" into Chinese > --- > > Key: FLINK-11633 > URL: https://issues.apache.org/jira/browse/FLINK-11633 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Documentation >Reporter: Congxian Qiu(klion26) >Priority: Major > > Doc locates in flink/doc/dev/state/state.md -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11561) Translate "Flink Architecture" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822475#comment-16822475 ] Congxian Qiu(klion26) commented on FLINK-11561: --- [~jark] thanks for reminder, I do not have more time to thanslate this issue, I've released the ticket. [~linjie] If you still have time, please feel free to take it. > Translate "Flink Architecture" page into Chinese > > > Key: FLINK-11561 > URL: https://issues.apache.org/jira/browse/FLINK-11561 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Project Website >Reporter: Jark Wu >Priority: Major > > Translate "Flink Architecture" page into Chinese. > The markdown file is located in: flink-web/flink-architecture.zh.md > The url link is: https://flink.apache.org/zh/flink-architecture.html > Please adjust the links in the page to Chinese pages when translating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11561) Translate "Flink Architecture" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-11561: - Assignee: (was: Congxian Qiu(klion26)) > Translate "Flink Architecture" page into Chinese > > > Key: FLINK-11561 > URL: https://issues.apache.org/jira/browse/FLINK-11561 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation, Project Website >Reporter: Jark Wu >Priority: Major > > Translate "Flink Architecture" page into Chinese. > The markdown file is located in: flink-web/flink-architecture.zh.md > The url link is: https://flink.apache.org/zh/flink-architecture.html > Please adjust the links in the page to Chinese pages when translating. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11948) When kafka sink parallelism
[ https://issues.apache.org/jira/browse/FLINK-11948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794775#comment-16794775 ] Congxian Qiu(klion26) commented on FLINK-11948: --- As the doc said, {noformat} To avoid such an unbalanced partitioning, use a round-robin kafka partitioner (note that this will * cause a lot of network connections between all the Flink instances and all the Kafka brokers).{noformat} Maybe we could have a round-robin partitioner here? > When kafka sink parallelism unbalance > - > > Key: FLINK-11948 > URL: https://issues.apache.org/jira/browse/FLINK-11948 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka >Affects Versions: 1.6.3, 1.6.4, 1.7.2 >Reporter: qi quan >Priority: Major > > The default FlinkFixedPartitioner return int[] partitions by subtaskid % > partitions.length.When kafka sink parallelism first few kafka partitions will write data. > I think it needs to be improved here. > {code:java} > @Override > public int partition(T record, byte[] key, byte[] value, String > targetTopic, int[] partitions) { > Preconditions.checkArgument( > partitions != null && partitions.length > 0, > "Partitions of the target topic is empty."); > return partitions[parallelInstanceId % partitions.length]; > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11939) Adjust components that need be modified for support FSCSOS
[ https://issues.apache.org/jira/browse/FLINK-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-11939: -- Summary: Adjust components that need be modified for support FSCSOS (was: Adjust components that need be modified for support FS) > Adjust components that need be modified for support FSCSOS > -- > > Key: FLINK-11939 > URL: https://issues.apache.org/jira/browse/FLINK-11939 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > In this sub-task, we'll adjust all the components that need to be modified, > such as {{SharedStateRegistry, }}{{PlaceholderStreamStateHandle}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11939) Adjust components that need be modified to support FSCSOS
[ https://issues.apache.org/jira/browse/FLINK-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-11939: -- Summary: Adjust components that need be modified to support FSCSOS (was: Adjust components that need be modified for support FSCSOS) > Adjust components that need be modified to support FSCSOS > - > > Key: FLINK-11939 > URL: https://issues.apache.org/jira/browse/FLINK-11939 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > In this sub-task, we'll adjust all the components that need to be modified, > such as {{SharedStateRegistry, }}{{PlaceholderStreamStateHandle}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11938) Introduce all the new components to support FSCSOS
[ https://issues.apache.org/jira/browse/FLINK-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-11938: -- Summary: Introduce all the new components to support FSCSOS (was: Introduce all the new components) > Introduce all the new components to support FSCSOS > -- > > Key: FLINK-11938 > URL: https://issues.apache.org/jira/browse/FLINK-11938 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > In this sub-task, we'll introduce all the new components, such as > {{FileSegmentCheckpointStreamFactory}}, > {{FileSegmentCheckpointStateOutputStream}}, > {{FileSegmentCheckpointLocationStorage}}, {{FileSegmentStateHandle}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11939) Adjust components that need be modified for support FS
[ https://issues.apache.org/jira/browse/FLINK-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-11939: -- Summary: Adjust components that need be modified for support FS (was: Adjust components that need be modified) > Adjust components that need be modified for support FS > -- > > Key: FLINK-11939 > URL: https://issues.apache.org/jira/browse/FLINK-11939 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Checkpointing >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > In this sub-task, we'll adjust all the components that need to be modified, > such as {{SharedStateRegistry, }}{{PlaceholderStreamStateHandle}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11941) Reuse single FileSegmentCheckpointStateOutputStream for multiple checkpoints
Congxian Qiu(klion26) created FLINK-11941: - Summary: Reuse single FileSegmentCheckpointStateOutputStream for multiple checkpoints Key: FLINK-11941 URL: https://issues.apache.org/jira/browse/FLINK-11941 Project: Flink Issue Type: Sub-task Components: Runtime / Checkpointing Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) After the previous first sub-tasks have been resolved, we'll solve the small file problems by writing all states of one checkpoint into a single physical file. In this sub-task, we'll try to share the underlying physical file across multiple checkpoints. This is configurable and disabled for default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11940) Reduce storage amplification for FSCSOS
Congxian Qiu(klion26) created FLINK-11940: - Summary: Reduce storage amplification for FSCSOS Key: FLINK-11940 URL: https://issues.apache.org/jira/browse/FLINK-11940 Project: Flink Issue Type: Sub-task Components: Runtime / Checkpointing Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) In this sub-task, we'll solve the storage amplification problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11939) Adjust components that need be modified
Congxian Qiu(klion26) created FLINK-11939: - Summary: Adjust components that need be modified Key: FLINK-11939 URL: https://issues.apache.org/jira/browse/FLINK-11939 Project: Flink Issue Type: Sub-task Components: Runtime / Checkpointing Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) In this sub-task, we'll adjust all the components that need to be modified, such as {{SharedStateRegistry, }}{{PlaceholderStreamStateHandle}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11938) Introduce all the new components
Congxian Qiu(klion26) created FLINK-11938: - Summary: Introduce all the new components Key: FLINK-11938 URL: https://issues.apache.org/jira/browse/FLINK-11938 Project: Flink Issue Type: Sub-task Components: Runtime / Checkpointing Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) In this sub-task, we'll introduce all the new components, such as {{FileSegmentCheckpointStreamFactory}}, {{FileSegmentCheckpointStateOutputStream}}, {{FileSegmentCheckpointLocationStorage}}, {{FileSegmentStateHandle}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11937) Resolve small file problem in RocksDB incremental checkpoint
[ https://issues.apache.org/jira/browse/FLINK-11937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-11937: -- Description: Currently when incremental checkpoint is enabled in RocksDBStateBackend a separate file will be generated on DFS for each sst file. This may cause “file flood” when running intensive workload (many jobs with high parallelism) in big cluster. According to our observation in Alibaba production, such file flood introduces at lease two drawbacks when using HDFS as the checkpoint storage FileSystem: 1) huge number of RPC request issued to NN which may burst its response queue; 2) huge number of files causes big pressure on NN’s on-heap memory. In Flink we ever noticed similar small file flood problem and tried to resolved it by introducing ByteStreamStateHandle(FLINK-2808), but this solution has its limitation that if we configure the threshold too low there will still be too many small files, while if too high the JM will finally OOM, thus could hardly resolve the issue in case of using RocksDBStateBackend with incremental snapshot strategy. We propose a new OutputStream called FileSegmentCheckpointStateOutputStream(FSCSOS) to fix the problem. FSCSOS will reuse the same underlying distributed file until its size exceeds a preset threshold. We plan to complete the work in 3 steps: firstly introduce FSCSOS, secondly resolve the specific storage amplification issue on FSCSOS, and lastly add an option to reuse FSCSOS across multiple checkpoints to further reduce the DFS file number. More details please refer to the attached design doc. was: Currently when incremental checkpoint is enabled in RocksDBStateBackend a separate file will be generated on DFS for each sst file. This may cause “file flood” when running intensive workload (many jobs with high parallelism) in big cluster. According to our observation in Alibaba production, such file flood introduces at lease two drawbacks when using HDFS as the checkpoint storage FileSystem: 1) huge number of RPC request issued to NN which may burst its response queue; 2) huge number of files causes big pressure on NN’s on-heap memory. In Flink we ever noticed similar small file flood problem and tried to resolved it by introducing ByteStreamStateHandle(FLINK-2818), but this solution has its limitation that if we configure the threshold too low there will still be too many small files, while if too high the JM will finally OOM, thus could hardly resolve the issue in case of using RocksDBStateBackend with incremental snapshot strategy. We propose a new OutputStream called FileSegmentCheckpointStateOutputStream(FSCSOS) to fix the problem. FSCSOS will reuse the same underlying distributed file until its size exceeds a preset threshold. We plan to complete the work in 3 steps: firstly introduce FSCSOS, secondly resolve the specific storage amplification issue on FSCSOS, and lastly add an option to reuse FSCSOS across multiple checkpoints to further reduce the DFS file number. More details please refer to the attached design doc. [1] [https://www.slideshare.net/dataArtisans/stephan-ewen-experiences-running-flink-at-very-large-scale] > Resolve small file problem in RocksDB incremental checkpoint > > > Key: FLINK-11937 > URL: https://issues.apache.org/jira/browse/FLINK-11937 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing >Affects Versions: 1.7.2 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > Currently when incremental checkpoint is enabled in RocksDBStateBackend a > separate file will be generated on DFS for each sst file. This may cause > “file flood” when running intensive workload (many jobs with high > parallelism) in big cluster. According to our observation in Alibaba > production, such file flood introduces at lease two drawbacks when using HDFS > as the checkpoint storage FileSystem: 1) huge number of RPC request issued to > NN which may burst its response queue; 2) huge number of files causes big > pressure on NN’s on-heap memory. > In Flink we ever noticed similar small file flood problem and tried to > resolved it by introducing ByteStreamStateHandle(FLINK-2808), but this > solution has its limitation that if we configure the threshold too low there > will still be too many small files, while if too high the JM will finally > OOM, thus could hardly resolve the issue in case of using RocksDBStateBackend > with incremental snapshot strategy. > We propose a new OutputStream called > FileSegmentCheckpointStateOutputStream(FSCSOS) to fix the problem. FSCSOS > will reuse the same underlying distributed file until its size exceeds a > preset
[jira] [Created] (FLINK-11937) Resolve small file problem in RocksDB incremental checkpoint
Congxian Qiu(klion26) created FLINK-11937: - Summary: Resolve small file problem in RocksDB incremental checkpoint Key: FLINK-11937 URL: https://issues.apache.org/jira/browse/FLINK-11937 Project: Flink Issue Type: New Feature Components: Runtime / Checkpointing Affects Versions: 1.7.2 Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) Currently when incremental checkpoint is enabled in RocksDBStateBackend a separate file will be generated on DFS for each sst file. This may cause “file flood” when running intensive workload (many jobs with high parallelism) in big cluster. According to our observation in Alibaba production, such file flood introduces at lease two drawbacks when using HDFS as the checkpoint storage FileSystem: 1) huge number of RPC request issued to NN which may burst its response queue; 2) huge number of files causes big pressure on NN’s on-heap memory. In Flink we ever noticed similar small file flood problem and tried to resolved it by introducing ByteStreamStateHandle(FLINK-2818), but this solution has its limitation that if we configure the threshold too low there will still be too many small files, while if too high the JM will finally OOM, thus could hardly resolve the issue in case of using RocksDBStateBackend with incremental snapshot strategy. We propose a new OutputStream called FileSegmentCheckpointStateOutputStream(FSCSOS) to fix the problem. FSCSOS will reuse the same underlying distributed file until its size exceeds a preset threshold. We plan to complete the work in 3 steps: firstly introduce FSCSOS, secondly resolve the specific storage amplification issue on FSCSOS, and lastly add an option to reuse FSCSOS across multiple checkpoints to further reduce the DFS file number. More details please refer to the attached design doc. [1] [https://www.slideshare.net/dataArtisans/stephan-ewen-experiences-running-flink-at-very-large-scale] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11904) Improve MemoryStateBackendTest by using JUnit's Parameterized
Congxian Qiu(klion26) created FLINK-11904: - Summary: Improve MemoryStateBackendTest by using JUnit's Parameterized Key: FLINK-11904 URL: https://issues.apache.org/jira/browse/FLINK-11904 Project: Flink Issue Type: Test Components: Tests Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) Currently, there are two classes {{MemoryStateBackendTest}} and {{AsyncMemoryStateBackendTest}}, the only difference is whether using async mode. We can improve this by using JUnit's Parameterize -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11903) Improve FileStateBackendTest by using JUnit's Parameterized
Congxian Qiu(klion26) created FLINK-11903: - Summary: Improve FileStateBackendTest by using JUnit's Parameterized Key: FLINK-11903 URL: https://issues.apache.org/jira/browse/FLINK-11903 Project: Flink Issue Type: Test Components: Tests Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) Currently, there is a test base class called {{FileStateBackendTest}}, and a subclass {{AsyncFileStateBackendTest}}. the only difference is whether to use async mode. We can improve the test code by using JUnit's Parameterized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11850) ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis
[ https://issues.apache.org/jira/browse/FLINK-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-11850: - Assignee: Congxian Qiu(klion26) (was: Till Rohrmann) > ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis > > > Key: FLINK-11850 > URL: https://issues.apache.org/jira/browse/FLINK-11850 > Project: Flink > Issue Type: Test > Components: Runtime / Coordination, Tests >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Critical > Labels: test-stability > > org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest > 08:20:01.694 [ERROR] > testSimpleCloseAndCleanupAllData(org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest) > Time elapsed: 0.076 s <<< ERROR! > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /foo/bar/flink/default/leaderlatch/resource_manager_lock/_c_477d0124-92f3-4069-98aa-a71b8243250c-latch-00 > at > org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.runCleanupTest(ZooKeeperHaServicesTest.java:203) > at > org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.testSimpleCloseAndCleanupAllData(ZooKeeperHaServicesTest.java:128) > > Travis links: https://travis-ci.org/apache/flink/jobs/502960186 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11850) ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis
[ https://issues.apache.org/jira/browse/FLINK-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786680#comment-16786680 ] Congxian Qiu(klion26) commented on FLINK-11850: --- Sorry for the reassign, I just hit the wrong hotkey. > ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis > > > Key: FLINK-11850 > URL: https://issues.apache.org/jira/browse/FLINK-11850 > Project: Flink > Issue Type: Test > Components: Runtime / Coordination, Tests >Reporter: Congxian Qiu(klion26) >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > > org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest > 08:20:01.694 [ERROR] > testSimpleCloseAndCleanupAllData(org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest) > Time elapsed: 0.076 s <<< ERROR! > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /foo/bar/flink/default/leaderlatch/resource_manager_lock/_c_477d0124-92f3-4069-98aa-a71b8243250c-latch-00 > at > org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.runCleanupTest(ZooKeeperHaServicesTest.java:203) > at > org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.testSimpleCloseAndCleanupAllData(ZooKeeperHaServicesTest.java:128) > > Travis links: https://travis-ci.org/apache/flink/jobs/502960186 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-11850) ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis
[ https://issues.apache.org/jira/browse/FLINK-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-11850: - Assignee: Till Rohrmann (was: Congxian Qiu(klion26)) > ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis > > > Key: FLINK-11850 > URL: https://issues.apache.org/jira/browse/FLINK-11850 > Project: Flink > Issue Type: Test > Components: Runtime / Coordination, Tests >Reporter: Congxian Qiu(klion26) >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > > org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest > 08:20:01.694 [ERROR] > testSimpleCloseAndCleanupAllData(org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest) > Time elapsed: 0.076 s <<< ERROR! > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for > /foo/bar/flink/default/leaderlatch/resource_manager_lock/_c_477d0124-92f3-4069-98aa-a71b8243250c-latch-00 > at > org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.runCleanupTest(ZooKeeperHaServicesTest.java:203) > at > org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.testSimpleCloseAndCleanupAllData(ZooKeeperHaServicesTest.java:128) > > Travis links: https://travis-ci.org/apache/flink/jobs/502960186 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11850) ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis
Congxian Qiu(klion26) created FLINK-11850: - Summary: ZooKeeperHaServicesTest#testSimpleCloseAndCleanupAllData fails on Travis Key: FLINK-11850 URL: https://issues.apache.org/jira/browse/FLINK-11850 Project: Flink Issue Type: Test Components: Tests Reporter: Congxian Qiu(klion26) org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest 08:20:01.694 [ERROR] testSimpleCloseAndCleanupAllData(org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest) Time elapsed: 0.076 s <<< ERROR! org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /foo/bar/flink/default/leaderlatch/resource_manager_lock/_c_477d0124-92f3-4069-98aa-a71b8243250c-latch-00 at org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.runCleanupTest(ZooKeeperHaServicesTest.java:203) at org.apache.flink.runtime.highavailability.zookeeper.ZooKeeperHaServicesTest.testSimpleCloseAndCleanupAllData(ZooKeeperHaServicesTest.java:128) Travis links: https://travis-ci.org/apache/flink/jobs/502960186 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11825) Resolve name clash of StateTTL TimeCharacteristic class
[ https://issues.apache.org/jira/browse/FLINK-11825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785553#comment-16785553 ] Congxian Qiu(klion26) commented on FLINK-11825: --- Thank you [~azagrebin], I'll give a patch soon as you suggested. > Resolve name clash of StateTTL TimeCharacteristic class > --- > > Key: FLINK-11825 > URL: https://issues.apache.org/jira/browse/FLINK-11825 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.7.2 >Reporter: Fabian Hueske >Assignee: Congxian Qiu(klion26) >Priority: Major > > The StateTTL feature introduced the class > \{{org.apache.flink.api.common.state.TimeCharacteristic}} which clashes with > \{{org.apache.flink.streaming.api.TimeCharacteristic}}. > This is a problem for two reasons: > 1. Users get confused because the mistakenly import > \{{org.apache.flink.api.common.state.TimeCharacteristic}}. > 2. When using the StateTTL feature, users need to spell out the package name > for \{{org.apache.flink.api.common.state.TimeCharacteristic}} because the > other class is most likely already imported. > Since \{{org.apache.flink.streaming.api.TimeCharacteristic}} is one of the > most used classes of the DataStream API, we should make sure that users can > use it without import problems. > These error are hard to spot and confusing for many users. > I see two ways to resolve the issue: > 1. drop \{{org.apache.flink.api.common.state.TimeCharacteristic}} and use > \{{org.apache.flink.streaming.api.TimeCharacteristic}} throwing an exception > if an incorrect characteristic is used. > 2. rename the class \{{org.apache.flink.api.common.state.TimeCharacteristic}} > to some other name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11825) Resolve name clash of StateTTL TimeCharacteristic class
[ https://issues.apache.org/jira/browse/FLINK-11825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785282#comment-16785282 ] Congxian Qiu(klion26) commented on FLINK-11825: --- [~fhueske] If you do not mind, I could help to implement this. I think the first solution is better because we can maintain TimeCharacteristic in one place and we may support the other TimeCharacteristics for TTL state in the future. > Resolve name clash of StateTTL TimeCharacteristic class > --- > > Key: FLINK-11825 > URL: https://issues.apache.org/jira/browse/FLINK-11825 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.7.2 >Reporter: Fabian Hueske >Priority: Major > > The StateTTL feature introduced the class > \{{org.apache.flink.api.common.state.TimeCharacteristic}} which clashes with > \{{org.apache.flink.streaming.api.TimeCharacteristic}}. > This is a problem for two reasons: > 1. Users get confused because the mistakenly import > \{{org.apache.flink.api.common.state.TimeCharacteristic}}. > 2. When using the StateTTL feature, users need to spell out the package name > for \{{org.apache.flink.api.common.state.TimeCharacteristic}} because the > other class is most likely already imported. > Since \{{org.apache.flink.streaming.api.TimeCharacteristic}} is one of the > most used classes of the DataStream API, we should make sure that users can > use it without import problems. > These error are hard to spot and confusing for many users. > I see two ways to resolve the issue: > 1. drop \{{org.apache.flink.api.common.state.TimeCharacteristic}} and use > \{{org.apache.flink.streaming.api.TimeCharacteristic}} throwing an exception > if an incorrect characteristic is used. > 2. rename the class \{{org.apache.flink.api.common.state.TimeCharacteristic}} > to some other name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11797) Implement empty unit tests in CheckpointCoordinatorMasterHooksTest
[ https://issues.apache.org/jira/browse/FLINK-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-11797: -- Description: Current, there are some empty tests(no implementation) such as {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, }}{{testTypeIncompatibleWithSerializerOnStore and }}{{testTypeIncompatibleWithHookOnRestore in class }}{{CheckpointCoordinatorMasterHooksTest[1].}} If implement them make sense, I'll give the patch. [1] [https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393] was: Current, there are some empty tests(no implementation) such as {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, }}{{testTypeIncompatibleWithSerializerOnStore and }}{{testTypeIncompatibleWithHookOnRestore in class }}{{CheckpointCoordinatorMasterHooksTest[1].}} If implementation them make sense, I'll give the patch. [1] [https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393] > Implement empty unit tests in CheckpointCoordinatorMasterHooksTest > -- > > Key: FLINK-11797 > URL: https://issues.apache.org/jira/browse/FLINK-11797 > Project: Flink > Issue Type: Test > Components: Tests >Affects Versions: 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > Current, there are some empty tests(no implementation) such as > {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, > }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, > }}{{testTypeIncompatibleWithSerializerOnStore and > }}{{testTypeIncompatibleWithHookOnRestore in class > }}{{CheckpointCoordinatorMasterHooksTest[1].}} > If implement them make sense, I'll give the patch. > [1] > [https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11797) Implement empty unit tests in CheckpointCoordinatorMasterHooksTest
[ https://issues.apache.org/jira/browse/FLINK-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-11797: -- Summary: Implement empty unit tests in CheckpointCoordinatorMasterHooksTest (was: Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest) > Implement empty unit tests in CheckpointCoordinatorMasterHooksTest > -- > > Key: FLINK-11797 > URL: https://issues.apache.org/jira/browse/FLINK-11797 > Project: Flink > Issue Type: Test > Components: Tests >Affects Versions: 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > Current, there are some empty tests(no implementation) such as > {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, > }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, > }}{{testTypeIncompatibleWithSerializerOnStore and > }}{{testTypeIncompatibleWithHookOnRestore in class > }}{{CheckpointCoordinatorMasterHooksTest[1].}} > If implementation them make sense, I'll give the patch. > [1] > [https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11797) Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest
[ https://issues.apache.org/jira/browse/FLINK-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-11797: -- Description: Current, there are some empty tests(no implementation) such as {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, }}{{testTypeIncompatibleWithSerializerOnStore and }}{{testTypeIncompatibleWithHookOnRestore in class }}{{CheckpointCoordinatorMasterHooksTest[1].}} If implementation them make sense, I'll give the patch. [1] [https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393] was: Current, there are some empty tests(no implementation) such as {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, }}{{testTypeIncompatibleWithSerializerOnStore and }}{{testTypeIncompatibleWithHookOnRestore in class }}{{CheckpointCoordinatorMasterHooksTest[1].}}{{}} If implementation them make sense, I'll give the patch. [1] https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393 > Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest > --- > > Key: FLINK-11797 > URL: https://issues.apache.org/jira/browse/FLINK-11797 > Project: Flink > Issue Type: Test > Components: Tests >Affects Versions: 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > Current, there are some empty tests(no implementation) such as > {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, > }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, > }}{{testTypeIncompatibleWithSerializerOnStore and > }}{{testTypeIncompatibleWithHookOnRestore in class > }}{{CheckpointCoordinatorMasterHooksTest[1].}} > If implementation them make sense, I'll give the patch. > [1] > [https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-11797) Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest
[ https://issues.apache.org/jira/browse/FLINK-11797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) updated FLINK-11797: -- Affects Version/s: 1.8.0 > Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest > --- > > Key: FLINK-11797 > URL: https://issues.apache.org/jira/browse/FLINK-11797 > Project: Flink > Issue Type: Test > Components: Tests >Affects Versions: 1.8.0 >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > > Current, there are some empty tests(no implementation) such as > {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, > }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, > }}{{testTypeIncompatibleWithSerializerOnStore and > }}{{testTypeIncompatibleWithHookOnRestore in class > }}{{CheckpointCoordinatorMasterHooksTest[1].}}{{}} > If implementation them make sense, I'll give the patch. > [1] > https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11797) Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest
Congxian Qiu(klion26) created FLINK-11797: - Summary: Implementation empty unit tests in CheckpointCoordinatorMasterHooksTest Key: FLINK-11797 URL: https://issues.apache.org/jira/browse/FLINK-11797 Project: Flink Issue Type: Test Components: Tests Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) Current, there are some empty tests(no implementation) such as {{testSerializationFailsOnTrigger, }}{{testHookCallFailsOnTrigger, }}{{testDeserializationFailsOnRestore}}{{testHookCallFailsOnRestore, }}{{testTypeIncompatibleWithSerializerOnStore and }}{{testTypeIncompatibleWithHookOnRestore in class }}{{CheckpointCoordinatorMasterHooksTest[1].}}{{}} If implementation them make sense, I'll give the patch. [1] https://github.com/apache/flink/blob/e3248d844c728c714857c5d69520f08ddf4e4c85/flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorMasterHooksTest.java#L393 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-11774) IllegalArgumentException in HeapPriorityQueueSet
[ https://issues.apache.org/jira/browse/FLINK-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780426#comment-16780426 ] Congxian Qiu(klion26) edited comment on FLINK-11774 at 2/28/19 11:46 AM: - Running the given code on master branch also has the problem sometimes was (Author: klion26): Running the given code on master branch also has the problem, I'll start to investigate this. > IllegalArgumentException in HeapPriorityQueueSet > > > Key: FLINK-11774 > URL: https://issues.apache.org/jira/browse/FLINK-11774 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.7.2 > Environment: Can reproduce on the following configurations: > > OS: macOS 10.14.3 > Java: 1.8.0_202 > > OS: CentOS 7.2.1511 > Java: 1.8.0_102 >Reporter: Kirill Vainer >Priority: Major > Attachments: flink-bug-dist.zip, flink-bug-src.zip > > > Hi, > I encountered the following exception: > {code} > Exception in thread "main" > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:647) > at > org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1510) > at flink.bug.App.main(App.java:21) > Caused by: java.lang.IllegalArgumentException > at > org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:123) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.globalKeyGroupToLocalIndex(HeapPriorityQueueSet.java:158) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.getDedupMapForKeyGroup(HeapPriorityQueueSet.java:147) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.getDedupMapForElement(HeapPriorityQueueSet.java:154) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.add(HeapPriorityQueueSet.java:121) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.add(HeapPriorityQueueSet.java:49) > at > org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.registerProcessingTimeTimer(InternalTimerServiceImpl.java:197) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator$Context.registerProcessingTimeTimer(WindowOperator.java:876) > at > org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger.onElement(ProcessingTimeTrigger.java:36) > at > org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger.onElement(ProcessingTimeTrigger.java:28) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator$Context.onElement(WindowOperator.java:895) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:396) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) > at java.lang.Thread.run(Thread.java:748) > {code} > Code that reproduces the problem: > {code:java} > package flink.bug; > import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; > import org.apache.flink.streaming.api.functions.sink.SinkFunction; > import org.apache.flink.streaming.api.windowing.time.Time; > public class App { > public static void main(String[] args) throws Exception { > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.setParallelism(2); > env.fromElements(1, 2) > .map(Aggregate::new) > .keyBy(Aggregate::getKey) > .timeWindow(Time.seconds(2)) > .reduce(Aggregate::reduce) > .addSink(new CollectSink()); > env.execute(); > } > private static class Aggregate { > private Key key = new Key(); > public Aggregate(long number) { > } > public static Aggregate reduce(Aggregate a, Aggregate b) { > return new Aggregate(0); > } > public Key getKey() { > return key; > } > } > public static class Key { > } > private static class CollectSink implements SinkFunction {
[jira] [Commented] (FLINK-11774) IllegalArgumentException in HeapPriorityQueueSet
[ https://issues.apache.org/jira/browse/FLINK-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780426#comment-16780426 ] Congxian Qiu(klion26) commented on FLINK-11774: --- Running the given code on master branch also has the problem, I'll start to investigate this. > IllegalArgumentException in HeapPriorityQueueSet > > > Key: FLINK-11774 > URL: https://issues.apache.org/jira/browse/FLINK-11774 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Affects Versions: 1.7.2 > Environment: Can reproduce on the following configurations: > > OS: macOS 10.14.3 > Java: 1.8.0_202 > > OS: CentOS 7.2.1511 > Java: 1.8.0_102 >Reporter: Kirill Vainer >Priority: Major > Attachments: flink-bug-dist.zip, flink-bug-src.zip > > > Hi, > I encountered the following exception: > {code} > Exception in thread "main" > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:647) > at > org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:123) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1510) > at flink.bug.App.main(App.java:21) > Caused by: java.lang.IllegalArgumentException > at > org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:123) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.globalKeyGroupToLocalIndex(HeapPriorityQueueSet.java:158) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.getDedupMapForKeyGroup(HeapPriorityQueueSet.java:147) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.getDedupMapForElement(HeapPriorityQueueSet.java:154) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.add(HeapPriorityQueueSet.java:121) > at > org.apache.flink.runtime.state.heap.HeapPriorityQueueSet.add(HeapPriorityQueueSet.java:49) > at > org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.registerProcessingTimeTimer(InternalTimerServiceImpl.java:197) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator$Context.registerProcessingTimeTimer(WindowOperator.java:876) > at > org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger.onElement(ProcessingTimeTrigger.java:36) > at > org.apache.flink.streaming.api.windowing.triggers.ProcessingTimeTrigger.onElement(ProcessingTimeTrigger.java:28) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator$Context.onElement(WindowOperator.java:895) > at > org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:396) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) > at java.lang.Thread.run(Thread.java:748) > {code} > Code that reproduces the problem: > {code:java} > package flink.bug; > import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; > import org.apache.flink.streaming.api.functions.sink.SinkFunction; > import org.apache.flink.streaming.api.windowing.time.Time; > public class App { > public static void main(String[] args) throws Exception { > StreamExecutionEnvironment env = > StreamExecutionEnvironment.getExecutionEnvironment(); > env.setParallelism(2); > env.fromElements(1, 2) > .map(Aggregate::new) > .keyBy(Aggregate::getKey) > .timeWindow(Time.seconds(2)) > .reduce(Aggregate::reduce) > .addSink(new CollectSink()); > env.execute(); > } > private static class Aggregate { > private Key key = new Key(); > public Aggregate(long number) { > } > public static Aggregate reduce(Aggregate a, Aggregate b) { > return new Aggregate(0); > } > public Key getKey() { > return key; > } > } > public static class Key { > } > private static class CollectSink implements SinkFunction { > private static final long serialVersionUID = 1; > @SuppressWarnings("rawtypes") > @Override > public void
[jira] [Assigned] (FLINK-11141) Key generation for RocksDBMapState can theoretically be ambiguous
[ https://issues.apache.org/jira/browse/FLINK-11141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) reassigned FLINK-11141: - Assignee: (was: Congxian Qiu(klion26)) > Key generation for RocksDBMapState can theoretically be ambiguous > - > > Key: FLINK-11141 > URL: https://issues.apache.org/jira/browse/FLINK-11141 > Project: Flink > Issue Type: Bug > Components: Runtime / State Backends >Affects Versions: 1.5.5, 1.6.2, 1.7.0 >Reporter: Stefan Richter >Priority: Critical > > RocksDBMap state stores values in RocksDB under a composite key from the > serialized bytes of {{key-group-id|key|namespace|user-key}}. In this > composition, key, namespace, and user-key can either have fixed sized or > variable sized serialization formats. In cases of at least 2 variable > formats, ambiguity can be possible, e.g.: > abcd <-> efg > abc <-> defg > Our code takes care of this for all other states, where composite keys only > consist of key and namespace by checking for 2x variable size and appending > the serialized length to each byte sequence. > However, for map state there is no inclusion of the user-key in the check for > potential ambiguity, as well as for appending the size. This means that, in > theory, some combinations can produce colliding composite keys in RocksDB. > What is required is to include the user-key serializer in the check and > append the length there as well. > Please notice that this cannot be simply changed because it has implications > for backwards compatibility and requires some form of migration for the state > keys on restore. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (FLINK-11483) Improve StreamOperatorSnapshotRestoreTest with Parameterized
[ https://issues.apache.org/jira/browse/FLINK-11483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Congxian Qiu(klion26) resolved FLINK-11483. --- Resolution: Fixed > Improve StreamOperatorSnapshotRestoreTest with Parameterized > > > Key: FLINK-11483 > URL: https://issues.apache.org/jira/browse/FLINK-11483 > Project: Flink > Issue Type: Test > Components: State Backends, Checkpointing, Tests >Reporter: Congxian Qiu(klion26) >Assignee: Congxian Qiu(klion26) >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > In current implementation, we will test {{StreamOperatorSnapshot}} with three > statebackend: {{File}}, {{RocksDB_FULL}}, {{RocksDB_Incremental}}, each in a > sperate class, we could improve this with Parameterized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11738) Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher15e85f5d-a55d-4773-8197-f0db5658f55b#1335897563]] after [10000 ms]. Se
[ https://issues.apache.org/jira/browse/FLINK-11738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776596#comment-16776596 ] Congxian Qiu(klion26) commented on FLINK-11738: --- Hi, [~thinktothings] , I think 11690 has solved the problem. you can get the latest code from master branch. If the problem has been resolved, could you please close this issue. > Caused by: akka.pattern.AskTimeoutException: Ask timed out on > [Actor[akka://flink/user/dispatcher15e85f5d-a55d-4773-8197-f0db5658f55b#1335897563]] > after [1 ms]. Sender[null] sent > -- > > Key: FLINK-11738 > URL: https://issues.apache.org/jira/browse/FLINK-11738 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.7.2 > Environment: flink 1.7.2 client > !image-2019-02-25-10-57-20-106.png! > > !image-2019-02-25-10-57-32-876.png! > > > !image-2019-02-25-10-57-39-753.png! > > > > > >Reporter: thinktothings >Priority: Major > Attachments: image-2019-02-25-13-11-13-723.png > > > Akka.ask.timeout 10 seconds, this miniCluster environment is written dead, > can not be changed? > - > org.apache.flink.runtime.minicluster.MiniCluster > /** > * Creates a new Flink mini cluster based on the given configuration. > * > * @param miniClusterConfiguration The configuration for the mini cluster > */ > public MiniCluster(MiniClusterConfiguration miniClusterConfiguration) \{ > this.miniClusterConfiguration = checkNotNull(miniClusterConfiguration, > "config may not be null"); this.rpcTimeout = Time.seconds(10L); > this.terminationFuture = CompletableFuture.completedFuture(null); running = > false; } > - > !image-2019-02-25-13-11-13-723.png! > > > - > > package com.opensourceteams.module.bigdata.flink.example.stream.worldcount.nc > import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment > import org.apache.flink.streaming.api.windowing.time.Time > /** > * nc -lk 1234 输入数据 > */ > object SocketWindowWordCount { > def main(args: Array[String]): Unit = { > val port = 1234 > // get the execution environment > val env: StreamExecutionEnvironment = > StreamExecutionEnvironment.getExecutionEnvironment > // get input data by connecting to the socket > val dataStream = env.socketTextStream("localhost", port, '\n') > import org.apache.flink.streaming.api.scala._ > val textResult = dataStream.flatMap( w => w.split(" > s") ).map( w => WordWithCount(w,1)) > .keyBy("word") > /** > * 每5秒刷新一次,相当于重新开始计数, > * 好处,不需要一直拿所有的数据统计 > * 只需要在指定时间间隔内的增量数据,减少了数据规模 > */ > .timeWindow(Time.seconds(5)) > .sum("count" ) > textResult.print().setParallelism(1) > if(args == null || args.size ==0) > { env.execute("默认作业") } > else > { env.execute(args(0)) } > println("结束") > } > // Data type for words with count > case class WordWithCount(word: String, count: Long) > } > > - -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10481) Wordcount end-to-end test in docker env unstable
[ https://issues.apache.org/jira/browse/FLINK-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773755#comment-16773755 ] Congxian Qiu(klion26) commented on FLINK-10481: --- another instance: https://api.travis-ci.org/v3/job/496339787/log.txt > Wordcount end-to-end test in docker env unstable > > > Key: FLINK-10481 > URL: https://issues.apache.org/jira/browse/FLINK-10481 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.7.0 >Reporter: Till Rohrmann >Assignee: Dawid Wysakowicz >Priority: Critical > Labels: pull-request-available, test-stability > Fix For: 1.6.3, 1.7.0 > > > The {{Wordcount end-to-end test in docker env}} fails sometimes on Travis > with the following problem: > {code} > Status: Downloaded newer image for java:8-jre-alpine > ---> fdc893b19a14 > Step 2/16 : RUN apk add --no-cache bash snappy > ---> [Warning] IPv4 forwarding is disabled. Networking will not work. > ---> Running in 4329ebcd8a77 > fetch http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/main/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > fetch > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz > WARNING: Ignoring > http://dl-cdn.alpinelinux.org/alpine/v3.4/community/x86_64/APKINDEX.tar.gz: > temporary error (try again later) > ERROR: unsatisfiable constraints: > bash (missing): > required by: world[bash] > snappy (missing): > required by: world[snappy] > The command '/bin/sh -c apk add --no-cache bash snappy' returned a non-zero > code: 2 > {code} > https://api.travis-ci.org/v3/job/434909395/log.txt > It seems as if it is related to > https://github.com/gliderlabs/docker-alpine/issues/264 and > https://github.com/gliderlabs/docker-alpine/issues/279. > We might want to switch to a different base image to avoid these problems in > the future. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11704) Improve AbstractCheckpointStateOutputStreamTestBase with Parameterized
Congxian Qiu(klion26) created FLINK-11704: - Summary: Improve AbstractCheckpointStateOutputStreamTestBase with Parameterized Key: FLINK-11704 URL: https://issues.apache.org/jira/browse/FLINK-11704 Project: Flink Issue Type: Improvement Components: Tests Reporter: Congxian Qiu(klion26) Assignee: Congxian Qiu(klion26) In the current version, there locates an {{AbstractCheckpointStateOutputStreamTestBase}} and two implementation class {{FileBasedStateOutputStreamTest}} and {{FsCheckpointMetadataOutputStreamTest}}, the differences are return the different {{FSDataOutputStream}} and the specified {{FileStateHandle. we can improve this by using Junit's Parameterized.}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)