[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412814#comment-16412814 ] Steven Zhen Wu commented on FLINK-9061: --- [~jgrier] Yes, we want to contribute this back. We can probably partner with you to get the change upstream. > S3 checkpoint data not partitioned well -- causes errors and poor performance > - > > Key: FLINK-9061 > URL: https://issues.apache.org/jira/browse/FLINK-9061 > Project: Flink > Issue Type: Bug > Components: FileSystem, State Backends, Checkpointing >Affects Versions: 1.4.2 >Reporter: Jamie Grier >Priority: Critical > > I think we need to modify the way we write checkpoints to S3 for high-scale > jobs (those with many total tasks). The issue is that we are writing all the > checkpoint data under a common key prefix. This is the worst case scenario > for S3 performance since the key is used as a partition key. > > In the worst case checkpoints fail with a 500 status code coming back from S3 > and an internal error type of TooBusyException. > > One possible solution would be to add a hook in the Flink filesystem code > that allows me to "rewrite" paths. For example say I have the checkpoint > directory set to: > > s3://bucket/flink/checkpoints > > I would hook that and rewrite that path to: > > s3://bucket/[HASH]/flink/checkpoints, where HASH is the hash of the original > path > > This would distribute the checkpoint write load around the S3 cluster evenly. > > For reference: > https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/ > > Any other people hit this issue? Any other ideas for solutions? This is a > pretty serious problem for people trying to checkpoint to S3. > > -Jamie > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412737#comment-16412737 ] Jamie Grier commented on FLINK-9061: [~stevenz3wu] Did you contribute those changes back to Flink? I think this will be affecting others as well. Would you consider a contribution back to the project? Otherwise I will do this but if you already have something working we might as well use it or base changes on it. > S3 checkpoint data not partitioned well -- causes errors and poor performance > - > > Key: FLINK-9061 > URL: https://issues.apache.org/jira/browse/FLINK-9061 > Project: Flink > Issue Type: Bug > Components: FileSystem, State Backends, Checkpointing >Affects Versions: 1.4.2 >Reporter: Jamie Grier >Priority: Critical > > I think we need to modify the way we write checkpoints to S3 for high-scale > jobs (those with many total tasks). The issue is that we are writing all the > checkpoint data under a common key prefix. This is the worst case scenario > for S3 performance since the key is used as a partition key. > > In the worst case checkpoints fail with a 500 status code coming back from S3 > and an internal error type of TooBusyException. > > One possible solution would be to add a hook in the Flink filesystem code > that allows me to "rewrite" paths. For example say I have the checkpoint > directory set to: > > s3://bucket/flink/checkpoints > > I would hook that and rewrite that path to: > > s3://bucket/[HASH]/flink/checkpoints, where HASH is the hash of the original > path > > This would distribute the checkpoint write load around the S3 cluster evenly. > > For reference: > https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/ > > Any other people hit this issue? Any other ideas for solutions? This is a > pretty serious problem for people trying to checkpoint to S3. > > -Jamie > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9069) Fix some double semicolons to single semicolons, and update checkstyle
[ https://issues.apache.org/jira/browse/FLINK-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Park updated FLINK-9069: -- Description: As I was reading through the source code, I noticed that there were some double semicolons ";;". These should be fixed. Finally, the tools/maven/checkstyle.xml should be updated accordingly. was: As I was reading through the source code, I noticed that there were some double semi-colons ";;". These should be fixed. Finally, the tools/maven/checkstyle.xml should be updated accordingly. > Fix some double semicolons to single semicolons, and update checkstyle > -- > > Key: FLINK-9069 > URL: https://issues.apache.org/jira/browse/FLINK-9069 > Project: Flink > Issue Type: Task > Components: Checkstyle >Reporter: Jacob Park >Priority: Trivial > Labels: starter > > As I was reading through the source code, I noticed that there were some > double semicolons ";;". > These should be fixed. > Finally, the tools/maven/checkstyle.xml should be updated accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9069) Fix some double semicolons to single semicolons, and update checkstyle
[ https://issues.apache.org/jira/browse/FLINK-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob Park updated FLINK-9069: -- Summary: Fix some double semicolons to single semicolons, and update checkstyle (was: Fix some double semi-colons to single semi-colons, and update checkstyle) > Fix some double semicolons to single semicolons, and update checkstyle > -- > > Key: FLINK-9069 > URL: https://issues.apache.org/jira/browse/FLINK-9069 > Project: Flink > Issue Type: Task > Components: Checkstyle >Reporter: Jacob Park >Priority: Trivial > Labels: starter > > As I was reading through the source code, I noticed that there were some > double semi-colons ";;". > These should be fixed. > Finally, the tools/maven/checkstyle.xml should be updated accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9069) Fix some double semi-colons to single semi-colons, and update checkstyle
[ https://issues.apache.org/jira/browse/FLINK-9069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412690#comment-16412690 ] Jacob Park commented on FLINK-9069: --- Can I be assigned to this ticket? I want to use it learn the contribution process for Apache Flink. > Fix some double semi-colons to single semi-colons, and update checkstyle > > > Key: FLINK-9069 > URL: https://issues.apache.org/jira/browse/FLINK-9069 > Project: Flink > Issue Type: Task > Components: Checkstyle >Reporter: Jacob Park >Priority: Trivial > Labels: starter > > As I was reading through the source code, I noticed that there were some > double semi-colons ";;". > These should be fixed. > Finally, the tools/maven/checkstyle.xml should be updated accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-9069) Fix some double semi-colons to single semi-colons, and update checkstyle
Jacob Park created FLINK-9069: - Summary: Fix some double semi-colons to single semi-colons, and update checkstyle Key: FLINK-9069 URL: https://issues.apache.org/jira/browse/FLINK-9069 Project: Flink Issue Type: Task Components: Checkstyle Reporter: Jacob Park As I was reading through the source code, I noticed that there were some double semi-colons ";;". These should be fixed. Finally, the tools/maven/checkstyle.xml should be updated accordingly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5760: [hotfix] [doc] fix maven version in building flink
GitHub user bowenli86 opened a pull request: https://github.com/apache/flink/pull/5760 [hotfix] [doc] fix maven version in building flink ## What is the purpose of the change The maven version in `start/building` is inconsistent. Make it consistent by changing the maven version to 3.0.4 ## Brief change log The maven version in `start/building` is inconsistent. Make it consistent by changing the maven version to 3.0.4 ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: none ## Documentation none You can merge this pull request into a Git repository by running: $ git pull https://github.com/bowenli86/flink hotfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5760.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5760 commit 8a17ccedf33bed267f4dcfeef185cc589fb70fe1 Author: Bowen LiDate: 2018-03-24T15:11:16Z [hotfix] fix maven version in building flink ---
[GitHub] flink issue #5704: [FLINK-8852] [sql-client] Add FLIP-6 support to SQL Clien...
Github user liurenjie1024 commented on the issue: https://github.com/apache/flink/pull/5704 LGTM ---
[jira] [Commented] (FLINK-8852) SQL Client does not work with new FLIP-6 mode
[ https://issues.apache.org/jira/browse/FLINK-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412633#comment-16412633 ] ASF GitHub Bot commented on FLINK-8852: --- Github user liurenjie1024 commented on the issue: https://github.com/apache/flink/pull/5704 LGTM > SQL Client does not work with new FLIP-6 mode > - > > Key: FLINK-8852 > URL: https://issues.apache.org/jira/browse/FLINK-8852 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Affects Versions: 1.5.0 >Reporter: Fabian Hueske >Assignee: Timo Walther >Priority: Blocker > Fix For: 1.5.0 > > > The SQL client does not submit queries to local Flink cluster that runs in > FLIP-6 mode. It doesn't throw an exception either. > Job submission works if the legacy Flink cluster mode is used (`mode: old`) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9067) End-to-end test: Stream SQL query with failure and retry
[ https://issues.apache.org/jira/browse/FLINK-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412546#comment-16412546 ] ASF GitHub Bot commented on FLINK-9067: --- Github user zhangminglei commented on a diff in the pull request: https://github.com/apache/flink/pull/5759#discussion_r176906809 --- Diff: flink-end-to-end-tests/test-scripts/test_streaming_sql.sh --- @@ -0,0 +1,43 @@ +#!/usr/bin/env bash + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +source "$(dirname "$0")"/common.sh + +TEST_PROGRAM_JAR=$TEST_INFRA_DIR/../../flink-end-to-end-tests/flink-stream-sql-test/target/StreamSQLTestProgram.jar + +# copy flink-table jar into lib folder +cp $FLINK_DIR/opt/flink-table*jar $FLINK_DIR/lib --- End diff -- Oh. I think because you start taskmanager.sh, so you have to copy that. Not sure about it. > End-to-end test: Stream SQL query with failure and retry > > > Key: FLINK-9067 > URL: https://issues.apache.org/jira/browse/FLINK-9067 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL, Tests >Affects Versions: 1.5.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske >Priority: Major > > Implement a test job that runs a streaming SQL query with a temporary failure > and recovery. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5759: [FLINK-9067] [e2eTests] Add StreamSQLTestProgram a...
Github user zhangminglei commented on a diff in the pull request: https://github.com/apache/flink/pull/5759#discussion_r176906809 --- Diff: flink-end-to-end-tests/test-scripts/test_streaming_sql.sh --- @@ -0,0 +1,43 @@ +#!/usr/bin/env bash + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +source "$(dirname "$0")"/common.sh + +TEST_PROGRAM_JAR=$TEST_INFRA_DIR/../../flink-end-to-end-tests/flink-stream-sql-test/target/StreamSQLTestProgram.jar + +# copy flink-table jar into lib folder +cp $FLINK_DIR/opt/flink-table*jar $FLINK_DIR/lib --- End diff -- Oh. I think because you start taskmanager.sh, so you have to copy that. Not sure about it. ---
[jira] [Commented] (FLINK-9067) End-to-end test: Stream SQL query with failure and retry
[ https://issues.apache.org/jira/browse/FLINK-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412543#comment-16412543 ] ASF GitHub Bot commented on FLINK-9067: --- Github user zhangminglei commented on a diff in the pull request: https://github.com/apache/flink/pull/5759#discussion_r176906704 --- Diff: flink-end-to-end-tests/test-scripts/test_streaming_sql.sh --- @@ -0,0 +1,43 @@ +#!/usr/bin/env bash + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +source "$(dirname "$0")"/common.sh + +TEST_PROGRAM_JAR=$TEST_INFRA_DIR/../../flink-end-to-end-tests/flink-stream-sql-test/target/StreamSQLTestProgram.jar + +# copy flink-table jar into lib folder +cp $FLINK_DIR/opt/flink-table*jar $FLINK_DIR/lib --- End diff -- Hi, @fhueske I have some small questions here. I can see that there are some jar files in the example folder(```$FLINK_DIR/example```), some jar files in the ```$FLINK_DIR/lib``` folder, and other jar files in the ```$FLINK_DIR/opt``` folder. Why did you copy this jar file from opt folder to lib folder, instead of putting to ```$FLINK_DIR/example``` ? Thanks. > End-to-end test: Stream SQL query with failure and retry > > > Key: FLINK-9067 > URL: https://issues.apache.org/jira/browse/FLINK-9067 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL, Tests >Affects Versions: 1.5.0 >Reporter: Fabian Hueske >Assignee: Fabian Hueske >Priority: Major > > Implement a test job that runs a streaming SQL query with a temporary failure > and recovery. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5759: [FLINK-9067] [e2eTests] Add StreamSQLTestProgram a...
Github user zhangminglei commented on a diff in the pull request: https://github.com/apache/flink/pull/5759#discussion_r176906704 --- Diff: flink-end-to-end-tests/test-scripts/test_streaming_sql.sh --- @@ -0,0 +1,43 @@ +#!/usr/bin/env bash + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +source "$(dirname "$0")"/common.sh + +TEST_PROGRAM_JAR=$TEST_INFRA_DIR/../../flink-end-to-end-tests/flink-stream-sql-test/target/StreamSQLTestProgram.jar + +# copy flink-table jar into lib folder +cp $FLINK_DIR/opt/flink-table*jar $FLINK_DIR/lib --- End diff -- Hi, @fhueske I have some small questions here. I can see that there are some jar files in the example folder(```$FLINK_DIR/example```), some jar files in the ```$FLINK_DIR/lib``` folder, and other jar files in the ```$FLINK_DIR/opt``` folder. Why did you copy this jar file from opt folder to lib folder, instead of putting to ```$FLINK_DIR/example``` ? Thanks. ---
[jira] [Commented] (FLINK-9060) Deleting state using KeyedStateBackend.getKeys() throws Exception
[ https://issues.apache.org/jira/browse/FLINK-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412480#comment-16412480 ] ASF GitHub Bot commented on FLINK-9060: --- Github user sihuazhou commented on the issue: https://github.com/apache/flink/pull/5751 @kl0u Got it! Addressing... > Deleting state using KeyedStateBackend.getKeys() throws Exception > - > > Key: FLINK-9060 > URL: https://issues.apache.org/jira/browse/FLINK-9060 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Reporter: Aljoscha Krettek >Assignee: Sihua Zhou >Priority: Blocker > Fix For: 1.5.0 > > > Adding this test to {{StateBackendTestBase}} showcases the problem: > {code} > @Test > public void testConcurrentModificationWithGetKeys() throws Exception { > AbstractKeyedStateBackend backend = > createKeyedBackend(IntSerializer.INSTANCE); > try { > ListStateDescriptor listStateDescriptor = > new ListStateDescriptor<>("foo", > StringSerializer.INSTANCE); > backend.setCurrentKey(1); > backend > .getPartitionedState(VoidNamespace.INSTANCE, > VoidNamespaceSerializer.INSTANCE, listStateDescriptor) > .add("Hello"); > backend.setCurrentKey(2); > backend > .getPartitionedState(VoidNamespace.INSTANCE, > VoidNamespaceSerializer.INSTANCE, listStateDescriptor) > .add("Ciao"); > Stream keys = backend > .getKeys(listStateDescriptor.getName(), > VoidNamespace.INSTANCE); > keys.forEach((key) -> { > backend.setCurrentKey(key); > try { > backend > .getPartitionedState( > VoidNamespace.INSTANCE, > > VoidNamespaceSerializer.INSTANCE, > listStateDescriptor) > .clear(); > } catch (Exception e) { > e.printStackTrace(); > } > }); > } > finally { > IOUtils.closeQuietly(backend); > backend.dispose(); > } > } > {code} > This should work because one of the use cases of {{getKeys()}} and > {{applyToAllKeys()}} is to do stuff for every key, which includes deleting > them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5751: [FLINK-9060][state] Deleting state using KeyedStateBacken...
Github user sihuazhou commented on the issue: https://github.com/apache/flink/pull/5751 @kl0u Got it! Addressing... ---