[jira] [Comment Edited] (FLINK-15533) Writing DataStream as text file fails due to output path already exists
[ https://issues.apache.org/jira/browse/FLINK-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012633#comment-17012633 ] Kostas Kloudas edited comment on FLINK-15533 at 1/10/20 9:58 AM: - Hi [~lirui], I tried it on the {{master}} (without my patch) with Yarn and HDFS and job submission from the command line and I cannot reproduce it. My job is: {code:java} StreamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream dataStream = streamEnv.fromCollection(Arrays.asList(1, 2, 3)); dataStream.writeAsText("hdfs://" + args[0] + ":9000/tmp/output"); streamEnv.execute(); {code} and the command in the CLI : {{./bin/flink run ./MY_JOB.jar HOSTNAME}} Also I tried it with providing parallelism using the {{-p}} option and it still works. Could you provide some more details so that I can reproduce it? was (Author: kkl0u): Hi [~lirui], I tried it on the {{master}} (without my patch) with Yarn and HDFS and job submission from the command line and I cannot reproduce it. My job is: {code:java} StreamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream dataStream = streamEnv.fromCollection(Arrays.asList(1, 2, 3)); dataStream.writeAsText("hdfs://" + args[0] + ":9000/tmp/output"); streamEnv.execute(); {code} and the command in the CLI : {{./bin/flink run ./examples/streaming/MY_JOB.jar HOSTNAME}} Also I tried it with providing parallelism using the {{-p}} option and it still works. Could you provide some more details so that I can reproduce it? > Writing DataStream as text file fails due to output path already exists > --- > > Key: FLINK-15533 > URL: https://issues.apache.org/jira/browse/FLINK-15533 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Affects Versions: 1.10.0 >Reporter: Rui Li >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.10.0 > > > The following program reproduces the issue. > {code} > Configuration configuration = GlobalConfiguration.loadConfiguration(); > configuration.set(DeploymentOptions.TARGET, RemoteExecutor.NAME); > StreamExecutionEnvironment streamEnv = new > StreamExecutionEnvironment(configuration); > DataStream dataStream = streamEnv.fromCollection(Arrays.asList(1,2,3)); > dataStream.writeAsText("hdfs://localhost:8020/tmp/output"); > streamEnv.execute(); > {code} > The job will fail with the follow error, even though the output path doesn't > exist before job submission: > {noformat} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExistsException): > /tmp/output already exists as a directory > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-15533) Writing DataStream as text file fails due to output path already exists
[ https://issues.apache.org/jira/browse/FLINK-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012633#comment-17012633 ] Kostas Kloudas edited comment on FLINK-15533 at 1/10/20 9:45 AM: - Hi [~lirui], I tried it on the {{master}} (without my patch) with Yarn and HDFS and job submission from the command line and I cannot reproduce it. My job is: {code:java} StreamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream dataStream = streamEnv.fromCollection(Arrays.asList(1, 2, 3)); dataStream.writeAsText("hdfs://" + args[0] + ":9000/tmp/output"); streamEnv.execute(); {code} and the command in the CLI : {{./bin/flink run ./examples/streaming/MY_JOB.jar HOSTNAME}} Also I tried it with providing parallelism using the {{-p}} option and it still works. Could you provide some more details so that I can reproduce it? was (Author: kkl0u): Hi [~lirui], I tried it on the {{master}} (without my patch) with Yarn and HDFS and job submission from the command line and I cannot reproduce it. My job is: {{ StreamExecutionEnvironment streamEnv = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream dataStream = streamEnv.fromCollection(Arrays.asList(1, 2, 3)); dataStream.writeAsText("hdfs://" + args[0] + ":9000/tmp/output"); streamEnv.execute(); }} and the command in the CLI : {{./bin/flink run ./examples/streaming/MY_JOB.jar HOSTNAME}} Also I tried it with providing parallelism using the {{-p}} option and it still works. Could you provide some more details so that I can reproduce it? > Writing DataStream as text file fails due to output path already exists > --- > > Key: FLINK-15533 > URL: https://issues.apache.org/jira/browse/FLINK-15533 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission >Affects Versions: 1.10.0 >Reporter: Rui Li >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.10.0 > > > The following program reproduces the issue. > {code} > Configuration configuration = GlobalConfiguration.loadConfiguration(); > configuration.set(DeploymentOptions.TARGET, RemoteExecutor.NAME); > StreamExecutionEnvironment streamEnv = new > StreamExecutionEnvironment(configuration); > DataStream dataStream = streamEnv.fromCollection(Arrays.asList(1,2,3)); > dataStream.writeAsText("hdfs://localhost:8020/tmp/output"); > streamEnv.execute(); > {code} > The job will fail with the follow error, even though the output path doesn't > exist before job submission: > {noformat} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExistsException): > /tmp/output already exists as a directory > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-15533) Writing DataStream as text file fails due to output path already exists
[ https://issues.apache.org/jira/browse/FLINK-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011573#comment-17011573 ] Zili Chen edited comment on FLINK-15533 at 1/9/20 9:01 AM: --- Could you attach the running job DAG on JIRA? It sounds weird. If you delete the file/directory at first and then submit the job with {{parallelism == 1}}, it should work and there should be only one execution instead of 1st execution and others; in the other case, it should be the case the directory exists if you set {{parallelism > 1}}. was (Author: tison): Could you attach the running job DAG on JIRA? It sounds wired if you delete the file/directory at first and then submit the job with {{parallelism == 1}}. Also it should be the case the directory exists if you set {{parallelism > 1}}. > Writing DataStream as text file fails due to output path already exists > --- > > Key: FLINK-15533 > URL: https://issues.apache.org/jira/browse/FLINK-15533 > Project: Flink > Issue Type: Bug >Reporter: Rui Li >Priority: Major > > The following program reproduces the issue. > {code} > Configuration configuration = GlobalConfiguration.loadConfiguration(); > configuration.set(DeploymentOptions.TARGET, RemoteExecutor.NAME); > StreamExecutionEnvironment streamEnv = new > StreamExecutionEnvironment(configuration); > DataStream dataStream = streamEnv.fromCollection(Arrays.asList(1,2,3)); > dataStream.writeAsText("hdfs://localhost:8020/tmp/output"); > streamEnv.execute(); > {code} > The job will fail with the follow error, even though the output path doesn't > exist before job submission: > {noformat} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExistsException): > /tmp/output already exists as a directory > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-15533) Writing DataStream as text file fails due to output path already exists
[ https://issues.apache.org/jira/browse/FLINK-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011546#comment-17011546 ] Zili Chen edited comment on FLINK-15533 at 1/9/20 8:32 AM: --- [~lzljs3620320] but how? I don't see a clear reason since that method translates the default value placeholder to an effective default value. was (Author: tison): [~lzljs3620320] but how? I don't see a clear reason since that method fallback the default value placeholder to an effective default value. > Writing DataStream as text file fails due to output path already exists > --- > > Key: FLINK-15533 > URL: https://issues.apache.org/jira/browse/FLINK-15533 > Project: Flink > Issue Type: Bug >Reporter: Rui Li >Priority: Major > > The following program reproduces the issue. > {code} > Configuration configuration = GlobalConfiguration.loadConfiguration(); > configuration.set(DeploymentOptions.TARGET, RemoteExecutor.NAME); > StreamExecutionEnvironment streamEnv = new > StreamExecutionEnvironment(configuration); > DataStream dataStream = streamEnv.fromCollection(Arrays.asList(1,2,3)); > dataStream.writeAsText("hdfs://localhost:8020/tmp/output"); > streamEnv.execute(); > {code} > The job will fail with the follow error, even though the output path doesn't > exist before job submission: > {noformat} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExistsException): > /tmp/output already exists as a directory > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)