[jira] [Commented] (FLINK-10218) Allow writing DataSet without explicit path parameter
[ https://issues.apache.org/jira/browse/FLINK-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593478#comment-16593478 ] ASF GitHub Bot commented on FLINK-10218: link3280 closed pull request #6616: [FLINK-10218] Allow writing DataSet without explicit path parameter URL: https://github.com/apache/flink/pull/6616 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/flink-java/src/main/java/org/apache/flink/api/java/DataSet.java b/flink-java/src/main/java/org/apache/flink/api/java/DataSet.java index 3dd4f6a8216..976a3c65f8c 100644 --- a/flink-java/src/main/java/org/apache/flink/api/java/DataSet.java +++ b/flink-java/src/main/java/org/apache/flink/api/java/DataSet.java @@ -1727,6 +1727,21 @@ public void printToErr() throws Exception { return output(new PrintingOutputFormat(sinkIdentifier, true)); } + /** +* Writes a DataSet using a {@link FileOutputFormat} to a specified location. +* This method adds a data sink to the program. +* +* @param outputFormat The FileOutputFormat to write the DataSet. +* @return The DataSink that writes the DataSet. +* +* @see FileOutputFormat +*/ + public DataSink write(FileOutputFormat outputFormat) { + Preconditions.checkNotNull(outputFormat, "Output format must not be null."); + Preconditions.checkNotNull(outputFormat.getOutputFilePath(), "File path must not be null."); + return output(outputFormat); + } + /** * Writes a DataSet using a {@link FileOutputFormat} to a specified location. * This method adds a data sink to the program. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow writing DataSet without explicit path parameter > - > > Key: FLINK-10218 > URL: https://issues.apache.org/jira/browse/FLINK-10218 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.6.0 >Reporter: Paul Lin >Priority: Minor > Labels: pull-request-available > > Currently, DataSet API has two overloaded `write` methods for using > FileOutputFormat as output format, and both require a path parameter, but the > output path could already be set in the FileOutputFormat object. What's more, > the subclasses of FileOutputFormat mostly don't have default constructors and > required a path parameter too, so users have to set output path twice in the > code, like: > {code:java} > String output = "hdfs:///tmp/"; > dataset.write(new TextOutputFormat<>(new Path(output)), output); > {code} > So I propose to add another write helper method that requires no path > parameter. May someone assign this issue to me? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10218) Allow writing DataSet without explicit path parameter
[ https://issues.apache.org/jira/browse/FLINK-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593477#comment-16593477 ] ASF GitHub Bot commented on FLINK-10218: link3280 commented on issue #6616: [FLINK-10218] Allow writing DataSet without explicit path parameter URL: https://github.com/apache/flink/pull/6616#issuecomment-416189573 @yanghua @zentol OK, I will close this PR. Thanks for your time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow writing DataSet without explicit path parameter > - > > Key: FLINK-10218 > URL: https://issues.apache.org/jira/browse/FLINK-10218 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.6.0 >Reporter: Paul Lin >Priority: Minor > Labels: pull-request-available > > Currently, DataSet API has two overloaded `write` methods for using > FileOutputFormat as output format, and both require a path parameter, but the > output path could already be set in the FileOutputFormat object. What's more, > the subclasses of FileOutputFormat mostly don't have default constructors and > required a path parameter too, so users have to set output path twice in the > code, like: > {code:java} > String output = "hdfs:///tmp/"; > dataset.write(new TextOutputFormat<>(new Path(output)), output); > {code} > So I propose to add another write helper method that requires no path > parameter. May someone assign this issue to me? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10218) Allow writing DataSet without explicit path parameter
[ https://issues.apache.org/jira/browse/FLINK-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593285#comment-16593285 ] ASF GitHub Bot commented on FLINK-10218: zentol commented on a change in pull request #6616: [FLINK-10218] Allow writing DataSet without explicit path parameter URL: https://github.com/apache/flink/pull/6616#discussion_r212897718 ## File path: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java ## @@ -1727,6 +1727,21 @@ public void printToErr() throws Exception { return output(new PrintingOutputFormat(sinkIdentifier, true)); } + /** +* Writes a DataSet using a {@link FileOutputFormat} to a specified location. +* This method adds a data sink to the program. +* +* @param outputFormat The FileOutputFormat to write the DataSet. +* @return The DataSink that writes the DataSet. +* +* @see FileOutputFormat +*/ + public DataSink write(FileOutputFormat outputFormat) { + Preconditions.checkNotNull(outputFormat, "Output format must not be null."); + Preconditions.checkNotNull(outputFormat.getOutputFilePath(), "File path must not be null."); + return output(outputFormat); Review comment: this right here is already a viable alternative for users, hence I would reject this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow writing DataSet without explicit path parameter > - > > Key: FLINK-10218 > URL: https://issues.apache.org/jira/browse/FLINK-10218 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.6.0 >Reporter: Paul Lin >Priority: Minor > Labels: pull-request-available > > Currently, DataSet API has two overloaded `write` methods for using > FileOutputFormat as output format, and both require a path parameter, but the > output path could already be set in the FileOutputFormat object. What's more, > the subclasses of FileOutputFormat mostly don't have default constructors and > required a path parameter too, so users have to set output path twice in the > code, like: > {code:java} > String output = "hdfs:///tmp/"; > dataset.write(new TextOutputFormat<>(new Path(output)), output); > {code} > So I propose to add another write helper method that requires no path > parameter. May someone assign this issue to me? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10218) Allow writing DataSet without explicit path parameter
[ https://issues.apache.org/jira/browse/FLINK-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593225#comment-16593225 ] ASF GitHub Bot commented on FLINK-10218: yanghua commented on issue #6616: [FLINK-10218] Allow writing DataSet without explicit path parameter URL: https://github.com/apache/flink/pull/6616#issuecomment-416125382 @link3280 Thanks for your contribution, It would be a good idea to add some tests for this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow writing DataSet without explicit path parameter > - > > Key: FLINK-10218 > URL: https://issues.apache.org/jira/browse/FLINK-10218 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.6.0 >Reporter: Paul Lin >Priority: Minor > Labels: pull-request-available > > Currently, DataSet API has two overloaded `write` methods for using > FileOutputFormat as output format, and both require a path parameter, but the > output path could already be set in the FileOutputFormat object. What's more, > the subclasses of FileOutputFormat mostly don't have default constructors and > required a path parameter too, so users have to set output path twice in the > code, like: > {code:java} > String output = "hdfs:///tmp/"; > dataset.write(new TextOutputFormat<>(new Path(output)), output); > {code} > So I propose to add another write helper method that requires no path > parameter. May someone assign this issue to me? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10218) Allow writing DataSet without explicit path parameter
[ https://issues.apache.org/jira/browse/FLINK-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593154#comment-16593154 ] ASF GitHub Bot commented on FLINK-10218: link3280 opened a new pull request #6616: [FLINK-10218] Allow writing DataSet without explicit path parameter URL: https://github.com/apache/flink/pull/6616 ## What is the purpose of the change Add an file output helper method, which requires only FileOutputFormat parameter, to DataSet API. This can avoid setting duplicate path parameters, since the output path could be found in FileOutputFormat. ## Brief change log - *Add an file output helper method, which requires only FileOutputFormat parameter, to DataSet API.* ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow writing DataSet without explicit path parameter > - > > Key: FLINK-10218 > URL: https://issues.apache.org/jira/browse/FLINK-10218 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.6.0 >Reporter: Paul Lin >Priority: Minor > Labels: pull-request-available > > Currently, DataSet API has two overloaded `write` methods for using > FileOutputFormat as output format, and both require a path parameter, but the > output path could already be set in the FileOutputFormat object. What's more, > the subclasses of FileOutputFormat mostly don't have default constructors and > required a path parameter too, so users have to set output path twice in the > code, like: > {code:java} > String output = "hdfs:///tmp/"; > dataset.write(new TextOutputFormat<>(new Path(output)), output); > {code} > So I propose to add another write helper method that requires no path > parameter. May someone assign this issue to me? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10218) Allow writing DataSet without explicit path parameter
[ https://issues.apache.org/jira/browse/FLINK-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593135#comment-16593135 ] vinoyang commented on FLINK-10218: -- [~Paul Lin] Currently you may have no contributor permissions, ping [~till.rohrmann] [~Zentol] > Allow writing DataSet without explicit path parameter > - > > Key: FLINK-10218 > URL: https://issues.apache.org/jira/browse/FLINK-10218 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Affects Versions: 1.6.0 >Reporter: Paul Lin >Priority: Minor > > Currently, DataSet API has two overloaded `write` methods for using > FileOutputFormat as output format, and both require a path parameter, but the > output path could already be set in the FileOutputFormat object. What's more, > the subclasses of FileOutputFormat mostly don't have default constructors and > required a path parameter too, so users have to set output path twice in the > code, like: > {code:java} > String output = "hdfs:///tmp/"; > dataset.write(new TextOutputFormat<>(new Path(output)), output); > {code} > So I propose to add another write helper method that requires no path > parameter. May someone assign this issue to me? -- This message was sent by Atlassian JIRA (v7.6.3#76005)