[GitHub] flink pull request: [FLINK-2394] [fix] HadoopOutputFormats use cor...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1056 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2394] [fix] HadoopOutputFormats use cor...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/1056#issuecomment-134982439 This looks good. I'm merging this to the release branch. We have an issue to add tests for the output formats, so it's fine that this does not include a test yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2394] [fix] HadoopOutputFormats use cor...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1056#issuecomment-134982359 LGTM. +1 for merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2394] [fix] HadoopOutputFormats use cor...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1056#discussion_r37969330 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/hadoop/mapred/HadoopOutputFormat.scala --- @@ -18,11 +18,17 @@ package org.apache.flink.api.scala.hadoop.mapred import org.apache.flink.api.java.hadoop.mapred.HadoopOutputFormatBase -import org.apache.hadoop.mapred.{JobConf, OutputFormat} +import org.apache.hadoop.mapred.{OutputCommitter, JobConf, OutputFormat} class HadoopOutputFormat[K, V](mapredOutputFormat: OutputFormat[K, V], job: JobConf) extends HadoopOutputFormatBase[K, V, (K, V)](mapredOutputFormat, job) { + def this(mapredOutputFormat: OutputFormat[K, V], outputCommitterClass: Class[OutputCommitter], + job: JobConf) { --- End diff -- Fixed. I'd propose to add this to the Scala checkstyle, if we want to enforce it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2394] [fix] HadoopOutputFormats use cor...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1056#discussion_r37968466 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/hadoop/mapred/HadoopOutputFormat.scala --- @@ -18,11 +18,17 @@ package org.apache.flink.api.scala.hadoop.mapred import org.apache.flink.api.java.hadoop.mapred.HadoopOutputFormatBase -import org.apache.hadoop.mapred.{JobConf, OutputFormat} +import org.apache.hadoop.mapred.{OutputCommitter, JobConf, OutputFormat} class HadoopOutputFormat[K, V](mapredOutputFormat: OutputFormat[K, V], job: JobConf) extends HadoopOutputFormatBase[K, V, (K, V)](mapredOutputFormat, job) { + def this(mapredOutputFormat: OutputFormat[K, V], outputCommitterClass: Class[OutputCommitter], + job: JobConf) { --- End diff -- Can we use multi line parameter lists as in other Scala files? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2394] [fix] HadoopOutputFormats use cor...
GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/1056 [FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters. Right now, Flink's wrappers for Hadoop OutputFormats always use a `FileOutputCommitter`. - In the `mapreduce` API, Hadoop OutputFormats have a method `getOutputCommitter()` which can be overwritten and returns the `FileOutputFormat` by default. - In the `mapred`API, the `OutputCommitter` should be obtained from the `JobConf`. If nothing custom is set, a `FileOutputCommitter` is returned. This PR uses the respective methods to obtain the correct `OutputCommitter`. Since, `FileOutputCommitter` is the default in both cases, the original semantics are preserved if no custom committer is implemented or set by the user. I also added convenience methods to the constructors of the `mapred` wrappers to set the `OutputCommitter` in the `JobConf`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink hadoopOutCommitter Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1056.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1056 commit a632203a948f2e7973339a0eab88750f7ce70cc5 Author: Fabian Hueske Date: 2015-07-30T19:47:01Z [FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---