[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-17 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/19497 I guess one aspect of `saveAsNewAPIHadoopFile` is that it calls ` jobConfiguration.set("mapreduce.output.fileoutputformat.outputdir", path)`, and `Configuration.set(String key, String value)`

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19497 Thank you @mridulm. I regret that I raised this here, causing confusion. Let's talk more in another place. I will cc you (and @jiangxb1987) when I happened to file up a JIRA or see similar

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-16 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19497 @HyukjinKwon Thanks for clarifying. The way I look at it is: `saveAsHadoopFile` is explicitly referring to `Output the RDD to any Hadoop-supported file system` in its description (and

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19497 I agree this was focussed more on the regression introduced and it should be good enough already, and I am talking about a different thing for behaviour change. Let me organise my idea

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19497 Currently, I meant `saveAsNewAPIHadoopFile` comparing to `saveAsHadoopFile`. ``` saveAsNewAPIHadoopFile[...]("") // succeeds ``` ``` saveAsHadoopFile[...]("") //

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-16 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19497 `saveAsNewAPIHadoopFile ` simply delegates to `saveAsNewAPIHadoopDataset` (with some options set), right ? The behavior would be similar ? Do you mean `saveAsHadoopDataset` instead ? I

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19497 I support this PR itself of course. I have no problem with this. I meant a separate (soft) question about `saveAsNewAPIHadoopFile` (not `saveAsNewAPIHadoopDataset`) to validate `path`

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-15 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19497 To clarify, we can look at changing the behavior (if required) in future - but that should be an explicit design choice informed by hadoop committer design. Until then, we should look to

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-15 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19497 @HyukjinKwon My intention was to preserve earlier behavior. Particularly for non-path based committer's, the `path` variable and its use/processing is not relevant, it makes more sense to ignore

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19497 @mridulm, BTW, WDYT about disallowing: ``` .saveAsNewAPIHadoopFile[...]("") .saveAsNewAPIHadoopFile[...]("::invalid:::") ``` within the APIs? If i tested this

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-15 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19497 Thanks for the reviews everyone ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-15 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19497 Thx for taking a deeper look @HyukjinKwon, much appreciated ! I will wait for @jiangxb1987 to also opine before committing - I want to make sure we are not adding incorrect behavior; given that

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19497 @mridulm, I just checked thought the related changes and checked the tests pass on branch-2.1. Seems this PR will actually also allow the cases below: ```scala

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19497 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19497 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82754/ Test PASSed. ---

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19497 **[Test build #82754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82754/testReport)** for PR 19497 at commit

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19497 Let me take a look with few tests and be back. Also I think I should cc @jiangxb1987 too. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19497 **[Test build #82754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82754/testReport)** for PR 19497 at commit

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19497 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19497 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19497 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82753/ Test FAILed. ---

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19497 **[Test build #82753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82753/testReport)** for PR 19497 at commit

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19497 **[Test build #82753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82753/testReport)** for PR 19497 at commit

[GitHub] spark issue #19497: [SPARK-21549][CORE] Respect OutputFormats with no/invali...

2017-10-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19497 +CC @HyukjinKwon, @steveloughran Sorry for messing up PR #19487 The only change in this PR is to use `::invalid::` instead of `test:` in the test to address @steveloughran's comment.