Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/19497
I guess one aspect of `saveAsNewAPIHadoopFile` is that it calls `
jobConfiguration.set("mapreduce.output.fileoutputformat.outputdir", path)`, and
`Configuration.set(String key, String value)`
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19497
Thank you @mridulm. I regret that I raised this here, causing confusion.
Let's talk more in another place. I will cc you (and @jiangxb1987) when I
happened to file up a JIRA or see similar
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19497
@HyukjinKwon Thanks for clarifying.
The way I look at it is:
`saveAsHadoopFile` is explicitly referring to `Output the RDD to any
Hadoop-supported file system` in its description (and
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19497
I agree this was focussed more on the regression introduced and it should
be good enough already, and I am talking about a different thing for behaviour
change.
Let me organise my idea
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19497
Currently, I meant `saveAsNewAPIHadoopFile` comparing to `saveAsHadoopFile`.
```
saveAsNewAPIHadoopFile[...]("") // succeeds
```
```
saveAsHadoopFile[...]("") //
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19497
`saveAsNewAPIHadoopFile ` simply delegates to `saveAsNewAPIHadoopDataset`
(with some options set), right ? The behavior would be similar ?
Do you mean `saveAsHadoopDataset` instead ?
I
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19497
I support this PR itself of course. I have no problem with this.
I meant a separate (soft) question about `saveAsNewAPIHadoopFile` (not
`saveAsNewAPIHadoopDataset`) to validate `path`
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19497
To clarify, we can look at changing the behavior (if required) in future -
but that should be an explicit design choice informed by hadoop committer
design. Until then, we should look to
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19497
@HyukjinKwon My intention was to preserve earlier behavior.
Particularly for non-path based committer's, the `path` variable and its
use/processing is not relevant, it makes more sense to ignore
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19497
@mridulm, BTW, WDYT about disallowing:
```
.saveAsNewAPIHadoopFile[...]("")
.saveAsNewAPIHadoopFile[...]("::invalid:::")
```
within the APIs? If i tested this
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19497
Thanks for the reviews everyone !
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19497
Thx for taking a deeper look @HyukjinKwon, much appreciated !
I will wait for @jiangxb1987 to also opine before committing - I want to
make sure we are not adding incorrect behavior; given that
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19497
@mridulm, I just checked thought the related changes and checked the tests
pass on branch-2.1.
Seems this PR will actually also allow the cases below:
```scala
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19497
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19497
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82754/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19497
**[Test build #82754 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82754/testReport)**
for PR 19497 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19497
Let me take a look with few tests and be back. Also I think I should cc
@jiangxb1987 too.
---
-
To unsubscribe, e-mail:
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19497
**[Test build #82754 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82754/testReport)**
for PR 19497 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19497
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19497
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19497
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82753/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19497
**[Test build #82753 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82753/testReport)**
for PR 19497 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19497
**[Test build #82753 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82753/testReport)**
for PR 19497 at commit
Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/19497
+CC @HyukjinKwon, @steveloughran
Sorry for messing up PR #19487
The only change in this PR is to use `::invalid::` instead of `test:` in
the test to address @steveloughran's comment.
24 matches
Mail list logo