github-actions[bot] closed pull request #43936: [SPARK-46034][CORE]
SparkContext add file should also copy file to local root path
URL: https://github.com/apache/spark/pull/43936
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
github-actions[bot] commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1984826472
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
HyukjinKwon commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1829030542
@AngersZh can you fix the PR description, and explain how this issue
happens specifically in Yarn cluster mode? I still can't fully follow what
where is the issue.
--
This is
junyi1313 commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1829023594
> The root cause is spark driver download file to it's `driverTempPath`, but
didn't download to container's execution root path. So in yarn cluster mode, if
we need to use the file in
tgravescs commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1828063684
I have not used addFiles on yarn in a long time so can't speak to whether it
got broken. Generally speaking it's not recommended and user should pass
files on submission. Whatever
AngersZh commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1827485987
I don't know why we add a `driverTmpDir`, remove `driverTmpDir` also can
resolve this issue.
--
This is an automated message from the Apache Git Service.
To respond to the
AngersZh commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1827474881
> Could you please update the PR description? It looks overdue and
inaccurate for me to catch up with.
>
> Could you also provide the output for `LIST FILE` both before and
yaooqinn commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1827051972
Could you please update the PR description? It looks overdue and inaccurate
for me to catch up with.
Could you also provide the output for `LIST FILE`? I guess we shall use its
AngersZh commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1827039250
gentle ping @yaooqinn @cloud-fan @HyukjinKwon @tgravescs Could you take a
look?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
AngersZh commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1825190497
> Can we fix `SparkFiles.get` at driver side when Yarn cluster is used?
`SparkFiles.get()` don't have problem.
The problem is user use relative path to use the added file.
HyukjinKwon commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1825009537
Can we fix `SparkFiles.get` at driver side when Yarn cluster is used?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
AngersZh commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1824131573
Any more suggestion? cc @HyukjinKwon @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
AngersZh commented on code in PR #43936:
URL: https://github.com/apache/spark/pull/43936#discussion_r1401427353
##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -1822,7 +1822,7 @@ class SparkContext(config: SparkConf) extends Logging {
AngersZh commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1822478004
```
Fetching
hdfs://R2/projects/search_algo/hdfs/dev/typhoon.bo/uploader/ego_config/feature_map.txt
to
AngersZh commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1822401850
The root cause is spark driver download file to it's `driverTempPath`, but
didn't download to container's execution root path.
So in yarn cluster mode, if we need to use the file
AngersZh commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1822369346
> cc @mridulm or @tgravescs have you ever seen such things before?:
`SparkContext.addFiles` adds a file with a temporary name, and you cannot get
it with the original name from
HyukjinKwon commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1822356253
cc @mridulm have you ever seen such things before?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
AngersZh commented on PR #43936:
URL: https://github.com/apache/spark/pull/43936#issuecomment-1822339813
Should be I make a mistake.
In driver side, file was download and copy to path driverTempPath
```
HyukjinKwon commented on code in PR #43936:
URL: https://github.com/apache/spark/pull/43936#discussion_r1401465887
##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -1822,7 +1822,7 @@ class SparkContext(config: SparkConf) extends Logging {
logInfo(s"Added
AngersZh commented on code in PR #43936:
URL: https://github.com/apache/spark/pull/43936#discussion_r1401448409
##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -1836,7 +1836,7 @@ class SparkContext(config: SparkConf) extends Logging {
val uriToUse =
AngersZh commented on code in PR #43936:
URL: https://github.com/apache/spark/pull/43936#discussion_r1401447578
##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -1822,7 +1822,7 @@ class SparkContext(config: SparkConf) extends Logging {
AngersZh commented on code in PR #43936:
URL: https://github.com/apache/spark/pull/43936#discussion_r1401447858
##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -1836,7 +1836,7 @@ class SparkContext(config: SparkConf) extends Logging {
val uriToUse =
AngersZh commented on code in PR #43936:
URL: https://github.com/apache/spark/pull/43936#discussion_r1401447419
##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -1822,7 +1822,7 @@ class SparkContext(config: SparkConf) extends Logging {
HyukjinKwon commented on code in PR #43936:
URL: https://github.com/apache/spark/pull/43936#discussion_r1401445511
##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -1822,7 +1822,7 @@ class SparkContext(config: SparkConf) extends Logging {
logInfo(s"Added
HyukjinKwon commented on code in PR #43936:
URL: https://github.com/apache/spark/pull/43936#discussion_r1401443129
##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -1836,7 +1836,7 @@ class SparkContext(config: SparkConf) extends Logging {
val uriToUse =
AngersZh commented on code in PR #43936:
URL: https://github.com/apache/spark/pull/43936#discussion_r1401427353
##
core/src/main/scala/org/apache/spark/SparkContext.scala:
##
@@ -1822,7 +1822,7 @@ class SparkContext(config: SparkConf) extends Logging {
26 matches
Mail list logo