Re: [spark on yarn] spark on yarn without DFS

2019-05-23 Thread Achilleus 003
This is interesting. Would really appreciate it if you could share what exactly did you change in* core-site.xml *and *yarn-site.xml.* On Wed, May 22, 2019 at 9:14 AM Gourav Sengupta wrote: > just wondering what is the advantage of doing this? > > Regards > Gourav Sengupta > > On Wed, May 22,

Re: [spark on yarn] spark on yarn without DFS

2019-05-22 Thread Gourav Sengupta
just wondering what is the advantage of doing this? Regards Gourav Sengupta On Wed, May 22, 2019 at 3:01 AM Huizhe Wang wrote: > Hi Hari, > Thanks :) I tried to do it as u said. It works ;) > > > Hariharan 于2019年5月20日 周一下午3:54写道: > >> Hi Huizhe, >> >> You can set the "fs.defaultFS" field in

Re: [spark on yarn] spark on yarn without DFS

2019-05-21 Thread Huizhe Wang
Hi Hari, Thanks :) I tried to do it as u said. It works ;) Hariharan 于2019年5月20日 周一下午3:54写道: > Hi Huizhe, > > You can set the "fs.defaultFS" field in core-site.xml to some path on s3. > That way your spark job will use S3 for all operations that need HDFS. > Intermediate data will still be

Re: [spark on yarn] spark on yarn without DFS

2019-05-20 Thread JB Data31
There is a kind of check in the *yarn-site.xml* *yarn.nodemanager.remote-app-log-dir /var/yarn/logs* ** Using *hdfs://:9000* as* fs.defaultFS* in *core-site.xml* you have to *hdfs dfs -mkdir /var/yarn/logs* Using *S3://* as * fs.defaultFS*... Take care of *.dir* properties in*

Re: [spark on yarn] spark on yarn without DFS

2019-05-20 Thread Hariharan
Hi Huizhe, You can set the "fs.defaultFS" field in core-site.xml to some path on s3. That way your spark job will use S3 for all operations that need HDFS. Intermediate data will still be stored on local disk though. Thanks, Hari On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari wrote: > While

Re: [spark on yarn] spark on yarn without DFS

2019-05-19 Thread Abdeali Kothari
While spark can read from S3 directly in EMR, I believe it still needs the HDFS to perform shuffles and to write intermediate data into disk when doing jobs (I.e. when the in memory need stop spill over to disk) For these operations, Spark does need a distributed file system - You could use

Re: [spark on yarn] spark on yarn without DFS

2019-05-19 Thread Jeff Zhang
I am afraid not, because yarn needs dfs. Huizhe Wang 于2019年5月20日周一 上午9:50写道: > Hi, > > I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and > using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and > DataNode. I got an error when using yarn cluster mode.

[spark on yarn] spark on yarn without DFS

2019-05-19 Thread Huizhe Wang
Hi, I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and DataNode. I got an error when using yarn cluster mode. Could I using yarn without start DFS, how could I use this mode? Yours, Jane