Re: Spark-YARN | Scheduling of containers

2019-05-19 Thread Akshay Bhardwaj
Hi All, Just floating this email again. Grateful for any suggestions. Akshay Bhardwaj +91-97111-33849 On Mon, May 20, 2019 at 12:25 AM Akshay Bhardwaj < akshay.bhardwaj1...@gmail.com> wrote: > Hi All, > > I am running Spark 2.3 on YARN using HDP 2.6 > > I am running spark job using dynamic

Re: spark 2.4.3 build fails using java 8 and scala 2.11 with NumberFormatException: Not a version: 9

2019-05-19 Thread Bulldog20630405
after blowing away my m2 repo cache; i was able to build just fine... i dont know why; but now it works :-) On Sun, May 19, 2019 at 10:22 PM Bulldog20630405 wrote: > i am trying to build spark 2.4.3 with the following env: > >- fedora 29 >- 1.8.0_202 >- spark 2.4.3 >- scala

Re: [spark on yarn] spark on yarn without DFS

2019-05-19 Thread Abdeali Kothari
While spark can read from S3 directly in EMR, I believe it still needs the HDFS to perform shuffles and to write intermediate data into disk when doing jobs (I.e. when the in memory need stop spill over to disk) For these operations, Spark does need a distributed file system - You could use

Re: [spark on yarn] spark on yarn without DFS

2019-05-19 Thread Jeff Zhang
I am afraid not, because yarn needs dfs. Huizhe Wang 于2019年5月20日周一 上午9:50写道: > Hi, > > I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and > using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and > DataNode. I got an error when using yarn cluster mode.

spark 2.4.3 build fails using java 8 and scala 2.11 with NumberFormatException: Not a version: 9

2019-05-19 Thread Bulldog20630405
i am trying to build spark 2.4.3 with the following env: - fedora 29 - 1.8.0_202 - spark 2.4.3 - scala 2.11.12 - maven 3.5.4 - hadoop 2.6.5 according to the documentation this can be done with the following commands: *export TERM=xterm-color* *./build/mvn -Pyarn -DskipTests

[spark on yarn] spark on yarn without DFS

2019-05-19 Thread Huizhe Wang
Hi, I wanna to use Spark on Yarn without HDFS.I store my resource in AWS and using s3a to get them. However, when I use stop-dfs.sh stoped Namenode and DataNode. I got an error when using yarn cluster mode. Could I using yarn without start DFS, how could I use this mode? Yours, Jane

Re: Access to live data of cached dataFrame

2019-05-19 Thread Tomas Bartalos
I'm trying to re-read however I'm getting cached data (which is a bit confusing). For re-read I'm issuing: spark.read.format("delta").load("/data").groupBy(col("event_hour")).count The cache seems to be global influencing also new dataframes. So the question is how should I re-read without

Spark-YARN | Scheduling of containers

2019-05-19 Thread Akshay Bhardwaj
Hi All, I am running Spark 2.3 on YARN using HDP 2.6 I am running spark job using dynamic resource allocation on YARN with minimum 2 executors and maximum 6. My job read data from parquet files which are present on S3 buckets and store some enriched data to cassandra. My question is, how does