Re: Spark-on-Yarn ClassNotFound Exception

2022-12-18 Thread Hariharan
ting "export > CLASSPATH" - you can double check that your jar looks to be included > correctly there. If it is I think you have a really "interesting" issue on > your hands! > > - scrypso > > On Wed, Dec 14, 2022, 05:17 Hariharan wrote: > >> Hi

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
t up? (Wild > guess, no idea if that works or how hard it would be.) > > > On Tue, Dec 13, 2022, 17:29 Hariharan wrote: > >> Thanks for the response, scrypso! I will try adding the extraClassPath >> option. Meanwhile, please find the full stack t

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
mmediately tell how your error might arise, unless there is some > timing issue with the Spark and Hadoop setup. Can you share the full > stacktrace of the ClassNotFound exception? That might tell us when Hadoop > is looking up this class. > > Good luck! > - scrypso > > >

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan
Missed to mention it above, but just to add, the error is coming from the driver. I tried using *--driver-class-path /path/to/my/jar* as well, but no luck. Thanks! On Mon, Dec 12, 2022 at 4:21 PM Hariharan wrote: > Hello folks, > > I have a spark app with a custom imple

Spark-on-Yarn ClassNotFound Exception

2022-12-12 Thread Hariharan
Hello folks, I have a spark app with a custom implementation of *fs.s3a.s3.client.factory.impl* which is packaged into the same jar. Output of *jar tf* *2620 Mon Dec 12 11:23:00 IST 2022 aws/utils/MyS3ClientFactory.class* However when I run the my spark app with spark-submit in cluster mode, it

Re: Automated setup of a multi-node cluster for Apache Spark

2021-04-10 Thread Hariharan
/blob/master/flintrock/core.py#L47>. Thanks, Hariharan On Sat, Apr 10, 2021 at 9:00 AM Dhruv Kumar wrote: > Hello > > I am new to Apache Spark and am looking for some close guidance or > collaboration for my Spark Project which has the following main components: > > 1. Wr

Re: Spark performance over S3

2021-04-07 Thread Hariharan
ons from Cloudera <https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_cloud-data-access/content/s3-performance.html> for optimal use of S3A. Thanks, Hariharan On Wed, Apr 7, 2021 at 12:15 AM Tzahi File wrote: > Hi All, > > We have a spark cluster on aws ec2 that has 60 X i3.4xlarge. &g

Re: Write pyspark dataframe into kms encrypted s3 bucket

2020-10-15 Thread Hariharan
rowing the error. ~ Hariharan On Thu, Oct 15, 2020 at 8:56 PM Devi P V wrote: > hadoop_conf.set("fs.s3a.multipart.size", 104857600L) > > .set only allows string values. Its throwing invalid syntax. > > I tried following also. But issue not fixed. > > hadoop_conf.setLo

Re: Write pyspark dataframe into kms encrypted s3 bucket

2020-10-15 Thread Hariharan
fs.s3a.multipart.size needs to be a long value, not a string, so you will need to use hadoop_conf.set("fs.s3a.multipart.size", 104857600L) ~ Hariharan On Thu, Oct 15, 2020 at 6:32 PM Devi P V wrote: > > Hi All, > > I am trying to write a pyspark dataframe into KMS en

Re: Can't get Spark to interface with S3A Filesystem with correct credentials

2020-03-04 Thread Hariharan
=org.apache.hadoop.fs.s3a.S3A Hadoop 2.8 and above would have these set by default. Thanks, Hariharan On Thu, Mar 5, 2020 at 2:41 AM Devin Boyer wrote: > > Hello, > > I'm attempting to run Spark within a Docker container with the hope of > eventually running Spark on Kubernetes. Nearly

Re: Spark-YARN | Scheduling of containers

2019-05-20 Thread Hariharan
Akshay Bhardwaj > +91-97111-33849 > > > On Mon, May 20, 2019 at 1:29 PM Hariharan wrote: > >> Hi Akshay, >> >> I believe HDP uses the capacity scheduler by default. In the capacity >> scheduler, assignment of multiple containers on the same node is >> d

Re: Spark-YARN | Scheduling of containers

2019-05-20 Thread Hariharan
Hi Akshay, I believe HDP uses the capacity scheduler by default. In the capacity scheduler, assignment of multiple containers on the same node is determined by the option yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled, which is true by default. If you would like YARN to

Re: [spark on yarn] spark on yarn without DFS

2019-05-20 Thread Hariharan
Hi Huizhe, You can set the "fs.defaultFS" field in core-site.xml to some path on s3. That way your spark job will use S3 for all operations that need HDFS. Intermediate data will still be stored on local disk though. Thanks, Hari On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari wrote: > While

Re: Spark Profiler

2019-03-29 Thread Hariharan
Hi Jack, You can try sparklens (https://github.com/qubole/sparklens). I think it won't give details at as low a level as you're looking for, but it can help you identify and remove performance bottlenecks. ~ Hariharan On Fri, Mar 29, 2019 at 12:01 AM bo yang wrote: > Yeah, these opti