from:"Hariharan"

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-18 Thread Hariharan

ting "export > CLASSPATH" - you can double check that your jar looks to be included > correctly there. If it is I think you have a really "interesting" issue on > your hands! > > - scrypso > > On Wed, Dec 14, 2022, 05:17 Hariharan wrote: > >> Hi

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan

t up? (Wild > guess, no idea if that works or how hard it would be.) > > > On Tue, Dec 13, 2022, 17:29 Hariharan wrote: > >> Thanks for the response, scrypso! I will try adding the extraClassPath >> option. Meanwhile, please find the full stack t

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan

mmediately tell how your error might arise, unless there is some > timing issue with the Spark and Hadoop setup. Can you share the full > stacktrace of the ClassNotFound exception? That might tell us when Hadoop > is looking up this class. > > Good luck! > - scrypso > > >

Re: Spark-on-Yarn ClassNotFound Exception

2022-12-13 Thread Hariharan

Missed to mention it above, but just to add, the error is coming from the driver. I tried using *--driver-class-path /path/to/my/jar* as well, but no luck. Thanks! On Mon, Dec 12, 2022 at 4:21 PM Hariharan wrote: > Hello folks, > > I have a spark app with a custom imple

Spark-on-Yarn ClassNotFound Exception

2022-12-12 Thread Hariharan

Hello folks, I have a spark app with a custom implementation of *fs.s3a.s3.client.factory.impl* which is packaged into the same jar. Output of *jar tf* *2620 Mon Dec 12 11:23:00 IST 2022 aws/utils/MyS3ClientFactory.class* However when I run the my spark app with spark-submit in cluster mode, it

Re: Automated setup of a multi-node cluster for Apache Spark

2021-04-10 Thread Hariharan

/blob/master/flintrock/core.py#L47>. Thanks, Hariharan On Sat, Apr 10, 2021 at 9:00 AM Dhruv Kumar wrote: > Hello > > I am new to Apache Spark and am looking for some close guidance or > collaboration for my Spark Project which has the following main components: > > 1. Wr

Re: Spark performance over S3

2021-04-07 Thread Hariharan

ons from Cloudera <https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_cloud-data-access/content/s3-performance.html> for optimal use of S3A. Thanks, Hariharan On Wed, Apr 7, 2021 at 12:15 AM Tzahi File wrote: > Hi All, > > We have a spark cluster on aws ec2 that has 60 X i3.4xlarge. &g

Re: Write pyspark dataframe into kms encrypted s3 bucket

2020-10-15 Thread Hariharan

rowing the error. ~ Hariharan On Thu, Oct 15, 2020 at 8:56 PM Devi P V wrote: > hadoop_conf.set("fs.s3a.multipart.size", 104857600L) > > .set only allows string values. Its throwing invalid syntax. > > I tried following also. But issue not fixed. > > hadoop_conf.setLo

Re: Write pyspark dataframe into kms encrypted s3 bucket

2020-10-15 Thread Hariharan

fs.s3a.multipart.size needs to be a long value, not a string, so you will need to use hadoop_conf.set("fs.s3a.multipart.size", 104857600L) ~ Hariharan On Thu, Oct 15, 2020 at 6:32 PM Devi P V wrote: > > Hi All, > > I am trying to write a pyspark dataframe into KMS en

Re: Can't get Spark to interface with S3A Filesystem with correct credentials

2020-03-04 Thread Hariharan

=org.apache.hadoop.fs.s3a.S3A Hadoop 2.8 and above would have these set by default. Thanks, Hariharan On Thu, Mar 5, 2020 at 2:41 AM Devin Boyer wrote: > > Hello, > > I'm attempting to run Spark within a Docker container with the hope of > eventually running Spark on Kubernetes. Nearly

Re: Spark-YARN | Scheduling of containers

2019-05-20 Thread Hariharan

Akshay Bhardwaj > +91-97111-33849 > > > On Mon, May 20, 2019 at 1:29 PM Hariharan wrote: > >> Hi Akshay, >> >> I believe HDP uses the capacity scheduler by default. In the capacity >> scheduler, assignment of multiple containers on the same node is >> d

Re: Spark-YARN | Scheduling of containers

2019-05-20 Thread Hariharan

Hi Akshay, I believe HDP uses the capacity scheduler by default. In the capacity scheduler, assignment of multiple containers on the same node is determined by the option yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled, which is true by default. If you would like YARN to

Re: [spark on yarn] spark on yarn without DFS

2019-05-20 Thread Hariharan

Hi Huizhe, You can set the "fs.defaultFS" field in core-site.xml to some path on s3. That way your spark job will use S3 for all operations that need HDFS. Intermediate data will still be stored on local disk though. Thanks, Hari On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari wrote: > While

Re: Spark Profiler

2019-03-29 Thread Hariharan

Hi Jack, You can try sparklens (https://github.com/qubole/sparklens). I think it won't give details at as low a level as you're looking for, but it can help you identify and remove performance bottlenecks. ~ Hariharan On Fri, Mar 29, 2019 at 12:01 AM bo yang wrote: > Yeah, these opti

Re: Spark-on-Yarn ClassNotFound Exception

Re: Spark-on-Yarn ClassNotFound Exception

Re: Spark-on-Yarn ClassNotFound Exception

Re: Spark-on-Yarn ClassNotFound Exception

Spark-on-Yarn ClassNotFound Exception

Re: Automated setup of a multi-node cluster for Apache Spark

Re: Spark performance over S3

Re: Write pyspark dataframe into kms encrypted s3 bucket

Re: Write pyspark dataframe into kms encrypted s3 bucket

Re: Can't get Spark to interface with S3A Filesystem with correct credentials

Re: Spark-YARN | Scheduling of containers

Re: Spark-YARN | Scheduling of containers

Re: [spark on yarn] spark on yarn without DFS

Re: Spark Profiler

14 matches

Site Navigation

Mail list logo

Footer information