Re: RDD blocks on Spark Driver

2017-02-28 Thread Prithish
This is the command I am running: spark-submit --deploy-mode cluster --master yarn --class com.myorg.myApp s3://my-bucket/myapp-0.1.jar On Wed, Mar 1, 2017 at 12:22 AM, Jonathan Kelly <jonathaka...@gmail.com> wrote: > Prithish, > > It would be helpful for you to share the spark

Re: Custom log4j.properties on AWS EMR

2017-02-28 Thread Prithish
Thanks for your response Jonathan. Yes, this works. I also added another way of achieving this to the Stackoverflow post. Thanks for the help. On Tue, Feb 28, 2017 at 11:58 PM, Jonathan Kelly <jonathaka...@gmail.com> wrote: > Prithish, > > I saw you posted this on SO, so I respo

Re: Custom log4j.properties on AWS EMR

2017-02-26 Thread Prithish
Dlog4j.configuration=/log4j-debugging.properties (maybe also try > without the "/") > > > On 26 Feb 2017, at 16:31, Prithish <prith...@gmail.com> wrote: > > Hoping someone can answer this. > > I am unable to override and use a Custom log4j.properties on Amazo

Custom log4j.properties on AWS EMR

2017-02-26 Thread Prithish
Hoping someone can answer this. I am unable to override and use a Custom log4j.properties on Amazon EMR. I am running Spark on EMR (Yarn) and have tried all the below combinations in the Spark-Submit to try and use the custom log4j. In Client mode --driver-java-options

Re: RDD blocks on Spark Driver

2017-02-26 Thread Prithish
which are local, standalone, yarn > and Mesos. Also, "blocks" is relative to hdfs, "partitions" > is relative to spark. > > liangyihuai > > ---Original--- > *From:* "Jacek Laskowski "<ja...@japila.pl> > *Date:* 2017/2/25 02:45:20 > *To:* &qu

RDD blocks on Spark Driver

2017-02-22 Thread prithish
Hello, Had a question. When I look at the executors tab in Spark UI, I notice that some RDD blocks are assigned to the driver as well. Can someone please tell me why? Thanks for the help.

Re: AVRO File size when caching in-memory

2016-11-16 Thread Prithish
at's the schema interpreted by spark? > A compression logic of the spark caching depends on column types. > > // maropu > > > On Wed, Nov 16, 2016 at 5:26 PM, Prithish <prith...@gmail.com> wrote: > >> Thanks for your response. >> >> I did some more tests an

Re: AVRO File size when caching in-memory

2016-11-16 Thread Prithish
be others here have more info to share. > > > > Regards, > > Shreya > > > > Sent from my Windows 10 phone > > > > *From: *Prithish <prith...@gmail.com> > *Sent: *Tuesday, November 15, 2016 11:04 PM > *To: *Shreya Agarwal <shrey...@microsoft.

Re: AVRO File size when caching in-memory

2016-11-15 Thread Prithish
Anyone? On Tue, Nov 15, 2016 at 10:45 AM, Prithish <prith...@gmail.com> wrote: > I am using 2.0.1 and databricks avro library 3.0.1. I am running this on > the latest AWS EMR release. > > On Mon, Nov 14, 2016 at 3:06 PM, Jörn Franke <jornfra...@gmail.com> wrote: > >

Re: AVRO File size when caching in-memory

2016-11-14 Thread Prithish
I am using 2.0.1 and databricks avro library 3.0.1. I am running this on the latest AWS EMR release. On Mon, Nov 14, 2016 at 3:06 PM, Jörn Franke <jornfra...@gmail.com> wrote: > spark version? Are you using tungsten? > > > On 14 Nov 2016, at 10:05, Prithish <prit

AVRO File size when caching in-memory

2016-11-14 Thread Prithish
Can someone please explain why this happens? When I read a 600kb AVRO file and cache this in memory (using cacheTable), it shows up as 11mb (storage tab in Spark UI). I have tried this with different file sizes, and the size in-memory is always proportionate. I thought Spark compresses when using

Re: Reading AVRO from S3 - No parallelism

2016-10-27 Thread prithish
> How big are your avro files?We collapse many small files into a single > partition to eliminate scheduler overhead.If you need explicit > parallelism you can also repartition. > > > > On Thu, Oct 27, 2016 at 5:19 AM, Prithish <prith...@gmail.

Reading AVRO from S3 - No parallelism

2016-10-27 Thread Prithish
I am trying to read a bunch of AVRO files from a S3 folder using Spark 2.0. No matter how many executors I use or what configuration changes I make, the cluster doesn't seem to use all the executors. I am using the com.databricks.spark.avro library from databricks to read the AVRO. However, if I

Question about In-Memory size (cache / cacheTable)

2016-10-26 Thread Prithish
Hello, I am trying to understand how in-memory size is changing in these situations. Specifically, why is in-memory size much higher for avro and parquet? Are there any optimizations necessary to reduce this? Used cacheTable on each of these: AVRO File (600kb) - In-memory size was 12mb Parquet