This is the command I am running:
spark-submit --deploy-mode cluster --master yarn --class com.myorg.myApp
s3://my-bucket/myapp-0.1.jar
On Wed, Mar 1, 2017 at 12:22 AM, Jonathan Kelly <jonathaka...@gmail.com>
wrote:
> Prithish,
>
> It would be helpful for you to share the spark
Thanks for your response Jonathan. Yes, this works. I also added another
way of achieving this to the Stackoverflow post. Thanks for the help.
On Tue, Feb 28, 2017 at 11:58 PM, Jonathan Kelly <jonathaka...@gmail.com>
wrote:
> Prithish,
>
> I saw you posted this on SO, so I respo
Dlog4j.configuration=/log4j-debugging.properties (maybe also try
> without the "/")
>
>
> On 26 Feb 2017, at 16:31, Prithish <prith...@gmail.com> wrote:
>
> Hoping someone can answer this.
>
> I am unable to override and use a Custom log4j.properties on Amazo
Hoping someone can answer this.
I am unable to override and use a Custom log4j.properties on Amazon EMR. I
am running Spark on EMR (Yarn) and have tried all the below combinations in
the Spark-Submit to try and use the custom log4j.
In Client mode
--driver-java-options
which are local, standalone, yarn
> and Mesos. Also, "blocks" is relative to hdfs, "partitions"
> is relative to spark.
>
> liangyihuai
>
> ---Original---
> *From:* "Jacek Laskowski "<ja...@japila.pl>
> *Date:* 2017/2/25 02:45:20
> *To:* &qu
Hello,
Had a question. When I look at the executors tab in Spark UI, I notice that
some RDD blocks are assigned to the driver as well. Can someone please tell me
why?
Thanks for the help.
at's the schema interpreted by spark?
> A compression logic of the spark caching depends on column types.
>
> // maropu
>
>
> On Wed, Nov 16, 2016 at 5:26 PM, Prithish <prith...@gmail.com> wrote:
>
>> Thanks for your response.
>>
>> I did some more tests an
be others here have more info to share.
>
>
>
> Regards,
>
> Shreya
>
>
>
> Sent from my Windows 10 phone
>
>
>
> *From: *Prithish <prith...@gmail.com>
> *Sent: *Tuesday, November 15, 2016 11:04 PM
> *To: *Shreya Agarwal <shrey...@microsoft.
Anyone?
On Tue, Nov 15, 2016 at 10:45 AM, Prithish <prith...@gmail.com> wrote:
> I am using 2.0.1 and databricks avro library 3.0.1. I am running this on
> the latest AWS EMR release.
>
> On Mon, Nov 14, 2016 at 3:06 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>
>
I am using 2.0.1 and databricks avro library 3.0.1. I am running this on
the latest AWS EMR release.
On Mon, Nov 14, 2016 at 3:06 PM, Jörn Franke <jornfra...@gmail.com> wrote:
> spark version? Are you using tungsten?
>
> > On 14 Nov 2016, at 10:05, Prithish <prit
Can someone please explain why this happens?
When I read a 600kb AVRO file and cache this in memory (using cacheTable),
it shows up as 11mb (storage tab in Spark UI). I have tried this with
different file sizes, and the size in-memory is always proportionate. I
thought Spark compresses when using
> How big are your avro files?We collapse many small files into a single
> partition to eliminate scheduler overhead.If you need explicit
> parallelism you can also repartition.
>
>
>
> On Thu, Oct 27, 2016 at 5:19 AM, Prithish <prith...@gmail.
I am trying to read a bunch of AVRO files from a S3 folder using Spark 2.0.
No matter how many executors I use or what configuration changes I make,
the cluster doesn't seem to use all the executors. I am using the
com.databricks.spark.avro library from databricks to read the AVRO.
However, if I
Hello,
I am trying to understand how in-memory size is changing in these
situations. Specifically, why is in-memory size much higher for avro and
parquet? Are there any optimizations necessary to reduce this?
Used cacheTable on each of these:
AVRO File (600kb) - In-memory size was 12mb
Parquet
14 matches
Mail list logo