Hi all,
I'm seeing some curious behavior which I have a hard time interpreting. I
have a job which does a "groupByKey" and results in 300 executors. 299 are
run in NODE_LOCAL mode. 1 executor is run in PROCESS_LOCAL mode.
The 1 executor that runs in PROCESS_LOCAL mode gets about 10x as much
Thanks Takeshi, that's exactly what I was looking for.
On Fri, Feb 5, 2016 at 12:32 PM, Takeshi Yamamuro <linguin@gmail.com>
wrote:
> How about using `spark.jars` to send jars into a cluster?
>
> On Sat, Feb 6, 2016 at 12:00 AM, Matt K <matvey1...@gmail.com> wrote:
&
reports metrics of each Executor?
>
> Thanks
>
> On 3 February 2016 at 15:56, Matt K <matvey1...@gmail.com> wrote:
>
>> Thanks for sharing Yiannis, looks very promising!
>>
>> Do you know if I can package a custom class with my application, or does
>> it
Hi guys,
I'm looking to create a custom sync based on Spark's Metrics System:
https://github.com/apache/spark/blob/9f603fce78fcc997926e9a72dec44d48cbc396fc/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
If I want to collect metrics from the Driver, Master, and Executor nodes,
deeper look to
> it: https://github.com/ibm-research-ireland/sparkoscope
>
> Thanks,
> Yiannis
>
> On 3 February 2016 at 13:32, Matt K <matvey1...@gmail.com> wrote:
>
>> Hi guys,
>>
>> I'm looking to create a custom sync bas
onCols: _*)
> .mode(saveMode)
> .save(targetPath)
>
> In 1.5, we've disabled schema merging by default.
>
> Cheng
>
>
> On 12/11/15 5:33 AM, Matt K wrote:
>
> Hi all,
>
> I have a process that's continuously saving data as Parquet w
Just want to add - I'm looking to partition the resulting Parquet files by
customer-id, which is why I'm looking to extract the customer-id from the
path.
On Tue, Sep 1, 2015 at 7:00 PM, Matt K <matvey1...@gmail.com> wrote:
> Hi all,
>
> TL;DR - is there a way to extract the s
Hi all,
TL;DR - is there a way to extract the source path from an RDD via the Scala
API?
I have sequence files on S3 that look something like this:
s3://data/customer=123/...
s3://data/customer=456/...
I am using Spark Dataframes to convert these sequence files to Parquet. As
part of the