Re: UDAFs for sketching Dataset columns with T-Digests

2017-07-06 Thread Sam Bessalah
This is interesting and very useful. Thanks. On Thu, Jul 6, 2017 at 2:33 AM, Erik Erlandson wrote: > After my talk on T-Digests in Spark at Spark Summit East, there were some > requests for a UDAF-based interface for working with Datasets. I'm > pleased to announce that I

Re: Spark join and large temp files

2016-08-09 Thread Sam Bessalah
Have you tried to broadcast your small table table in order to perform your join ? joined = bigDF.join(broadcast(smallDF, ) On Tue, Aug 9, 2016 at 3:29 PM, Ashic Mahtab wrote: > Hi Deepak, > No...not really. Upping the disk size is a solution, but more expensive as > you

Re: hdfs-ha on mesos - odd bug

2015-09-14 Thread Sam Bessalah
I don't know about the broken url. But are you running HDFS as a mesos framework? If so is it using mesos-dns? Then you should resolve the namenode via hdfs:/// On Mon, Sep 14, 2015 at 3:55 PM, Adrian Bridgett wrote: > I'm hitting an odd issue with running spark on

Re: Spark (Streaming?) holding on to Mesos resources

2015-01-27 Thread Sam Bessalah
Hi Geraard, isn't this the same issueas this? https://issues.apache.org/jira/browse/MESOS-1688 On Mon, Jan 26, 2015 at 9:17 PM, Gerard Maas gerard.m...@gmail.com wrote: Hi, We are observing with certain regularity that our Spark jobs, as Mesos framework, are hoarding resources and not

Re: Spark is slow

2014-04-21 Thread Sam Bessalah
Why don't start by explaining what kind of operation you're running on spark that's faster than hadoop mapred. Mybewe could start there. And yes this mailing is very busy since many people are getting into Spark, it's hard to answer to everyone. On 21 Apr 2014 20:23, Joe L selme...@yahoo.com

Re: [ann] Spark-NYC Meetup

2014-04-21 Thread Sam Bessalah
Sounds great François. On 21 Apr 2014 22:31, François Le Lay f...@spotify.com wrote: Hi everyone, This is a quick email to announce the creation of a Spark-NYC Meetup. We have 2 upcoming events, one at PlaceIQ, another at Spotify where Reynold Xin (Databricks) and Christopher Johnson