Re: coalesce ending up very unbalanced - but why?

2016-12-14 Thread Adrian Bridgett
14/12/2016 13:58, Dirceu Semighini Filho wrote: Hi Adrian, Which kind of partitioning are you using? Have you already tried to coalesce it to a prime number? 2016-12-14 11:56 GMT-02:00 Adrian Bridgett <adr...@opensignal.com <mailto:adr...@opensignal.com>>: I realise that coale

coalesce ending up very unbalanced - but why?

2016-12-14 Thread Adrian Bridgett
I realise that coalesce() isn't guaranteed to be balanced and adding a repartition() does indeed fix this (at the cost of a large shuffle. I'm trying to understand _why_ it's so uneven (hopefully it helps someone else too). This is using spark v2.0.2 (pyspark). Essentially we're just

Re: mesos in spark 2.0.1 - must call stop() otherwise app hangs

2016-10-05 Thread Adrian Bridgett
Fab thanks all - I'll ensure we fix our code :-) On 05/10/2016 18:10, Sean Owen wrote: Being discussed as we speak at https://issues.apache.org/jira/browse/SPARK-17707 Calling stop() is definitely the right thing to do and always has been (see examples), but, may be possible to get rid of

Re: Issue with rogue data in csv file used in Spark application

2016-09-27 Thread Adrian Bridgett
We use the spark-csv (a successor of which is built in to spark 2.0) for this. It doesn't cause crashes, failed parsing is logged. We run on Mesos so I have to pull back all the logs from all the executors and search for failed lines (so that we can ensure that the failure rate isn't too

Re: very high maxresults setting (no collect())

2016-09-22 Thread Adrian Bridgett
Hi Michael, No spark upgrade, we've been changing some of our data pipelines so the data volumes have probably been getting a bit larger. Just in the last few weeks we've seen quite a few jobs needing a larger maxResultSize. Some jobs have gone from "fine with 1GB default" to 3GB.

very high maxresults setting (no collect())

2016-09-19 Thread Adrian Bridgett
Hi, We've recently started seeing a huge increase in spark.driver.maxResultSize - we are starting to set it at 3GB (and increase our driver memory a lot to 12GB or so). This is on v1.6.1 with Mesos scheduler. All the docs I can see is that this is to do with .collect() being called on a

2.0.1/2.1.x release dates

2016-08-18 Thread Adrian Bridgett
Just wondering if there were any rumoured release dates for either of the above. I'm seeing some odd hangs with 2.0.0 and mesos (and I know that the mesos integration has had a bit of updating in 2.1.x). Looking at JIRA, there's no suggested release date and issues seem to be added to a

coalesce serialising earlier work

2016-08-09 Thread Adrian Bridgett
ant the df to be calculated in parallel and then this is _then_ coalesced before being written. (It may be that the -getmerge approach will still be faster) df.coalesce(100).coalesce(1).write. doesn't look very likely to help! Adrian -- *Adrian Bridgett*

odd python.PythonRunner Times values?

2016-05-23 Thread Adrian Bridgett
I'm seeing output like this on our mesos spark slaves: 16/05/23 11:44:04 INFO python.PythonRunner: Times: total = 1137, boot = -590, init = 593, finish = 1134 16/05/23 11:44:04 INFO python.PythonRunner: Times: total = 1652, boot = -446, init = 481, finish = 1617 This seems to be coming from

Re: Worker's BlockManager Folder not getting cleared

2016-01-26 Thread Adrian Bridgett
t how to get rid of this and help on understanding this behaviour. Thanks !!! Abhi -- *Adrian Bridgett* | Sysadmin Engineer, OpenSignal <http://www.opensignal.com> _ Office: 3rd Floor, The Angel Office, 2 Angel Square, London, EC1V 1NY Pho

Re: Executor deregistered after 2mins (mesos, 1.6.0-rc4)

2015-12-30 Thread Adrian Bridgett
I thought the Driver used). Anyhow I'll do more testing and then raise a JIRA. Adrian -- *Adrian Bridgett* | Sysadmin Engineer, OpenSignal <http://www.opensignal.com> _ Office: First Floor, Scriptor Court, 155-157 Farringdon Road, Cle

Re: Executor deregistered after 2mins (mesos, 1.6.0-rc4)

2015-12-30 Thread Adrian Bridgett
to be the core issue. On 29/12/2015 21:17, Ted Yu wrote: Have you searched log for 'f02cb67a-3519-4655-b23a-edc0dd082bf1-S1/4' ? In the snippet you posted, I don't see registration of this Executor. Cheers On Tue, Dec 29, 2015 at 12:43 PM, Adrian Bridgett <adr...@opensignal.com <mail

Re: Executor deregistered after 2mins (mesos, 1.6.0-rc4)

2015-12-30 Thread Adrian Bridgett
To wrap this up, it's the shuffle manager sending the FIN so setting spark.shuffle.io.connectionTimeout to 3600s is the only workaround right now. SPARK-12583 raised. Adrian -- *Adrian Bridgett*

Executor deregistered after 2mins (mesos, 1.6.0-rc4)

2015-12-29 Thread Adrian Bridgett
ry setting that on the shuffle service): spark.network.timeout 180s spark.shuffle.io.connectionTimeout 240s Adrian -- *Adrian Bridgett*

Re: default parallelism and mesos executors

2015-12-15 Thread Adrian Bridgett
Thanks Iulian, I'll retest with 1.6.x once it's released (probably won't have enough spare time to test with the RC). On 11/12/2015 15:00, Iulian DragoČ™ wrote: On Wed, Dec 9, 2015 at 4:29 PM, Adrian Bridgett <adr...@opensignal.com <mailto:adr...@opensignal.com>> wrote:

default parallelism and mesos executors

2015-12-09 Thread Adrian Bridgett
.ec2.internal:41194/user/Executor#-1021429650]) with ID 20151117-115458-164233482-5050-24333-S22/5 15/12/02 14:34:15 INFO spark.ExecutorAllocationManager: New executor 20151117-115458-164233482-5050-24333-S22/5 has registered (new total is 1) .... >&

default parallelism and mesos executors

2015-12-02 Thread Adrian Bridgett
115458-164233482-5050-24333-S22/5 15/12/02 14:34:15 INFO spark.ExecutorAllocationManager: New executor 20151117-115458-164233482-5050-24333-S22/5 has registered (new total is 1) >>> print (sc.defaultParallelism) 42 -- *Adrian Bridg

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Adrian Bridgett
via hdfs:/// On Mon, Sep 14, 2015 at 3:55 PM, Adrian Bridgett <adr...@opensignal.com <mailto:adr...@opensignal.com>> wrote: I'm hitting an odd issue with running spark on mesos together with HA-HDFS, with an even odder workaround. In particular I get an error that i

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Adrian Bridgett
(or my spark config). http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html#Configuration_details On 15/09/2015 10:24, Steve Loughran wrote: On 15 Sep 2015, at 08:55, Adrian Bridgett <adr...@opensignal.com> wrote: Hi Sam, in short, no

hdfs-ha on mesos - odd bug

2015-09-14 Thread Adrian Bridgett
2055067800) connection to mesos-1.example.com/10.1.200.165:8020 from ubuntu sending #0 15/09/14 13:47:18 DEBUG Client: IPC Client (2055067800) connection to mesos-1.example.com/10.1.200.165:8020 from ubuntu got value #0 15/09/14 13:47:18 DEBUG ProtobufRpcEngine: Call: getBlockLocations to

JNI issues with mesos

2015-09-09 Thread Adrian Bridgett
I'm trying to run spark (1.4.1) on top of mesos (0.23). I've followed the instructions (uploaded spark tarball to HDFS, set executor uri in both places etc) and yet on the slaves it's failing to lauch even the SparkPi example with a JNI error. It does run with a local master. A day of

Re: JNI issues with mesos

2015-09-09 Thread Adrian Bridgett
spark15.tgz to spark-1.5.0-bin-os1.tgz... Success!!! The same trick with 1.4 doesn't work, but now that I have something that does I can make progress. Hopefully this helps someone else :-) Adrian On 09/09/2015 16:59, Adrian Bridgett wrote: I'm trying to run spark (1.4.1) on top of mesos (0.23

Re: JNI issues with mesos

2015-09-09 Thread Adrian Bridgett
, Sep 9, 2015 at 8:18 AM, Adrian Bridgett <adr...@opensignal.com <mailto:adr...@opensignal.com>> wrote: 5mins later... Trying 1.5 with a fairly plain build: ./make-distribution.sh --tgz --name os1 -Phadoop-2.6 and on my first attempt stderr showed: I0909 15:16:49.