problem detecting FAILED state with SparkLauncher and SparkAppHandle

2017-01-12 Thread adam kramer
Hi All - I'm having an issue with detecting a failed Spark application state when using the startApplication method and SparkAppHandle with the SparkLauncher in Spark 2.0.1. Previous I had used a Java Process to waitFor it to return an non-zero exit code to detect failure which worked. But when

Re: Spark 2.0 with Hadoop 3.0?

2016-10-30 Thread adam kramer
anything from trunk to get it working. On Sat, Oct 29, 2016 at 6:08 AM, Steve Loughran <ste...@hortonworks.com> wrote: > > On 27 Oct 2016, at 23:04, adam kramer <ada...@gmail.com> wrote: > > Is the version of Spark built for Hadoop 2.7 and later only for 2.x > releas

Spark 2.0 with Hadoop 3.0?

2016-10-27 Thread adam kramer
Is the version of Spark built for Hadoop 2.7 and later only for 2.x releases? Is there any reason why Hadoop 3.0 is a non-starter for use with Spark 2.0? The version of aws-sdk in 3.0 actually works for DynamoDB which would resolve our driver dependency issues. Thanks, Adam

Re: Equivalent Parquet File Repartitioning Benefits for Join/Shuffle?

2016-10-20 Thread adam kramer
on the bucketed columns? On Tue, Oct 18, 2016 at 10:59 PM, adam kramer <ada...@gmail.com> wrote: > Hello All, > > I’m trying to improve join efficiency within (self-join) and across > data sets loaded from different parquet files primarily due to a > multi-stage data

Equivalent Parquet File Repartitioning Benefits for Join/Shuffle?

2016-10-19 Thread adam kramer
Hello All, I’m trying to improve join efficiency within (self-join) and across data sets loaded from different parquet files primarily due to a multi-stage data ingestion environment. Are there specific benefits to shuffling efficiency (e.g. no network transmission) if the parquet files are