Hi All -
I'm having an issue with detecting a failed Spark application state when
using the startApplication method and SparkAppHandle with the SparkLauncher
in Spark 2.0.1.
Previous I had used a Java Process to waitFor it to return an non-zero exit
code to detect failure which worked. But when
anything from trunk to get it
working.
On Sat, Oct 29, 2016 at 6:08 AM, Steve Loughran <ste...@hortonworks.com> wrote:
>
> On 27 Oct 2016, at 23:04, adam kramer <ada...@gmail.com> wrote:
>
> Is the version of Spark built for Hadoop 2.7 and later only for 2.x
> releas
Is the version of Spark built for Hadoop 2.7 and later only for 2.x releases?
Is there any reason why Hadoop 3.0 is a non-starter for use with Spark
2.0? The version of aws-sdk in 3.0 actually works for DynamoDB which
would resolve our driver dependency issues.
Thanks,
Adam
on the bucketed columns?
On Tue, Oct 18, 2016 at 10:59 PM, adam kramer <ada...@gmail.com> wrote:
> Hello All,
>
> I’m trying to improve join efficiency within (self-join) and across
> data sets loaded from different parquet files primarily due to a
> multi-stage data
Hello All,
I’m trying to improve join efficiency within (self-join) and across
data sets loaded from different parquet files primarily due to a
multi-stage data ingestion environment.
Are there specific benefits to shuffling efficiency (e.g. no network
transmission) if the parquet files are