[General Question] [Hadoop + Spark at scale] Spark Rack Awareness ?

2015-07-18 Thread Mike Frampton
I wanted to ask a general question about Hadoop/Yarn and Apache Spark integration. I know that Hadoop on a physical cluster has rack awareness. i.e. It attempts to minimise network traffic by saving replicated blocks within a rack. i.e. I wondered whether, when Spark is configured to use

[Spark 1.3.1] Spark HiveQL - CDH 5.3 Hive 0.13 UDF's

2015-06-26 Thread Mike Frampton
Hi I have a five node CDH 5.3 cluster running on CentOS 6.5, I also have a separate install of Spark 1.3.1. ( The CDH 5.3 install has Spark 1.2 but I wanted a newer version. ) I managed to write some Scala based code using a Hive Context to connect to Hive and create/populate tables etc.

[Spark 1.3.1 SQL] Using Hive

2015-06-21 Thread Mike Frampton
Hi Is it true that if I want to use Spark SQL ( for Spark 1.3.1 ) against Apache Hive I need to build a source version of Spark ? Im using CDH 5.3 on CentOS Linux 6.5 which uses Hive 0.13.0 ( I think ). cheers Mike F

spark stream twitter question ..

2015-06-13 Thread Mike Frampton
Hi I have a question about Spark Twitter stream processing in Spark 1.3.1, the code sample below just opens up a twitter stream, uses auth keys, splits out has tags and creates a temp table. However, when I try to compile it using sbt ( CentOS 6.5) I get the error [error]

Spark sql and csv data processing question

2015-05-15 Thread Mike Frampton
Hi Im getting the following error when trying to process a csv based data file. Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 10.0 failed 4 times, most recent failure: Lost task 1.3 in stage 10.0 (TID 262,