Re: Running Spark on EMR

2017-01-15 Thread Andrew Holway
use yarn :) "spark-submit --master yarn" On Sun, Jan 15, 2017 at 7:55 PM, Darren Govoni <dar...@ontrenet.com> wrote: > So what was the answer? > > > > Sent from my Verizon, Samsung Galaxy smartphone > > ---- Original message > From: Andre

Re: Running Spark on EMR

2017-01-15 Thread Andrew Holway
Darn. I didn't respond to the list. Sorry. On Sun, Jan 15, 2017 at 5:29 PM, Marco Mistroni wrote: > thanks Neil. I followed original suggestion from Andrw and everything is > working fine now > kr > > On Sun, Jan 15, 2017 at 4:27 PM, Neil Jonkers

python environments with "local" and "yarn-client" - Boto failing on HDP2.5

2016-11-29 Thread Andrew Holway
Hey, I am making some calls with Boto3 in my pyspark which is working fine in master=local mode but when I switch to master=yarn I am getting "NoCredentialsError: Unable to locate credentials" which is a bit annoying as I cannot work out why! I have been running this application fine on Mesos

Re: createDataFrame causing a strange error.

2016-11-29 Thread Andrew Holway
gType) > .add("timezone", StringType).add("day", StringType) > .add("minute", StringType) > > val jsonContentWithSchema = sqlContext.jsonRDD(jsonRdd, schema) > println(s"- And the Json withSchema ha

Re: createDataFrame causing a strange error.

2016-11-28 Thread Andrew Holway
; extra complexity which you dont need > > If you send a snippet ofyour json content, then everyone on the list can > run the code and try to reproduce > > > hth > > Marco > > > On 27 Nov 2016 7:33 pm, "Andrew Holway" <andrew.hol...@otternetworks.de

Re: createDataFrame causing a strange error.

2016-11-27 Thread Andrew Holway
.protocol.Py4JError: An error occurred while calling o33.__getnewargs__. Trace: py4j.Py4JException: Method __getnewargs__([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Ga

createDataFrame causing a strange error.

2016-11-27 Thread Andrew Holway
Hi, Can anyone tell me what is causing this error Spark 2.0.0 Python 2.7.5 df = sqlContext.createDataFrame(foo, schema) https://gist.github.com/mooperd/368e3453c29694c8b2c038d6b7b4413a Traceback (most recent call last): File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py",

javac - No such file or directory

2016-11-09 Thread Andrew Holway
I'm getting this error trying to build spark on Centos7. It is not googling very well: [error] (tags/compile:compileIncremental) java.io.IOException: Cannot run program "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64/bin/javac" (in directory "/home/spark/spark"): error=2, No such

Re: Save a spark RDD to disk

2016-11-08 Thread Andrew Holway
Thats around 750MB/s which seems quite respectable even in this day and age! How many and what kind of disks to you have attached to your nodes? What are you expecting? On Tue, Nov 8, 2016 at 11:08 PM, Elf Of Lothlorein wrote: > Hi > I am trying to save a RDD to disk and

Live data visualisations with Spark

2016-11-08 Thread Andrew Holway
. Is this something that could be accomplished with shiny server for instance? Thanks, Andrew Holway

Re: sanboxing spark executors

2016-11-04 Thread Andrew Holway
I think running it on a Mesos cluster could give you better control over this kinda stuff. On Fri, Nov 4, 2016 at 7:41 AM, blazespinnaker wrote: > Is there a good method / discussion / documentation on how to sandbox a > spark > executor? Assume the code is

Re: Python - Spark Cassandra Connector on DC/OS

2016-11-01 Thread Andrew Holway
Sorry: Spark 2.0.0 On Tue, Nov 1, 2016 at 10:04 AM, Andrew Holway < andrew.hol...@otternetworks.de> wrote: > Hello, > > I've been getting pretty serious with DC/OS which I guess could be > described as a somewhat polished distribution of Mesos. I'm not sure ho

Python - Spark Cassandra Connector on DC/OS

2016-11-01 Thread Andrew Holway
Hello, I've been getting pretty serious with DC/OS which I guess could be described as a somewhat polished distribution of Mesos. I'm not sure how relevant DC/OS is to this problem. I am using this pyspark program to test the cassandra connection: http://bit.ly/2eWAfxm (github) I can that the

ERROR SparkContext: Error initializing SparkContext.

2016-05-09 Thread Andrew Holway
Hi, I am having a hard time getting to the bottom of this problem. I'm really not sure where to start with it. Everything works fine in local mode. Cheers, Andrew [testing@instance-16826 ~]$ /opt/mapr/spark/spark-1.5.2/bin/spark-submit --num-executors 21 --executor-cores 5 --master yarn-client

[OT] Apache Spark Jobs in Kochi, India

2016-02-11 Thread Andrew Holway
Hello, I'm not sure how appropriate job postings are to a user group. We're getting deep into spark and are looking for some talent in our Kochi office. http://bit.ly/Spark-Eng - Apache Spark Engineer / Architect - Kochi http://bit.ly/Spark-Dev - Lead Apache Spark Developer - Kochi Sorry for

Writing to jdbc database from SparkR (1.5.2)

2016-02-06 Thread Andrew Holway
I'm managing to read data via JDBC using the following but I can't work out how to write something back to the Database. df <- read.df(sqlContext, source="jdbc", url="jdbc:mysql://hostname:3306?user=user=pass", dbtable="database.table") Does this functionality exist in 1.5.2? Thanks,

Re: Writing to jdbc database from SparkR (1.5.2)

2016-02-06 Thread Andrew Holway
> > df <- read.df(sqlContext, source="jdbc", > url="jdbc:mysql://hostname:3306?user=user=pass", > dbtable="database.table") > I got a bit further but am now getting the following error. This error is being thrown without the database being touched. I tested this by making the database

Concatenating tables

2016-01-23 Thread Andrew Holway
Is there a data frame operation to do this? +-+ | A B C D | +-+ | 1 2 3 4 | | 5 6 7 8 | +-+ +-+ | A B C D | +-+ | 3 5 6 8 | | 0 0 0 0 | +-+ +-+ | A B C D | +-+ | 8 8 8 8 | | 1 1 1 1 | +-+ Concatenated together to make this.

Date / time stuff with spark.

2016-01-21 Thread Andrew Holway
Hello, I am importing this data from HDFS into a data frame with sqlContext.read.json(). {“a": 42, “a": 56, "Id": "621368e2f829f230", “smunkId": "CKm26sDMucoCFReRGwodbHAAgw", “popsicleRange": "17610", "time": "2016-01-20T23:59:53+00:00”} I want to do some date/time operations on this json data

Re: Date / time stuff with spark.

2016-01-21 Thread Andrew Holway
P.S. We are working with Python. On Thu, Jan 21, 2016 at 8:24 PM, Andrew Holway <andrew.hol...@otternetworks.de> wrote: > Hello, > > I am importing this data from HDFS into a data frame with > sqlContext.read.json(). > > {“a": 42, “a": 56,