Kryo fails to serialise output

2015-07-03 Thread Dominik Hübner
I have a rather simple avro schema to serialize Tweets (message, username, timestamp). Kryo and twitter chill are used to do so. For my dev environment the Spark context is configured as below val conf: SparkConf = new SparkConf() conf.setAppName(kryo_test) conf.setMaster(“local[4])

Exchanging data between pyspark and scala

2014-09-03 Thread Dominik Hübner
Hey, I am about to implement a spark app which will require to use both, pyspark and spark on scala. Data should be read from AWS S3 (compressed CSV files), and must be pre-processed by an existing Python codebase. However, our final goal is to make those datasets available for Spark apps

python dependencies loaded but not on PYTHONPATH

2014-08-05 Thread Dominik Hübner
Hey, I just tried to submit a task to my spark cluster using the following command ./spark/bin/spark-submit --py-files file:///root/abc.zip --master spark://xxx.xxx.xxx.xxx:7077 test.py It seems like the dependency I’ve added gets loaded: 14/08/05 23:07:00 INFO spark.SparkContext: Added file