Re: How to add jars to standalone pyspark program

2015-05-06 Thread mj
Thank you for your response, however, I'm afraid I still can't get it to work, this is my code: jar_path = '/home/mj/apps/spark_jars/spark-csv_2.11-1.0.3.jar' spark_config = SparkConf().setMaster('local').setAppName('data_frame_test').set(spark.jars, jar_path) sc = SparkContext(conf

Re: How to add jars to standalone pyspark program

2015-05-06 Thread mj
I've worked around this by dropping the jars into a directory (spark_jars) and then creating a spark-defaults.conf file in conf containing this: spark.driver.extraClassPath/home/mj/apps/spark_jars/* -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com

How to add jars to standalone pyspark program

2015-04-28 Thread mj
Hi, I'm trying to figure out how to use a third party jar inside a python program which I'm running via PyCharm in order to debug it. I am normally able to run spark code in python such as this: spark_conf = SparkConf().setMaster('local').setAppName('test') sc =

Change ivy cache for spark on Windows

2015-04-27 Thread mj
Hi, I'm having trouble using the --packages option for spark-shell.cmd - I have to use Windows at work and have been issued a username with a space in it that means when I use the --packages option it fails with this message: Exception in thread main java.net.URISyntaxException: Illegal

pyspark 1.1.1 on windows saveAsTextFile - NullPointerException

2014-12-18 Thread mj
Hi, I'm trying to use pyspark to save a simple rdd to a text file (code below), but it keeps throwing an error. - Python Code - items=[Hello, world] items2 = sc.parallelize(items) items2.coalesce(1).saveAsTextFile('c:/tmp/python_out.csv') - Error

Pyspark 1.1.1 error with large number of records - serializer.dump_stream(func(split_index, iterator), outfile)

2014-12-16 Thread mj
PySpark via PyCharm and the information for my environment is: OS: Windows 7 Python version: 2.7.9 Spark version: 1.1.1 Java version: 1.8 I've also included the py file I am using. I'd appreciate any help you can give me, MJ. ERROR MESSAGE C

Re: Appending an incrental value to each RDD record

2014-12-16 Thread mj
You could try using zipWIthIndex (links below to API docs). For example, in python: items =['a','b','c'] items2= sc.parallelize(items) print(items2.first()) items3=items2.map(lambda x: (x, x+!)) print(items3.first()) items4=items3.zipWithIndex() print(items4.first())