Re: Spark job fails on cluster but works fine on a single machine

2015-02-20 Thread Pavel Velikhov
I definitely delete the file on the right HDFS, I only have one HDFS instance. The problem seems to be in the CassandraRDD - reading always fails in some way when run on the cluster, but single-machine reads are okay. On Feb 20, 2015, at 4:20 AM, Ilya Ganelin ilgan...@gmail.com wrote: The

Re: Spark job fails on cluster but works fine on a single machine

2015-02-19 Thread Pavel Velikhov
Yeah, I do manually delete the files, but it still fails with this error. On Feb 19, 2015, at 8:16 PM, Ganelin, Ilya ilya.gane...@capitalone.com wrote: When writing to hdfs Spark will not overwrite existing files or directories. You must either manually delete these or use Java's Hadoop

Re: Spark job fails on cluster but works fine on a single machine

2015-02-19 Thread Pavel Velikhov
On Feb 19, 2015, at 7:29 PM, Pavel Velikhov pavel.velik...@icloud.com wrote: I have a simple Spark job that goes out to Cassandra, runs a pipe and stores results: val sc = new SparkContext(conf) val rdd = sc.cassandraTable(“keyspace, “table) .map(r = r.getInt(“column) + \t +

RE: Spark job fails on cluster but works fine on a single machine

2015-02-19 Thread Ganelin, Ilya
When writing to hdfs Spark will not overwrite existing files or directories. You must either manually delete these or use Java's Hadoop FileSystem class to remove them. Sent with Good (www.good.com) -Original Message- From: Pavel Velikhov

Re: Spark job fails on cluster but works fine on a single machine

2015-02-19 Thread Ilya Ganelin
The stupid question is whether you're deleting the file from hdfs on the right node? On Thu, Feb 19, 2015 at 11:31 AM Pavel Velikhov pavel.velik...@gmail.com wrote: Yeah, I do manually delete the files, but it still fails with this error. On Feb 19, 2015, at 8:16 PM, Ganelin, Ilya