Spark on Gordon

2015-01-31 Thread Deep Pradhan
Hi All, Gordon SC has Spark installed in it. Has anyone tried to run Spark jobs on Gordon? Thank You

Re: spark-shell working in scala-2.11 (breaking change?)

2015-01-31 Thread Ted Yu
Looking at https://github.com/apache/spark/pull/1222/files , the following change may have caused what Stephen described: + if (!fileSystem.isDirectory(new Path(logBaseDir))) { When there is no schema associated with logBaseDir, local path should be assumed. On Fri, Jan 30, 2015 at 8:37 AM,

Re: Error when running spark in debug mode

2015-01-31 Thread Arush Kharbanda
Hi Ankur, Its running fine for me for spark 1.1 and changes to log4j properties file. Thanks Arush On Fri, Jan 30, 2015 at 9:49 PM, Ankur Srivastava ankur.srivast...@gmail.com wrote: Hi Arush I have configured log4j by updating the file log4j.properties in SPARK_HOME/conf folder. If it

Re: Error when running spark in debug mode

2015-01-31 Thread Arush Kharbanda
Can you share your log4j file. On Sat, Jan 31, 2015 at 1:35 PM, Arush Kharbanda ar...@sigmoidanalytics.com wrote: Hi Ankur, Its running fine for me for spark 1.1 and changes to log4j properties file. Thanks Arush On Fri, Jan 30, 2015 at 9:49 PM, Ankur Srivastava

Re: Spark streaming - tracking/deleting processed files

2015-01-31 Thread Akhil Das
This might not be a straight forward approach, but one way would be to use the *PairRDDFunctions* and then you have a few methods to access the partitions and the filenames from the partitions. And once you have the filename, you can delete it after your operations. Not sure if spark updated the

Re: Spark streaming - tracking/deleting processed files

2015-01-31 Thread Arush Kharbanda
Hi Ganterm, Thats obvious. If you look at the documentation for textFileStream. Create a input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). Files must be written to

Re: Error Compiling

2015-01-31 Thread Sean Owen
I assume you're trying to sum *by key* within each window. The _ + _ operation applies to integers, but here you're telling it to sum (String,Int) pairs, which isn't defined. Use reduceByKeyAndWindow On Sat, Jan 31, 2015 at 12:00 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi Guys,

Re: spark-shell working in scala-2.11 (breaking change?)

2015-01-31 Thread Stephen Haberman
Looking at https://github.com/apache/spark/pull/1222/files , the following change may have caused what Stephen described: + if (!fileSystem.isDirectory(new Path(logBaseDir))) { When there is no schema associated with logBaseDir, local path should be assumed. Yes, that looks right. In

Re: spark-shell working in scala-2.11 (breaking change?)

2015-01-31 Thread Ted Yu
Understood. However the previous default was local directory. Now user has to specify file:// scheme. Maybe add release note to SPARK-2261 ? Cheers On Sat, Jan 31, 2015 at 8:40 AM, Sean Owen so...@cloudera.com wrote: This might have been on purpose, since the goal is to make this

Re: spark-shell working in scala-2.11 (breaking change?)

2015-01-31 Thread Sean Owen
This might have been on purpose, since the goal is to make this HDFS-friendly, and of course still allow local directories. With no scheme, a path is ambiguous. On Sat, Jan 31, 2015 at 4:18 PM, Ted Yu yuzhih...@gmail.com wrote: Looking at https://github.com/apache/spark/pull/1222/files , the

spark on yarn succeeds but exit code 1 in logs

2015-01-31 Thread Koert Kuipers
i have a simple spark app that i run with spark-submit on yarn. it runs fine and shows up with finalStatus=SUCCEEDED in the resource manager logs. however in the nodemanager logs i see this: 2015-01-31 18:30:48,195 INFO

Re: spark on yarn succeeds but exit code 1 in logs

2015-01-31 Thread Ted Yu
Can you look inside RM log to see if you can find some clue there ? You can pastebin part of the RM log around the time your job ran ? What hadoop version are you using ? Thanks On Sat, Jan 31, 2015 at 11:24 AM, Koert Kuipers ko...@tresata.com wrote: i have a simple spark app that i run with

Re: spark on yarn succeeds but exit code 1 in logs

2015-01-31 Thread Koert Kuipers
it is CDH 5.3 with the spark that ships with it. i went through the RM logs line by line, and i found the exit code in there: container_1422728945460_0001_01_29 Container Transitioned from NEW to RESERVED 2015-01-31 18:30:49,633 INFO

Re: [hive context] Unable to query array once saved as parquet

2015-01-31 Thread Ayoub
Hello, as asked, I just filled this JIRA issue https://issues.apache.org/jira/browse/SPARK-5508. I will add an other similar code example which lead to GenericRow cannot be cast to org.apache.spark.sql.catalyst.expressions.SpecificMutableRow Exception. Best, Ayoub. 2015-01-31 4:05 GMT+01:00

Re: RE: Can't access remote Hive table from spark

2015-01-31 Thread guxiaobo1982
Hi Skanda, How do set up your SPARK_CLASSPATH? I add the following line to my SPARK_HOME/conf/spark-env.sh , and still got the same error. export SPARK_CLASSPATH=${SPARK_CLASSPATH}:/etc/hive/conf -- Original -- From: Skanda

Re: RE: Can't access remote Hive table from spark

2015-01-31 Thread guxiaobo1982
The following line does not work too export SPARK_CLASSPATH=/etc/hive/conf -- Original -- From: guxiaobo1982;guxiaobo1...@qq.com; Send time: Sunday, Feb 1, 2015 2:15 PM To: Skanda Prasadskanda.ganapa...@gmail.com; user@spark.apache.orguser@spark.apache.org;