Iterating over values by Key

2015-07-28 Thread gulyasm
I have K/V pairs where V is an Iterable (from previous groupBy). I use the
JAVA API. 
What I want is to iterate over the values by key, and on every element set
previousElementId attribute, that is the id of the previous element in the
sorted list. 
I try to do this with mapValues. I create an array from the Iterable, sort
the list there and iterate over the values, setting the attribute and saving
the reference for the next element. After that, return the array. 
Is this the best approach or I miss something?

Spark version: 1.4.1
Java version: 1.8

Thanks in advance. 
Mate



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Iterating-over-values-by-Key-tp24029.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



NullPointerException in ApplicationMaster

2015-02-25 Thread gulyasm
Hi all,

I am trying to run a Spark Java application on EMR, but I keep getting
NullPointerException from the Application master (spark version on
EMR: 1.2). The stacktrace is below. I also tried to run the
application on Hortonworks Sandbox (2.2) with spark 1.2, following the
blogpost (http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/)
from Hortonworks, but that failed too. Same exception. I try to run
over YARN (master: yarn-cluster). Tried to run the hortonworks sample
application on the virtual machine, but that failed with the very same
exception. I also tried to set spark home in SparkConf, same
exception. What am I missing?

The stacktrace and the log:
15/02/25 11:38:41 INFO SecurityManager: Changing view acls to: root
15/02/25 11:38:41 INFO SecurityManager: Changing modify acls to: root
15/02/25 11:38:41 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(root); users
with modify permissions: Set(root)
15/02/25 11:38:42 INFO Slf4jLogger: Slf4jLogger started
15/02/25 11:38:42 INFO Remoting: Starting remoting
15/02/25 11:38:42 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://sparkdri...@sandbox.hortonworks.com:53937]
15/02/25 11:38:42 INFO Utils: Successfully started service
'sparkDriver' on port 53937.
15/02/25 11:38:42 INFO SparkEnv: Registering MapOutputTracker
15/02/25 11:38:42 INFO SparkEnv: Registering BlockManagerMaster
15/02/25 11:38:42 INFO DiskBlockManager: Created local directory at
/tmp/spark-local-20150225113842-788f
15/02/25 11:38:42 INFO MemoryStore: MemoryStore started with capacity 265.4
MB
15/02/25 11:38:42 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
15/02/25 11:38:42 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-973069b3-aafd-4f1d-b18c-9e0a5d0efcaa
15/02/25 11:38:42 INFO HttpServer: Starting HTTP Server
15/02/25 11:38:43 INFO Utils: Successfully started service 'HTTP file
server' on port 39199.
15/02/25 11:38:43 INFO Utils: Successfully started service 'SparkUI' on port
4040.
15/02/25 11:38:43 INFO SparkUI: Started SparkUI at
http://sandbox.hortonworks.com:4040
15/02/25 11:38:43 INFO SparkContext: Added JAR
file:/root/logprocessor-1.0-SNAPSHOT-jar-with-dependencies.jar at
http://192.168.100.37:39199/jars/logprocessor-1.0-SNAPSHOT-jar-with-dependencies.jar
with timestamp 1424864323482
15/02/25 11:38:43 INFO YarnClusterScheduler: Created YarnClusterScheduler
Exception in thread main java.lang.NullPointerException
at
org.apache.spark.deploy.yarn.ApplicationMaster$.getAttempId(ApplicationMaster.scala:524)
at
org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.start(YarnClusterSchedulerBackend.scala:34)
at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:140)
at org.apache.spark.SparkContext.init(SparkContext.scala:337)
at
org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:61)
at
org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:75)
at hu.enbritely.logprocessor.Logprocessor.main(Logprocessor.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:360)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:76)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties


One of the program I try to run:

public static void main(String[] argv) {
  SparkConf conf = new SparkConf();
  JavaSparkContext spark = new JavaSparkContext(yarn-cluster,
Spark logprocessing, conf);
  JavaRDDString file = spark.textFile(hdfs://spark-output);
  file.saveAsTextFile(hdfs://output);
  spark.stop();
}

Thank you for your assistance!
Mate Gulyas



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-in-ApplicationMaster-tp21804.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org