Partitioning a libsvm format file

2014-08-10 Thread ayandas84
Hi,

I am using spark-scala system to train distributed svm. For training svm I
am using the files in LIBSVM format. I want to partition a file into fixed
number of partititions, with each partition having equal number 
of datapoints(assume that the number of datapoints in the file is exactly
divisible by number of partitions). How could I do that using spark-scala.

Please help. Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Partitioning-a-libsvm-format-file-tp11852.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Problem in running mosek in spark cluster - java.lang.UnsatisfiedLinkError: no mosekjava7_0 in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1738)

2014-09-09 Thread ayandas84
 We have a small apache spark cluster of 6 computers. We are trying to solve
a distributed problem which requires solving a optimization problem at each
machine during a spark map operation.

We decided to use mosek as the solver and I collected an academic license to
this end. We observed that mosek works fine in a single system. However,
when we prepare a jar file, include the mosek.jar into the library and try
to run the jar in the cluster as a spark job it gives errors.

java.lang.UnsatisfiedLinkError: no mosekjava7_0 in java.library.path

Does this problem has any thing to do with the license? We have set the
necessary path variables i n the profile of the user in the master machine
but we are not sure about what changes needs to be made to the other
machines in the cluster.

We shall be greatly obliged if you please suggest the necessary solution and
help us out.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-in-running-mosek-in-spark-cluster-java-lang-UnsatisfiedLinkError-no-mosekjava7-0-in-java-lib-tp13799.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



How to set java.library.path in a spark cluster

2014-09-09 Thread ayandas84
Hi,

I am working on a 3 machine cloudera cluster. Whenever I submit a spark job
as a jar file with native dependency on mosek it shows the following error.

java.lang.UnsatisfiedLinkError: no mosekjava7_0 in java.library.path

How should I set the java.library.path. I printed the environment variable
and it shows;
-Djava.library.path= -Xms512m -Xmx512m,

I added the following lines to the spark-env.sh file but it was of no help.
The path contains both the mosek.jar and the libmosek_7.0.so files.

export
SPARK_LIBRARY_PATH=${SPARK_HOME}/lib:/home/chanda/mosek/7/tools/platform/linux64x86/bin
export
SPARK_MASTER_OPTS='-Djava.library.path="/home/chanda/mosek/7/tools/platform/linux64x86/bin'
export
SPARK_WORKER_OPTS='/home/chanda/mosek/7/tools/platform/linux64x86/bin'
export
SPARK_HISTORY_OPTS='/home/chanda/mosek/7/tools/platform/linux64x86/bin'
export
SPARK_DAEMON_JAVA_OPTS='/home/chanda/mosek/7/tools/platform/linux64x86/bin'

Please help




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-java-library-path-in-a-spark-cluster-tp13854.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Kyro deserialisation error

2014-09-12 Thread ayandas84
Hi,

I am also facing the same problem. Has any one found out the solution yet?

It just returns a vague set of characters.

Please help..


Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Exception while deserializing and fetching task:
com.esotericsoftware.kryo.KryoException: Unable to find class: 

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv;

 "$&(*,.02468:<>@BDFHJLNPRTVXZ^`bdfhlnprtvD^bjlnpv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv:

 "$&(*,.02468:<>@BDFHJNPRTVXZ\`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv8@p=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtv=

 "$&(*,.02468:<>@BDFHJLNPRTVXZ\^`bdfhjlnprtvxz|~
Serialization trace:




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kyro-deserialisation-error-tp6798p14071.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



java.lang.OutOfMemoryError: Java heap space during reduce operation

2014-10-20 Thread ayandas84
Hi,

*In a reduce operation I am trying to accumulate a list of SparseVectors.
The code is given below;*
val WNode = trainingData.reduce{(node1:Node,node2:Node) =>
  val wNode = new Node(num1,num2)
  wNode.WhatList ++= (node1.WList)
  wNode.WList ++= (node2.WList)
  wNode
}

where Whatlist is a list of SparseVectors. The average size of a
SparseVector is 21000 and the approximate number of 
elements in the final list at the end of the reduce operation varies between
20 to 100.

*However, at run time I am getting the following error messages from some of
the executor machines.*
14/10/20 22:38:41 INFO BlockManagerInfo: Added taskresult_30 in memory on
cse-hadoop-113:34602 (size: 789.0 MB, free: 22.2 GB)
14/10/20 22:38:41 INFO TaskSetManager: Starting task 1.0:12 as TID 34 on
executor 6: cse-hadoop-113 (PROCESS_LOCAL)
14/10/20 22:38:41 INFO TaskSetManager: Serialized task 1.0:12 as 2170 bytes
in 2 ms
14/10/20 22:38:41 INFO SendingConnection: Initiating connection to
[cse-hadoop-113/192.168.0.113:34602]
14/10/20 22:38:41 INFO SendingConnection: Connected to
[cse-hadoop-113/192.168.0.113:34602], 1 messages pending
14/10/20 22:38:41 INFO ConnectionManager: Accepted connection from
[cse-hadoop-113/192.168.0.113]
Exception in thread "pool-5-thread-3" java.lang.OutOfMemoryError: Java heap
space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.apache.spark.network.Message$.create(Message.scala:88)
at
org.apache.spark.network.ReceivingConnection$Inbox.org$apache$spark$network$ReceivingConnection$Inbox$$createNewMessage$1(Connection.scala:438)
at
org.apache.spark.network.ReceivingConnection$Inbox$$anonfun$1.apply(Connection.scala:448)
at
org.apache.spark.network.ReceivingConnection$Inbox$$anonfun$1.apply(Connection.scala:448)
at
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
at
org.apache.spark.network.ReceivingConnection$Inbox.getChunk(Connection.scala:448)
at 
org.apache.spark.network.ReceivingConnection.read(Connection.scala:525)
at
org.apache.spark.network.ConnectionManager$$anon$6.run(ConnectionManager.scala:176)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
*Please help.*



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-Java-heap-space-during-reduce-operation-tp16835.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org