Re: Feasibility Project - Text Processing and Category Classification

2015-08-28 Thread Ritesh Kumar Singh
Load the textFile as an RDD. Something like this:

  val file = sc.textFile(/path/to/file)


After this you can manipulate this RDD to filter texts the way you want
them :

  val a1 = file.filter( line = line.contains([ERROR]) )
  val a2 = file.filter( line = line.contains([WARN]) )
  val a3 = file.filter( line = line.contains([INFO]) )


You can view the lines using the println method like this:

  a1.foreach(println)


You can also count the number of such lines using the count function like
this:

  val b1 = file.filter( line = line.contains([ERROR]) ).count()


Regards,


 *Ritesh Kumar Singh,**https://riteshtoday.wordpress.com/
 https://riteshtoday.wordpress.com/*


Re: Unsupported major.minor version 51.0

2015-08-11 Thread Ritesh Kumar Singh
Can you please mention the output for the following :

java -version

javac -version


Local spark jars not being detected

2015-06-20 Thread Ritesh Kumar Singh
Hi,

I'm using IntelliJ ide for my spark project.
I've compiled spark 1.3.0 for scala 2.11.4 and here's the one of the
compiled jar installed in my m2 folder :

~/.m2/repository/org/apache/spark/spark-core_2.11/1.3.0/spark-core_2.11-1.3.0.jar

But when I add this dependency in my pom file for the project :

dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_$(scala.version)/artifactId
version${spark.version}/version
scopeprovided/scope
/dependency

I'm getting Dependency org.apache.spark:spark-core_$(scala.version):1.3.0
not found.
Why is this happening and what's the workaround ?


Re: Local spark jars not being detected

2015-06-20 Thread Ritesh Kumar Singh
Yes, finally solved. It was there in front of my eyes all time.

Thanks a lot Pete.


Error using spark 1.3.0 with maven

2015-06-15 Thread Ritesh Kumar Singh
Hi,

I'm getting this error while running spark as a java project using maven :

15/06/15 17:11:38 INFO SparkContext: Running Spark version 1.3.0
15/06/15 17:11:38 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/06/15 17:11:38 INFO SecurityManager: Changing view acls to: root
15/06/15 17:11:38 INFO SecurityManager: Changing modify acls to: root
15/06/15 17:11:38 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(root); users
with modify permissions: Set(root)
Exception in thread main com.typesafe.config.ConfigException$Missing: No
configuration setting found for key 'akka.version'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
at akka.actor.ActorSystem$Settings.init(ActorSystem.scala:168)
at akka.actor.ActorSystemImpl.init(ActorSystem.scala:504)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:55)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1832)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1823)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:57)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:223)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
at org.apache.spark.SparkContext.init(SparkContext.scala:270)
at
org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:61)
at Test.main(Test.java:9)


==

My Test.java file contains following :

import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;

public class Test {
  public static void main(String[] args) {
String logFile = /code/data.txt;
SparkConf conf = new
SparkConf().setMaster(local[4]).setAppName(Simple Application);
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDDString logData = sc.textFile(logFile);

long numAs = logData.filter(new FunctionString, Boolean() {
  public Boolean call(String s) { return s.contains(a); }
}).count();

long numBs = logData.filter(new FunctionString, Boolean() {
  public Boolean call(String s) { return s.contains(b); }
}).count();

System.out.println(Lines with a:  + numAs + , lines with b:  +
numBs);
  }
}


==

My pom file contains the following :

?xml version=1.0 encoding=UTF-8?
project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=
http://www.w3.org/2001/XMLSchema-instance;
  xsi:schemaLocation=http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd;
  modelVersion4.0.0/modelVersion

  groupIddexample/groupId
  artifactIdsparktest/artifactId
  nameTesting spark with maven/name
  packagingjar/packaging
  version1.0-SNAPSHOT/version

  dependencies
dependency
  groupIdorg.apache.spark/groupId
  artifactIdspark-core_2.10/artifactId
  version1.3.0/version
/dependency
  /dependencies
  build
plugins
  plugin
groupIdorg.apache.maven.plugins/groupId
artifactIdmaven-jar-plugin/artifactId
version2.6/version
configuration
  finalNamesparktest/finalName
  archive
manifest
  addClasspathtrue/addClasspath
  mainClassTest/mainClass
  classpathPrefixdependency-jars//classpathPrefix
/manifest
  /archive
/configuration
  /plugin
  plugin
groupIdorg.apache.maven.plugins/groupId
artifactIdmaven-compiler-plugin/artifactId
version3.3/version
configuration
  source1.7/source
  target1.7/target
/configuration
  /plugin
  plugin
groupIdorg.apache.maven.plugins/groupId
artifactIdmaven-assembly-plugin/artifactId
executions
  execution
goals
  goalattached/goal
/goals
phasepackage/phase
configuration
  finalNamesparktest/finalName
  descriptorRefs

akka configuration not found

2015-06-15 Thread Ritesh Kumar Singh
Hi,

Though my project has nothing to do with akka, I'm getting this error :

Exception in thread main com.typesafe.config.ConfigException$Missing: No
configuration setting found for key 'akka.version'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:151)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:206)
at akka.actor.ActorSystem$Settings.init(ActorSystem.scala:168)
at akka.actor.ActorSystemImpl.init(ActorSystem.scala:504)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:55)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1832)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1823)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:57)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:223)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
at org.apache.spark.SparkContext.init(SparkContext.scala:270)
at
org.apache.spark.api.java.JavaSparkContext.init(JavaSparkContext.scala:61)
at Test.main(Test.java:9)

There is no reference to akka anywhere in the code / pom file.
Any fixes?

Thanks,
Ritesh


Re: Can't build Spark 1.3

2015-06-02 Thread Ritesh Kumar Singh
It did hang for me too. High RAM consumption during build. Had to free a
lot of RAM and introduce swap memory just to get it build in my 3rd attempt.
Everything else looks fine. You can download the prebuilt versions from the
Spark homepage to save yourself from all this trouble.

Thanks,
Ritesh


Re: Official Docker container for Spark

2015-05-22 Thread Ritesh Kumar Singh
Use this:
sequenceiq/docker

Here's a link to their github repo:
docker-spark https://github.com/sequenceiq/docker-spark


They have repos for other big data tools too which are agin really nice.
Its being maintained properly by their devs and


Re: Overlapping classes warnings

2015-04-09 Thread Ritesh Kumar Singh
Though the warnings can be ignored, they add up in the log files while
compiling other projects too. And there are a lot of those warnings. Any
workaround? How do we modify the pom.xml file to exclude these unnecessary
dependencies?

On Fri, Apr 10, 2015 at 2:29 AM, Sean Owen so...@cloudera.com wrote:

 Generally, you can ignore these things. They mean some artifacts
 packaged other artifacts, and so two copies show up when all the JAR
 contents are merged.

 But here you do show a small dependency convergence problem; beanutils
 1.7 is present but beanutills-core 1.8 is too even though these should
 be harmonized. I imagine one could be excluded; I imagine we could
 harmonize the version manually. In practice, I also imagine it doesn't
 cause any problem but feel free to propose a fix along those lines.

 On Thu, Apr 9, 2015 at 4:54 PM, Ritesh Kumar Singh
 riteshoneinamill...@gmail.com wrote:
  Hi,
 
  During compilation I get a lot of these:
 
  [WARNING] kryo-2.21.jar, reflectasm-1.07-shaded.jar define
   23 overlappping classes:
 
  [WARNING] commons-beanutils-1.7.0.jar, commons-beanutils-core-1.8.0.jar
  define
   82 overlappping classes:
 
  [WARNING] commons-beanutils-1.7.0.jar, commons-collections-3.2.1.jar,
  commons-beanutils-core-1.8.0.jar define 10 overlappping classes:
 
 
  And a lot of others. How do I fix these?



Re: Overlapping classes warnings

2015-04-09 Thread Ritesh Kumar Singh
I found this jira https://jira.codehaus.org/browse/MSHADE-128 when
googling for fixes. Wonder if it can fix anything here.
But anyways, thanks for the help :)

On Fri, Apr 10, 2015 at 2:46 AM, Sean Owen so...@cloudera.com wrote:

 I agree, but as I say, most are out of the control of Spark. They
 aren't because of unnecessary dependencies.

 On Thu, Apr 9, 2015 at 5:14 PM, Ritesh Kumar Singh
 riteshoneinamill...@gmail.com wrote:
  Though the warnings can be ignored, they add up in the log files while
  compiling other projects too. And there are a lot of those warnings. Any
  workaround? How do we modify the pom.xml file to exclude these
 unnecessary
  dependencies?
 
  On Fri, Apr 10, 2015 at 2:29 AM, Sean Owen so...@cloudera.com wrote:
 
  Generally, you can ignore these things. They mean some artifacts
  packaged other artifacts, and so two copies show up when all the JAR
  contents are merged.
 
  But here you do show a small dependency convergence problem; beanutils
  1.7 is present but beanutills-core 1.8 is too even though these should
  be harmonized. I imagine one could be excluded; I imagine we could
  harmonize the version manually. In practice, I also imagine it doesn't
  cause any problem but feel free to propose a fix along those lines.
 
  On Thu, Apr 9, 2015 at 4:54 PM, Ritesh Kumar Singh
  riteshoneinamill...@gmail.com wrote:
   Hi,
  
   During compilation I get a lot of these:
  
   [WARNING] kryo-2.21.jar, reflectasm-1.07-shaded.jar define
23 overlappping classes:
  
   [WARNING] commons-beanutils-1.7.0.jar,
 commons-beanutils-core-1.8.0.jar
   define
82 overlappping classes:
  
   [WARNING] commons-beanutils-1.7.0.jar, commons-collections-3.2.1.jar,
   commons-beanutils-core-1.8.0.jar define 10 overlappping classes:
  
  
   And a lot of others. How do I fix these?
 
 



Migrating from Spark 0.8.0 to Spark 1.3.0

2015-04-03 Thread Ritesh Kumar Singh
Hi,

Are there any tutorials that explains all the changelogs between Spark
0.8.0 and Spark 1.3.0 and how can we approach this issue.


Re: Is there any Sparse Matrix implementation in Spark/MLib?

2015-02-27 Thread Ritesh Kumar Singh
try using breeze (scala linear algebra library)

On Fri, Feb 27, 2015 at 5:56 PM, shahab shahab.mok...@gmail.com wrote:

 Thanks a lot Vijay, let me see how it performs.

 Best
 Shahab


 On Friday, February 27, 2015, Vijay Saraswat vi...@saraswat.org wrote:

 Available in GML --

 http://x10-lang.org/x10-community/applications/global-matrix-library.html

 We are exploring how to make it available within Spark. Any ideas would
 be much appreciated.

 On 2/27/15 7:01 AM, shahab wrote:

 Hi,

 I just wonder if there is any Sparse Matrix implementation available  in
 Spark, so it can be used in spark application?

 best,
 /Shahab



 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Mllib error

2014-12-10 Thread Ritesh Kumar Singh
How did you build your spark 1.1.1 ?

On Wed, Dec 10, 2014 at 10:41 AM, amin mohebbi aminn_...@yahoo.com.invalid
wrote:

 I'm trying to build a very simple scala standalone app using the Mllib,
 but I get the following error when trying to bulid the program:

 Object mllib is not a member of package org.apache.spark


 please note I just migrated from 1.0.2 to 1.1.1



 Best Regards

 ...

 Amin Mohebbi

 PhD candidate in Software Engineering
  at university of Malaysia

 Tel : +60 18 2040 017



 E-Mail : tp025...@ex.apiit.edu.my

   amin_...@me.com



Re: Install Apache Spark on a Cluster

2014-12-08 Thread Ritesh Kumar Singh
On a rough note,

Step 1: Install Hadoop2.x in all the machines on cluster
Step 2: Check if Hadoop cluster is working
Step 3: Setup Apache Spark as given on the documentation page for the
cluster.
Check the status of cluster on the master UI

As it is some data mining project, configure Hive too.
You can use Spark SQL or AMPLAB Shark as a database store

On Mon, Dec 8, 2014 at 11:01 PM, riginos samarasrigi...@gmail.com wrote:

 My thesis is related to big data mining and I have a cluster in the
 laboratory of my university. My task is to install apache spark on it and
 use it for extraction purposes. Is there any understandable guidance on how
 to do this ?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Install-Apache-Spark-on-a-Cluster-tp20580.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: How take top N of top M from RDD as RDD

2014-12-01 Thread Ritesh Kumar Singh
For converting an Array or any List to a RDD, we can try using :

sc.parallelize(groupedScore)//or whatever the name of the list
variable is

On Mon, Dec 1, 2014 at 8:14 PM, Xuefeng Wu ben...@gmail.com wrote:

 Hi, I have a problem, it is easy in Scala code, but I can not take the top
 N from RDD as RDD.


 There are 1 Student Score, ask take top 10 age, and then take top 10
 from each age, the result is 100 records.

 The Scala code is here, but how can I do it in RDD,  *for RDD.take return
 is Array, but other RDD.*

 example Scala code:

 import scala.util.Random

 case class StudentScore(age: Int, num: Int, score: Int, name: Int)

 val scores = for {
   i - 1 to 1
 } yield {
   StudentScore(Random.nextInt(100), Random.nextInt(100), Random.nextInt(), 
 Random.nextInt())
 }


 def takeTop(scores: Seq[StudentScore], byKey: StudentScore = Int): Seq[(Int, 
 Seq[StudentScore])] = {
   val groupedScore = scores.groupBy(byKey)
.map{case (_, _scores) = 
 (_scores.foldLeft(0)((acc, v) = acc + v.score), _scores)}.toSeq
   groupedScore.sortBy(_._1).take(10)
 }

 val topScores = for {
   (_, ageScores) - takeTop(scores, _.age)
   (_, numScores) - takeTop(ageScores, _.num)
 } yield {
   numScores
 }

 topScores.size


 --

 ~Yours, Xuefeng Wu/吴雪峰  敬上




Re: Setting network variables in spark-shell

2014-11-30 Thread Ritesh Kumar Singh
Spark configuration settings can be found here
http://spark.apache.org/docs/latest/configuration.html

Hope it helps :)

On Sun, Nov 30, 2014 at 9:55 PM, Brian Dolan buddha_...@yahoo.com.invalid
wrote:

 Howdy Folks,

 What is the correct syntax in 1.0.0 to set networking variables in spark
 shell?  Specifically, I'd like to set the spark.akka.frameSize

 I'm attempting this:

 spark-shell -Dspark.akka.frameSize=1 --executor-memory 4g


 Only to get this within the session:

 System.getProperty(spark.executor.memory)
 res0: String = 4g
 System.getProperty(spark.akka.frameSize)
 res1: String = null


 I don't believe I am violating protocol, but I have also posted this to
 SO:
 http://stackoverflow.com/questions/27215288/how-do-i-set-spark-akka-framesize-in-spark-shell

 ~~
 May All Your Sequences Converge






Re: spark-shell giving me error of unread block data

2014-11-19 Thread Ritesh Kumar Singh
As Marcelo mentioned, the issue occurs mostly when incompatible classes are
used by executors or drivers.  Try out if the output is coming on
spark-shell. If yes, then most probably in your case, there might be some
issue with your configuration files. It will be helpful if you can paste
the contents of the config files you edited.

On Thu, Nov 20, 2014 at 5:45 AM, Anson Abraham anson.abra...@gmail.com
wrote:

 Sorry meant cdh 5.2 w/ spark 1.1.

 On Wed, Nov 19, 2014, 17:41 Anson Abraham anson.abra...@gmail.com wrote:

 yeah CDH distribution (1.1).

 On Wed Nov 19 2014 at 5:29:39 PM Marcelo Vanzin van...@cloudera.com
 wrote:

 On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham anson.abra...@gmail.com
 wrote:
  yeah but in this case i'm not building any files.  just deployed out
 config
  files in CDH5.2 and initiated a spark-shell to just read and output a
 file.

 In that case it is a little bit weird. Just to be sure, you are using
 CDH's version of Spark, not trying to run an Apache Spark release on
 top of CDH, right? (If that's the case, then we could probably move
 this conversation to cdh-us...@cloudera.org, since it would be
 CDH-specific.)


  On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin van...@cloudera.com
 wrote:
 
  Hi Anson,
 
  We've seen this error when incompatible classes are used in the driver
  and executors (e.g., same class name, but the classes are different
  and thus the serialized data is different). This can happen for
  example if you're including some 3rd party libraries in your app's
  jar, or changing the driver/executor class paths to include these
  conflicting libraries.
 
  Can you clarify whether any of the above apply to your case?
 
  (For example, one easy way to trigger this is to add the
  spark-examples jar shipped with CDH5.2 in the classpath of your
  driver. That's one of the reasons I filed SPARK-4048, but I digress.)
 
 
  On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham 
 anson.abra...@gmail.com
  wrote:
   I'm essentially loading a file and saving output to another
 location:
  
   val source = sc.textFile(/tmp/testfile.txt)
   source.saveAsTextFile(/tmp/testsparkoutput)
  
   when i do so, i'm hitting this error:
   14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at
   console:15
   org.apache.spark.SparkException: Job aborted due to stage failure:
 Task
   0 in
   stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in
 stage
   0.0
   (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateExceptio
 n:
   unread
   block data
  
  
   java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(
 ObjectInputStream.java:2421)
  
   java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
  
   java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea
 m.java:1990)
  
   java.io.ObjectInputStream.readSerialData(ObjectInputStream.
 java:1915)
  
  
   java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre
 am.java:1798)
  
   java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   java.io.ObjectInputStream.readObject(ObjectInputStream.java
 :370)
  
  
   org.apache.spark.serializer.JavaDeserializationStream.readOb
 ject(JavaSerializer.scala:62)
  
  
   org.apache.spark.serializer.JavaSerializerInstance.deseriali
 ze(JavaSerializer.scala:87)
  
   org.apache.spark.executor.Executor$TaskRunner.run(Executor.
 scala:162)
  
  
   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
 Executor.java:1145)
  
  
   java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
 lExecutor.java:615)
   java.lang.Thread.run(Thread.java:744)
   Driver stacktrace:
   at
  
   org.apache.spark.scheduler.DAGScheduler.org$apache$spark$sch
 eduler$DAGScheduler$$failJobAndIndependentStages(DAGSchedule
 r.scala:1185)
   at
  
   org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
 1.apply(DAGScheduler.scala:1174)
   at
  
   org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
 1.apply(DAGScheduler.scala:1173)
   at
  
   scala.collection.mutable.ResizableArray$class.foreach(Resiza
 bleArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.
 scala:47)
   at
  
   org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu
 ler.scala:1173)
   at
  
   org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
 etFailed$1.apply(DAGScheduler.scala:688)
   at
  
   org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
 etFailed$1.apply(DAGScheduler.scala:688)
   at scala.Option.foreach(Option.scala:236)
   at
  
   org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
 DAGScheduler.scala:688)
   at
  
   org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$
 anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at 

Re: Spark On Yarn Issue: Initial job has not accepted any resources

2014-11-18 Thread Ritesh Kumar Singh
Not sure how to solve this, but spotted these lines in the logs:

14/11/18 14:28:23 INFO YarnAllocationHandler: Container marked as
*failed*: container_1415961020140_0325_01_02

14/11/18 14:28:38 INFO YarnAllocationHandler: Container marked as
*failed*: container_1415961020140_0325_01_03

And the lines following it says its trying to allocate some space of
1408B but its failing to do so. You might want to look into that


On Tue, Nov 18, 2014 at 1:23 PM, LinCharlie lin_q...@outlook.com wrote:

 Hi All:
 I was submitting a spark_program.jar to `spark on yarn cluster` on a
 driver machine with yarn-client mode. Here is the spark-submit command I
 used:

 ./spark-submit --master yarn-client --class
 com.charlie.spark.grax.OldFollowersExample --queue dt_spark
 ~/script/spark-flume-test-0.1-SNAPSHOT-hadoop2.0.0-mr1-cdh4.2.1.jar

 The queue `dt_spark` was free, and the program was submitted succesfully
 and running on the cluster.  But on console, it showed repeatedly that:

 14/11/18 15:11:48 WARN YarnClientClusterScheduler: Initial job has not
 accepted any resources; check your cluster UI to ensure that workers are
 registered and have sufficient memory

 Checked the cluster UI logs, I find no errors:

 SLF4J: Class path contains multiple SLF4J bindings.
 SLF4J: Found binding in 
 [jar:file:/home/disk5/yarn/usercache/linqili/filecache/6957209742046754908/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: Found binding in 
 [jar:file:/home/hadoop/hadoop-2.0.0-cdh4.2.1/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
 SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
 explanation.
 SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
 14/11/18 14:28:16 INFO SecurityManager: Changing view acls to: hadoop,linqili
 14/11/18 14:28:16 INFO SecurityManager: SecurityManager: authentication 
 disabled; ui acls disabled; users with view permissions: Set(hadoop, linqili)
 14/11/18 14:28:17 INFO Slf4jLogger: Slf4jLogger started
 14/11/18 14:28:17 INFO Remoting: Starting remoting
 14/11/18 14:28:17 INFO Remoting: Remoting started; listening on addresses 
 :[akka.tcp://sparkyar...@longzhou-hdp3.lz.dscc:37187]
 14/11/18 14:28:17 INFO Remoting: Remoting now listens on addresses: 
 [akka.tcp://sparkyar...@longzhou-hdp3.lz.dscc:37187]
 14/11/18 14:28:17 INFO ExecutorLauncher: ApplicationAttemptId: 
 appattempt_1415961020140_0325_01
 14/11/18 14:28:17 INFO ExecutorLauncher: Connecting to ResourceManager at 
 longzhou-hdpnn.lz.dscc/192.168.19.107:12032
 14/11/18 14:28:17 INFO ExecutorLauncher: Registering the ApplicationMaster
 14/11/18 14:28:18 INFO ExecutorLauncher: Waiting for spark driver to be 
 reachable.
 14/11/18 14:28:18 INFO ExecutorLauncher: Master now available: 
 192.168.59.90:36691
 14/11/18 14:28:18 INFO ExecutorLauncher: Listen to driver: 
 akka.tcp://spark@192.168.59.90:36691/user/CoarseGrainedScheduler
 14/11/18 
 http://spark@192.168.59.90:36691/user/CoarseGrainedScheduler14/11/18 
 14:28:18 INFO ExecutorLauncher: Allocating 1 executors.
 14/11/18 14:28:18 INFO YarnAllocationHandler: Allocating 1 executor 
 containers with 1408 of memory each.
 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
 containers: 1, priority = 1 , capability : memory: 1408)
 14/11/18 14:28:18 INFO YarnAllocationHandler: Allocating 1 executor 
 containers with 1408 of memory each.
 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
 containers: 1, priority = 1 , capability : memory: 1408)
 14/11/18 14:28:18 INFO RackResolver: Resolved longzhou-hdp3.lz.dscc to /rack1
 14/11/18 14:28:18 INFO YarnAllocationHandler: launching container on 
 container_1415961020140_0325_01_02 host longzhou-hdp3.lz.dscc
 14/11/18 14:28:18 INFO ExecutorRunnable: Starting Executor Container
 14/11/18 14:28:18 INFO ExecutorRunnable: Connecting to ContainerManager at 
 longzhou-hdp3.lz.dscc:12040
 14/11/18 14:28:18 INFO ExecutorRunnable: Setting up ContainerLaunchContext
 14/11/18 14:28:18 INFO ExecutorRunnable: Preparing Local resources
 14/11/18 14:28:18 INFO ExecutorLauncher: All executors have launched.
 14/11/18 14:28:18 INFO ExecutorLauncher: Started progress reporter thread - 
 sleep time : 5000
 14/11/18 14:28:18 INFO YarnAllocationHandler: ResourceRequest (host : *, num 
 containers: 0, priority = 1 , capability : memory: 1408)
 14/11/18 14:28:18 INFO ExecutorRunnable: Prepared Local resources 
 Map(__spark__.jar - resource {, scheme: hdfs, host: 
 longzhou-hdpnn.lz.dscc, port: 11000, file: 
 /user/linqili/.sparkStaging/application_1415961020140_0325/spark-assembly-1.0.2-hadoop2.0.0-cdh4.2.1.jar,
  }, size: 134859131, timestamp: 1416292093988, type: FILE, visibility: 
 PRIVATE, )
 14/11/18 14:28:18 INFO ExecutorRunnable: Setting up executor with commands: 
 List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill %p', 
 -Xms1024m -Xmx1024m , 
 

Re: spark-shell giving me error of unread block data

2014-11-18 Thread Ritesh Kumar Singh
It can be a serialization issue.
Happens when there are different versions installed on the same system.
What do you mean by the first time you installed and tested it out?

On Wed, Nov 19, 2014 at 3:29 AM, Anson Abraham anson.abra...@gmail.com
wrote:

 I'm essentially loading a file and saving output to another location:

 val source = sc.textFile(/tmp/testfile.txt)
 source.saveAsTextFile(/tmp/testsparkoutput)

 when i do so, i'm hitting this error:
 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at
 console:15
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException:
 unread block data

 java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)

 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)

 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)

 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)

 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162)

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
 Driver stacktrace:
 at org.apache.spark.scheduler.DAGScheduler.org
 $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
 at scala.Option.foreach(Option.scala:236)
 at
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
 at
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


 Cant figure out what the issue is.  I'm running in CDH5.2 w/ version of
 spark being 1.1.  The file i'm loading is literally just 7 MB.  I thought
 it was jar files mismatch, but i did a compare and see they're all
 identical.  But seeing as how they were all installed through CDH parcels,
 not sure how there would be version mismatch on the nodes and master.  Oh
 yeah 1 master node w/ 2 worker nodes and running in standalone not through
 yarn.  So as a just in case, i copied the jars from the master to the 2
 worker nodes as just in case, and still same issue.
 Weird thing is, first time i installed and tested it out, it worked, but
 now it doesn't.

 Any help here would be greatly appreciated.



RandomGenerator class not found exception

2014-11-17 Thread Ritesh Kumar Singh
My sbt file for the project includes this:

libraryDependencies ++= Seq(
org.apache.spark  %% spark-core  % 1.1.0,
org.apache.spark  %% spark-mllib % 1.1.0,
org.apache.commons % commons-math3 % 3.3
)
=

Still I am getting this error:

java.lang.NoClassDefFoundError:
org/apache/commons/math3/random/RandomGenerator

=

The jar at location: ~/.m2/repository/org/apache/commons/commons-math3/3.3
contains the random generator class:

 $ jar tvf commons-math3-3.3.jar | grep RandomGenerator
org/apache/commons/math3/random/RandomGenerator.class
org/apache/commons/math3/random/UniformRandomGenerator.class
org/apache/commons/math3/random/SynchronizedRandomGenerator.class
org/apache/commons/math3/random/AbstractRandomGenerator.class
org/apache/commons/math3/random/RandomGeneratorFactory$1.class
org/apache/commons/math3/random/RandomGeneratorFactory.class
org/apache/commons/math3/random/StableRandomGenerator.class
org/apache/commons/math3/random/NormalizedRandomGenerator.class
org/apache/commons/math3/random/JDKRandomGenerator.class
org/apache/commons/math3/random/GaussianRandomGenerator.class


Please help


Re: Returning breeze.linalg.DenseMatrix from method

2014-11-17 Thread Ritesh Kumar Singh
Yeah, it works.

Although when I try to define a var of type DenseMatrix, like this:

var mat1: DenseMatrix[Double]

It gives an error saying we need to initialise the matrix mat1 at the time
of declaration.
Had to initialise it as :
var mat1: DenseMatrix[Double] = DenseMatrix.zeros[Double](1,1)

Anyways, it works now
Thanks for helping :)

On Mon, Nov 17, 2014 at 4:56 PM, tribhuvan...@gmail.com 
tribhuvan...@gmail.com wrote:

 This should fix it --

 def func(str: String): DenseMatrix*[Double]* = {
 ...
 ...
 }

 So, why is this required?
 Think of it like this -- If you hadn't explicitly mentioned Double, it
 might have been that the calling function expected a
 DenseMatrix[SomeOtherType], and performed a SomeOtherType-specific
 operation which may have not been supported by the returned
 DenseMatrix[Double]. (I'm also assuming that SomeOtherType has no subtype
 relations with Double).

 On 17 November 2014 00:14, Ritesh Kumar Singh 
 riteshoneinamill...@gmail.com wrote:

 Hi,

 I have a method that returns DenseMatrix:
 def func(str: String): DenseMatrix = {
 ...
 ...
 }

 But I keep getting this error:
 *class DenseMatrix takes type parameters*

 I tried this too:
 def func(str: String): DenseMatrix(Int, Int, Array[Double]) = {
 ...
 ...
 }
 But this gives me this error:
 *'=' expected but '(' found*

 Any possible fixes?




 --
 *Tribhuvanesh Orekondy*



Returning breeze.linalg.DenseMatrix from method

2014-11-16 Thread Ritesh Kumar Singh
Hi,

I have a method that returns DenseMatrix:
def func(str: String): DenseMatrix = {
...
...
}

But I keep getting this error:
*class DenseMatrix takes type parameters*

I tried this too:
def func(str: String): DenseMatrix(Int, Int, Array[Double]) = {
...
...
}
But this gives me this error:
*'=' expected but '(' found*

Any possible fixes?


Re: Fwd: Executor Lost Failure

2014-11-11 Thread Ritesh Kumar Singh
Yes... found the output on web UI of the slave.

Thanks :)

On Tue, Nov 11, 2014 at 2:48 AM, Ankur Dave ankurd...@gmail.com wrote:

 At 2014-11-10 22:53:49 +0530, Ritesh Kumar Singh 
 riteshoneinamill...@gmail.com wrote:
  Tasks are now getting submitted, but many tasks don't happen.
  Like, after opening the spark-shell, I load a text file from disk and try
  printing its contentsas:
 
 sc.textFile(/path/to/file).foreach(println)
 
  It does not give me any output.

 That's because foreach launches tasks on the slaves. When each task tries
 to print its lines, they go to the stdout file on the slave rather than to
 your console at the driver. You should see the file's contents in each of
 the slaves' stdout files in the web UI.

 This only happens when running on a cluster. In local mode, all the tasks
 are running locally and can output to the driver, so foreach(println) is
 more useful.

 Ankur



Re: disable log4j for spark-shell

2014-11-11 Thread Ritesh Kumar Singh
go to your spark home and then into the conf/ directory and then edit the
log4j.properties file i.e. :

gedit $SPARK_HOME/conf/log4j.properties

and set root logger to:
   log4j.rootCategory=WARN, console

U don't need to build spark for the changes to take place. Whenever you
open spark-shel, it by default looks into the conf directories and loads
all the properties.

Thanks

On Tue, Nov 11, 2014 at 6:34 AM, lordjoe lordjoe2...@gmail.com wrote:

 public static void main(String[] args) throws Exception {
  System.out.println(Set Log to Warn);
 Logger rootLogger = Logger.getRootLogger();
 rootLogger.setLevel(Level.WARN);
 ...
  works for me




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/disable-log4j-for-spark-shell-tp11278p18535.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Fwd: disable log4j for spark-shell

2014-11-11 Thread Ritesh Kumar Singh
-- Forwarded message --
From: Ritesh Kumar Singh riteshoneinamill...@gmail.com
Date: Tue, Nov 11, 2014 at 2:18 PM
Subject: Re: disable log4j for spark-shell
To: lordjoe lordjoe2...@gmail.com
Cc: u...@spark.incubator.apache.org


go to your spark home and then into the conf/ directory and then edit the
log4j.properties file i.e. :

gedit $SPARK_HOME/conf/log4j.properties

and set root logger to:
   log4j.rootCategory=WARN, console

U don't need to build spark for the changes to take place. Whenever you
open spark-shel, it by default looks into the conf directories and loads
all the properties.

Thanks

On Tue, Nov 11, 2014 at 6:34 AM, lordjoe lordjoe2...@gmail.com wrote:

 public static void main(String[] args) throws Exception {
  System.out.println(Set Log to Warn);
 Logger rootLogger = Logger.getRootLogger();
 rootLogger.setLevel(Level.WARN);
 ...
  works for me




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/disable-log4j-for-spark-shell-tp11278p18535.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: save as file

2014-11-11 Thread Ritesh Kumar Singh
We have RDD.saveAsTextFile and RDD.saveAsObjectFile for saving the output
to any location specified. The params to be provided are:
path of storage location
no. of partitions

For giving an hdfs path we use the following format:
/user/user-name/directory-to-sore/

On Tue, Nov 11, 2014 at 6:28 PM, Naveen Kumar Pokala 
npok...@spcapitaliq.com wrote:

 Hi,



 I am spark 1.1.0. I need a help regarding saving rdd in a JSON file?



 How to do that? And how to mentions hdfs path in the program.





 -Naveen







Re: How to kill a Spark job running in cluster mode ?

2014-11-11 Thread Ritesh Kumar Singh
There is a property :
   spark.ui.killEnabled
which needs to be set true for killing applications directly from the webUI.
Check the link:
Kill Enable spark job
http://spark.apache.org/docs/latest/configuration.html#spark-ui

Thanks

On Tue, Nov 11, 2014 at 7:42 PM, Sonal Goyal sonalgoy...@gmail.com wrote:

 The web interface has a kill link. You can try using that.

 Best Regards,
 Sonal
 Founder, Nube Technologies http://www.nubetech.co

 http://in.linkedin.com/in/sonalgoyal



 On Tue, Nov 11, 2014 at 7:28 PM, Tao Xiao xiaotao.cs@gmail.com
 wrote:

 I'm using Spark 1.0.0 and I'd like to kill a job running in cluster mode,
 which means the driver is not running on local node.

 So how can I kill such a job? Is there a command like hadoop job -kill
 job-id which kills a running MapReduce job ?

 Thanks





Re: Spark-submit and Windows / Linux mixed network

2014-11-11 Thread Ritesh Kumar Singh
Never tried this form but just guessing,

What's the output when you submit this jar: \\shares\publish\Spark\app1\
someJar.jar
using spark-submit.cmd


Removing INFO logs

2014-11-10 Thread Ritesh Kumar Singh
How can I remove all the INFO logs that appear on the console when I submit
an application using spark-submit?


Re: Removing INFO logs

2014-11-10 Thread Ritesh Kumar Singh
It works.

Thanks

On Mon, Nov 10, 2014 at 6:32 PM, YANG Fan idd...@gmail.com wrote:

 Hi,

 In conf/log4j.properties, change the following

 log4j.rootCategory=INFO, console

 to
  log4j.rootCategory=WARN, console

 This works for me.

 Best,
 Fan

 On Mon, Nov 10, 2014 at 8:21 PM, Ritesh Kumar Singh 
 riteshoneinamill...@gmail.com wrote:

 How can I remove all the INFO logs that appear on the console when I
 submit an application using spark-submit?





Re: Executor Lost Failure

2014-11-10 Thread Ritesh Kumar Singh
On Mon, Nov 10, 2014 at 10:52 PM, Ritesh Kumar Singh 
riteshoneinamill...@gmail.com wrote:

 Tasks are now getting submitted, but many tasks don't happen.
 Like, after opening the spark-shell, I load a text file from disk and try
 printing its contentsas:

 sc.textFile(/path/to/file).foreach(println)

 It does not give me any output. While running this:

 sc.textFile(/path/to/file).count

 gives me the right number of lines in the text file.
 Not sure what the error is. But here is the output on the console for
 print case:

 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(215230) called with
 curMem=709528, maxMem=463837593
 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6 stored as values in
 memory (estimated size 210.2 KB, free 441.5 MB)
 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(17239) called with
 curMem=924758, maxMem=463837593
 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6_piece0 stored as
 bytes in memory (estimated size 16.8 KB, free 441.5 MB)
 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in
 memory on gonephishing.local:42648 (size: 16.8 KB, free: 442.3 MB)
 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block
 broadcast_6_piece0
 14/11/10 22:48:02 INFO FileInputFormat: Total input paths to process : 1
 14/11/10 22:48:02 INFO SparkContext: Starting job: foreach at console:13
 14/11/10 22:48:02 INFO DAGScheduler: Got job 3 (foreach at console:13)
 with 2 output partitions (allowLocal=false)
 14/11/10 22:48:02 INFO DAGScheduler: Final stage: Stage 3(foreach at
 console:13)
 14/11/10 22:48:02 INFO DAGScheduler: Parents of final stage: List()
 14/11/10 22:48:02 INFO DAGScheduler: Missing parents: List()
 14/11/10 22:48:02 INFO DAGScheduler: Submitting Stage 3 (Desktop/mnd.txt
 MappedRDD[7] at textFile at console:13), which has no missing parents
 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(2504) called with
 curMem=941997, maxMem=463837593
 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7 stored as values in
 memory (estimated size 2.4 KB, free 441.4 MB)
 14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(1602) called with
 curMem=944501, maxMem=463837593
 14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7_piece0 stored as
 bytes in memory (estimated size 1602.0 B, free 441.4 MB)
 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in
 memory on gonephishing.local:42648 (size: 1602.0 B, free: 442.3 MB)
 14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block
 broadcast_7_piece0
 14/11/10 22:48:02 INFO DAGScheduler: Submitting 2 missing tasks from Stage
 3 (Desktop/mnd.txt MappedRDD[7] at textFile at console:13)
 14/11/10 22:48:02 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
 14/11/10 22:48:02 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
 6, gonephishing.local, PROCESS_LOCAL, 1216 bytes)
 14/11/10 22:48:02 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID
 7, gonephishing.local, PROCESS_LOCAL, 1216 bytes)
 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in
 memory on gonephishing.local:48857 (size: 1602.0 B, free: 442.3 MB)
 14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in
 memory on gonephishing.local:48857 (size: 16.8 KB, free: 442.3 MB)
 14/11/10 22:48:02 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID
 6) in 308 ms on gonephishing.local (1/2)
 14/11/10 22:48:02 INFO DAGScheduler: Stage 3 (foreach at console:13)
 finished in 0.321 s
 14/11/10 22:48:02 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID
 7) in 315 ms on gonephishing.local (2/2)
 14/11/10 22:48:02 INFO SparkContext: Job finished: foreach at
 console:13, took 0.376602079 s
 14/11/10 22:48:02 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks
 have all completed, from pool

 ===



 On Mon, Nov 10, 2014 at 8:01 PM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 ​Try adding the following configurations also, might work.

  spark.rdd.compress true

   spark.storage.memoryFraction 1
   spark.core.connection.ack.wait.timeout 600
   spark.akka.frameSize 50

 Thanks
 Best Regards

 On Mon, Nov 10, 2014 at 6:51 PM, Ritesh Kumar Singh 
 riteshoneinamill...@gmail.com wrote:

 Hi,

 I am trying to submit my application using spark-submit, using following
 spark-default.conf params:

 spark.master spark://master-ip:7077
 spark.eventLog.enabled   true
 spark.serializer
 org.apache.spark.serializer.KryoSerializer
 spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value
 -Dnumbers=one two three

 ===
 But every time I am getting this error:

 14/11/10 18:39:17 ERROR TaskSchedulerImpl: Lost executor 1 on aa.local:
 remote Akka client disassociated
 14/11/10 18:39:17 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID
 1, aa.local): ExecutorLostFailure (executor lost)
 14/11/10 18:39:17

Fwd: Executor Lost Failure

2014-11-10 Thread Ritesh Kumar Singh
-- Forwarded message --
From: Ritesh Kumar Singh riteshoneinamill...@gmail.com
Date: Mon, Nov 10, 2014 at 10:52 PM
Subject: Re: Executor Lost Failure
To: Akhil Das ak...@sigmoidanalytics.com


Tasks are now getting submitted, but many tasks don't happen.
Like, after opening the spark-shell, I load a text file from disk and try
printing its contentsas:

sc.textFile(/path/to/file).foreach(println)

It does not give me any output. While running this:

sc.textFile(/path/to/file).count

gives me the right number of lines in the text file.
Not sure what the error is. But here is the output on the console for print
case:

14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(215230) called with
curMem=709528, maxMem=463837593
14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6 stored as values in
memory (estimated size 210.2 KB, free 441.5 MB)
14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(17239) called with
curMem=924758, maxMem=463837593
14/11/10 22:48:02 INFO MemoryStore: Block broadcast_6_piece0 stored as
bytes in memory (estimated size 16.8 KB, free 441.5 MB)
14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory
on gonephishing.local:42648 (size: 16.8 KB, free: 442.3 MB)
14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block
broadcast_6_piece0
14/11/10 22:48:02 INFO FileInputFormat: Total input paths to process : 1
14/11/10 22:48:02 INFO SparkContext: Starting job: foreach at console:13
14/11/10 22:48:02 INFO DAGScheduler: Got job 3 (foreach at console:13)
with 2 output partitions (allowLocal=false)
14/11/10 22:48:02 INFO DAGScheduler: Final stage: Stage 3(foreach at
console:13)
14/11/10 22:48:02 INFO DAGScheduler: Parents of final stage: List()
14/11/10 22:48:02 INFO DAGScheduler: Missing parents: List()
14/11/10 22:48:02 INFO DAGScheduler: Submitting Stage 3 (Desktop/mnd.txt
MappedRDD[7] at textFile at console:13), which has no missing parents
14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(2504) called with
curMem=941997, maxMem=463837593
14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7 stored as values in
memory (estimated size 2.4 KB, free 441.4 MB)
14/11/10 22:48:02 INFO MemoryStore: ensureFreeSpace(1602) called with
curMem=944501, maxMem=463837593
14/11/10 22:48:02 INFO MemoryStore: Block broadcast_7_piece0 stored as
bytes in memory (estimated size 1602.0 B, free 441.4 MB)
14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory
on gonephishing.local:42648 (size: 1602.0 B, free: 442.3 MB)
14/11/10 22:48:02 INFO BlockManagerMaster: Updated info of block
broadcast_7_piece0
14/11/10 22:48:02 INFO DAGScheduler: Submitting 2 missing tasks from Stage
3 (Desktop/mnd.txt MappedRDD[7] at textFile at console:13)
14/11/10 22:48:02 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
14/11/10 22:48:02 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
6, gonephishing.local, PROCESS_LOCAL, 1216 bytes)
14/11/10 22:48:02 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID
7, gonephishing.local, PROCESS_LOCAL, 1216 bytes)
14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory
on gonephishing.local:48857 (size: 1602.0 B, free: 442.3 MB)
14/11/10 22:48:02 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory
on gonephishing.local:48857 (size: 16.8 KB, free: 442.3 MB)
14/11/10 22:48:02 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID
6) in 308 ms on gonephishing.local (1/2)
14/11/10 22:48:02 INFO DAGScheduler: Stage 3 (foreach at console:13)
finished in 0.321 s
14/11/10 22:48:02 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID
7) in 315 ms on gonephishing.local (2/2)
14/11/10 22:48:02 INFO SparkContext: Job finished: foreach at console:13,
took 0.376602079 s
14/11/10 22:48:02 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks
have all completed, from pool

===



On Mon, Nov 10, 2014 at 8:01 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 ​Try adding the following configurations also, might work.

  spark.rdd.compress true

   spark.storage.memoryFraction 1
   spark.core.connection.ack.wait.timeout 600
   spark.akka.frameSize 50

 Thanks
 Best Regards

 On Mon, Nov 10, 2014 at 6:51 PM, Ritesh Kumar Singh 
 riteshoneinamill...@gmail.com wrote:

 Hi,

 I am trying to submit my application using spark-submit, using following
 spark-default.conf params:

 spark.master spark://master-ip:7077
 spark.eventLog.enabled   true
 spark.serializer
 org.apache.spark.serializer.KryoSerializer
 spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value
 -Dnumbers=one two three

 ===
 But every time I am getting this error:

 14/11/10 18:39:17 ERROR TaskSchedulerImpl: Lost executor 1 on aa.local:
 remote Akka client disassociated
 14/11/10 18:39:17 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
 aa.local