from:"prabeesh k"

Fwd: Container preempted by scheduler - Spark job error

2016-06-02 Thread Prabeesh K.

Hi Ted

We use Hadoop 2.6 & Spark 1.3.1. I also attached the error file to this
mail, please have a look at it.

Thanks

On Thu, Jun 2, 2016 at 11:51 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Can you show the error in bit more detail ?
>
> Which release of hadoop / Spark are you using ?
>
> Is CapacityScheduler being used ?
>
> Thanks
>
> On Thu, Jun 2, 2016 at 1:32 AM, Prabeesh K. <prabsma...@gmail.com> wrote:
>
>> Hi I am using the below command to run a spark job and I get an error
>> like "Container preempted by scheduler"
>>
>> I am not sure if it's related to the wrong usage of Memory:
>>
>> nohup ~/spark1.3/bin/spark-submit \ --num-executors 50 \ --master yarn \
>> --deploy-mode cluster \ --queue adhoc \ --driver-memory 18G \
>> --executor-memory 12G \ --class main.ru..bigdata.externalchurn.Main
>> \ --conf "spark.task.maxFailures=100" \ --conf
>> "spark.yarn.max.executor.failures=1" \ --conf "spark.executor.cores=1"
>> \ --conf "spark.akka.frameSize=50" \ --conf
>> "spark.storage.memoryFraction=0.5" \ --conf
>> "spark.driver.maxResultSize=10G" \
>> ~/external-flow/externalChurn-1.0-SNAPSHOT-shaded.jar \
>> prepareTraining=true \ prepareTrainingMNP=true \ prepareMap=false \
>> bouldozerMode=true \ &> ~/external-flow/run.log & echo "STARTED" tail -f
>> ~/external-flow/run.log
>>
>> Thanks,
>>
>>
>>
>>
>>
>


spark-error
Description: Binary data

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Container preempted by scheduler - Spark job error

2016-06-02 Thread Prabeesh K.

Hi I am using the below command to run a spark job and I get an error like
"Container preempted by scheduler"

I am not sure if it's related to the wrong usage of Memory:

nohup ~/spark1.3/bin/spark-submit \ --num-executors 50 \ --master yarn \
--deploy-mode cluster \ --queue adhoc \ --driver-memory 18G \
--executor-memory 12G \ --class main.ru..bigdata.externalchurn.Main
\ --conf "spark.task.maxFailures=100" \ --conf
"spark.yarn.max.executor.failures=1" \ --conf "spark.executor.cores=1"
\ --conf "spark.akka.frameSize=50" \ --conf
"spark.storage.memoryFraction=0.5" \ --conf
"spark.driver.maxResultSize=10G" \
~/external-flow/externalChurn-1.0-SNAPSHOT-shaded.jar \
prepareTraining=true \ prepareTrainingMNP=true \ prepareMap=false \
bouldozerMode=true \ &> ~/external-flow/run.log & echo "STARTED" tail -f
~/external-flow/run.log

Thanks,

Re: Spark + Jupyter (IPython Notebook)

2015-08-18 Thread Prabeesh K.

Refer this post
http://blog.prabeeshk.com/blog/2015/06/19/pyspark-notebook-with-docker/

Spark + Jupyter + Docker

On 18 August 2015 at 21:29, Jerry Lam chiling...@gmail.com wrote:

 Hi Guru,

 Thanks! Great to hear that someone tried it in production. How do you like
 it so far?

 Best Regards,

 Jerry


 On Tue, Aug 18, 2015 at 11:38 AM, Guru Medasani gdm...@gmail.com wrote:

 Hi Jerry,

 Yes. I’ve seen customers using this in production for data science work.
 I’m currently using this for one of my projects on a cluster as well.

 Also, here is a blog that describes how to configure this.


 http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/


 Guru Medasani
 gdm...@gmail.com



 On Aug 18, 2015, at 8:35 AM, Jerry Lam chiling...@gmail.com wrote:

 Hi spark users and developers,

 Did anyone have IPython Notebook (Jupyter) deployed in production that
 uses Spark as the computational engine?

 I know Databricks Cloud provides similar features with deeper integration
 with Spark. However, Databricks Cloud has to be hosted by Databricks so we
 cannot do this.

 Other solutions (e.g. Zeppelin) seem to reinvent the wheel that IPython
 has already offered years ago. It would be great if someone can educate me
 the reason behind this.

 Best Regards,

 Jerry

Re: Packaging Java + Python library

2015-04-13 Thread prabeesh k

Refer this post
http://blog.prabeeshk.com/blog/2015/04/07/self-contained-pyspark-application/

On 13 April 2015 at 17:41, Punya Biswal pbis...@palantir.com wrote:

 Dear Spark users,

 My team is working on a small library that builds on PySpark and is
 organized like PySpark as well -- it has a JVM component (that runs in the
 Spark driver and executor) and a Python component (that runs in the PySpark
 driver and executor processes). What's a good approach for packaging such a
 library?

 Some ideas we've considered:

- Package up the JVM component as a Jar and the Python component as a
binary egg. This is reasonable but it means that there are two separate
artifacts that people have to manage and keep in sync.
- Include Python files in the Jar and add it to the PYTHONPATH. This
follows the example of the Spark assembly jar, but deviates from the Python
community's standards.

 We'd really appreciate hearing experiences from other people who have
 built libraries on top of PySpark.

 Punya

Re: How to learn Spark ?

2015-04-02 Thread prabeesh k

You can also refer this blog http://blog.prabeeshk.com/blog/archives/

On 2 April 2015 at 12:19, Star Guo st...@ceph.me wrote:

 Hi, all



 I am new to here. Could you give me some suggestion to learn Spark ?
 Thanks.



 Best Regards,

 Star Guo

Re: Beginner in Spark

2015-02-10 Thread prabeesh k

Refer this blog
http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/
for step by step installation of Spark on Ubuntu

On 7 February 2015 at 03:12, Matei Zaharia matei.zaha...@gmail.com wrote:

You don't need HDFS or virtual machines to run Spark. You can just
download it, unzip it and run it on your laptop. See
http://spark.apache.org/docs/latest/index.html.

Matei

On Feb 6, 2015, at 2:58 PM, David Fallside falls...@us.ibm.com wrote:

King, consider trying the Spark Kernel (
https://github.com/ibm-et/spark-kernel) which will install Spark etc and
provide you with a Spark/Scala Notebook in which you can develop your
algorithm. The Vagrant installation described in
https://github.com/ibm-et/spark-kernel/wiki/Vagrant-Development-Environment
will
have you quickly up and running on a single machine without having to
manage the details of the system installations. There is a Docker version,
https://github.com/ibm-et/spark-kernel/wiki/Using-the-Docker-Container-for-the-Spark-Kernel,
if you prefer Docker.
Regards,
David

King sami kgsam...@gmail.com wrote on 02/06/2015 08:09:39 AM:

From: King sami kgsam...@gmail.com
To: user@spark.apache.org
Date: 02/06/2015 08:11 AM
Subject: Beginner in Spark

Hi,

I'm new in Spark, I'd like to install Spark with Scala. The aim is
to build a data processing system foor door events.

the first step is install spark, scala, hdfs and other required tools.
the second is build the algorithm programm in Scala which can treat
a file of my data logs (events).

Could you please help me to install the required tools: Spark,
Scala, HDF and tell me how can I execute my programm treating the entry
file.

Best regards,

Re: Kestrel and Spark Stream

2014-11-18 Thread prabeesh k

You can refer the following link
https://github.com/prabeesh/Spark-Kestrel

On Tue, Nov 18, 2014 at 3:51 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 You can implement a custom receiver
 http://spark.apache.org/docs/latest/streaming-custom-receivers.html to
 connect to Kestrel and use it. I think someone have already tried it, not
 sure if it is working though. Here's the link
 https://github.com/prabeesh/Spark-Kestrel/blob/master/streaming/src/main/scala/spark/streaming/dstream/KestrelInputDStream.scala
 .

 Thanks
 Best Regards

 On Tue, Nov 18, 2014 at 4:23 PM, Eduardo Alfaia e.costaalf...@unibs.it
 wrote:

 Hi guys,
 Has anyone already tried doing this work?

 Thanks

 Informativa sulla Privacy: http://www.unibs.it/node/8155

Re: Unable to run a Standalone job

2014-06-05 Thread prabeesh k

try sbt clean command before build the app.

or delete .ivy2 ans .sbt  folders(not a good methode). Then try to rebuild
the project.


On Thu, Jun 5, 2014 at 11:45 AM, Sean Owen so...@cloudera.com wrote:

 I think this is SPARK-1949 again: https://github.com/apache/spark/pull/906
 I think this change fixed this issue for a few people using the SBT
 build, worth committing?

 On Thu, Jun 5, 2014 at 6:40 AM, Shrikar archak shrika...@gmail.com
 wrote:
  Hi All,
  Now that the Spark Version 1.0.0 is release there should not be any
 problem
  with the local jars.
  Shrikars-MacBook-Pro:SimpleJob shrikar$ cat simple.sbt
  name := Simple Project
 
  version := 1.0
 
  scalaVersion := 2.10.4
 
  libraryDependencies ++= Seq(org.apache.spark %% spark-core % 1.0.0,
  org.apache.spark %% spark-streaming %
  1.0.0)
 
  resolvers += Akka Repository at http://repo.akka.io/releases/;
 
  I am still having this issue
  [error] (run-main) java.lang.NoClassDefFoundError:
  javax/servlet/http/HttpServletResponse
  java.lang.NoClassDefFoundError: javax/servlet/http/HttpServletResponse
  at org.apache.spark.HttpServer.start(HttpServer.scala:54)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.createServer(HttpBroadcast.scala:156)
  at
 
 org.apache.spark.broadcast.HttpBroadcast$.initialize(HttpBroadcast.scala:127)
  at
 
 org.apache.spark.broadcast.HttpBroadcastFactory.initialize(HttpBroadcastFactory.scala:31)
  at
 
 org.apache.spark.broadcast.BroadcastManager.initialize(BroadcastManager.scala:48)
  at
 
 org.apache.spark.broadcast.BroadcastManager.init(BroadcastManager.scala:35)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:218)
  at org.apache.spark.SparkContext.init(SparkContext.scala:202)
 
  Any help would be greatly appreciated.
 
  Thanks,
  Shrikar
 
 
  On Fri, May 23, 2014 at 3:58 PM, Shrikar archak shrika...@gmail.com
 wrote:
 
  Still the same error no change
 
  Thanks,
  Shrikar
 
 
  On Fri, May 23, 2014 at 2:38 PM, Jacek Laskowski ja...@japila.pl
 wrote:
 
  Hi Shrikar,
 
  How did you build Spark 1.0.0-SNAPSHOT on your machine? My
  understanding is that `sbt publishLocal` is not enough and you really
  need `sbt assembly` instead. Give it a try and report back.
 
  As to your build.sbt, upgrade Scala to 2.10.4 and org.apache.spark
  %% spark-streaming % 1.0.0-SNAPSHOT only that will pull down
  spark-core as a transitive dep. The resolver for Akka Repository is
  not needed. Your build.sbt should really look as follows:
 
  name := Simple Project
 
  version := 1.0
 
  scalaVersion := 2.10.4
 
  libraryDependencies += org.apache.spark %% spark-streaming %
  1.0.0-SNAPSHOT
 
  Jacek
 
  On Thu, May 22, 2014 at 11:27 PM, Shrikar archak shrika...@gmail.com
  wrote:
   Hi All,
  
   I am trying to run the network count example as a seperate standalone
   job
   and running into some issues.
  
   Environment:
   1) Mac Mavericks
   2) Latest spark repo from Github.
  
  
   I have a structure like this
  
   Shrikars-MacBook-Pro:SimpleJob shrikar$ find .
   .
   ./simple.sbt
   ./src
   ./src/main
   ./src/main/scala
   ./src/main/scala/NetworkWordCount.scala
   ./src/main/scala/SimpleApp.scala.bk
  
  
   simple.sbt
   name := Simple Project
  
   version := 1.0
  
   scalaVersion := 2.10.3
  
   libraryDependencies ++= Seq(org.apache.spark %% spark-core %
   1.0.0-SNAPSHOT,
   org.apache.spark %% spark-streaming %
   1.0.0-SNAPSHOT)
  
   resolvers += Akka Repository at http://repo.akka.io/releases/;
  
  
   I am able to run the SimpleApp which is mentioned in the doc but
 when I
   try
   to run the NetworkWordCount app I get error like this am I missing
   something?
  
   [info] Running com.shrikar.sparkapps.NetworkWordCount
   14/05/22 14:26:47 INFO spark.SecurityManager: Changing view acls to:
   shrikar
   14/05/22 14:26:47 INFO spark.SecurityManager: SecurityManager:
   authentication disabled; ui acls disabled; users with view
 permissions:
   Set(shrikar)
   14/05/22 14:26:48 INFO slf4j.Slf4jLogger: Slf4jLogger started
   14/05/22 14:26:48 INFO Remoting: Starting remoting
   14/05/22 14:26:48 INFO Remoting: Remoting started; listening on
   addresses
   :[akka.tcp://spark@192.168.10.88:49963]
   14/05/22 14:26:48 INFO Remoting: Remoting now listens on addresses:
   [akka.tcp://spark@192.168.10.88:49963]
   14/05/22 14:26:48 INFO spark.SparkEnv: Registering MapOutputTracker
   14/05/22 14:26:48 INFO spark.SparkEnv: Registering BlockManagerMaster
   14/05/22 14:26:48 INFO storage.DiskBlockManager: Created local
   directory at
  
  
 /var/folders/r2/mbj08pb55n5d_9p8588xk5b0gn/T/spark-local-20140522142648-0a14
   14/05/22 14:26:48 INFO storage.MemoryStore: MemoryStore started with
   capacity 911.6 MB.
   14/05/22 14:26:48 INFO network.ConnectionManager: Bound socket to
 port
   49964
   with id = ConnectionManagerId(192.168.10.88,49964)
   14/05/22 14:26:48 INFO storage.BlockManagerMaster: Trying to register
   BlockManager

Re: mismatched hdfs protocol

2014-06-04 Thread prabeesh k

For building Spark for particular version of Hadoop
Refer
http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html


On Thu, Jun 5, 2014 at 8:14 AM, Koert Kuipers ko...@tresata.com wrote:

 you have to build spark against the version of hadoop your are using


 On Wed, Jun 4, 2014 at 10:25 PM, bluejoe2008 bluejoe2...@gmail.com
 wrote:

  hi, all
 when my spark program accessed hdfs files
 an error happened:

 Exception in thread main org.apache.hadoop.ipc.RemoteException: Server IPC 
 version 9 cannot communicate with client version 4


 it seems the client was trying to connect hadoop2 via an old hadoop
 protocol

 so my question is:
 how to specify the version of hadoop on connection?

 thank you!

  bluejoe

 2014-06-05
 --

Unable to execute saveAsTextFile on multi node mesos

2014-05-31 Thread prabeesh k

Hi,

scenario : Read data from HDFS and apply hive query  on it and the result
is written back to HDFS.

 Scheme creation, Querying  and saveAsTextFile are working fine with
following mode

   - local mode
   - mesos cluster with single node
   - spark cluster with multi node

Schema creation and querying are working fine with mesos multi node cluster.
But  while trying to write back to HDFS using saveAsTextFile, the following
error occurs

* 14/05/30 10:16:35 INFO DAGScheduler: The failed fetch was from Stage 4
(mapPartitionsWithIndex at Operator.scala:333); marking it for resubmission*
*14/05/30 10:16:35 INFO DAGScheduler: Executor lost:
201405291518-3644595722-5050-17933-1 (epoch 148)*

Let me know your thoughts regarding this.

Regards,
prabeesh

Re: Announcing Spark 1.0.0

2014-05-30 Thread prabeesh k

Please update the http://spark.apache.org/docs/latest/  link


On Fri, May 30, 2014 at 4:03 PM, Margusja mar...@roo.ee wrote:

 Is it possible to download pre build package?
 http://mirror.symnds.com/software/Apache/incubator/
 spark/spark-1.0.0/spark-1.0.0-bin-hadoop2.tgz - gives me 404

 Best regards, Margus (Margusja) Roo
 +372 51 48 780
 http://margus.roo.ee
 http://ee.linkedin.com/in/margusroo
 skype: margusja
 ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)


 On 30/05/14 13:18, Christopher Nguyen wrote:

 Awesome work, Pat et al.!

 --
 Christopher T. Nguyen
 Co-founder  CEO, Adatao http://adatao.com
 linkedin.com/in/ctnguyen http://linkedin.com/in/ctnguyen




 On Fri, May 30, 2014 at 3:12 AM, Patrick Wendell pwend...@gmail.com
 mailto:pwend...@gmail.com wrote:

 I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0
 is a milestone release as the first in the 1.0 line of releases,
 providing API stability for Spark's core interfaces.

 Spark 1.0.0 is Spark's largest release ever, with contributions from
 117 developers. I'd like to thank everyone involved in this release -
 it was truly a community effort with fixes, features, and
 optimizations contributed from dozens of organizations.

 This release expands Spark's standard libraries, introducing a new SQL
 package (SparkSQL) which lets users integrate SQL queries into
 existing Spark workflows. MLlib, Spark's machine learning library, is
 expanded with sparse vector support and several new algorithms. The
 GraphX and Streaming libraries also introduce new features and
 optimizations. Spark's core engine adds support for secured YARN
 clusters, a unified tool for submitting Spark applications, and
 several performance and stability improvements. Finally, Spark adds
 support for Java 8 lambda syntax and improves coverage of the Java and
 Python API's.

 Those features only scratch the surface - check out the release
 notes here:
 http://spark.apache.org/releases/spark-release-1-0-0.html

 Note that since release artifacts were posted recently, certain
 mirrors may not have working downloads for a few hours.

 - Patrick

java.lang.OutOfMemoryError while running Shark on Mesos

2014-05-22 Thread prabeesh k

Hi,

I am trying to apply  inner join in shark using 64MB and 27MB files. I am
able to run the following queris on Mesos


   - SELECT * FROM geoLocation1 



   -  SELECT * FROM geoLocation1  WHERE  country =  'US' 


But while trying inner join as

 SELECT * FROM geoLocation1 g1 INNER JOIN geoBlocks1 g2 ON (g1.locId =
g2.locId)



I am getting following error as follows.


Exception in thread main org.apache.spark.SparkException: Job aborted:
Task 1.0:7 failed 4 times (most recent failure: Exception failure:
java.lang.OutOfMemoryError: Java heap space)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
 at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
 at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
 at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Please help me to resolve this.

Thanks in adv

regards,
prabeesh

Re: Better option to use Querying in Spark

2014-05-06 Thread prabeesh k

Thank you for your prompt reply.

Regards,
prabeesh


On Tue, May 6, 2014 at 11:44 AM, Mayur Rustagi mayur.rust...@gmail.comwrote:

 All three have different usecases. If you are looking for more of a
 warehouse you are better off with Shark.
 SparkSQL is a way to query regular data in sql like syntax leveraging
 columnar store.

 BlinkDB is a experiment, meant to integrate with Shark in the long term.
 Not meant for production usecase directly.


 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
 @mayur_rustagi https://twitter.com/mayur_rustagi



 On Tue, May 6, 2014 at 11:22 AM, prabeesh k prabsma...@gmail.com wrote:

  Hi,

 I have seen three different ways to query data from Spark

1. Default SQL support(

 https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/sql/examples/HiveFromSpark.scala
)
2. Shark
3. Blink DB


 I would like know which one is more efficient

 Regards.
 prabeesh

Re: Compile SimpleApp.scala encountered error, please can any one help?

2014-04-12 Thread prabeesh k

ensure the only one SimpleApp object in your project, also check is there
any copy of SimpleApp.scala.

Normally the  file SimpleApp.scala in src/main/scala or in the project root
folder.


On Sat, Apr 12, 2014 at 11:07 AM, jni2000 james...@federatedwireless.comwrote:

 Hi

  I am a new Spark user and try to test run it from scratch. I followed the
 documentation and was able the build the Spark package and run the spark
 shell. However when I move on to building the standalone sample
 SimpleApp.scala, I see the following errors:

 Loading /usr/share/sbt/bin/sbt-launch-lib.bash
 [info] Set current project to Simple Project (in build
 file:/home/james/workplace/framework/3rd-party/spark-0.9.0/test-project/)
 [info] Compiling 1 Scala source and 1 Java source to

 /home/james/workplace/framework/3rd-party/spark-0.9.0/test-project/target/scala-2.10/classes...
 [error]

 /home/james/workplace/framework/3rd-party/spark-0.9.0/test-project/src/main/scala/SimpleApp.scala:5:
 SimpleApp is already defined as object SimpleApp
 [error] object SimpleApp {
 [error]^
 [error] one error found
 [error] (compile:compile) Compilation failed
 [error] Total time: 2 s, completed Apr 12, 2014 1:12:43 AM

 Can some one help me understand what could be wrong?

 Thanks a lot.

 James



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Compile-SimpleApp-scala-encountered-error-please-can-any-one-help-tp4160.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

[BLOG] For Beginners

2014-04-07 Thread prabeesh k

Hi all,

Here I am sharing a blog for beginners, about creating spark streaming
stand alone application and bundle the app as single runnable jar. Take a
look and drop your comments in blog page.

http://prabstechblog.blogspot.in/2014/04/a-standalone-spark-application-in-scala.html

http://prabstechblog.blogspot.in/2014/04/creating-single-jar-for-spark-project.html

prabeesh