Standalone Spark, How to find (driver's ) final status for an application

2019-09-25 Thread Nilkanth Patel
I am setting up *Spark 2.2.0 in standalone mode* ( https://spark.apache.org/docs/latest/spark-standalone.html) and submitting spark jobs programatically using SparkLauncher sparkAppLauncher = new SparkLauncher(userNameMap).setMaster(sparkMaster).setAppName(appName).; SparkAppHandle

Re: Question about standalone Spark cluster reading from Kerberosed hadoop

2017-06-23 Thread Mu Kong
n does when when you give it keytab. > > > see also: > https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details > > On Fri, Jun 23, 2017 at 5:10 PM, Mu Kong <kong.mu@gmail.com> wrote: > >> Hi, all! >> >> I was trying to read from a Kerberosed

Re: Question about standalone Spark cluster reading from Kerberosed hadoop

2017-06-23 Thread Steve Loughran
: Hi, all! I was trying to read from a Kerberosed hadoop cluster from a standalone spark cluster. Right now, I encountered some authentication issues with Kerberos: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client

Re: Question about standalone Spark cluster reading from Kerberosed hadoop

2017-06-23 Thread Saisai Shao
On Fri, Jun 23, 2017 at 5:10 PM, Mu Kong <kong.mu@gmail.com> wrote: > Hi, all! > > I was trying to read from a Kerberosed hadoop cluster from a standalone > spark cluster. > Right now, I encountered some authentication issues with Kerberos: > > > java.io.IOExcept

Question about standalone Spark cluster reading from Kerberosed hadoop

2017-06-23 Thread Mu Kong
Hi, all! I was trying to read from a Kerberosed hadoop cluster from a standalone spark cluster. Right now, I encountered some authentication issues with Kerberos: java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client

Native libraries using only one core in standalone spark cluster

2016-09-26 Thread guangweiyu
Hi, I'm trying to run a spark job that uses multiple cpu cores per spark executor in a spark job. Specifically, it runs the gemm matrix multiply routine from each partition on a large matrix that cannot be distributed. For test purpose, I have a machine with 8 cores running standalone spark. I

Re: Building standalone spark application via sbt

2016-07-20 Thread Sachin Mittal
t; this is a case that happen often) > > Btw did you get the NoClassDefFoundException at compile time or run > time?if at run time, what is your Spark Version and what is the spark > libraries version you used in your sbt? > are you using a Spark version pre 1.4? > > kr > marco

Re: Building standalone spark application via sbt

2016-07-20 Thread Marco Mistroni
version pre 1.4? kr marco On Wed, Jul 20, 2016 at 6:13 PM, Sachin Mittal <sjmit...@gmail.com> wrote: > NoClassDefFound error was for spark classes like say SparkConext. > When running a standalone spark application I was not passing external > jars using --jars option. >

Re: Building standalone spark application via sbt

2016-07-20 Thread Sachin Mittal
NoClassDefFound error was for spark classes like say SparkConext. When running a standalone spark application I was not passing external jars using --jars option. However I have fixed this by making a fat jar using sbt assembly plugin. Now all the dependencies are included in that jar and I use

Re: Building standalone spark application via sbt

2016-07-20 Thread Marco Mistroni
Hello Sachin pls paste the NoClassDefFound Exception so we can see what's failing, aslo please advise how are you running your Spark App For an extremely simple case, let's assume you have your MyFirstSparkApp packaged in your myFirstSparkApp.jar Then all you need to do would be to kick off

Re: Building standalone spark application via sbt

2016-07-20 Thread Mich Talebzadeh
you need an uber jar file. Have you actually followed the dependencies and project sub-directory build? check this. http://stackoverflow.com/questions/28459333/how-to-build-an-uber-jar-fat-jar-using-sbt-within-intellij-idea under three answers the top one. I started reading the official SBT

Re: Building standalone spark application via sbt

2016-07-20 Thread Sachin Mittal
Hi, I am following the example under https://spark.apache.org/docs/latest/quick-start.html For standalone scala application. I added all my dependencies via build.sbt (one dependency is under lib folder). When I run sbt package I see the jar created under target/scala-2.10/ So compile seems to

Re: Building standalone spark application via sbt

2016-07-19 Thread Andrew Ehrlich
Yes, spark-core will depend on Hadoop and several other jars. Here’s the list of dependencies: https://github.com/apache/spark/blob/master/core/pom.xml#L35 Whether you need spark-sql depends on whether you will use the DataFrame

Building standalone spark application via sbt

2016-07-19 Thread Sachin Mittal
Hi, Can someone please guide me what all jars I need to place in my lib folder of the project to build a standalone scala application via sbt. Note I need to provide static dependencies and I cannot download the jars using libraryDependencies. So I need to provide all the jars upfront. So far I

Re: [ Standalone Spark Cluster ] - Track node status

2016-06-09 Thread Mich Talebzadeh
; >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 9 June 2016 at 01:27, Rutuja Kulkarni <rutuja.kulkarn...@gmail.com> >> wrote: >> >>> Thank you for the quick response. >>> So the workers section would list all t

Re: [ Standalone Spark Cluster ] - Track node status

2016-06-09 Thread Rutuja Kulkarni
wrote: > >> Thank you for the quick response. >> So the workers section would list all the running worker nodes in the >> standalone Spark cluster? >> I was also wondering if this is the only way to retrieve worker nodes or >> is there something like a Web API or CLI I could use? &g

Re: [ Standalone Spark Cluster ] - Track node status

2016-06-08 Thread Mich Talebzadeh
Rutuja Kulkarni <rutuja.kulkarn...@gmail.com> wrote: > Thank you for the quick response. > So the workers section would list all the running worker nodes in the > standalone Spark cluster? > I was also wondering if this is the only way to retrieve worker nodes or > is there someth

Re: [ Standalone Spark Cluster ] - Track node status

2016-06-08 Thread Rutuja Kulkarni
Thank you for the quick response. So the workers section would list all the running worker nodes in the standalone Spark cluster? I was also wondering if this is the only way to retrieve worker nodes or is there something like a Web API or CLI I could use? Thanks. Regards, Rutuja On Wed, Jun 8

Re: [ Standalone Spark Cluster ] - Track node status

2016-06-08 Thread Mich Talebzadeh
8Pw>* http://talebzadehmich.wordpress.com On 8 June 2016 at 23:56, Rutuja Kulkarni <rutuja.kulkarn...@gmail.com> wrote: > Hello! > > I'm trying to setup a standalone spark cluster and wondering how to track > status of all of it's nodes. I wonder if something like Yarn REST

[ Standalone Spark Cluster ] - Track node status

2016-06-08 Thread Rutuja Kulkarni
Hello! I'm trying to setup a standalone spark cluster and wondering how to track status of all of it's nodes. I wonder if something like Yarn REST API or HDFS CLI exists in Spark world that can provide status of nodes on such a cluster. Any pointers would be greatly appreciated. -- *Regards

python application cluster mode in standalone spark cluster

2016-05-25 Thread Jan Sourek
A the official documentation states 'Currently only YARN supports cluster mode for Python applications.' I would like to know if work is being done or planned to support cluster mode for Python applications on standalone spark clusters? -- View this message in context: http://apache-spark

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Takeshi Yamamuro
, 2016 at 6:46 PM, Alexander Pivovarov >>> <apivova...@gmail.com> wrote: >>> > AWS EMR includes Spark on Yarn >>> > Hortonworks and Cloudera platforms include Spark on Yarn as well >>> > >>> > >>> > On Thu, Apr 14, 2016

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Mark Hamstra
@gmail.com> wrote: >> > AWS EMR includes Spark on Yarn >> > Hortonworks and Cloudera platforms include Spark on Yarn as well >> > >> > >> > On Thu, Apr 14, 2016 at 7:29 AM, Arkadiusz Bicz < >> arkadiusz.b...@gmail.com> >> > wrote:

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Alexander Pivovarov
Arkadiusz Bicz < > arkadiusz.b...@gmail.com> > > wrote: > >> > >> Hello, > >> > >> Is there any statistics regarding YARN vs Standalone Spark Usage in > >> production ? > >> &

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Sean Owen
k on Yarn > Hortonworks and Cloudera platforms include Spark on Yarn as well > > > On Thu, Apr 14, 2016 at 7:29 AM, Arkadiusz Bicz <arkadiusz.b...@gmail.com> > wrote: >> >> Hello, >> >> Is there any statistics regarding YARN vs Standalone Spark Usage in >&

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Mich Talebzadeh
adiusz.b...@gmail.com> > wrote: > >> Hello, >> >> Is there any statistics regarding YARN vs Standalone Spark Usage in >> production ? >> >> I would like to choose most supported and used tec

Re: YARN vs Standalone Spark Usage in production

2016-04-14 Thread Alexander Pivovarov
AWS EMR includes Spark on Yarn Hortonworks and Cloudera platforms include Spark on Yarn as well On Thu, Apr 14, 2016 at 7:29 AM, Arkadiusz Bicz <arkadiusz.b...@gmail.com> wrote: > Hello, > > Is there any statistics regarding YARN vs Standalone Spark Usage in > production

YARN vs Standalone Spark Usage in production

2016-04-14 Thread Arkadiusz Bicz
Hello, Is there any statistics regarding YARN vs Standalone Spark Usage in production ? I would like to choose most supported and used technology in production for our project. BR, Arkadiusz Bicz - To unsubscribe, e-mail

Re: Spark jobs run extremely slow on yarn cluster compared to standalone spark

2016-02-14 Thread Yuval.Itzchakov
-run-extremely-slow-on-yarn-cluster-compared-to-standalone-spark-tp26215p26221.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Spark jobs run extremely slow on yarn cluster compared to standalone spark

2016-02-12 Thread pdesai
Hi there, I am doing a POC with Spark and I have noticed that if I run my job on standalone spark installation, it finishes in a second(It's a small sample job). But when I run same job on spark cluster with Yarn, it takes 4-5 min in simple execution. Are there any best practices that I need

Re: Cannot connect to standalone spark cluster

2015-10-14 Thread Akhil Das
com> wrote: > Hi, > I'm trying to run a java application that connects to a local standalone > spark cluster. I start the cluster with the default configuration, using > start-all.sh. When I go to the web page for the cluster, it is started ok. > I can connect to this cluster wi

Cannot connect to standalone spark cluster

2015-10-09 Thread ekraffmiller
Hi, I'm trying to run a java application that connects to a local standalone spark cluster. I start the cluster with the default configuration, using start-all.sh. When I go to the web page for the cluster, it is started ok. I can connect to this cluster with SparkR, but when I use the same

Convert Simple Kafka Consumer to standalone Spark JavaStream Consumer

2015-07-21 Thread Hafsa Asif
integration. Can anyone help me to reform this code so that I can get same output with Spark Kafka integration. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Convert-Simple-Kafka-Consumer-to-standalone-Spark-JavaStream-Consumer-tp23930.html Sent from

Re: Convert Simple Kafka Consumer to standalone Spark JavaStream Consumer

2015-07-21 Thread Tathagata Das
with Spark Kafka integration. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Convert-Simple-Kafka-Consumer-to-standalone-Spark-JavaStream-Consumer-tp23930.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: When to use underlying data management layer versus standalone Spark?

2015-06-24 Thread Sandy Ryza
1:17 AM, commtech michael.leon...@opco.com wrote: Hi, I work at a large financial institution in New York. We're looking into Spark and trying to learn more about the deployment/use cases for real-time analytics with Spark. When would it be better to deploy standalone Spark versus Spark

When to use underlying data management layer versus standalone Spark?

2015-06-23 Thread commtech
Hi, I work at a large financial institution in New York. We're looking into Spark and trying to learn more about the deployment/use cases for real-time analytics with Spark. When would it be better to deploy standalone Spark versus Spark on top of a more comprehensive data management layer

Re: When to use underlying data management layer versus standalone Spark?

2015-06-23 Thread canan chen
, I work at a large financial institution in New York. We're looking into Spark and trying to learn more about the deployment/use cases for real-time analytics with Spark. When would it be better to deploy standalone Spark versus Spark on top of a more comprehensive data management layer (Hadoop

How to start Thrift JDBC server as part of standalone spark application?

2015-04-23 Thread Vladimir Grigor
Hello, I would like to export RDD/DataFrames via JDBC SQL interface from the standalone application for currently stable Spark v1.3.1. I found one way of doing it but it requires the use of @DeveloperAPI method HiveThriftServer2.startWithContext(sqlContext) Is there a better, production level

distcp problems on ec2 standalone spark cluster

2015-03-09 Thread roni
I got pass the issues with the cluster not started problem by adding Yarn to mapreduce.framework.name . But when I try to to distcp , if I use uRI with s3://path to my bucket .. I get invalid path even though the bucket exists. If I use s3n:// it just hangs. Did anyone else face anything like

Re: distcp on ec2 standalone spark cluster

2015-03-08 Thread Akhil Das
not startetd problem I am having problem where distcp with s3 URI says incorrect forlder path and s3n:// hangs. stuck for 2 days :( Thanks -R -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/distcp-on-ec2-standalone-spark-cluster-tp13652p21957.html Sent

Re: distcp on ec2 standalone spark cluster

2015-03-07 Thread roni
/distcp-on-ec2-standalone-spark-cluster-tp13652p21957.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Standalone spark

2015-02-25 Thread Sean Owen
Spark and Hadoop should be listed as 'provided' dependency in your Maven or SBT build. But that should make it available at compile time. On Wed, Feb 25, 2015 at 10:42 PM, boci boci.b...@gmail.com wrote: Hi, I have a little question. I want to develop a spark based application, but spark

Re: Standalone spark

2015-02-25 Thread boci
Thanks dude... I think I will pull up a docker container for integration test -- Skype: boci13, Hangout: boci.b...@gmail.com On Thu, Feb 26, 2015 at 12:22 AM, Sean Owen

Re: Standalone spark

2015-02-25 Thread Sean Owen
Yes, been on the books for a while ... https://issues.apache.org/jira/browse/SPARK-2356 That one just may always be a known 'gotcha' in Windows; it's kind of a Hadoop gotcha. I don't know that Spark 100% works on Windows and it isn't tested on Windows. On Wed, Feb 25, 2015 at 11:05 PM, boci

Re: Whether standalone spark support kerberos?

2015-02-05 Thread Kostas Sakellis
jande...@gmail.com wrote: We have a standalone spark cluster for kerberos test. But when reading from hdfs, i get error output: Can't get Master Kerberos principal for use as renewer. So Whether standalone spark support kerberos? can anyone confirm it? or what i missed? Thanks in advance

Re: Whether standalone spark support kerberos?

2015-02-04 Thread Jander g
Hope someone helps me. Thanks. On Wed, Feb 4, 2015 at 6:14 PM, Jander g jande...@gmail.com wrote: We have a standalone spark cluster for kerberos test. But when reading from hdfs, i get error output: Can't get Master Kerberos principal for use as renewer. So Whether standalone spark

Whether standalone spark support kerberos?

2015-02-04 Thread Jander g
We have a standalone spark cluster for kerberos test. But when reading from hdfs, i get error output: Can't get Master Kerberos principal for use as renewer. So Whether standalone spark support kerberos? can anyone confirm it? or what i missed? Thanks in advance. -- Thanks, Jander

Standalone Spark program

2014-12-18 Thread Akshat Aranya
Hi, I am building a Spark-based service which requires initialization of a SparkContext in a main(): def main(args: Array[String]) { val conf = new SparkConf(false) .setMaster(spark://foo.example.com:7077) .setAppName(foobar) val sc = new SparkContext(conf) val rdd =

Re: Standalone Spark program

2014-12-18 Thread Akhil Das
You can build a jar of your project and add it to the sparkContext (sc.addJar(/path/to/your/project.jar)) then it will get shipped to the worker and hence no classNotfoundException! Thanks Best Regards On Thu, Dec 18, 2014 at 10:06 PM, Akshat Aranya aara...@gmail.com wrote: Hi, I am building

Re: Standalone Spark program

2014-12-18 Thread Andrew Or
Hey Akshat, What is the class that is not found, is it a Spark class or classes that you define in your own application? If the latter, then Akhil's solution should work (alternatively you can also pass the jar through the --jars command line option in spark-submit). If it's a Spark class,

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
~/ephemeral-hdfs/sbin/start-mapred.sh does not exist on spark-1.0.2; I restarted hdfs using ~/ephemeral-hdfs/sbin/stop-dfs.sh and ~/ephemeral-hdfs/sbin/start-dfs.sh, but still getting the same error when trying to run distcp: ERROR tools.DistCp (DistCp.java:run(126)) - Exception encountered

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Frank Austin Nothaft
Tomer, To use distcp, you need to have a Hadoop compute cluster up. start-dfs just restarts HDFS. I don’t have a Spark 1.0.2 cluster up right now, but there should be a start-mapred*.sh or start-all.sh script that will launch the Hadoop MapReduce cluster that you will need for distcp.

Re: Standalone spark cluster. Can't submit job programmatically - java.io.InvalidClassException

2014-09-08 Thread DrKhu
that was 2.4. When I changed the version of hadoop client to 1.2.1 in my app, I'm able to execute spark code on cluster. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Standalone-spark-cluster-Can-t-submit-job-programmatically-java-io-InvalidClassException

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Nicholas Chammas
Tomer, Did you try start-all.sh? It worked for me the last time I tried using distcp, and it worked for this guy too http://stackoverflow.com/a/18083790/877069. Nick ​ On Mon, Sep 8, 2014 at 3:28 AM, Tomer Benyamini tomer@gmail.com wrote: ~/ephemeral-hdfs/sbin/start-mapred.sh does not

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
Still no luck, even when running stop-all.sh followed by start-all.sh. On Mon, Sep 8, 2014 at 5:57 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Tomer, Did you try start-all.sh? It worked for me the last time I tried using distcp, and it worked for this guy too. Nick On Mon, Sep

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Ye Xianjin
what did you see in the log? was there anything related to mapreduce? can you log into your hdfs (data) node, use jps to list all java process and confirm whether there is a tasktracker process (or nodemanager) running with datanode process -- Ye Xianjin Sent with Sparrow

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Tomer Benyamini
No tasktracker or nodemanager. This is what I see: On the master: org.apache.hadoop.yarn.server.resourcemanager.ResourceManager org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode org.apache.hadoop.hdfs.server.namenode.NameNode On the data node (slave):

Re: distcp on ec2 standalone spark cluster

2014-09-08 Thread Ye Xianjin
well, this means you didn't start a compute cluster. Most likely because the wrong value of mapreduce.jobtracker.address cause the slave node cannot start the node manager. ( I am not familiar with the ec2 script, so I don't know whether the slave node has node manager installed or not.) Can

Adding quota to the ephemeral hdfs on a standalone spark cluster on ec2

2014-09-07 Thread Tomer Benyamini
Hi, I would like to make sure I'm not exceeding the quota on the local cluster's hdfs. I have a couple of questions: 1. How do I know the quota? Here's the output of hadoop fs -count -q which essentially does not tell me a lot root@ip-172-31-7-49 ~]$ hadoop fs -count -q / 2147483647

Re: Adding quota to the ephemeral hdfs on a standalone spark cluster on ec2

2014-09-07 Thread Ognen Duzlevski
On 9/7/2014 7:27 AM, Tomer Benyamini wrote: 2. What should I do to increase the quota? Should I bring down the existing slaves and upgrade to ones with more storage? Is there a way to add disks to existing slaves? I'm using the default m1.large slaves set up using the spark-ec2 script. Take a

Re: Adding quota to the ephemeral hdfs on a standalone spark cluster on ec2

2014-09-07 Thread Tomer Benyamini
Thanks! I found the hdfs ui via this port - http://[master-ip]:50070/. It shows 1 node hdfs though, although I have 4 slaves on my cluster. Any idea why? On Sun, Sep 7, 2014 at 4:29 PM, Ognen Duzlevski ognen.duzlev...@gmail.com wrote: On 9/7/2014 7:27 AM, Tomer Benyamini wrote: 2. What should

distcp on ec2 standalone spark cluster

2014-09-07 Thread Tomer Benyamini
Hi, I would like to copy log files from s3 to the cluster's ephemeral-hdfs. I tried to use distcp, but I guess mapred is not running on the cluster - I'm getting the exception below. Is there a way to activate it, or is there a spark alternative to distcp? Thanks, Tomer mapreduce.Cluster

Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Tomer Benyamini
I've installed a spark standalone cluster on ec2 as defined here - https://spark.apache.org/docs/latest/ec2-scripts.html. I'm not sure if mr1/2 is part of this installation. On Sun, Sep 7, 2014 at 7:25 PM, Ye Xianjin advance...@gmail.com wrote: Distcp requires a mr1(or mr2) cluster to start. Do

Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Nicholas Chammas
I think you need to run start-all.sh or something similar on the EC2 cluster. MR is installed but is not running by default on EC2 clusters spun up by spark-ec2. ​ On Sun, Sep 7, 2014 at 12:33 PM, Tomer Benyamini tomer@gmail.com wrote: I've installed a spark standalone cluster on ec2 as

Re: distcp on ec2 standalone spark cluster

2014-09-07 Thread Josh Rosen
If I recall, you should be able to start Hadoop MapReduce using ~/ephemeral-hdfs/sbin/start-mapred.sh. On Sun, Sep 7, 2014 at 6:42 AM, Tomer Benyamini tomer@gmail.com wrote: Hi, I would like to copy log files from s3 to the cluster's ephemeral-hdfs. I tried to use distcp, but I guess

can't submit my application on standalone spark cluster

2014-08-06 Thread Andres Gomez Ferrer
Hi all, My name is Andres and I'm starting to use Apache Spark. I try to submit my spark.jar to my cluster using this: spark-submit --class net.redborder.spark.RedBorderApplication --master spark://pablo02:7077 redborder-spark-selfcontained.jar But when I did it .. My worker die .. and my

Re: can't submit my application on standalone spark cluster

2014-08-06 Thread Akhil Das
Looks like a netty conflict there, most likely you are having mutiple versions of netty jars (eg: netty-3.6.6.Final.jar, netty-3.2.2.Final.jar, netty-all-4.0.13.Final.jar), you only require 3.6.6 i believe. a quick fix would be to remove the rest of them. Thanks Best Regards On Wed, Aug 6, 2014

Re: can't submit my application on standalone spark cluster

2014-08-06 Thread Andrew Or
Hi Andres, If you're using the EC2 scripts to start your standalone cluster, you can use ~/spark-ec2/copy-dir --delete ~/spark to sync your jars across the cluster. Note that you will need to restart the Master and the Workers afterwards through sbin/start-all.sh and sbin/stop-all.sh. If you're