Re: [ Standalone Spark Cluster ] - Track node status

2016-06-09 Thread Mich Talebzadeh
Hi Rutuja,

I am not certain whether such tool exists or not, However, opening a JIRA
may be beneficial and would not do any harm.

You may look for workaround. Now my understanding is that your need is for
monitoring the health of the cluster?

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 9 June 2016 at 19:45, Rutuja Kulkarni 
wrote:

>
>
> Thanks again Mich!
> If there does not exist any interface like REST API or CLI for this, I
> would like to open a JIRA on exposing such a REST interface in SPARK which
> would list all the worker nodes.
> Please let me know if this seems to be the right thing to do for the
> community.
>
>
> Regards,
> Rutuja Kulkarni
>
>
> On Wed, Jun 8, 2016 at 5:36 PM, Mich Talebzadeh  > wrote:
>
>> The other way is to log in to the individual nodes and do
>>
>>  jps
>>
>> 24819 Worker
>>
>> And you Processes identified as worker
>>
>> Also you can use jmonitor to see what they are doing resource wise
>>
>> You can of course write a small shell script to see if Worker(s) are up
>> and running in every node and alert if they are down?
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 9 June 2016 at 01:27, Rutuja Kulkarni 
>> wrote:
>>
>>> Thank you for the quick response.
>>> So the workers section would list all the running worker nodes in the
>>> standalone Spark cluster?
>>> I was also wondering if this is the only way to retrieve worker nodes or
>>> is there something like a Web API or CLI I could use?
>>> Thanks.
>>>
>>> Regards,
>>> Rutuja
>>>
>>> On Wed, Jun 8, 2016 at 4:02 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 check port 8080 on the node that you started start-master.sh



 [image: Inline images 2]

 HTH


 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com



 On 8 June 2016 at 23:56, Rutuja Kulkarni 
 wrote:

> Hello!
>
> I'm trying to setup a standalone spark cluster and wondering how to
> track status of all of it's nodes. I wonder if something like Yarn REST 
> API
> or HDFS CLI exists in Spark world that can provide status of nodes on such
> a cluster. Any pointers would be greatly appreciated.
>
> --
> *Regards,*
> *Rutuja Kulkarni*
>
>
>

>>>
>>>
>>> --
>>> *Regards,*
>>> *Rutuja Kulkarni*
>>>
>>>
>>>
>>
>
>
> --
> *Regards,*
> *Rutuja Kulkarni*
>
>
>


Re: [ Standalone Spark Cluster ] - Track node status

2016-06-09 Thread Rutuja Kulkarni
Thanks again Mich!
If there does not exist any interface like REST API or CLI for this, I
would like to open a JIRA on exposing such a REST interface in SPARK which
would list all the worker nodes.
Please let me know if this seems to be the right thing to do for the
community.


Regards,
Rutuja Kulkarni


On Wed, Jun 8, 2016 at 5:36 PM, Mich Talebzadeh 
wrote:

> The other way is to log in to the individual nodes and do
>
>  jps
>
> 24819 Worker
>
> And you Processes identified as worker
>
> Also you can use jmonitor to see what they are doing resource wise
>
> You can of course write a small shell script to see if Worker(s) are up
> and running in every node and alert if they are down?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 9 June 2016 at 01:27, Rutuja Kulkarni 
> wrote:
>
>> Thank you for the quick response.
>> So the workers section would list all the running worker nodes in the
>> standalone Spark cluster?
>> I was also wondering if this is the only way to retrieve worker nodes or
>> is there something like a Web API or CLI I could use?
>> Thanks.
>>
>> Regards,
>> Rutuja
>>
>> On Wed, Jun 8, 2016 at 4:02 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> check port 8080 on the node that you started start-master.sh
>>>
>>>
>>>
>>> [image: Inline images 2]
>>>
>>> HTH
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 8 June 2016 at 23:56, Rutuja Kulkarni 
>>> wrote:
>>>
 Hello!

 I'm trying to setup a standalone spark cluster and wondering how to
 track status of all of it's nodes. I wonder if something like Yarn REST API
 or HDFS CLI exists in Spark world that can provide status of nodes on such
 a cluster. Any pointers would be greatly appreciated.

 --
 *Regards,*
 *Rutuja Kulkarni*



>>>
>>
>>
>> --
>> *Regards,*
>> *Rutuja Kulkarni*
>>
>>
>>
>


-- 
*Regards,*
*Rutuja Kulkarni*


Re: [ Standalone Spark Cluster ] - Track node status

2016-06-08 Thread Mich Talebzadeh
The other way is to log in to the individual nodes and do

 jps

24819 Worker

And you Processes identified as worker

Also you can use jmonitor to see what they are doing resource wise

You can of course write a small shell script to see if Worker(s) are up and
running in every node and alert if they are down?

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 9 June 2016 at 01:27, Rutuja Kulkarni 
wrote:

> Thank you for the quick response.
> So the workers section would list all the running worker nodes in the
> standalone Spark cluster?
> I was also wondering if this is the only way to retrieve worker nodes or
> is there something like a Web API or CLI I could use?
> Thanks.
>
> Regards,
> Rutuja
>
> On Wed, Jun 8, 2016 at 4:02 PM, Mich Talebzadeh  > wrote:
>
>> check port 8080 on the node that you started start-master.sh
>>
>>
>>
>> [image: Inline images 2]
>>
>> HTH
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 8 June 2016 at 23:56, Rutuja Kulkarni 
>> wrote:
>>
>>> Hello!
>>>
>>> I'm trying to setup a standalone spark cluster and wondering how to
>>> track status of all of it's nodes. I wonder if something like Yarn REST API
>>> or HDFS CLI exists in Spark world that can provide status of nodes on such
>>> a cluster. Any pointers would be greatly appreciated.
>>>
>>> --
>>> *Regards,*
>>> *Rutuja Kulkarni*
>>>
>>>
>>>
>>
>
>
> --
> *Regards,*
> *Rutuja Kulkarni*
>
>
>


Re: [ Standalone Spark Cluster ] - Track node status

2016-06-08 Thread Rutuja Kulkarni
Thank you for the quick response.
So the workers section would list all the running worker nodes in the
standalone Spark cluster?
I was also wondering if this is the only way to retrieve worker nodes or is
there something like a Web API or CLI I could use?
Thanks.

Regards,
Rutuja

On Wed, Jun 8, 2016 at 4:02 PM, Mich Talebzadeh 
wrote:

> check port 8080 on the node that you started start-master.sh
>
>
>
> [image: Inline images 2]
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 8 June 2016 at 23:56, Rutuja Kulkarni 
> wrote:
>
>> Hello!
>>
>> I'm trying to setup a standalone spark cluster and wondering how to track
>> status of all of it's nodes. I wonder if something like Yarn REST API or
>> HDFS CLI exists in Spark world that can provide status of nodes on such a
>> cluster. Any pointers would be greatly appreciated.
>>
>> --
>> *Regards,*
>> *Rutuja Kulkarni*
>>
>>
>>
>


-- 
*Regards,*
*Rutuja Kulkarni*


Re: [ Standalone Spark Cluster ] - Track node status

2016-06-08 Thread Mich Talebzadeh
check port 8080 on the node that you started start-master.sh



[image: Inline images 2]

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 8 June 2016 at 23:56, Rutuja Kulkarni 
wrote:

> Hello!
>
> I'm trying to setup a standalone spark cluster and wondering how to track
> status of all of it's nodes. I wonder if something like Yarn REST API or
> HDFS CLI exists in Spark world that can provide status of nodes on such a
> cluster. Any pointers would be greatly appreciated.
>
> --
> *Regards,*
> *Rutuja Kulkarni*
>
>
>


Re: Standalone spark

2015-02-25 Thread Sean Owen
Spark and Hadoop should be listed as 'provided' dependency in your
Maven or SBT build. But that should make it available at compile time.

On Wed, Feb 25, 2015 at 10:42 PM, boci boci.b...@gmail.com wrote:
 Hi,

 I have a little question. I want to develop a spark based application, but
 spark depend to hadoop-client library. I think it's not necessary (spark
 standalone) so I excluded from sbt file.. the result is interesting. My
 trait where I create the spark context not compiled.

 The error:
 ...
  scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature
 in SparkContext.class refers to term mapred
 [error] in package org.apache.hadoop which is not available.
 [error] It may be completely missing from the current classpath, or the
 version on
 [error] the classpath might be incompatible with the version used when
 compiling SparkContext.class.
 ...

 I used this class for integration test. I'm using windows and I don't want
 to using hadoop for integration test. How can I solve this?

 Thanks
 Janos


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Standalone spark

2015-02-25 Thread boci
Thanks dude... I think I will pull up a docker container for integration
test

--
Skype: boci13, Hangout: boci.b...@gmail.com

On Thu, Feb 26, 2015 at 12:22 AM, Sean Owen so...@cloudera.com wrote:

 Yes, been on the books for a while ...
 https://issues.apache.org/jira/browse/SPARK-2356
 That one just may always be a known 'gotcha' in Windows; it's kind of
 a Hadoop gotcha. I don't know that Spark 100% works on Windows and it
 isn't tested on Windows.

 On Wed, Feb 25, 2015 at 11:05 PM, boci boci.b...@gmail.com wrote:
  Thanks your fast answer...
  in windows it's not working, because hadoop (surprise suprise) need
  winutils.exe. Without this it's not working, but if you not set the
 hadoop
  directory You simply get
 
  15/02/26 00:03:16 ERROR Shell: Failed to locate the winutils binary in
 the
  hadoop binary path
  java.io.IOException: Could not locate executable null\bin\winutils.exe in
  the Hadoop binaries.
 
  b0c1
 
 
 
 --
  Skype: boci13, Hangout: boci.b...@gmail.com
 
  On Wed, Feb 25, 2015 at 11:50 PM, Sean Owen so...@cloudera.com wrote:
 
  Spark and Hadoop should be listed as 'provided' dependency in your
  Maven or SBT build. But that should make it available at compile time.
 
  On Wed, Feb 25, 2015 at 10:42 PM, boci boci.b...@gmail.com wrote:
   Hi,
  
   I have a little question. I want to develop a spark based application,
   but
   spark depend to hadoop-client library. I think it's not necessary
 (spark
   standalone) so I excluded from sbt file.. the result is interesting.
 My
   trait where I create the spark context not compiled.
  
   The error:
   ...
scala.reflect.internal.Types$TypeError: bad symbolic reference. A
   signature
   in SparkContext.class refers to term mapred
   [error] in package org.apache.hadoop which is not available.
   [error] It may be completely missing from the current classpath, or
 the
   version on
   [error] the classpath might be incompatible with the version used when
   compiling SparkContext.class.
   ...
  
   I used this class for integration test. I'm using windows and I don't
   want
   to using hadoop for integration test. How can I solve this?
  
   Thanks
   Janos
  
 
 



Re: Standalone spark

2015-02-25 Thread Sean Owen
Yes, been on the books for a while ...
https://issues.apache.org/jira/browse/SPARK-2356
That one just may always be a known 'gotcha' in Windows; it's kind of
a Hadoop gotcha. I don't know that Spark 100% works on Windows and it
isn't tested on Windows.

On Wed, Feb 25, 2015 at 11:05 PM, boci boci.b...@gmail.com wrote:
 Thanks your fast answer...
 in windows it's not working, because hadoop (surprise suprise) need
 winutils.exe. Without this it's not working, but if you not set the hadoop
 directory You simply get

 15/02/26 00:03:16 ERROR Shell: Failed to locate the winutils binary in the
 hadoop binary path
 java.io.IOException: Could not locate executable null\bin\winutils.exe in
 the Hadoop binaries.

 b0c1


 --
 Skype: boci13, Hangout: boci.b...@gmail.com

 On Wed, Feb 25, 2015 at 11:50 PM, Sean Owen so...@cloudera.com wrote:

 Spark and Hadoop should be listed as 'provided' dependency in your
 Maven or SBT build. But that should make it available at compile time.

 On Wed, Feb 25, 2015 at 10:42 PM, boci boci.b...@gmail.com wrote:
  Hi,
 
  I have a little question. I want to develop a spark based application,
  but
  spark depend to hadoop-client library. I think it's not necessary (spark
  standalone) so I excluded from sbt file.. the result is interesting. My
  trait where I create the spark context not compiled.
 
  The error:
  ...
   scala.reflect.internal.Types$TypeError: bad symbolic reference. A
  signature
  in SparkContext.class refers to term mapred
  [error] in package org.apache.hadoop which is not available.
  [error] It may be completely missing from the current classpath, or the
  version on
  [error] the classpath might be incompatible with the version used when
  compiling SparkContext.class.
  ...
 
  I used this class for integration test. I'm using windows and I don't
  want
  to using hadoop for integration test. How can I solve this?
 
  Thanks
  Janos
 



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Standalone Spark program

2014-12-18 Thread Akhil Das
You can build a jar of your project and add it to the sparkContext
(sc.addJar(/path/to/your/project.jar)) then it will get shipped to the
worker and hence no classNotfoundException!

Thanks
Best Regards

On Thu, Dec 18, 2014 at 10:06 PM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I am building a Spark-based service which requires initialization of a
 SparkContext in a main():

 def main(args: Array[String]) {
 val conf = new SparkConf(false)
   .setMaster(spark://foo.example.com:7077)
   .setAppName(foobar)

 val sc = new SparkContext(conf)
 val rdd = sc.parallelize(0 until 255)
 val res =  rdd.mapPartitions(it = it).take(1)
 println(sres=$res)
 sc.stop()
 }

 This code works fine via REPL, but not as a standalone program; it causes
 a ClassNotFoundException.  This has me confused about how code is shipped
 out to executors.  When using via REPL, does the mapPartitions closure,
 it=it, get sent out when the REPL statement is executed?  When this code
 is run as a standalone program (not via spark-submit), is the compiled code
 expected to be present at the the executor?

 Thanks,
 Akshat




Re: Standalone Spark program

2014-12-18 Thread Andrew Or
Hey Akshat,

What is the class that is not found, is it a Spark class or classes that
you define in your own application? If the latter, then Akhil's solution
should work (alternatively you can also pass the jar through the --jars
command line option in spark-submit).

If it's a Spark class, however, it's likely that the Spark assembly jar is
not present on the worker nodes. When you build Spark on the cluster, you
will need to rsync it to the same path on all the nodes in your cluster.
For more information, see
http://spark.apache.org/docs/latest/spark-standalone.html.

-Andrew

2014-12-18 10:29 GMT-08:00 Akhil Das ak...@sigmoidanalytics.com:

 You can build a jar of your project and add it to the sparkContext
 (sc.addJar(/path/to/your/project.jar)) then it will get shipped to the
 worker and hence no classNotfoundException!

 Thanks
 Best Regards

 On Thu, Dec 18, 2014 at 10:06 PM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I am building a Spark-based service which requires initialization of a
 SparkContext in a main():

 def main(args: Array[String]) {
 val conf = new SparkConf(false)
   .setMaster(spark://foo.example.com:7077)
   .setAppName(foobar)

 val sc = new SparkContext(conf)
 val rdd = sc.parallelize(0 until 255)
 val res =  rdd.mapPartitions(it = it).take(1)
 println(sres=$res)
 sc.stop()
 }

 This code works fine via REPL, but not as a standalone program; it causes
 a ClassNotFoundException.  This has me confused about how code is shipped
 out to executors.  When using via REPL, does the mapPartitions closure,
 it=it, get sent out when the REPL statement is executed?  When this code
 is run as a standalone program (not via spark-submit), is the compiled code
 expected to be present at the the executor?

 Thanks,
 Akshat




Re: Standalone spark cluster. Can't submit job programmatically - java.io.InvalidClassException

2014-09-08 Thread DrKhu
After wasting a lot of time, I've found the problem. Despite I haven't used
hadoop/hdfs in my application, hadoop client matters. The problem was in
hadoop-client version, it was different than the version of hadoop, spark
was built for. Spark's hadoop version 1.2.1, but in my application that was
2.4.

When I changed the version of hadoop client to 1.2.1 in my app, I'm able to
execute spark code on cluster.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Standalone-spark-cluster-Can-t-submit-job-programmatically-java-io-InvalidClassException-tp13456p13688.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org