Re: is there any way to submit spark application from outside of spark cluster

2016-03-25 Thread sunil m
Hi  Prateek!

You might want to have a look at spark job server:

https://github.com/spark-jobserver/spark-jobserver


Warm regards,
Sunil Manikani.

On 25 March 2016 at 23:34, Ted Yu  wrote:

> Do you run YARN in your production environment (and plan to run Spark jobs
> on YARN) ?
>
> If that is the case, hadoop configuration is needed.
>
> Cheers
>
> On Fri, Mar 25, 2016 at 11:01 AM, prateek arora <
> prateek.arora...@gmail.com> wrote:
>
>> Hi
>>
>> Thanks for the information . it will definitely solve  my problem
>>
>> I have one more question .. if i want to launch a spark application in
>> production environment  so is there any other way so multiple users can
>> submit there job without having  hadoop configuration .
>>
>> Regards
>> Prateek
>>
>>
>> On Fri, Mar 25, 2016 at 10:50 AM, Ted Yu  wrote:
>>
>>> See this thread:
>>>
>>> http://search-hadoop.com/m/q3RTtAvwgE7dEI02
>>>
>>> On Fri, Mar 25, 2016 at 10:39 AM, prateek arora <
>>> prateek.arora...@gmail.com> wrote:
>>>
 Hi

 I want to submit spark application from outside of spark clusters .   so
 please help me to provide a information regarding this.

 Regards
 Prateek




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/is-there-any-way-to-submit-spark-application-from-outside-of-spark-cluster-tp26599.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


>>>
>>
>


Error while running a job in yarn-client mode

2015-12-16 Thread sunil m
Hello Spark experts!


I am using spark 1.5.1 and get the following exception  while running
sample applications

Any tips/ hints on how to solve the error below will be of great help!


_
*Exception in thread "main" java.lang.IllegalStateException: Cannot call
methods on a stopped SparkContext*
at org.apache.spark.SparkContext.org
$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
at
org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:2052)
at
org.apache.spark.api.java.JavaSparkContext.parallelizePairs(JavaSparkContext.scala:169)
at
com.slb.sis.bolt.test.samples.SparkAPIExamples.mapsFromPairsToPairs(SparkAPIExamples.java:156)
at
com.slb.sis.bolt.test.samples.SparkAPIExamples.main(SparkAPIExamples.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


Thanks in advance.

Warm regards,
Sunil M.


Logging spark output to hdfs file

2015-12-08 Thread sunil m
Hi!
I configured log4j.properties file in conf folder of spark with following
values...

log4j.appender.file.File=hdfs://

I expected all log files to log output to the file in HDFS.
Instead files are created locally.

Has anybody tried logging to HDFS by configuring log4j.properties?

Warm regards,
Sunil M


Associating spark jobs with logs

2015-12-08 Thread sunil m
Hello Spark experts!

I was wondering if somebody has solved the problem which we are facing.

We want to achieve the following:

Given a spark job id fetch all the logs generated by that job.

We looked at spark job server it seems to be lacking such a feature.


Any ideas, suggestions are welcome!

Thanks in advance.

Warm regards,
Sunil M.


Re: Associating spark jobs with logs

2015-12-08 Thread sunil m
Thanks for replying ...
Yes i did.
I am not seeing the application-ids  for jobs submitted to YARN when i
query http://MY_HOST:18080/api/v1/applications/

When I query
http://MY_HOST:18080/api/v1/applications/application_1446812769803_0011 it
does not understand the application_id since it belongs to YARN.

I am looking for a feature like this but we need to get logs irrespective
of the master being YARN, MESSOS or stand-alone spark.

Warm regards,
Sunil M.

On 9 December 2015 at 00:48, Ted Yu <yuzhih...@gmail.com> wrote:

> Have you looked at the REST API section of:
>
> https://spark.apache.org/docs/latest/monitoring.html
>
> FYI
>
> On Tue, Dec 8, 2015 at 8:57 AM, sunil m <260885smanik...@gmail.com> wrote:
>
>> Hello Spark experts!
>>
>> I was wondering if somebody has solved the problem which we are facing.
>>
>> We want to achieve the following:
>>
>> Given a spark job id fetch all the logs generated by that job.
>>
>> We looked at spark job server it seems to be lacking such a feature.
>>
>>
>> Any ideas, suggestions are welcome!
>>
>> Thanks in advance.
>>
>> Warm regards,
>> Sunil M.
>>
>
>


Available options for Spark REST API

2015-12-07 Thread sunil m
Dear Spark experts!

I would like to know the best practices used for invoking spark jobs via
REST API.

We tried out the hidden REST API mentioned here:
http://arturmkrtchyan.com/apache-spark-hidden-rest-api

It works fine for spark standalone mode but does not seem to be working
when i specify
 "spark.master" : "YARN-CLUSTER" or "mesos://..."
did anyone encounter a similar problem?

Has anybody used:
https://github.com/spark-jobserver/spark-jobserver

If yes please share your experience. Does it work good with both Scala and
Java classes. I saw only scala example. Are there any known disadvantages
of using it?

Is there anything better available, which is used in Production environment?

Any advise is appreciated. We are using Spark 1.5.1.

Thanks in advance.

Warm regards,
Sunil M.


Queue in Spark standalone mode

2015-11-25 Thread sunil m
Hi!

I am using Spark 1.5.1 and pretty new to Spark...

Like Yarn, is there a way to configure queues in Spark standalone mode?
If yes, can someone point me to a good documentation / reference.

Sometimes  I get strange behavior while running  multiple jobs
simultaneously.

Thanks in advance.

Warm regards,
Sunil


Spark JDBCRDD query

2015-11-18 Thread sunil m
Hello Spark experts!
I am new to Spark and i have the following query...

What I am trying to do:  Run a spark 1.5.1 job local[*] on a 4 core CPU.
This will ping oracle data base and fetch 5000 records each in jdbcRDD, I
 increase the number of partitions by 1 for every 5000 records i fetch.
I have taken care that all partitions get same count of records.

What i expected to happen ideally : All tasks will start at same time T0
ping oracle database in parallel  store value in JDBCRDD and finish in
parallel at T1.

What I Observed : There was one task for every partition, Tasks on Web-UI
were staggered, some were spawned or scheduled way after first task was
scheduled.

Is there a configuration to change how many tasks can run simultaneously on
a executor  core? Or in other words IS it possible that  one core get more
than one task which can run simultaneously on that core?

Thanks...

Warm regards,
Sunil M.