Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-05 Thread polkosity
After changing to reuse spark context and cache RDDs in memory, performance
is 4 times better.  We didn't expect that much of an improvement!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-standalone-mode-vs-YARN-tp2016p2340.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: trying to understand job cancellation

2014-03-05 Thread Matei Zaharia
Which version of Spark is this in, Koert? There might have been some fixes more 
recently for it.

Matei

On Mar 5, 2014, at 5:26 PM, Koert Kuipers  wrote:

> Sorry I meant to say: seems the issue is shared RDDs between a job that got 
> cancelled and a later job.
> 
> However even disregarding that I have the other issue that the active task of 
> the cancelled job hangs around forever, not doing anything
> 
> On Mar 5, 2014 7:29 PM, "Koert Kuipers"  wrote:
> yes jobs on RDDs that were not part of the cancelled job work fine.
> 
> so it seems the issue is the cached RDDs that are ahred between the cancelled 
> job and the jobs after that.
> 
> 
> On Wed, Mar 5, 2014 at 7:15 PM, Koert Kuipers  wrote:
> well, the new jobs use existing RDDs that were also used in the jon that got 
> killed. 
> 
> let me confirm that new jobs that use completely different RDDs do not get 
> killed.
> 
> 
> 
> On Wed, Mar 5, 2014 at 7:00 PM, Mayur Rustagi  wrote:
> Quite unlikely as jobid are given in an incremental fashion, so your future 
> jobid are not likely to be killed if your groupid is not repeated.I guess the 
> issue is something else. 
> 
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
> 
> 
> 
> On Wed, Mar 5, 2014 at 3:50 PM, Koert Kuipers  wrote:
> i did that. my next job gets a random new group job id (a uuid). however that 
> doesnt seem to stop the job from getting sucked into the cancellation it seems
> 
> 
> On Wed, Mar 5, 2014 at 6:47 PM, Mayur Rustagi  wrote:
> You can randomize job groups as well. to secure yourself against termination. 
> 
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
> 
> 
> 
> On Wed, Mar 5, 2014 at 3:42 PM, Koert Kuipers  wrote:
> got it. seems like i better stay away from this feature for now..
> 
> 
> On Wed, Mar 5, 2014 at 5:55 PM, Mayur Rustagi  wrote:
> One issue is that job cancellation is posted on eventloop. So its possible 
> that subsequent jobs submitted to job queue may beat the job cancellation 
> event & hence the job cancellation event may end up closing them too.
> So there's definitely a race condition you are risking even if not running 
> into. 
> 
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
> 
> 
> 
> On Wed, Mar 5, 2014 at 2:40 PM, Koert Kuipers  wrote:
> SparkContext.cancelJobGroup
> 
> 
> On Wed, Mar 5, 2014 at 5:32 PM, Mayur Rustagi  wrote:
> How do you cancel the job. Which API do you use?
> 
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
> 
> 
> 
> On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers  wrote:
> i also noticed that jobs (with a new JobGroupId) which i run after this use 
> which use the same RDDs get very confused. i see lots of cancelled stages and 
> retries that go on forever.
> 
> 
> On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers  wrote:
> i have a running job that i cancel while keeping the spark context alive.
> 
> at the time of cancellation the active stage is 14.
> 
> i see in logs:
> 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel job group 
> 3a25db23-2e39-4497-b7ab-b26b2a976f9c
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 10
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 14
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was cancelled
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 14.0 
> from pool x
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 13
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 12
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 11
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 15
> 
> so far it all looks good. then i get a lot of messages like this:
> 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring update with 
> state FINISHED from TID 883 because its task set is gone
> 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring update with 
> state KILLED from TID 888 because its task set is gone
> 
> after this stage 14 hangs around in active stages, without any sign of 
> progress or cancellation. it just sits there forever, stuck. looking at the 
> logs of the executors confirms this. they task seem to be still running, but 
> nothing is happening. for example (by the time i look at this its 4:58 so 
> this tasks hasnt done anything in 15 mins):
> 
> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 943 is 1007
> 14/03/04 16:43:16 INFO Executor: Sending result for 943 directly to driver
> 14/03/04 16:43:16 INFO Executor: Finished task ID 943
> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 945 is 1007
> 14/03/04 16:43:16 INFO Executor: Sending result for 945 directly to driver
> 14/03/04 16:43:16 INFO Executor: Finished task ID 945
> 14/03/04 16:43:19 INFO BlockManager: Removing R

Re: trying to understand job cancellation

2014-03-05 Thread Koert Kuipers
Sorry I meant to say: seems the issue is shared RDDs between a job that got
cancelled and a later job.

However even disregarding that I have the other issue that the active task
of the cancelled job hangs around forever, not doing anything
On Mar 5, 2014 7:29 PM, "Koert Kuipers"  wrote:

> yes jobs on RDDs that were not part of the cancelled job work fine.
>
> so it seems the issue is the cached RDDs that are ahred between the
> cancelled job and the jobs after that.
>
>
> On Wed, Mar 5, 2014 at 7:15 PM, Koert Kuipers  wrote:
>
>> well, the new jobs use existing RDDs that were also used in the jon that
>> got killed.
>>
>> let me confirm that new jobs that use completely different RDDs do not
>> get killed.
>>
>>
>>
>> On Wed, Mar 5, 2014 at 7:00 PM, Mayur Rustagi wrote:
>>
>>> Quite unlikely as jobid are given in an incremental fashion, so your
>>> future jobid are not likely to be killed if your groupid is not repeated.I
>>> guess the issue is something else.
>>>
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>> @mayur_rustagi 
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 3:50 PM, Koert Kuipers  wrote:
>>>
 i did that. my next job gets a random new group job id (a uuid).
 however that doesnt seem to stop the job from getting sucked into the
 cancellation it seems


 On Wed, Mar 5, 2014 at 6:47 PM, Mayur Rustagi 
 wrote:

> You can randomize job groups as well. to secure yourself against
> termination.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi 
>
>
>
> On Wed, Mar 5, 2014 at 3:42 PM, Koert Kuipers wrote:
>
>> got it. seems like i better stay away from this feature for now..
>>
>>
>> On Wed, Mar 5, 2014 at 5:55 PM, Mayur Rustagi <
>> mayur.rust...@gmail.com> wrote:
>>
>>> One issue is that job cancellation is posted on eventloop. So its
>>> possible that subsequent jobs submitted to job queue may beat the job
>>> cancellation event & hence the job cancellation event may end up closing
>>> them too.
>>> So there's definitely a race condition you are risking even if not
>>> running into.
>>>
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>> @mayur_rustagi 
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 2:40 PM, Koert Kuipers wrote:
>>>
 SparkContext.cancelJobGroup


 On Wed, Mar 5, 2014 at 5:32 PM, Mayur Rustagi <
 mayur.rust...@gmail.com> wrote:

> How do you cancel the job. Which API do you use?
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
>  @mayur_rustagi 
>
>
>
> On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers 
> wrote:
>
>> i also noticed that jobs (with a new JobGroupId) which i run
>> after this use which use the same RDDs get very confused. i see lots 
>> of
>> cancelled stages and retries that go on forever.
>>
>>
>> On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers 
>> wrote:
>>
>>> i have a running job that i cancel while keeping the spark
>>> context alive.
>>>
>>> at the time of cancellation the active stage is 14.
>>>
>>> i see in logs:
>>> 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel
>>> job group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 10
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 14
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14
>>> was cancelled
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove
>>> TaskSet 14.0 from pool x
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 13
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 12
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 11
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 15
>>>
>>> so far it all looks good. then i get a lot of messages like this:
>>> 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring
>>> update with state FINISHED from TID 883 because its task set is gone
>>> 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring
>>> update with state KILLED from TID 888 because its task set is gone
>>>
>>> after this stage 14 hangs aroun

Re: trying to understand job cancellation

2014-03-05 Thread Koert Kuipers
yes jobs on RDDs that were not part of the cancelled job work fine.

so it seems the issue is the cached RDDs that are ahred between the
cancelled job and the jobs after that.


On Wed, Mar 5, 2014 at 7:15 PM, Koert Kuipers  wrote:

> well, the new jobs use existing RDDs that were also used in the jon that
> got killed.
>
> let me confirm that new jobs that use completely different RDDs do not get
> killed.
>
>
>
> On Wed, Mar 5, 2014 at 7:00 PM, Mayur Rustagi wrote:
>
>> Quite unlikely as jobid are given in an incremental fashion, so your
>> future jobid are not likely to be killed if your groupid is not repeated.I
>> guess the issue is something else.
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>> @mayur_rustagi 
>>
>>
>>
>> On Wed, Mar 5, 2014 at 3:50 PM, Koert Kuipers  wrote:
>>
>>> i did that. my next job gets a random new group job id (a uuid). however
>>> that doesnt seem to stop the job from getting sucked into the cancellation
>>> it seems
>>>
>>>
>>> On Wed, Mar 5, 2014 at 6:47 PM, Mayur Rustagi 
>>> wrote:
>>>
 You can randomize job groups as well. to secure yourself against
 termination.

 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
 @mayur_rustagi 



 On Wed, Mar 5, 2014 at 3:42 PM, Koert Kuipers wrote:

> got it. seems like i better stay away from this feature for now..
>
>
> On Wed, Mar 5, 2014 at 5:55 PM, Mayur Rustagi  > wrote:
>
>> One issue is that job cancellation is posted on eventloop. So its
>> possible that subsequent jobs submitted to job queue may beat the job
>> cancellation event & hence the job cancellation event may end up closing
>> them too.
>> So there's definitely a race condition you are risking even if not
>> running into.
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>> @mayur_rustagi 
>>
>>
>>
>> On Wed, Mar 5, 2014 at 2:40 PM, Koert Kuipers wrote:
>>
>>> SparkContext.cancelJobGroup
>>>
>>>
>>> On Wed, Mar 5, 2014 at 5:32 PM, Mayur Rustagi <
>>> mayur.rust...@gmail.com> wrote:
>>>
 How do you cancel the job. Which API do you use?

 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
  @mayur_rustagi 



 On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers wrote:

> i also noticed that jobs (with a new JobGroupId) which i run after
> this use which use the same RDDs get very confused. i see lots of 
> cancelled
> stages and retries that go on forever.
>
>
> On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers 
> wrote:
>
>> i have a running job that i cancel while keeping the spark
>> context alive.
>>
>> at the time of cancellation the active stage is 14.
>>
>> i see in logs:
>> 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel
>> job group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 10
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 14
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14
>> was cancelled
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove
>> TaskSet 14.0 from pool x
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 13
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 12
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 11
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 15
>>
>> so far it all looks good. then i get a lot of messages like this:
>> 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring
>> update with state FINISHED from TID 883 because its task set is gone
>> 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring
>> update with state KILLED from TID 888 because its task set is gone
>>
>> after this stage 14 hangs around in active stages, without any
>> sign of progress or cancellation. it just sits there forever, stuck.
>> looking at the logs of the executors confirms this. they task seem 
>> to be
>> still running, but nothing is happening. for example (by the time i 
>> look at
>> this its 4:58 so this tasks hasnt done anything in 15 mins):
>>
>> 14/03/04 16:43:16 INFO Executor: Serialized size of result for
>>

Re: trying to understand job cancellation

2014-03-05 Thread Koert Kuipers
well, the new jobs use existing RDDs that were also used in the jon that
got killed.

let me confirm that new jobs that use completely different RDDs do not get
killed.



On Wed, Mar 5, 2014 at 7:00 PM, Mayur Rustagi wrote:

> Quite unlikely as jobid are given in an incremental fashion, so your
> future jobid are not likely to be killed if your groupid is not repeated.I
> guess the issue is something else.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi 
>
>
>
> On Wed, Mar 5, 2014 at 3:50 PM, Koert Kuipers  wrote:
>
>> i did that. my next job gets a random new group job id (a uuid). however
>> that doesnt seem to stop the job from getting sucked into the cancellation
>> it seems
>>
>>
>> On Wed, Mar 5, 2014 at 6:47 PM, Mayur Rustagi wrote:
>>
>>> You can randomize job groups as well. to secure yourself against
>>> termination.
>>>
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>> @mayur_rustagi 
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 3:42 PM, Koert Kuipers  wrote:
>>>
 got it. seems like i better stay away from this feature for now..


 On Wed, Mar 5, 2014 at 5:55 PM, Mayur Rustagi 
 wrote:

> One issue is that job cancellation is posted on eventloop. So its
> possible that subsequent jobs submitted to job queue may beat the job
> cancellation event & hence the job cancellation event may end up closing
> them too.
> So there's definitely a race condition you are risking even if not
> running into.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi 
>
>
>
> On Wed, Mar 5, 2014 at 2:40 PM, Koert Kuipers wrote:
>
>> SparkContext.cancelJobGroup
>>
>>
>> On Wed, Mar 5, 2014 at 5:32 PM, Mayur Rustagi <
>> mayur.rust...@gmail.com> wrote:
>>
>>> How do you cancel the job. Which API do you use?
>>>
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>>  @mayur_rustagi 
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers wrote:
>>>
 i also noticed that jobs (with a new JobGroupId) which i run after
 this use which use the same RDDs get very confused. i see lots of 
 cancelled
 stages and retries that go on forever.


 On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers wrote:

> i have a running job that i cancel while keeping the spark context
> alive.
>
> at the time of cancellation the active stage is 14.
>
> i see in logs:
> 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel
> job group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
> stage 10
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
> stage 14
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was
> cancelled
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove
> TaskSet 14.0 from pool x
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
> stage 13
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
> stage 12
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
> stage 11
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
> stage 15
>
> so far it all looks good. then i get a lot of messages like this:
> 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring
> update with state FINISHED from TID 883 because its task set is gone
> 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring
> update with state KILLED from TID 888 because its task set is gone
>
> after this stage 14 hangs around in active stages, without any
> sign of progress or cancellation. it just sits there forever, stuck.
> looking at the logs of the executors confirms this. they task seem to 
> be
> still running, but nothing is happening. for example (by the time i 
> look at
> this its 4:58 so this tasks hasnt done anything in 15 mins):
>
> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 943
> is 1007
> 14/03/04 16:43:16 INFO Executor: Sending result for 943 directly
> to driver
> 14/03/04 16:43:16 INFO Executor: Finished task ID 943
> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 945
> is 1007
> 14/03/04 16:43:16 INFO Executor: Sending result for 945 directly
> to driver
> 14/

Re: Unable to redirect Spark logs to slf4j

2014-03-05 Thread Patrick Wendell
Hey,

Maybe I don't understand the slf4j model completely, but I think you
need to add a concrete implementation of a logger. So in your case
you'd the logback-classic binding in place of the log4j binding at
compile time:

http://mvnrepository.com/artifact/ch.qos.logback/logback-classic/1.1.1

- Patrick

On Wed, Mar 5, 2014 at 1:52 PM, Sergey Parhomenko  wrote:
> Hi Patrick,
>
> Thanks for the patch. I tried building a patched version of
> spark-core_2.10-0.9.0-incubating.jar but the Maven build fails:
> [ERROR]
> /home/das/Work/thx/incubator-spark/core/src/main/scala/org/apache/spark/Logging.scala:22:
> object impl is not a member of package org.slf4j
> [ERROR] import org.slf4j.impl.StaticLoggerBinder
> [ERROR]  ^
> [ERROR]
> /home/das/Work/thx/incubator-spark/core/src/main/scala/org/apache/spark/Logging.scala:106:
> not found: value StaticLoggerBinder
> [ERROR] val binder = StaticLoggerBinder.getSingleton
> [ERROR]  ^
> [ERROR] two errors found
>
> The module only has compile dependency on slf4j-api, and not on
> slf4j-log4j12 or any other slf4j logging modules which provide
> org.slf4j.impl.StaticLoggerBinder. Adding slf4j-log4j12 with compile scope
> helps, and I confirm the logging is redirected to slf4j/Logback correctly
> now with the patched module. I'm not sure however if using compile scope for
> slf4j-log4j12 is a good idea.
>
> --
> Best regards,
> Sergey Parhomenko
>
>
> On 5 March 2014 20:11, Patrick Wendell  wrote:
>>
>> Hey All,
>>
>> We have a fix for this but it didn't get merged yet. I'll put it as a
>> blocker for Spark 0.9.1.
>>
>>
>> https://github.com/pwendell/incubator-spark/commit/66594e88e5be50fca073a7ef38fa62db4082b3c8
>>
>> https://spark-project.atlassian.net/browse/SPARK-1190
>>
>> Sergey if you could try compiling Spark with this batch and seeing if
>> it works that would be great.
>>
>> Thanks,
>> Patrick
>>
>>
>> On Wed, Mar 5, 2014 at 10:26 AM, Paul Brown  wrote:
>> >
>> > Hi, Sergey --
>> >
>> > Here's my recipe, implemented via Maven; YMMV if you need to do it via
>> > sbt,
>> > etc., but it should be equivalent:
>> >
>> > 1) Replace org.apache.spark.Logging trait with this:
>> > https://gist.github.com/prb/bc239b1616f5ac40b4e5 (supplied by Patrick
>> > during
>> > the discussion on the dev list)
>> > 2) Amend your POM using the fragment that's in the same gist.
>> >
>> > We build two shaded JARs from the same build, one for the driver and one
>> > for
>> > the worker; to ensure that our Logging trait is the one in use in the
>> > driver
>> > (where it matters), we exclude that same class from the Spark JAR in the
>> > shade plugin configuration.
>> >
>> > Best.
>> > -- Paul
>> >
>> >
>> > --
>> > p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>> >
>> >
>> > On Wed, Mar 5, 2014 at 10:02 AM, Sergey Parhomenko
>> > 
>> > wrote:
>> >>
>> >> Hi Sean,
>> >>
>> >> We're not using log4j actually, we're trying to redirect all logging to
>> >> slf4j which then uses logback as the logging implementation.
>> >>
>> >> The fix you mentioned - am I right to assume it is not part of the
>> >> latest
>> >> released Spark version (0.9.0)? If so, are there any workarounds or
>> >> advices
>> >> on how to avoid this issue in 0.9.0?
>> >>
>> >> --
>> >> Best regards,
>> >> Sergey Parhomenko
>> >>
>> >>
>> >> On 5 March 2014 14:40, Sean Owen  wrote:
>> >>>
>> >>> Yes I think that issue is fixed (Patrick you had the last eyes on it
>> >>> IIRC?)
>> >>>
>> >>> If you are using log4j, in general, do not redirect log4j to slf4j.
>> >>> Stuff using log4j is already using log4j, done.
>> >>> --
>> >>> Sean Owen | Director, Data Science | London
>> >>>
>> >>>
>> >>> On Wed, Mar 5, 2014 at 1:12 PM, Sergey Parhomenko
>> >>> 
>> >>> wrote:
>> >>> > Hi,
>> >>> >
>> >>> > I'm trying to redirect Spark logs to slf4j. Spark seem to be using
>> >>> > Log4J, so
>> >>> > I did the typical steps of forcing a Log4J-based framework to use
>> >>> > slf4j
>> >>> > -
>> >>> > manually excluded slf4j-log4j12 and log4j, and included
>> >>> > log4j-over-slf4j.
>> >>> > When doing that however Spark starts failing on initialization with:
>> >>> > java.lang.StackOverflowError
>> >>> > at java.lang.ThreadLocal.access$400(ThreadLocal.java:72)
>> >>> > at
>> >>> > java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:376)
>> >>> > at
>> >>> >
>> >>> > java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:261)
>> >>> > at java.lang.ThreadLocal.get(ThreadLocal.java:146)
>> >>> > at java.lang.StringCoding.deref(StringCoding.java:63)
>> >>> > at java.lang.StringCoding.encode(StringCoding.java:330)
>> >>> > at java.lang.String.getBytes(String.java:916)
>> >>> > at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>> >>> > at
>> >>> > java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
>> >>> > at java.io.File.exists(File.java:813)
>> >>> > at
>> >>> > sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
>> >>> >

Re: trying to understand job cancellation

2014-03-05 Thread Mayur Rustagi
Quite unlikely as jobid are given in an incremental fashion, so your future
jobid are not likely to be killed if your groupid is not repeated.I guess
the issue is something else.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Wed, Mar 5, 2014 at 3:50 PM, Koert Kuipers  wrote:

> i did that. my next job gets a random new group job id (a uuid). however
> that doesnt seem to stop the job from getting sucked into the cancellation
> it seems
>
>
> On Wed, Mar 5, 2014 at 6:47 PM, Mayur Rustagi wrote:
>
>> You can randomize job groups as well. to secure yourself against
>> termination.
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>> @mayur_rustagi 
>>
>>
>>
>> On Wed, Mar 5, 2014 at 3:42 PM, Koert Kuipers  wrote:
>>
>>> got it. seems like i better stay away from this feature for now..
>>>
>>>
>>> On Wed, Mar 5, 2014 at 5:55 PM, Mayur Rustagi 
>>> wrote:
>>>
 One issue is that job cancellation is posted on eventloop. So its
 possible that subsequent jobs submitted to job queue may beat the job
 cancellation event & hence the job cancellation event may end up closing
 them too.
 So there's definitely a race condition you are risking even if not
 running into.

 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
 @mayur_rustagi 



 On Wed, Mar 5, 2014 at 2:40 PM, Koert Kuipers wrote:

> SparkContext.cancelJobGroup
>
>
> On Wed, Mar 5, 2014 at 5:32 PM, Mayur Rustagi  > wrote:
>
>> How do you cancel the job. Which API do you use?
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>>  @mayur_rustagi 
>>
>>
>>
>> On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers wrote:
>>
>>> i also noticed that jobs (with a new JobGroupId) which i run after
>>> this use which use the same RDDs get very confused. i see lots of 
>>> cancelled
>>> stages and retries that go on forever.
>>>
>>>
>>> On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers wrote:
>>>
 i have a running job that i cancel while keeping the spark context
 alive.

 at the time of cancellation the active stage is 14.

 i see in logs:
 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel
 job group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
 stage 10
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
 stage 14
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was
 cancelled
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove
 TaskSet 14.0 from pool x
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
 stage 13
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
 stage 12
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
 stage 11
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
 stage 15

 so far it all looks good. then i get a lot of messages like this:
 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring
 update with state FINISHED from TID 883 because its task set is gone
 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring
 update with state KILLED from TID 888 because its task set is gone

 after this stage 14 hangs around in active stages, without any sign
 of progress or cancellation. it just sits there forever, stuck. 
 looking at
 the logs of the executors confirms this. they task seem to be still
 running, but nothing is happening. for example (by the time i look at 
 this
 its 4:58 so this tasks hasnt done anything in 15 mins):

 14/03/04 16:43:16 INFO Executor: Serialized size of result for 943
 is 1007
 14/03/04 16:43:16 INFO Executor: Sending result for 943 directly to
 driver
 14/03/04 16:43:16 INFO Executor: Finished task ID 943
 14/03/04 16:43:16 INFO Executor: Serialized size of result for 945
 is 1007
 14/03/04 16:43:16 INFO Executor: Sending result for 945 directly to
 driver
 14/03/04 16:43:16 INFO Executor: Finished task ID 945
 14/03/04 16:43:19 INFO BlockManager: Removing RDD 66

 not sure what to make of this. any suggestions? best, koert

>>>
>>>
>>
>

>>>
>>
>


Is there a way to control where RDD partition physically go to?

2014-03-05 Thread Yishu Lin
Let’s say I have a RDD that represents user’s behavior data. I can shard the 
RDD to several partitions on user id by HashPartitioner.  Is there any way that 
I can control to which machine each partition goes to? Or how does Spark 
distribute partitions onto each machine? Thanks!

Yishu

Re: trying to understand job cancellation

2014-03-05 Thread Koert Kuipers
i did that. my next job gets a random new group job id (a uuid). however
that doesnt seem to stop the job from getting sucked into the cancellation
it seems


On Wed, Mar 5, 2014 at 6:47 PM, Mayur Rustagi wrote:

> You can randomize job groups as well. to secure yourself against
> termination.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi 
>
>
>
> On Wed, Mar 5, 2014 at 3:42 PM, Koert Kuipers  wrote:
>
>> got it. seems like i better stay away from this feature for now..
>>
>>
>> On Wed, Mar 5, 2014 at 5:55 PM, Mayur Rustagi wrote:
>>
>>> One issue is that job cancellation is posted on eventloop. So its
>>> possible that subsequent jobs submitted to job queue may beat the job
>>> cancellation event & hence the job cancellation event may end up closing
>>> them too.
>>> So there's definitely a race condition you are risking even if not
>>> running into.
>>>
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>> @mayur_rustagi 
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 2:40 PM, Koert Kuipers  wrote:
>>>
 SparkContext.cancelJobGroup


 On Wed, Mar 5, 2014 at 5:32 PM, Mayur Rustagi 
 wrote:

> How do you cancel the job. Which API do you use?
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
>  @mayur_rustagi 
>
>
>
> On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers wrote:
>
>> i also noticed that jobs (with a new JobGroupId) which i run after
>> this use which use the same RDDs get very confused. i see lots of 
>> cancelled
>> stages and retries that go on forever.
>>
>>
>> On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers wrote:
>>
>>> i have a running job that i cancel while keeping the spark context
>>> alive.
>>>
>>> at the time of cancellation the active stage is 14.
>>>
>>> i see in logs:
>>> 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel job
>>> group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 10
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 14
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was
>>> cancelled
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove TaskSet
>>> 14.0 from pool x
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 13
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 12
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 11
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>>> stage 15
>>>
>>> so far it all looks good. then i get a lot of messages like this:
>>> 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring
>>> update with state FINISHED from TID 883 because its task set is gone
>>> 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring
>>> update with state KILLED from TID 888 because its task set is gone
>>>
>>> after this stage 14 hangs around in active stages, without any sign
>>> of progress or cancellation. it just sits there forever, stuck. looking 
>>> at
>>> the logs of the executors confirms this. they task seem to be still
>>> running, but nothing is happening. for example (by the time i look at 
>>> this
>>> its 4:58 so this tasks hasnt done anything in 15 mins):
>>>
>>> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 943
>>> is 1007
>>> 14/03/04 16:43:16 INFO Executor: Sending result for 943 directly to
>>> driver
>>> 14/03/04 16:43:16 INFO Executor: Finished task ID 943
>>> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 945
>>> is 1007
>>> 14/03/04 16:43:16 INFO Executor: Sending result for 945 directly to
>>> driver
>>> 14/03/04 16:43:16 INFO Executor: Finished task ID 945
>>> 14/03/04 16:43:19 INFO BlockManager: Removing RDD 66
>>>
>>> not sure what to make of this. any suggestions? best, koert
>>>
>>
>>
>

>>>
>>
>


Re: trying to understand job cancellation

2014-03-05 Thread Mayur Rustagi
You can randomize job groups as well. to secure yourself against
termination.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Wed, Mar 5, 2014 at 3:42 PM, Koert Kuipers  wrote:

> got it. seems like i better stay away from this feature for now..
>
>
> On Wed, Mar 5, 2014 at 5:55 PM, Mayur Rustagi wrote:
>
>> One issue is that job cancellation is posted on eventloop. So its
>> possible that subsequent jobs submitted to job queue may beat the job
>> cancellation event & hence the job cancellation event may end up closing
>> them too.
>> So there's definitely a race condition you are risking even if not
>> running into.
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>> @mayur_rustagi 
>>
>>
>>
>> On Wed, Mar 5, 2014 at 2:40 PM, Koert Kuipers  wrote:
>>
>>> SparkContext.cancelJobGroup
>>>
>>>
>>> On Wed, Mar 5, 2014 at 5:32 PM, Mayur Rustagi 
>>> wrote:
>>>
 How do you cancel the job. Which API do you use?

 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
  @mayur_rustagi 



 On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers wrote:

> i also noticed that jobs (with a new JobGroupId) which i run after
> this use which use the same RDDs get very confused. i see lots of 
> cancelled
> stages and retries that go on forever.
>
>
> On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers wrote:
>
>> i have a running job that i cancel while keeping the spark context
>> alive.
>>
>> at the time of cancellation the active stage is 14.
>>
>> i see in logs:
>> 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel job
>> group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 10
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 14
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was
>> cancelled
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove TaskSet
>> 14.0 from pool x
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 13
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 12
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 11
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling
>> stage 15
>>
>> so far it all looks good. then i get a lot of messages like this:
>> 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring update
>> with state FINISHED from TID 883 because its task set is gone
>> 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring update
>> with state KILLED from TID 888 because its task set is gone
>>
>> after this stage 14 hangs around in active stages, without any sign
>> of progress or cancellation. it just sits there forever, stuck. looking 
>> at
>> the logs of the executors confirms this. they task seem to be still
>> running, but nothing is happening. for example (by the time i look at 
>> this
>> its 4:58 so this tasks hasnt done anything in 15 mins):
>>
>> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 943 is
>> 1007
>> 14/03/04 16:43:16 INFO Executor: Sending result for 943 directly to
>> driver
>> 14/03/04 16:43:16 INFO Executor: Finished task ID 943
>> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 945 is
>> 1007
>> 14/03/04 16:43:16 INFO Executor: Sending result for 945 directly to
>> driver
>> 14/03/04 16:43:16 INFO Executor: Finished task ID 945
>> 14/03/04 16:43:19 INFO BlockManager: Removing RDD 66
>>
>> not sure what to make of this. any suggestions? best, koert
>>
>
>

>>>
>>
>


Re: trying to understand job cancellation

2014-03-05 Thread Koert Kuipers
got it. seems like i better stay away from this feature for now..


On Wed, Mar 5, 2014 at 5:55 PM, Mayur Rustagi wrote:

> One issue is that job cancellation is posted on eventloop. So its possible
> that subsequent jobs submitted to job queue may beat the job cancellation
> event & hence the job cancellation event may end up closing them too.
> So there's definitely a race condition you are risking even if not running
> into.
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi 
>
>
>
> On Wed, Mar 5, 2014 at 2:40 PM, Koert Kuipers  wrote:
>
>> SparkContext.cancelJobGroup
>>
>>
>> On Wed, Mar 5, 2014 at 5:32 PM, Mayur Rustagi wrote:
>>
>>> How do you cancel the job. Which API do you use?
>>>
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>>  @mayur_rustagi 
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers  wrote:
>>>
 i also noticed that jobs (with a new JobGroupId) which i run after this
 use which use the same RDDs get very confused. i see lots of cancelled
 stages and retries that go on forever.


 On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers wrote:

> i have a running job that i cancel while keeping the spark context
> alive.
>
> at the time of cancellation the active stage is 14.
>
> i see in logs:
> 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel job
> group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
> 10
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
> 14
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was
> cancelled
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove TaskSet
> 14.0 from pool x
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
> 13
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
> 12
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
> 11
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
> 15
>
> so far it all looks good. then i get a lot of messages like this:
> 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring update
> with state FINISHED from TID 883 because its task set is gone
> 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring update
> with state KILLED from TID 888 because its task set is gone
>
> after this stage 14 hangs around in active stages, without any sign of
> progress or cancellation. it just sits there forever, stuck. looking at 
> the
> logs of the executors confirms this. they task seem to be still running,
> but nothing is happening. for example (by the time i look at this its 4:58
> so this tasks hasnt done anything in 15 mins):
>
> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 943 is
> 1007
> 14/03/04 16:43:16 INFO Executor: Sending result for 943 directly to
> driver
> 14/03/04 16:43:16 INFO Executor: Finished task ID 943
> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 945 is
> 1007
> 14/03/04 16:43:16 INFO Executor: Sending result for 945 directly to
> driver
> 14/03/04 16:43:16 INFO Executor: Finished task ID 945
> 14/03/04 16:43:19 INFO BlockManager: Removing RDD 66
>
> not sure what to make of this. any suggestions? best, koert
>


>>>
>>
>


Job aborted: Spark cluster looks down

2014-03-05 Thread Christian
I have deployed a Spark cluster in standalone mode with 3 machines:

node1/192.168.1.2 -> master
node2/192.168.1.3 -> worker 20 cores 12g
node3/192.168.1.4 -> worker 20 cores 12g

The web interface shows the workers correctly.

When I launch the scala job (which only requires 256m of memory) these are
the logs:

14/03/05 23:24:06 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0
with 55 tasks
14/03/05 23:24:21 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/03/05 23:24:23 INFO client.AppClient$ClientActor: Connecting to master
spark://node1:7077...
14/03/05 23:24:36 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/03/05 23:24:43 INFO client.AppClient$ClientActor: Connecting to master
spark://node1:7077...
14/03/05 23:24:51 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/03/05 23:25:03 ERROR client.AppClient$ClientActor: All masters are
unresponsive! Giving up.
14/03/05 23:25:03 ERROR cluster.SparkDeploySchedulerBackend: Spark cluster
looks dead, giving up.
14/03/05 23:25:03 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 0.0 from
pool
14/03/05 23:25:03 INFO scheduler.DAGScheduler: Failed to run
saveAsNewAPIHadoopFile at CondelCalc.scala:146
Exception in thread "main" org.apache.spark.SparkException: Job aborted:
Spark cluster looks down
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
...

The generated logs by the master and the 2 workers are attached, but I
found something weird in the master logs:

14/03/05 23:37:43 INFO master.Master: Registering worker *node1:57297* with
20 cores, 12.0 GB RAM
14/03/05 23:37:43 INFO master.Master: Registering worker *node1:34188* with
20 cores, 12.0 GB RAM

It reports that the two workers are node1:57297 and node1:34188 instead of
node3 and node2 respectively.

$ cat /etc/hosts
...
192.168.1.2 node1
192.168.1.3 node2
192.168.1.4 node3
...

$ nslookup node2
Server: 192.168.1.1
Address:192.168.1.1#53

Name:   node2.cluster.local
Address: 192.168.1.3

$ nslookup node3
Server: 192.168.1.1
Address:192.168.1.1#53

Name:   node3.cluster.local
Address: 192.168.1.4

$ ssh node1 "ps aux | grep spark"
cperez   17023  1.4  0.1 4691944 154532 pts/3  Sl   23:37   0:15
/data/users/cperez/opt/jdk/bin/java -cp
:/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/conf:/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop2.2.0.jar:/data/users/cperez/opt/hadoop-2.2.0/etc/hadoop
-Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m
org.apache.spark.deploy.master.Master --ip node1 --port 7077 --webui-port
8080

$ ssh node2 "ps aux | grep spark"
cperez   17511  2.7  0.1 4625248 156304 ?  Sl   23:37   0:07
/data/users/cperez/opt/jdk/bin/java -cp
:/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/conf:/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop2.2.0.jar:/data/users/cperez/opt/hadoop-2.2.0/etc/hadoop
-Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m
org.apache.spark.deploy.worker.Worker spark://node1:7077

$ ssh node2 "netstat -lptun | grep 17511"
tcp0  0 :::8081 :::*
 LISTEN  17511/java
tcp0  0 :::192.168.1.3:34188:::*
 LISTEN  17511/java

$ ssh node3 "ps aux | grep spark"
cperez7543  1.9  0.1 4625248 158600 ?  Sl   23:37   0:09
/data/users/cperez/opt/jdk/bin/java -cp
:/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/conf:/data/users/cperez/opt/spark-0.9.0-incubating-bin-hadoop2/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop2.2.0.jar:/data/users/cperez/opt/hadoop-2.2.0/etc/hadoop
-Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m
org.apache.spark.deploy.worker.Worker spark://node1:7077

$ ssh node3 "netstat -lptun | grep 7543"
tcp0  0 :::8081 :::*
 LISTEN  7543/java
tcp0  0 :::192.168.1.4:57297:::*
 LISTEN  7543/java

I am completely blocked at this, any help would be very helpful to me. Many
thanks in advance.
Christian


spark-cperez-org.apache.spark.deploy.master.Master-1-node1.out
Description: Binary data


spark-cperez-org.apache.spark.deploy.worker.Worker-1-node2.out
Description: Binary data


spark-cperez-org.apache.spark.deploy.worker.Worker-1-node3.out
Description: Binary data


Re: trying to understand job cancellation

2014-03-05 Thread Mayur Rustagi
One issue is that job cancellation is posted on eventloop. So its possible
that subsequent jobs submitted to job queue may beat the job cancellation
event & hence the job cancellation event may end up closing them too.
So there's definitely a race condition you are risking even if not running
into.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Wed, Mar 5, 2014 at 2:40 PM, Koert Kuipers  wrote:

> SparkContext.cancelJobGroup
>
>
> On Wed, Mar 5, 2014 at 5:32 PM, Mayur Rustagi wrote:
>
>> How do you cancel the job. Which API do you use?
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>>  @mayur_rustagi 
>>
>>
>>
>> On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers  wrote:
>>
>>> i also noticed that jobs (with a new JobGroupId) which i run after this
>>> use which use the same RDDs get very confused. i see lots of cancelled
>>> stages and retries that go on forever.
>>>
>>>
>>> On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers  wrote:
>>>
 i have a running job that i cancel while keeping the spark context
 alive.

 at the time of cancellation the active stage is 14.

 i see in logs:
 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel job
 group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
 10
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
 14
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was
 cancelled
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove TaskSet
 14.0 from pool x
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
 13
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
 12
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
 11
 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage
 15

 so far it all looks good. then i get a lot of messages like this:
 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring update
 with state FINISHED from TID 883 because its task set is gone
 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring update
 with state KILLED from TID 888 because its task set is gone

 after this stage 14 hangs around in active stages, without any sign of
 progress or cancellation. it just sits there forever, stuck. looking at the
 logs of the executors confirms this. they task seem to be still running,
 but nothing is happening. for example (by the time i look at this its 4:58
 so this tasks hasnt done anything in 15 mins):

 14/03/04 16:43:16 INFO Executor: Serialized size of result for 943 is
 1007
 14/03/04 16:43:16 INFO Executor: Sending result for 943 directly to
 driver
 14/03/04 16:43:16 INFO Executor: Finished task ID 943
 14/03/04 16:43:16 INFO Executor: Serialized size of result for 945 is
 1007
 14/03/04 16:43:16 INFO Executor: Sending result for 945 directly to
 driver
 14/03/04 16:43:16 INFO Executor: Finished task ID 945
 14/03/04 16:43:19 INFO BlockManager: Removing RDD 66

 not sure what to make of this. any suggestions? best, koert

>>>
>>>
>>
>


Running spark 0.9 on mesos 0.15

2014-03-05 Thread elyast
Hi,

Quick question do I need to compile spark against exactly same version of
mesos library, currently spark depends on 0.13.

The problem I am facing is following I am running MLib example with SVM and
it works nicely when I use coarse grained mode, however when running fine
grained mode on mesos the tasks scheduling gots stuck. 
(I know about ActorNotFound exception I applied latest git pull request on
branch-0.9)

Strange thing is for simple things fine grained mode works ok, however when
doing graph examples or MLib ones its just hangs (standalone job) looks like
waiting for something and spark ui shows one task in active stage but not
scheduled yet.

Happy to do more testing if needed
Thanks for information

Best regards
Lukasz Jastrzebski




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-spark-0-9-on-mesos-0-15-tp2326.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: trying to understand job cancellation

2014-03-05 Thread Koert Kuipers
SparkContext.cancelJobGroup


On Wed, Mar 5, 2014 at 5:32 PM, Mayur Rustagi wrote:

> How do you cancel the job. Which API do you use?
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi 
>
>
>
> On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers  wrote:
>
>> i also noticed that jobs (with a new JobGroupId) which i run after this
>> use which use the same RDDs get very confused. i see lots of cancelled
>> stages and retries that go on forever.
>>
>>
>> On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers  wrote:
>>
>>> i have a running job that i cancel while keeping the spark context alive.
>>>
>>> at the time of cancellation the active stage is 14.
>>>
>>> i see in logs:
>>> 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel job
>>> group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 10
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 14
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was
>>> cancelled
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove TaskSet
>>> 14.0 from pool x
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 13
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 12
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 11
>>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 15
>>>
>>> so far it all looks good. then i get a lot of messages like this:
>>> 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring update
>>> with state FINISHED from TID 883 because its task set is gone
>>> 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring update
>>> with state KILLED from TID 888 because its task set is gone
>>>
>>> after this stage 14 hangs around in active stages, without any sign of
>>> progress or cancellation. it just sits there forever, stuck. looking at the
>>> logs of the executors confirms this. they task seem to be still running,
>>> but nothing is happening. for example (by the time i look at this its 4:58
>>> so this tasks hasnt done anything in 15 mins):
>>>
>>> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 943 is
>>> 1007
>>> 14/03/04 16:43:16 INFO Executor: Sending result for 943 directly to
>>> driver
>>> 14/03/04 16:43:16 INFO Executor: Finished task ID 943
>>> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 945 is
>>> 1007
>>> 14/03/04 16:43:16 INFO Executor: Sending result for 945 directly to
>>> driver
>>> 14/03/04 16:43:16 INFO Executor: Finished task ID 945
>>> 14/03/04 16:43:19 INFO BlockManager: Removing RDD 66
>>>
>>> not sure what to make of this. any suggestions? best, koert
>>>
>>
>>
>


Re: trying to understand job cancellation

2014-03-05 Thread Mayur Rustagi
How do you cancel the job. Which API do you use?

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Wed, Mar 5, 2014 at 2:29 PM, Koert Kuipers  wrote:

> i also noticed that jobs (with a new JobGroupId) which i run after this
> use which use the same RDDs get very confused. i see lots of cancelled
> stages and retries that go on forever.
>
>
> On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers  wrote:
>
>> i have a running job that i cancel while keeping the spark context alive.
>>
>> at the time of cancellation the active stage is 14.
>>
>> i see in logs:
>> 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel job
>> group 3a25db23-2e39-4497-b7ab-b26b2a976f9c
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 10
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 14
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was
>> cancelled
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 14.0
>> from pool x
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 13
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 12
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 11
>> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 15
>>
>> so far it all looks good. then i get a lot of messages like this:
>> 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring update
>> with state FINISHED from TID 883 because its task set is gone
>> 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring update
>> with state KILLED from TID 888 because its task set is gone
>>
>> after this stage 14 hangs around in active stages, without any sign of
>> progress or cancellation. it just sits there forever, stuck. looking at the
>> logs of the executors confirms this. they task seem to be still running,
>> but nothing is happening. for example (by the time i look at this its 4:58
>> so this tasks hasnt done anything in 15 mins):
>>
>> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 943 is 1007
>> 14/03/04 16:43:16 INFO Executor: Sending result for 943 directly to driver
>> 14/03/04 16:43:16 INFO Executor: Finished task ID 943
>> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 945 is 1007
>> 14/03/04 16:43:16 INFO Executor: Sending result for 945 directly to driver
>> 14/03/04 16:43:16 INFO Executor: Finished task ID 945
>> 14/03/04 16:43:19 INFO BlockManager: Removing RDD 66
>>
>> not sure what to make of this. any suggestions? best, koert
>>
>
>


Re: trying to understand job cancellation

2014-03-05 Thread Koert Kuipers
i also noticed that jobs (with a new JobGroupId) which i run after this use
which use the same RDDs get very confused. i see lots of cancelled stages
and retries that go on forever.


On Tue, Mar 4, 2014 at 5:02 PM, Koert Kuipers  wrote:

> i have a running job that i cancel while keeping the spark context alive.
>
> at the time of cancellation the active stage is 14.
>
> i see in logs:
> 2014/03/04 16:43:19 INFO scheduler.DAGScheduler: Asked to cancel job group
> 3a25db23-2e39-4497-b7ab-b26b2a976f9c
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 10
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 14
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Stage 14 was
> cancelled
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 14.0
> from pool x
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 13
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 12
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 11
> 2014/03/04 16:43:19 INFO scheduler.TaskSchedulerImpl: Cancelling stage 15
>
> so far it all looks good. then i get a lot of messages like this:
> 2014/03/04 16:43:20 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state FINISHED from TID 883 because its task set is gone
> 2014/03/04 16:43:24 INFO scheduler.TaskSchedulerImpl: Ignoring update with
> state KILLED from TID 888 because its task set is gone
>
> after this stage 14 hangs around in active stages, without any sign of
> progress or cancellation. it just sits there forever, stuck. looking at the
> logs of the executors confirms this. they task seem to be still running,
> but nothing is happening. for example (by the time i look at this its 4:58
> so this tasks hasnt done anything in 15 mins):
>
> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 943 is 1007
> 14/03/04 16:43:16 INFO Executor: Sending result for 943 directly to driver
> 14/03/04 16:43:16 INFO Executor: Finished task ID 943
> 14/03/04 16:43:16 INFO Executor: Serialized size of result for 945 is 1007
> 14/03/04 16:43:16 INFO Executor: Sending result for 945 directly to driver
> 14/03/04 16:43:16 INFO Executor: Finished task ID 945
> 14/03/04 16:43:19 INFO BlockManager: Removing RDD 66
>
> not sure what to make of this. any suggestions? best, koert
>


Re: PIG to SPARK

2014-03-05 Thread Mayur Rustagi
The real question is why do you want to run pig script using Spark
Are you planning to user spark as underlying processing engine for Spark?
thats not simple
Are you planning to feed Pig data to spark for further processing, then you
can write it to HDFS & trigger your spark script.

rdd.pipe is basically similar to Hadoop streaming, allowing you to run a
script on each partition of the RDD & get output as another RDD.
Regards
Mayur


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi 



On Wed, Mar 5, 2014 at 10:29 AM, suman bharadwaj wrote:

> Hi,
>
> How can i call pig script using SPARK. Can I use rdd.pipe() here ?
>
> And can anyone share sample implementation of rdd.pipe () and if you can
> explain how rdd.pipe() works, it would really really help.
>
> Regards,
> SB
>


Re: Unable to redirect Spark logs to slf4j

2014-03-05 Thread Sergey Parhomenko
Hi Patrick,

Thanks for the patch. I tried building a patched version
of spark-core_2.10-0.9.0-incubating.jar but the Maven build fails:
*[ERROR]
/home/das/Work/thx/incubator-spark/core/src/main/scala/org/apache/spark/Logging.scala:22:
object impl is not a member of package org.slf4j*
*[ERROR] import org.slf4j.impl.StaticLoggerBinder*
*[ERROR]  ^*
*[ERROR]
/home/das/Work/thx/incubator-spark/core/src/main/scala/org/apache/spark/Logging.scala:106:
not found: value StaticLoggerBinder*
*[ERROR] val binder = StaticLoggerBinder.getSingleton*
*[ERROR]  ^*
*[ERROR] two errors found*

The module only has compile dependency on slf4j-api, and not
on slf4j-log4j12 or any other slf4j logging modules which
provide org.slf4j.impl.StaticLoggerBinder*. *Adding slf4j-log4j12 with
compile scope helps, and I confirm the logging is redirected to
slf4j/Logback correctly now with the patched module. I'm not sure however
if using compile scope for slf4j-log4j12 is a good idea.

--
Best regards,
Sergey Parhomenko


On 5 March 2014 20:11, Patrick Wendell  wrote:

> Hey All,
>
> We have a fix for this but it didn't get merged yet. I'll put it as a
> blocker for Spark 0.9.1.
>
>
> https://github.com/pwendell/incubator-spark/commit/66594e88e5be50fca073a7ef38fa62db4082b3c8
>
> https://spark-project.atlassian.net/browse/SPARK-1190
>
> Sergey if you could try compiling Spark with this batch and seeing if
> it works that would be great.
>
> Thanks,
> Patrick
>
>
> On Wed, Mar 5, 2014 at 10:26 AM, Paul Brown  wrote:
> >
> > Hi, Sergey --
> >
> > Here's my recipe, implemented via Maven; YMMV if you need to do it via
> sbt,
> > etc., but it should be equivalent:
> >
> > 1) Replace org.apache.spark.Logging trait with this:
> > https://gist.github.com/prb/bc239b1616f5ac40b4e5 (supplied by Patrick
> during
> > the discussion on the dev list)
> > 2) Amend your POM using the fragment that's in the same gist.
> >
> > We build two shaded JARs from the same build, one for the driver and one
> for
> > the worker; to ensure that our Logging trait is the one in use in the
> driver
> > (where it matters), we exclude that same class from the Spark JAR in the
> > shade plugin configuration.
> >
> > Best.
> > -- Paul
> >
> >
> > --
> > p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
> >
> >
> > On Wed, Mar 5, 2014 at 10:02 AM, Sergey Parhomenko <
> sparhome...@gmail.com>
> > wrote:
> >>
> >> Hi Sean,
> >>
> >> We're not using log4j actually, we're trying to redirect all logging to
> >> slf4j which then uses logback as the logging implementation.
> >>
> >> The fix you mentioned - am I right to assume it is not part of the
> latest
> >> released Spark version (0.9.0)? If so, are there any workarounds or
> advices
> >> on how to avoid this issue in 0.9.0?
> >>
> >> --
> >> Best regards,
> >> Sergey Parhomenko
> >>
> >>
> >> On 5 March 2014 14:40, Sean Owen  wrote:
> >>>
> >>> Yes I think that issue is fixed (Patrick you had the last eyes on it
> >>> IIRC?)
> >>>
> >>> If you are using log4j, in general, do not redirect log4j to slf4j.
> >>> Stuff using log4j is already using log4j, done.
> >>> --
> >>> Sean Owen | Director, Data Science | London
> >>>
> >>>
> >>> On Wed, Mar 5, 2014 at 1:12 PM, Sergey Parhomenko <
> sparhome...@gmail.com>
> >>> wrote:
> >>> > Hi,
> >>> >
> >>> > I'm trying to redirect Spark logs to slf4j. Spark seem to be using
> >>> > Log4J, so
> >>> > I did the typical steps of forcing a Log4J-based framework to use
> slf4j
> >>> > -
> >>> > manually excluded slf4j-log4j12 and log4j, and included
> >>> > log4j-over-slf4j.
> >>> > When doing that however Spark starts failing on initialization with:
> >>> > java.lang.StackOverflowError
> >>> > at java.lang.ThreadLocal.access$400(ThreadLocal.java:72)
> >>> > at
> java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:376)
> >>> > at
> >>> > java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:261)
> >>> > at java.lang.ThreadLocal.get(ThreadLocal.java:146)
> >>> > at java.lang.StringCoding.deref(StringCoding.java:63)
> >>> > at java.lang.StringCoding.encode(StringCoding.java:330)
> >>> > at java.lang.String.getBytes(String.java:916)
> >>> > at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
> >>> > at
> java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
> >>> > at java.io.File.exists(File.java:813)
> >>> > at
> sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
> >>> > at
> >>> > sun.misc.URLClassPath$FileLoader.findResource(URLClassPath.java:1047)
> >>> > at sun.misc.URLClassPath.findResource(URLClassPath.java:176)
> >>> > at java.net.URLClassLoader$2.run(URLClassLoader.java:551)
> >>> > at java.net.URLClassLoader$2.run(URLClassLoader.java:549)
> >>> > at java.security.AccessController.doPrivileged(Native Method)
> >>> > at java.net.URLClassLoader.findResource(URLClassLoader.java:548)
> >>> > at java.lang.ClassLoader.getResource(ClassLoader.java:1147)
> >>> > at
> org.apach

Re: pyspark and Python virtual enviroments

2014-03-05 Thread Christian
Thanks Bryn.


On Wed, Mar 5, 2014 at 9:00 PM, Bryn Keller  wrote:

> Hi Christian,
>
> The PYSPARK_PYTHON environment variable specifies the python executable to
> use for pyspark. You can put the path to a virtualenv's python executable
> and it will work fine. Remember you have to have the same installation at
> the same path on each of your cluster nodes for pyspark to work. If you're
> creating the spark context yourself in a python application, you can use
> os.environ['PYSPARK_PYTHON'] = sys.executable before creating your spark
> context.
>
> Hope that helps,
> Bryn
>
>
> On Wed, Mar 5, 2014 at 4:54 AM, Christian  wrote:
>
>> Hello,
>>
>> I usually create different python virtual environments for different
>> projects to avoid version conflicts and skip the requirement to be root to
>> install libs.
>>
>> How can I specify to pyspark to activate a virtual environment before
>> executing the tasks ?
>>
>> Further info on virtual envs:
>> http://virtualenv.readthedocs.org/en/latest/virtualenv.html
>>
>> Thanks in advance,
>> Christian
>>
>
>


Problem with HBase external table on freshly created EMR cluster

2014-03-05 Thread phil3k
Hi!

I created an EMR cluster with Spark and HBase according to
http://aws.amazon.com/articles/4926593393724923 with --hbase flag to include
HBase. Although spark and shark both work nicely with the provided S3
examples, there is a problem with external tables pointing to the HBase
instance.

We create the following external table with shark:

CREATE EXTERNAL TABLE oh (id STRING, name STRING, title STRING) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
("hbase.zookeeper.quorum" =
"172.31.13.161","hbase.zookeeper.property.clientPort"="2181",
"hbase.columns.mapping" = ":key,o:OH_Name,o:OH_Title")
TBLPROPERTIES("hbase.table.name" = "objects")

The objects table exists and has all columns as defined in the DDL.
The Zookeeper for HBase is running on the specified hostname and port.

CREATE TABLE oh_cached AS SELECT * FROM OH fails with the following error:

org.apache.spark.SparkException: Job aborted: Task 11.0:0 failed more than 4
times
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)

The logfiles of the spark workers are almost empty, however, the stages
information in the spark web console reveals additional hints:

 0 4 FAILED NODE_LOCAL ip-172-31-10-246.ec2.internal 2014/03/05 13:38:20
java.lang.IllegalStateException (java.lang.IllegalStateException: unread
block data)
  
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2420)java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1380)java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1954)j
  
ava.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1848)java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1794)java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)java.io.ObjectInput
  
Stream.readObject(ObjectInputStream.java:370)org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:61)org.apa
  
che.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:199)org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:18
  
2)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:724)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-with-HBase-external-table-on-freshly-created-EMR-cluster-tp2319.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: disconnected from cluster; reconnecting gives java.net.BindException

2014-03-05 Thread Nicholas Chammas
Whoopdeedoo, after just waiting for like an hour (well, I was doing other
stuff) the process holding that address seems to have died automatically
and now I can start up pyspark without any warnings.

Would there be a faster way to go through this than just wait around for
the orphaned process to die?

Nick


On Wed, Mar 5, 2014 at 1:01 PM, Nicholas Chammas  wrote:

> So I was doing stuff in pyspark on a cluster in EC2. I got booted due to a
> network issue. I reconnect to the cluster and start up pyspark again. I get
> these warnings:
>
> 14/03/05 17:54:56 WARN component.AbstractLifeCycle: FAILED
> SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address
> already in use
>
> Is this Bad(tm)? Do I need to do anything? sc appears to be available as
> usual.
>
> Nick
>
>
> --
> View this message in context: disconnected from cluster; reconnecting
> gives 
> java.net.BindException
> Sent from the Apache Spark User List mailing list 
> archiveat Nabble.com.
>


Re: pyspark and Python virtual enviroments

2014-03-05 Thread Bryn Keller
Hi Christian,

The PYSPARK_PYTHON environment variable specifies the python executable to
use for pyspark. You can put the path to a virtualenv's python executable
and it will work fine. Remember you have to have the same installation at
the same path on each of your cluster nodes for pyspark to work. If you're
creating the spark context yourself in a python application, you can use
os.environ['PYSPARK_PYTHON'] = sys.executable before creating your spark
context.

Hope that helps,
Bryn


On Wed, Mar 5, 2014 at 4:54 AM, Christian  wrote:

> Hello,
>
> I usually create different python virtual environments for different
> projects to avoid version conflicts and skip the requirement to be root to
> install libs.
>
> How can I specify to pyspark to activate a virtual environment before
> executing the tasks ?
>
> Further info on virtual envs:
> http://virtualenv.readthedocs.org/en/latest/virtualenv.html
>
> Thanks in advance,
> Christian
>


Re: Unable to redirect Spark logs to slf4j

2014-03-05 Thread Patrick Wendell
Hey All,

We have a fix for this but it didn't get merged yet. I'll put it as a
blocker for Spark 0.9.1.

https://github.com/pwendell/incubator-spark/commit/66594e88e5be50fca073a7ef38fa62db4082b3c8

https://spark-project.atlassian.net/browse/SPARK-1190

Sergey if you could try compiling Spark with this batch and seeing if
it works that would be great.

Thanks,
Patrick


On Wed, Mar 5, 2014 at 10:26 AM, Paul Brown  wrote:
>
> Hi, Sergey --
>
> Here's my recipe, implemented via Maven; YMMV if you need to do it via sbt,
> etc., but it should be equivalent:
>
> 1) Replace org.apache.spark.Logging trait with this:
> https://gist.github.com/prb/bc239b1616f5ac40b4e5 (supplied by Patrick during
> the discussion on the dev list)
> 2) Amend your POM using the fragment that's in the same gist.
>
> We build two shaded JARs from the same build, one for the driver and one for
> the worker; to ensure that our Logging trait is the one in use in the driver
> (where it matters), we exclude that same class from the Spark JAR in the
> shade plugin configuration.
>
> Best.
> -- Paul
>
>
> --
> p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
> On Wed, Mar 5, 2014 at 10:02 AM, Sergey Parhomenko 
> wrote:
>>
>> Hi Sean,
>>
>> We're not using log4j actually, we're trying to redirect all logging to
>> slf4j which then uses logback as the logging implementation.
>>
>> The fix you mentioned - am I right to assume it is not part of the latest
>> released Spark version (0.9.0)? If so, are there any workarounds or advices
>> on how to avoid this issue in 0.9.0?
>>
>> --
>> Best regards,
>> Sergey Parhomenko
>>
>>
>> On 5 March 2014 14:40, Sean Owen  wrote:
>>>
>>> Yes I think that issue is fixed (Patrick you had the last eyes on it
>>> IIRC?)
>>>
>>> If you are using log4j, in general, do not redirect log4j to slf4j.
>>> Stuff using log4j is already using log4j, done.
>>> --
>>> Sean Owen | Director, Data Science | London
>>>
>>>
>>> On Wed, Mar 5, 2014 at 1:12 PM, Sergey Parhomenko 
>>> wrote:
>>> > Hi,
>>> >
>>> > I'm trying to redirect Spark logs to slf4j. Spark seem to be using
>>> > Log4J, so
>>> > I did the typical steps of forcing a Log4J-based framework to use slf4j
>>> > -
>>> > manually excluded slf4j-log4j12 and log4j, and included
>>> > log4j-over-slf4j.
>>> > When doing that however Spark starts failing on initialization with:
>>> > java.lang.StackOverflowError
>>> > at java.lang.ThreadLocal.access$400(ThreadLocal.java:72)
>>> > at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:376)
>>> > at
>>> > java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:261)
>>> > at java.lang.ThreadLocal.get(ThreadLocal.java:146)
>>> > at java.lang.StringCoding.deref(StringCoding.java:63)
>>> > at java.lang.StringCoding.encode(StringCoding.java:330)
>>> > at java.lang.String.getBytes(String.java:916)
>>> > at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>>> > at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
>>> > at java.io.File.exists(File.java:813)
>>> > at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
>>> > at
>>> > sun.misc.URLClassPath$FileLoader.findResource(URLClassPath.java:1047)
>>> > at sun.misc.URLClassPath.findResource(URLClassPath.java:176)
>>> > at java.net.URLClassLoader$2.run(URLClassLoader.java:551)
>>> > at java.net.URLClassLoader$2.run(URLClassLoader.java:549)
>>> > at java.security.AccessController.doPrivileged(Native Method)
>>> > at java.net.URLClassLoader.findResource(URLClassLoader.java:548)
>>> > at java.lang.ClassLoader.getResource(ClassLoader.java:1147)
>>> > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:109)
>>> > at
>>> > org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
>>> > at org.apache.spark.Logging$class.log(Logging.scala:36)
>>> > at org.apache.spark.util.Utils$.log(Utils.scala:47)
>>> > 
>>> >
>>> > There's some related work done in SPARK-1071, but it was resolved after
>>> > 0.9.0 was released. In the last comment Sean refers to a
>>> > StackOverflowError
>>> > which was discussed in the mailing list, I assume it might be a problem
>>> > similar to mine but I was not able to find that discussion.
>>> > Is anyone aware of a way to redirect Spark 0.9.0 logs to slf4j?
>>> >
>>> > --
>>> > Best regards,
>>> > Sergey Parhomenko
>>
>>
>


Re: Spark Worker crashing and Master not seeing recovered worker

2014-03-05 Thread Ognen Duzlevski

Rob,

I have seen this too. I have 16 nodes in my spark cluster and for some 
reason (after app failures) one of the workers will go offline. I will 
ssh to the machine in question and find that the java process is running 
but for some reason the master is not noticing this. I have not had the 
time to investigate (my setup is manual, 0.9 in standalone mode).


Ognen

On 3/5/14, 12:27 PM, Rob Povey wrote:

I installed Spark 0.9.0 from the CDH parcel yesterday in standalone mode on
top of a 6 node cluster running CDH4.6 on Centos.

What I'm seeing is that when jobs fail, often the worker process will crash,
it seems that the worker restarts on the node but the Master then never
utilizes the restarted worker, and it doesn't show up in the web interface.

Has anyone seen anything like this, is there an obvious workaround/fix other
than manually restarting the workers?

In the Master log I see the following repeated many times, filer being the
"lost" node. What it looks like to me is that when the worker actor is
restarted by AKKA, it gets a new ID and for whatever reason does not
register with the master.

Any ideas?

14/03/04 20:04:44 WARN master.Master: Got heartbeat from unregistered worker
worker-20140304183709-filer.maana.io-7078
14/03/04 20:04:54 WARN master.Master: Got heartbeat from unregistered worker
worker-20140304183709-filer.maana.io-7078
14/03/04 20:04:59 WARN master.Master: Got heartbeat from unregistered worker
worker-20140304183709-filer.maana.io-7078


On Filer itself I can see it's shutdown with the following exception, and I
can see that it's been restarted and is running.

14/03/04 18:37:09 INFO worker.Worker: Executor app-20140304183705-0036/0
finished with state KILLED
14/03/04 18:37:09 INFO worker.CommandUtils: Redirection to
/var/run/spark/work/app-20140304183705-0036/0/stderr closed: Bad file
descriptor
14/03/04 18:37:09 ERROR actor.OneForOneStrategy: key not found:
app-20140304183705-0036/0
java.util.NoSuchElementException: key not found: app-20140304183705-0036/0
 at scala.collection.MapLike$class.default(MapLike.scala:228)
 at scala.collection.AbstractMap.default(Map.scala:58)
 at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
 at
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
 at akka.actor.ActorCell.invoke(ActorCell.scala:456)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
 at akka.dispatch.Mailbox.run(Mailbox.scala:219)
 at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
 at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/03/04 18:37:09 ERROR remote.EndpointWriter: AssociationError
[akka.tcp://sparkwor...@filer.maana.io:7078] ->
[akka.tcp://sparkexecu...@filer.maana.io:58331]: Error [Association failed
with [akka.tcp://sparkexecu...@filer.maana.io:58331]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkexecu...@filer.maana.io:58331]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: filer.maana.io/192.168.1.33:58331
]
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{*,null}
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{/json,null}
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{/logPage,null}
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{/log,null}
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{/static,null}
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{/metrics/json,null}
14/03/04 18:37:09 INFO worker.Worker: Starting Spark worker
filer.maana.io:7078 with 4 cores, 30.3 GB RAM
14/03/04 18:37:09 INFO worker.Worker: Spark home:
/opt/cloudera/parcels/SPARK/lib/spark
14/03/04 18:37:09 INFO server.Server: jetty-7.6.8.v20121106
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/metrics/json,null}
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/static,null}
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/log,null}
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/static,null}
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/log,null

PIG to SPARK

2014-03-05 Thread suman bharadwaj
Hi,

How can i call pig script using SPARK. Can I use rdd.pipe() here ?

And can anyone share sample implementation of rdd.pipe () and if you can
explain how rdd.pipe() works, it would really really help.

Regards,
SB


Spark Worker crashing and Master not seeing recovered worker

2014-03-05 Thread Rob Povey
I installed Spark 0.9.0 from the CDH parcel yesterday in standalone mode on
top of a 6 node cluster running CDH4.6 on Centos.

What I'm seeing is that when jobs fail, often the worker process will crash,
it seems that the worker restarts on the node but the Master then never
utilizes the restarted worker, and it doesn't show up in the web interface. 

Has anyone seen anything like this, is there an obvious workaround/fix other
than manually restarting the workers?

In the Master log I see the following repeated many times, filer being the
"lost" node. What it looks like to me is that when the worker actor is
restarted by AKKA, it gets a new ID and for whatever reason does not
register with the master.

Any ideas? 

14/03/04 20:04:44 WARN master.Master: Got heartbeat from unregistered worker
worker-20140304183709-filer.maana.io-7078
14/03/04 20:04:54 WARN master.Master: Got heartbeat from unregistered worker
worker-20140304183709-filer.maana.io-7078
14/03/04 20:04:59 WARN master.Master: Got heartbeat from unregistered worker
worker-20140304183709-filer.maana.io-7078


On Filer itself I can see it's shutdown with the following exception, and I
can see that it's been restarted and is running.

14/03/04 18:37:09 INFO worker.Worker: Executor app-20140304183705-0036/0
finished with state KILLED
14/03/04 18:37:09 INFO worker.CommandUtils: Redirection to
/var/run/spark/work/app-20140304183705-0036/0/stderr closed: Bad file
descriptor
14/03/04 18:37:09 ERROR actor.OneForOneStrategy: key not found:
app-20140304183705-0036/0
java.util.NoSuchElementException: key not found: app-20140304183705-0036/0
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.mutable.HashMap.apply(HashMap.scala:64)
at
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:232)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/03/04 18:37:09 ERROR remote.EndpointWriter: AssociationError
[akka.tcp://sparkwor...@filer.maana.io:7078] ->
[akka.tcp://sparkexecu...@filer.maana.io:58331]: Error [Association failed
with [akka.tcp://sparkexecu...@filer.maana.io:58331]] [
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkexecu...@filer.maana.io:58331]
Caused by:
akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2:
Connection refused: filer.maana.io/192.168.1.33:58331
]
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{*,null}
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{/json,null}
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{/logPage,null}
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{/log,null}
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{/static,null}
14/03/04 18:37:09 INFO handler.ContextHandler: stopped
o.e.j.s.h.ContextHandler{/metrics/json,null}
14/03/04 18:37:09 INFO worker.Worker: Starting Spark worker
filer.maana.io:7078 with 4 cores, 30.3 GB RAM
14/03/04 18:37:09 INFO worker.Worker: Spark home:
/opt/cloudera/parcels/SPARK/lib/spark
14/03/04 18:37:09 INFO server.Server: jetty-7.6.8.v20121106
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/metrics/json,null}
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/static,null}
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/log,null}
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/static,null}
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/log,null}
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/logPage,null}
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/json,null}
14/03/04 18:37:09 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{*,null}
14/03/04 18:37:09 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:18081
14/03/04 18:37:09 INFO ui.WorkerWebUI: Started Worker 

Re: Unable to redirect Spark logs to slf4j

2014-03-05 Thread Paul Brown
Hi, Sergey --

Here's my recipe, implemented via Maven; YMMV if you need to do it via sbt,
etc., but it should be equivalent:

1) Replace org.apache.spark.Logging trait with this:
https://gist.github.com/prb/bc239b1616f5ac40b4e5 (supplied by Patrick
during the discussion on the dev list)
2) Amend your POM using the fragment that's in the same gist.

We build two shaded JARs from the same build, one for the driver and one
for the worker; to ensure that our Logging trait is the one in use in the
driver (where it matters), we exclude that same class from the Spark JAR in
the shade plugin configuration.

Best.
-- Paul


—
p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


On Wed, Mar 5, 2014 at 10:02 AM, Sergey Parhomenko wrote:

> Hi Sean,
>
> We're not using log4j actually, we're trying to redirect all logging to
> slf4j which then uses logback as the logging implementation.
>
> The fix you mentioned - am I right to assume it is not part of the latest
> released Spark version (0.9.0)? If so, are there any workarounds or advices
> on how to avoid this issue in 0.9.0?
>
> --
> Best regards,
> Sergey Parhomenko
>
>
> On 5 March 2014 14:40, Sean Owen  wrote:
>
>> Yes I think that issue is fixed (Patrick you had the last eyes on it
>> IIRC?)
>>
>> If you are using log4j, in general, do not redirect log4j to slf4j.
>> Stuff using log4j is already using log4j, done.
>> --
>> Sean Owen | Director, Data Science | London
>>
>>
>> On Wed, Mar 5, 2014 at 1:12 PM, Sergey Parhomenko 
>> wrote:
>> > Hi,
>> >
>> > I'm trying to redirect Spark logs to slf4j. Spark seem to be using
>> Log4J, so
>> > I did the typical steps of forcing a Log4J-based framework to use slf4j
>> -
>> > manually excluded slf4j-log4j12 and log4j, and included
>> log4j-over-slf4j.
>> > When doing that however Spark starts failing on initialization with:
>> > java.lang.StackOverflowError
>> > at java.lang.ThreadLocal.access$400(ThreadLocal.java:72)
>> > at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:376)
>> > at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:261)
>> > at java.lang.ThreadLocal.get(ThreadLocal.java:146)
>> > at java.lang.StringCoding.deref(StringCoding.java:63)
>> > at java.lang.StringCoding.encode(StringCoding.java:330)
>> > at java.lang.String.getBytes(String.java:916)
>> > at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
>> > at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
>> > at java.io.File.exists(File.java:813)
>> > at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
>> > at sun.misc.URLClassPath$FileLoader.findResource(URLClassPath.java:1047)
>> > at sun.misc.URLClassPath.findResource(URLClassPath.java:176)
>> > at java.net.URLClassLoader$2.run(URLClassLoader.java:551)
>> > at java.net.URLClassLoader$2.run(URLClassLoader.java:549)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at java.net.URLClassLoader.findResource(URLClassLoader.java:548)
>> > at java.lang.ClassLoader.getResource(ClassLoader.java:1147)
>> > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:109)
>> > at
>> org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
>> > at org.apache.spark.Logging$class.log(Logging.scala:36)
>> > at org.apache.spark.util.Utils$.log(Utils.scala:47)
>> > 
>> >
>> > There's some related work done in SPARK-1071, but it was resolved after
>> > 0.9.0 was released. In the last comment Sean refers to a
>> StackOverflowError
>> > which was discussed in the mailing list, I assume it might be a problem
>> > similar to mine but I was not able to find that discussion.
>> > Is anyone aware of a way to redirect Spark 0.9.0 logs to slf4j?
>> >
>> > --
>> > Best regards,
>> > Sergey Parhomenko
>>
>
>


Re: Unable to redirect Spark logs to slf4j

2014-03-05 Thread Sergey Parhomenko
Hi Sean,

We're not using log4j actually, we're trying to redirect all logging to
slf4j which then uses logback as the logging implementation.

The fix you mentioned - am I right to assume it is not part of the latest
released Spark version (0.9.0)? If so, are there any workarounds or advices
on how to avoid this issue in 0.9.0?

--
Best regards,
Sergey Parhomenko


On 5 March 2014 14:40, Sean Owen  wrote:

> Yes I think that issue is fixed (Patrick you had the last eyes on it IIRC?)
>
> If you are using log4j, in general, do not redirect log4j to slf4j.
> Stuff using log4j is already using log4j, done.
> --
> Sean Owen | Director, Data Science | London
>
>
> On Wed, Mar 5, 2014 at 1:12 PM, Sergey Parhomenko 
> wrote:
> > Hi,
> >
> > I'm trying to redirect Spark logs to slf4j. Spark seem to be using
> Log4J, so
> > I did the typical steps of forcing a Log4J-based framework to use slf4j -
> > manually excluded slf4j-log4j12 and log4j, and included log4j-over-slf4j.
> > When doing that however Spark starts failing on initialization with:
> > java.lang.StackOverflowError
> > at java.lang.ThreadLocal.access$400(ThreadLocal.java:72)
> > at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:376)
> > at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:261)
> > at java.lang.ThreadLocal.get(ThreadLocal.java:146)
> > at java.lang.StringCoding.deref(StringCoding.java:63)
> > at java.lang.StringCoding.encode(StringCoding.java:330)
> > at java.lang.String.getBytes(String.java:916)
> > at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
> > at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
> > at java.io.File.exists(File.java:813)
> > at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
> > at sun.misc.URLClassPath$FileLoader.findResource(URLClassPath.java:1047)
> > at sun.misc.URLClassPath.findResource(URLClassPath.java:176)
> > at java.net.URLClassLoader$2.run(URLClassLoader.java:551)
> > at java.net.URLClassLoader$2.run(URLClassLoader.java:549)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findResource(URLClassLoader.java:548)
> > at java.lang.ClassLoader.getResource(ClassLoader.java:1147)
> > at org.apache.spark.Logging$class.initializeLogging(Logging.scala:109)
> > at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
> > at org.apache.spark.Logging$class.log(Logging.scala:36)
> > at org.apache.spark.util.Utils$.log(Utils.scala:47)
> > 
> >
> > There's some related work done in SPARK-1071, but it was resolved after
> > 0.9.0 was released. In the last comment Sean refers to a
> StackOverflowError
> > which was discussed in the mailing list, I assume it might be a problem
> > similar to mine but I was not able to find that discussion.
> > Is anyone aware of a way to redirect Spark 0.9.0 logs to slf4j?
> >
> > --
> > Best regards,
> > Sergey Parhomenko
>


disconnected from cluster; reconnecting gives java.net.BindException

2014-03-05 Thread Nicholas Chammas
So I was doing stuff in pyspark on a cluster in EC2. I got booted due to a
network issue. I reconnect to the cluster and start up pyspark again. I get
these warnings:

14/03/05 17:54:56 WARN component.AbstractLifeCycle: FAILED
SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address
already in use

Is this Bad(tm)? Do I need to do anything? sc appears to be available as usual.

Nick




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/disconnected-from-cluster-reconnecting-gives-java-net-BindException-tp2309.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Python 2.7 + numpy break sortByKey()

2014-03-05 Thread Nicholas Chammas
Devs? Is this an issue for you that deserves a ticket?


On Sun, Mar 2, 2014 at 4:32 PM, Nicholas Chammas  wrote:

> So this issue appears to be related to the other Python 2.7-related issue
> I reported in this 
> thread
> .
>
> Shall I open a bug in JIRA about this and include the wikistat repro?
>
> Nick
>
>
> On Sun, Mar 2, 2014 at 1:50 AM, nicholas.chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> Unexpected behavior. Here's the repro:
>>
>>1. Launch an EC2 cluster with spark-ec2. 1 slave; default instance
>>type.
>>2. Upgrade the cluster to Python 2.7 using the instructions 
>> here
>>.
>>3. pip2.7 install numpy
>>4. Run this script in the pyspark shell:
>>
>>wikistat = sc.textFile('s3n://ACCESSKEY:SECRET@bigdatademo
>>/sample/wiki/pagecounts-20100212-05.gz')
>>wikistat = wikistat.map(lambda x: x.split(' ')).cache()
>>wikistat.map(lambda x: (x[1], int(x[3]))).map(lambda x:
>>(x[1],x[0])).sortByKey(False).take(5)
>>
>>5. You will see a long error output that includes a complaint about
>>NumPy not being installed.
>>6. Now remove the sortByKey() from that last line and rerun it.
>>
>>wikistat.map(lambda x: (x[1], int(x[3]))).map(lambda x:
>>(x[1],x[0])).take(5)
>>
>>You should see your results without issue. So it's the sortByKey()
>>that's choking.
>>7. Quit the pyspark shell and pip uninstall numpy.
>>8. Rerun the three lines from step 4. Enjoy your sorted results
>>error-free.
>>
>> Can anyone else reproduce this issue? Is it a bug? I don't see it if I
>> leave the cluster on the default Python 2.6.8.
>>
>> Installing numpy on the slave via pssh and pip2.7 (so that it's identical
>> to the master) does not fix the issue. Dunno if installing Python packages
>> everywhere is even necessary though.
>>
>> Nick
>>
>>
>> --
>> View this message in context: Python 2.7 + numpy break 
>> sortByKey()
>> Sent from the Apache Spark User List mailing list 
>> archiveat Nabble.com.
>>
>
>


Re: Explain About Logs NetworkWordcount.scala

2014-03-05 Thread eduardocalfaia
Hi TD,
I have seen in the web UI the stage number that result has been zero and in
the field GC Times there is nothing.

  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Explain-About-Logs-NetworkWordcount-scala-tp1835p2306.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Problem with HBase external table on freshly created EMR cluster

2014-03-05 Thread Philip Limbeck
Hi!

I created an EMR cluster with Spark and HBase according to
http://aws.amazon.com/articles/4926593393724923 with --hbase flag to
include HBase. Although spark and shark both work nicely with the provided
S3 examples, there is a problem with external tables pointing to the HBase
instance.

We create the following external table with shark:

CREATE EXTERNAL TABLE oh (id STRING, name STRING, title STRING) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
("hbase.zookeeper.quorum" =
"172.31.13.161","hbase.zookeeper.property.clientPort"="2181",
"hbase.columns.mapping" = ":key,o:OH_Name,o:OH_Title") TBLPROPERTIES("
hbase.table.name" = "objects")

The objects table exists and has all columns as defined in the DDL.
The Zookeeper for HBase is running on the specified hostname and port.

CREATE TABLE oh_cached AS SELECT * FROM OH fails with the following error:

org.apache.spark.SparkException: Job aborted: Task 11.0:0 failed more than
4 times
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)

The logfiles of the spark workers are almost empty, however, the stages
information in the spark web console reveals additional hints:

 0 4 FAILED NODE_LOCAL ip-172-31-10-246.ec2.internal 2014/03/05 13:38:20
java.lang.IllegalStateException (java.lang.IllegalStateException: unread
block data)

java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2420)java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1380)java.io.ObjectInputStream.skipCustomData(ObjectInputStream.java:1954)j

ava.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1848)java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1794)java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)java.io.ObjectInput

Stream.readObject(ObjectInputStream.java:370)org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:61)org.apa

che.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:199)org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:50)org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:18

2)java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)java.lang.Thread.run(Thread.java:724)


Re: pyspark: Importing other py-files in PYTHONPATH

2014-03-05 Thread Anders Bennehag
I just discovered that putting myLib in /usr/local/python2-7/dist-packages/
on the worker-nodes will let me import the module in a pyspark-script...

That is a solution but it would be nice if modules in PYTHONPATH were
included as well.


On Wed, Mar 5, 2014 at 1:34 PM, Anders Bennehag  wrote:

> Hi there,
>
> I am running spark 0.9.0 standalone on a cluster. The documentation
> http://spark.incubator.apache.org/docs/latest/python-programming-guide.htmlstates
>  that code-dependencies can be deployed through the pyFiles argument
> to the SparkContext.
>
> But in my case, the relevant code, lets call it myLib is already available
> in PYTHONPATH on the worker-nodes. However, when trying to access this code
> through a regular 'import myLib' in the script sent to pyspark, the
> spark-workers seem to hang in the middle of the script without any specific
> errors.
>
> If I start a regular python-shell on the workers, there is no problem
> importing myLib and accessing it.
>
> Why is this?
>
> /Anders
>
>


Re: Unable to redirect Spark logs to slf4j

2014-03-05 Thread Sean Owen
Yes I think that issue is fixed (Patrick you had the last eyes on it IIRC?)

If you are using log4j, in general, do not redirect log4j to slf4j.
Stuff using log4j is already using log4j, done.
--
Sean Owen | Director, Data Science | London


On Wed, Mar 5, 2014 at 1:12 PM, Sergey Parhomenko  wrote:
> Hi,
>
> I'm trying to redirect Spark logs to slf4j. Spark seem to be using Log4J, so
> I did the typical steps of forcing a Log4J-based framework to use slf4j -
> manually excluded slf4j-log4j12 and log4j, and included log4j-over-slf4j.
> When doing that however Spark starts failing on initialization with:
> java.lang.StackOverflowError
> at java.lang.ThreadLocal.access$400(ThreadLocal.java:72)
> at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:376)
> at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:261)
> at java.lang.ThreadLocal.get(ThreadLocal.java:146)
> at java.lang.StringCoding.deref(StringCoding.java:63)
> at java.lang.StringCoding.encode(StringCoding.java:330)
> at java.lang.String.getBytes(String.java:916)
> at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
> at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
> at java.io.File.exists(File.java:813)
> at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
> at sun.misc.URLClassPath$FileLoader.findResource(URLClassPath.java:1047)
> at sun.misc.URLClassPath.findResource(URLClassPath.java:176)
> at java.net.URLClassLoader$2.run(URLClassLoader.java:551)
> at java.net.URLClassLoader$2.run(URLClassLoader.java:549)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findResource(URLClassLoader.java:548)
> at java.lang.ClassLoader.getResource(ClassLoader.java:1147)
> at org.apache.spark.Logging$class.initializeLogging(Logging.scala:109)
> at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
> at org.apache.spark.Logging$class.log(Logging.scala:36)
> at org.apache.spark.util.Utils$.log(Utils.scala:47)
> 
>
> There's some related work done in SPARK-1071, but it was resolved after
> 0.9.0 was released. In the last comment Sean refers to a StackOverflowError
> which was discussed in the mailing list, I assume it might be a problem
> similar to mine but I was not able to find that discussion.
> Is anyone aware of a way to redirect Spark 0.9.0 logs to slf4j?
>
> --
> Best regards,
> Sergey Parhomenko


Unable to redirect Spark logs to slf4j

2014-03-05 Thread Sergey Parhomenko
Hi,

I'm trying to redirect Spark logs to slf4j. Spark seem to be using Log4J,
so I did the typical steps of forcing a Log4J-based framework to use slf4j
- manually excluded slf4j-log4j12 and log4j, and included log4j-over-slf4j.
When doing that however Spark starts failing on initialization with:
java.lang.StackOverflowError
 at java.lang.ThreadLocal.access$400(ThreadLocal.java:72)
at java.lang.ThreadLocal$ThreadLocalMap.getEntry(ThreadLocal.java:376)
 at java.lang.ThreadLocal$ThreadLocalMap.access$000(ThreadLocal.java:261)
at java.lang.ThreadLocal.get(ThreadLocal.java:146)
 at java.lang.StringCoding.deref(StringCoding.java:63)
at java.lang.StringCoding.encode(StringCoding.java:330)
 at java.lang.String.getBytes(String.java:916)
at java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
 at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242)
at java.io.File.exists(File.java:813)
 at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080)
at sun.misc.URLClassPath$FileLoader.findResource(URLClassPath.java:1047)
 at sun.misc.URLClassPath.findResource(URLClassPath.java:176)
at java.net.URLClassLoader$2.run(URLClassLoader.java:551)
 at java.net.URLClassLoader$2.run(URLClassLoader.java:549)
at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findResource(URLClassLoader.java:548)
at java.lang.ClassLoader.getResource(ClassLoader.java:1147)
 at org.apache.spark.Logging$class.initializeLogging(Logging.scala:109)
at org.apache.spark.Logging$class.initializeIfNecessary(Logging.scala:97)
 at org.apache.spark.Logging$class.log(Logging.scala:36)
at org.apache.spark.util.Utils$.log(Utils.scala:47)
 

There's some related work done in
SPARK-1071,
but it was resolved after 0.9.0 was released. In the last comment Sean
refers to a StackOverflowError which was discussed in the mailing list, I
assume it might be a problem similar to mine but I was not able to find
that discussion.
Is anyone aware of a way to redirect Spark 0.9.0 logs to slf4j?

--
Best regards,
Sergey Parhomenko


pyspark and Python virtual enviroments

2014-03-05 Thread Christian
Hello,

I usually create different python virtual environments for different
projects to avoid version conflicts and skip the requirement to be root to
install libs.

How can I specify to pyspark to activate a virtual environment before
executing the tasks ?

Further info on virtual envs:
http://virtualenv.readthedocs.org/en/latest/virtualenv.html

Thanks in advance,
Christian


pyspark: Importing other py-files in PYTHONPATH

2014-03-05 Thread Anders Bennehag
Hi there,

I am running spark 0.9.0 standalone on a cluster. The documentation
http://spark.incubator.apache.org/docs/latest/python-programming-guide.htmlstates
that code-dependencies can be deployed through the pyFiles argument
to the SparkContext.

But in my case, the relevant code, lets call it myLib is already available
in PYTHONPATH on the worker-nodes. However, when trying to access this code
through a regular 'import myLib' in the script sent to pyspark, the
spark-workers seem to hang in the middle of the script without any specific
errors.

If I start a regular python-shell on the workers, there is no problem
importing myLib and accessing it.

Why is this?

/Anders


Re: Word Count on Mesos Cluster

2014-03-05 Thread juanpedromoreno
Oh, I forgot to mention that I'm adding to Spark Context an uber-jar with my
project (sbt assembly -> app-assembly-0.1-SNAPSHOT.jar).

Thanks,
Juan Pedro.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Word-Count-on-Mesos-Cluster-tp2299p2300.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Word Count on Mesos Cluster

2014-03-05 Thread juanpedromoreno
Hi there, 

I tried the SimpleApp WordCount example and it works perfect on local
environment. My code:

object SimpleApp {

  def main(args: Array[String]) {
val logFile = "README.md"

val conf = new SparkConf()
  .setMaster("zk://172.31.0.11:2181/mesos") 
  .setAppName("Simple App")
  .setSparkHome("/opt/spark")
  .setJars(List("app/target/scala-2.10/app-assembly-0.1-SNAPSHOT.jar"))
  .set("spark.executor.memory", "5g")
  .set("spark.cores.max", "10")
  .set("spark.executor.uri",
"http://domain.com/spark/spark-1.0.0-wr-bin-cdh4.tgz";)

val sc = new SparkContext(conf)

System.out.println("[info] Spark Context created: " + sc)
System.out.println("[info] Spark user: " + sc.sparkUser)
 
val logData = sc.textFile(logFile, 2).cache()
System.out.println("[info] Log Data: " + logData)

val numAs = logData.filter(line => line.contains("a")).count()
System.out.println("[info] numAs: " + numAs)
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

However, if I run the spark code in cluster mode (Mesos), the framework will
encounter errors and it doesn't work. 

Log sbt run:

--
vagrant@master1:/vagrant/workspace/spark-quick-start$ sbt run
Loading /usr/share/sbt/bin/sbt-launch-lib.bash
[info] Loading project definition from
/vagrant/workspace/spark-quick-start/project
[info] Set current project to Simple Project (in build
file:/vagrant/workspace/spark-quick-start/)
[info] Running com.domain.spark.SimpleApp
14/03/05 09:53:36 WARN util.Utils: Your hostname, master1 resolves to a
loopback address: 127.0.1.1; using 172.31.1.11 instead (on interface eth1)
14/03/05 09:53:36 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
14/03/05 09:53:41 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/03/05 09:53:42 INFO Remoting: Starting remoting
14/03/05 09:53:45 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://spark@172.31.1.11:43210]
14/03/05 09:53:45 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@172.31.1.11:43210]
14/03/05 09:53:45 INFO spark.SparkEnv: Registering BlockManagerMaster
14/03/05 09:53:45 INFO storage.DiskBlockManager: Created local directory at
/tmp/spark-local-20140305095345-a5ea
14/03/05 09:53:46 INFO storage.MemoryStore: MemoryStore started with
capacity 593.9 MB.
14/03/05 09:53:46 INFO network.ConnectionManager: Bound socket to port 50585
with id = ConnectionManagerId(172.31.1.11,50585)
14/03/05 09:53:46 INFO storage.BlockManagerMaster: Trying to register
BlockManager
14/03/05 09:53:46 INFO storage.BlockManagerMasterActor$BlockManagerInfo:
Registering block manager 172.31.1.11:50585 with 593.9 MB RAM
14/03/05 09:53:46 INFO storage.BlockManagerMaster: Registered BlockManager
14/03/05 09:53:46 INFO spark.HttpServer: Starting HTTP Server
14/03/05 09:53:47 INFO server.Server: jetty-7.6.8.v20121106
14/03/05 09:53:47 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:41819
14/03/05 09:53:47 INFO broadcast.HttpBroadcast: Broadcast server started at
http://172.31.1.11:41819
14/03/05 09:53:48 INFO spark.SparkEnv: Registering MapOutputTracker
14/03/05 09:53:48 INFO spark.HttpFileServer: HTTP File server directory is
/tmp/spark-fcdabf4d-33bc-4505-a9ca-a2bb2ad43da4
14/03/05 09:53:48 INFO spark.HttpServer: Starting HTTP Server
14/03/05 09:53:48 INFO server.Server: jetty-7.6.8.v20121106
14/03/05 09:53:48 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:43856
14/03/05 09:53:53 INFO server.Server: jetty-7.6.8.v20121106
14/03/05 09:53:53 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/storage/rdd,null}
14/03/05 09:53:53 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/storage,null}
14/03/05 09:53:53 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/stages/stage,null}
14/03/05 09:53:53 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/stages/pool,null}
14/03/05 09:53:53 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/stages,null}
14/03/05 09:53:53 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/environment,null}
14/03/05 09:53:53 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/executors,null}
14/03/05 09:53:53 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/metrics/json,null}
14/03/05 09:53:53 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/static,null}
14/03/05 09:53:53 INFO handler.ContextHandler: started
o.e.j.s.h.ContextHandler{/,null}
14/03/05 09:53:53 INFO server.AbstractConnector: Started
SelectChannelConnector@172.31.1.11:4040
14/03/05 09:53:53 INFO ui.SparkUI: Started Spark Web UI at
http://172.31.1.11:4040
14/03/05 09:53:54 INFO spark.SparkContext: Added JAR
app/target/scala-2.10/app-assembly-0.1-SNAPSHOT.jar at
http://172.31.1.11:43856/jars/app-assembly
-0.1-SNAPSHOT.jar with timestamp 1394013234949
20