Re: Question about Scala style, explicit typing within transformation functions and anonymous val.

2016-04-17 Thread Mark Hamstra
I actually find my version of 3 more readable than the one with the `_`,
which looks too much like a partially applied function.  It's a minor
issue, though.

On Sat, Apr 16, 2016 at 11:56 PM, Hyukjin Kwon  wrote:

> Hi Mark,
>
> I know but that could harm readability. AFAIK, for this reason, that is
> not (or rarely) used in Spark.
>
> 2016-04-17 15:54 GMT+09:00 Mark Hamstra :
>
>> FWIW, 3 should work as just `.map(function)`.
>>
>> On Sat, Apr 16, 2016 at 11:48 PM, Reynold Xin 
>> wrote:
>>
>>> Hi Hyukjin,
>>>
>>> Thanks for asking.
>>>
>>> For 1 the change is almost always better.
>>>
>>> For 2 it depends on the context. In general if the type is not obvious,
>>> it helps readability to explicitly declare them.
>>>
>>> For 3 again it depends on context.
>>>
>>>
>>> So while it is a good idea to change 1 to reflect a more consistent code
>>> base (and maybe we should codify it), it is almost always a bad idea to
>>> change 2 and 3 just for the sake of changing them.
>>>
>>>
>>>
>>> On Sat, Apr 16, 2016 at 11:06 PM, Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 First of all, I am sorry that this is relatively trivial and too minor
 but I just want to be clear on this and careful for the more PRs in the
 future.

 Recently, I have submitted a PR (
 https://github.com/apache/spark/pull/12413) about Scala style and this
 was merged. In this PR, I changed

 1.

 from

 .map(item => {
   ...
 })

 to

 .map { item =>
   ...
 }



 2.
 from

 words.foreachRDD { (rdd: RDD[String], time: Time) => ...

 to

 words.foreachRDD { (rdd, time) => ...



 3.

 from

 .map { x =>
   function(x)
 }

 to

 .map(function(_))


 My question is, I think it looks 2. and 3. are arguable (please see the
 discussion in the PR).
 I agree that I might not have to change those in the future but I just
 wonder if I should revert 2. and 3..

 FYI,
 - The usage of 2. is pretty rare.
 - 3. is pretty a lot. but the PR corrects ones like above only when the
 val within closure looks obviously meaningless (such as x or a) and with
 only single line.

 I would appreciate that if you add some comments and opinions on this.

 Thanks!

>>>
>>>
>>
>


Re: Using local-cluster mode for testing Spark-related projects

2016-04-17 Thread Takeshi Yamamuro
Hi,
Is this a bad idea to create `SparkContext` with a `local-cluster` mode by
yourself like '
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ShuffleSuite.scala#L55
'?

// maropu

On Sun, Apr 17, 2016 at 9:47 AM, Evan Chan  wrote:

> Hey folks,
>
> I'd like to use local-cluster mode in my Spark-related projects to
> test Spark functionality in an automated way in a simulated local
> cluster.The idea is to test multi-process things in a much easier
> fashion than setting up a real cluster.   However, getting this up and
> running in a separate project (I'm using Scala 2.10 and ScalaTest) is
> nontrivial.   Does anyone have any suggestions to get up and running?
>
> This is what I've observed so far (I'm testing against 1.5.1, but
> suspect this would apply equally to 1.6.x):
>
> - One needs to have a real Spark distro and point to it using SPARK_HOME
> - SPARK_SCALA_VERSION needs to be set
> - One needs to manually inject jar paths, otherwise dependencies are
> missing.  For example, build an assembly jar of all your deps.  Java
> class directory hierarchies don't seem to work with the setJars(...).
>
> How does Spark's internal scripts make it possible to run
> local-cluster mode and set up all the class paths correctly?   And, is
> it possible to mimic this setup for external Spark projects?
>
> thanks,
> Evan
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro


Re: Using local-cluster mode for testing Spark-related projects

2016-04-17 Thread Evan Chan
What I want to find out is how to run tests like Spark's with
local-cluster, just like that suite, but in your own projects.   Has
anyone done this?

On Sun, Apr 17, 2016 at 5:37 AM, Takeshi Yamamuro  wrote:
> Hi,
> Is this a bad idea to create `SparkContext` with a `local-cluster` mode by
> yourself like
> 'https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ShuffleSuite.scala#L55'?
>
> // maropu
>
> On Sun, Apr 17, 2016 at 9:47 AM, Evan Chan  wrote:
>>
>> Hey folks,
>>
>> I'd like to use local-cluster mode in my Spark-related projects to
>> test Spark functionality in an automated way in a simulated local
>> cluster.The idea is to test multi-process things in a much easier
>> fashion than setting up a real cluster.   However, getting this up and
>> running in a separate project (I'm using Scala 2.10 and ScalaTest) is
>> nontrivial.   Does anyone have any suggestions to get up and running?
>>
>> This is what I've observed so far (I'm testing against 1.5.1, but
>> suspect this would apply equally to 1.6.x):
>>
>> - One needs to have a real Spark distro and point to it using SPARK_HOME
>> - SPARK_SCALA_VERSION needs to be set
>> - One needs to manually inject jar paths, otherwise dependencies are
>> missing.  For example, build an assembly jar of all your deps.  Java
>> class directory hierarchies don't seem to work with the setJars(...).
>>
>> How does Spark's internal scripts make it possible to run
>> local-cluster mode and set up all the class paths correctly?   And, is
>> it possible to mimic this setup for external Spark projects?
>>
>> thanks,
>> Evan
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>
>
>
> --
> ---
> Takeshi Yamamuro

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-17 Thread Luciano Resende
On Sat, Apr 16, 2016 at 11:12 PM, Reynold Xin  wrote:

> First, really thank you for leading the discussion.
>
> I am concerned that it'd hurt Spark more than it helps. As many others
> have pointed out, this unnecessarily creates a new tier of connectors or
> 3rd party libraries appearing to be endorsed by the Spark PMC or the ASF.
> We can alleviate this concern by not having "Spark" in the name, and the
> project proposal and documentation should label clearly that this is not
> affiliated with Spark.
>

I really thought we could use the Spark name (e.g. similar to
spark-packages) as this project is really aligned and dedicated to curating
extensions to Apache Spark and that's why we were inviting Spark PMC
members to join the new project PMC so that Apache Spark has the necessary
oversight and influence on the project direction. I understand folks have
concerns with the name, and thus we will start looking into name
alternatives unless there is any way I could address the community concerns
around this.


>
> Also Luciano - assuming you are interested in creating a project like this
> and find a home for the connectors that were removed, I find it surprising
> that few of the initially proposed PMC members have actually contributed
> much to the connectors, and people that have contributed a lot were left
> out. I am sure that is just an oversight.
>
>
Reynold, thanks for your concern, we are not leaving anyone out, we took
the following criteria to identify initial PMC/Committers list as described
on the first e-mail on this thread:

   - Spark Committers and Apache Members can request to participate as PMC
members
   - All active spark committers (committed on the last one year) will have
write access to the project (committer access)
   - Other committers can request to become committers.
   - Non committers would be added based on meritocracy after the start of
the project.

Based on this criteria, all people that have expressed interest in joining
the project PMC has been added to it, but I don't feel comfortable adding
names to it at my will. And I have updated the list of committers and
currently we have the following on the draft proposal:


Initial PMC


   -

   Luciano Resende (lresende AT apache DOT org) (Apache Member)
   -

   Chris Mattmann (mattmann  AT apache DOT org) (Apache Member, Apache
   board member)
   -

   Steve Loughran (stevel AT apache DOT org) (Apache Member)
   -

   Jean-Baptiste Onofré (jbonofre  AT apache DOT org) (Apache Member)
   -

   Marcelo Masiero Vanzin (vanzin AT apache DOT org) (Apache Spark
   committer)
   -

   Sean R. Owen (srowen AT apache DOT org) (Apache Member and Spark PMC)
   -

   Mridul Muralidharan (mridulm80 AT apache DOT org) (Apache Spark PMC)


Initial Committers (write access to active Spark committers that have
committed in the last one year)


   -

   Andy Konwinski (andrew AT apache DOT org) (Apache Spark)
   -

   Andrew Or (andrewor14 AT apache DOT org) (Apache Spark)
   -

   Ankur Dave (ankurdave AT apache DOT org) (Apache Spark)
   -

   Davies Liu (davies AT apache DOT org) (Apache Spark)
   -

   DB Tsai (dbtsai AT apache DOT org) (Apache Spark)
   -

   Haoyuan Li (haoyuan AT apache DOT org) (Apache Spark)
   -

   Ram Sriharsha (harsha AT apache DOT org) (Apache Spark)
   -

   Herman van Hövell (hvanhovell AT apache DOT org) (Apache Spark)
   -

   Imran Rashid (irashid AT apache DOT org) (Apache Spark)
   -

   Joseph Kurata Bradley (jkbradley AT apache DOT org) (Apache Spark)
   -

   Josh Rosen (joshrosen AT apache DOT org) (Apache Spark)
   -

   Kay Ousterhout (kayousterhout AT apache DOT org) (Apache Spark)
   -

   Cheng Lian (lian AT apache DOT org) (Apache Spark)
   -

   Mark Hamstra (markhamstra AT apache DOT org) (Apache Spark)
   -

   Michael Armbrust (marmbrus AT apache DOT org) (Apache Spark)
   -

   Matei Alexandru Zaharia (matei AT apache DOT org) (Apache Spark)
   -

   Xiangrui Meng (meng AT apache DOT org) (Apache Spark)
   -

   Prashant Sharma (prashant AT apache DOT org) (Apache Spark)
   -

   Patrick Wendell (pwendell AT apache DOT org) (Apache Spark)
   -

   Reynold Xin (rxin AT apache DOT org) (Apache Spark)
   -

   Sanford Ryza (sandy AT apache DOT org) (Apache Spark)
   -

   Kousuke Saruta (sarutak AT apache DOT org) (Apache Spark)
   -

   Shivaram Venkataraman (shivaram AT apache DOT org) (Apache Spark)
   -

   Tathagata Das (tdas AT apache DOT org) (Apache Spark)
   -

   Thomas Graves  (tgraves AT apache DOT org) (Apache Spark)
   -

   Wenchen Fan (wenchen AT apache DOT org) (Apache Spark)
   -

   Yin Huai (yhuai AT apache DOT org) (Apache Spark)
   - Shixiong Zhu (zsxwing AT apache DOT org) (Apache Spark)



BTW, It would be really good to have you on the PMC as well, and any others
that volunteer based on the criteria above. May I add you as PMC to the new
project proposal ?



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: Using local-cluster mode for testing Spark-related projects

2016-04-17 Thread Jon Maurer
Take a look at spark testing base.
https://github.com/holdenk/spark-testing-base/blob/master/README.md
On Apr 17, 2016 10:28 AM, "Evan Chan"  wrote:

> What I want to find out is how to run tests like Spark's with
> local-cluster, just like that suite, but in your own projects.   Has
> anyone done this?
>
> On Sun, Apr 17, 2016 at 5:37 AM, Takeshi Yamamuro 
> wrote:
> > Hi,
> > Is this a bad idea to create `SparkContext` with a `local-cluster` mode
> by
> > yourself like
> > '
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ShuffleSuite.scala#L55
> '?
> >
> > // maropu
> >
> > On Sun, Apr 17, 2016 at 9:47 AM, Evan Chan 
> wrote:
> >>
> >> Hey folks,
> >>
> >> I'd like to use local-cluster mode in my Spark-related projects to
> >> test Spark functionality in an automated way in a simulated local
> >> cluster.The idea is to test multi-process things in a much easier
> >> fashion than setting up a real cluster.   However, getting this up and
> >> running in a separate project (I'm using Scala 2.10 and ScalaTest) is
> >> nontrivial.   Does anyone have any suggestions to get up and running?
> >>
> >> This is what I've observed so far (I'm testing against 1.5.1, but
> >> suspect this would apply equally to 1.6.x):
> >>
> >> - One needs to have a real Spark distro and point to it using SPARK_HOME
> >> - SPARK_SCALA_VERSION needs to be set
> >> - One needs to manually inject jar paths, otherwise dependencies are
> >> missing.  For example, build an assembly jar of all your deps.  Java
> >> class directory hierarchies don't seem to work with the setJars(...).
> >>
> >> How does Spark's internal scripts make it possible to run
> >> local-cluster mode and set up all the class paths correctly?   And, is
> >> it possible to mimic this setup for external Spark projects?
> >>
> >> thanks,
> >> Evan
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
> >
> >
> >
> > --
> > ---
> > Takeshi Yamamuro
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Impact of STW GC events for the driver JVM on overall cluster

2016-04-17 Thread Rahul Tanwani
Hi Devs,

In case of stop the world GC events on the driver JVM, since all the
application threads will be stopped, there won't be any new task scheduled /
launched on the executors. In cases where the full collection is happening,
the applications threads may be stopped for a long time, and if the running
tasks are small, entire cluster will be idle till the GC is over and
application threads are resumed.

Is my understanding in this regard correct?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Impact-of-STW-GC-events-for-the-driver-JVM-on-overall-cluster-tp17183.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Recent Jenkins always fails in specific two tests

2016-04-17 Thread Kazuaki Ishizaki
I realized that recent Jenkins among different pull requests always fails 
in the following two tests
"SPARK-8020: set sql conf in spark conf"
"SPARK-9757 Persist Parquet relation with decimal column"

Here are examples.
https://github.com/apache/spark/pull/11956 (consoleFull: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56058/consoleFull
)
https://github.com/apache/spark/pull/12259 (consoleFull: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56056/consoleFull
)
https://github.com/apache/spark/pull/12450 (consoleFull: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56051/consoleFull
)
https://github.com/apache/spark/pull/12453 (consoleFull: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56050/consoleFull
)
https://github.com/apache/spark/pull/12257 (consoleFull: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56061/consoleFull
)
https://github.com/apache/spark/pull/12451 (consoleFull: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56045/consoleFull
)

I have just realized that the latest master also causes the same two 
failures at amplab Jenkins. 
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/627/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/625/

Since they seem to have some relationships with failures in recent pull 
requests, I created two JIRA entries.
https://issues.apache.org/jira/browse/SPARK-14689
https://issues.apache.org/jira/browse/SPARK-14690

Best regards,
Kazuaki Ishizaki



Re: Impact of STW GC events for the driver JVM on overall cluster

2016-04-17 Thread Reynold Xin
Your understanding is correct. If the driver is stuck in GC, then during
that period it cannot schedule any tasks.


On Sun, Apr 17, 2016 at 10:27 AM, Rahul Tanwani 
wrote:

> Hi Devs,
>
> In case of stop the world GC events on the driver JVM, since all the
> application threads will be stopped, there won't be any new task scheduled
> /
> launched on the executors. In cases where the full collection is happening,
> the applications threads may be stopped for a long time, and if the running
> tasks are small, entire cluster will be idle till the GC is over and
> application threads are resumed.
>
> Is my understanding in this regard correct?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Impact-of-STW-GC-events-for-the-driver-JVM-on-overall-cluster-tp17183.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: Impact of STW GC events for the driver JVM on overall cluster

2016-04-17 Thread Rahul Tanwani
Does that not mean, GC settings with concurrent collectors should be
preferred over parallel collectors atleast on the driver side? If so, why
not have concurrent collectors specified by default when the driver JVM is
launched without any overriding on this part?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Impact-of-STW-GC-events-for-the-driver-JVM-on-overall-cluster-tp17183p17186.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Question about Scala style, explicit typing within transformation functions and anonymous val.

2016-04-17 Thread Koert Kuipers
i find version 3 without the _ also more readable

On Sun, Apr 17, 2016 at 3:02 AM, Mark Hamstra 
wrote:

> I actually find my version of 3 more readable than the one with the `_`,
> which looks too much like a partially applied function.  It's a minor
> issue, though.
>
> On Sat, Apr 16, 2016 at 11:56 PM, Hyukjin Kwon 
> wrote:
>
>> Hi Mark,
>>
>> I know but that could harm readability. AFAIK, for this reason, that is
>> not (or rarely) used in Spark.
>>
>> 2016-04-17 15:54 GMT+09:00 Mark Hamstra :
>>
>>> FWIW, 3 should work as just `.map(function)`.
>>>
>>> On Sat, Apr 16, 2016 at 11:48 PM, Reynold Xin 
>>> wrote:
>>>
 Hi Hyukjin,

 Thanks for asking.

 For 1 the change is almost always better.

 For 2 it depends on the context. In general if the type is not obvious,
 it helps readability to explicitly declare them.

 For 3 again it depends on context.


 So while it is a good idea to change 1 to reflect a more consistent
 code base (and maybe we should codify it), it is almost always a bad idea
 to change 2 and 3 just for the sake of changing them.



 On Sat, Apr 16, 2016 at 11:06 PM, Hyukjin Kwon 
 wrote:

> Hi all,
>
> First of all, I am sorry that this is relatively trivial and too minor
> but I just want to be clear on this and careful for the more PRs in the
> future.
>
> Recently, I have submitted a PR (
> https://github.com/apache/spark/pull/12413) about Scala style and
> this was merged. In this PR, I changed
>
> 1.
>
> from
>
> .map(item => {
>   ...
> })
>
> to
>
> .map { item =>
>   ...
> }
>
>
>
> 2.
> from
>
> words.foreachRDD { (rdd: RDD[String], time: Time) => ...
>
> to
>
> words.foreachRDD { (rdd, time) => ...
>
>
>
> 3.
>
> from
>
> .map { x =>
>   function(x)
> }
>
> to
>
> .map(function(_))
>
>
> My question is, I think it looks 2. and 3. are arguable (please see
> the discussion in the PR).
> I agree that I might not have to change those in the future but I just
> wonder if I should revert 2. and 3..
>
> FYI,
> - The usage of 2. is pretty rare.
> - 3. is pretty a lot. but the PR corrects ones like above only when
> the val within closure looks obviously meaningless (such as x or a) and
> with only single line.
>
> I would appreciate that if you add some comments and opinions on this.
>
> Thanks!
>


>>>
>>
>


Re: Recent Jenkins always fails in specific two tests

2016-04-17 Thread Hyukjin Kwon
+1

Yea, I am facing this problem as well,
https://github.com/apache/spark/pull/12452

I thought they are spurious because the tests are passed in my local.



2016-04-18 3:26 GMT+09:00 Kazuaki Ishizaki :

> I realized that recent Jenkins among different pull requests always fails
> in the following two tests
> "SPARK-8020: set sql conf in spark conf"
> "SPARK-9757 Persist Parquet relation with decimal column"
>
> Here are examples.
> https://github.com/apache/spark/pull/11956(consoleFull:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56058/consoleFull
> )
> https://github.com/apache/spark/pull/12259(consoleFull:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56056/consoleFull
> )
> https://github.com/apache/spark/pull/12450(consoleFull:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56051/consoleFull
> 
> )
> https://github.com/apache/spark/pull/12453(consoleFull:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56050/consoleFull
> 
> )
> https://github.com/apache/spark/pull/12257(consoleFull:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56061/consoleFull
> 
> )
> https://github.com/apache/spark/pull/12451(consoleFull:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56045/consoleFull
> 
> )
>
> I have just realized that the latest master also causes the same two
> failures at amplab Jenkins.
>
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/627/
>
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/625/
>
> Since they seem to have some relationships with failures in recent pull
> requests, I created two JIRA entries.
> https://issues.apache.org/jira/browse/SPARK-14689
> https://issues.apache.org/jira/browse/SPARK-14690
>
> Best regards,
> Kazuaki Ishizaki
>


Re: Possible deadlock in registering applications in the recovery mode

2016-04-17 Thread Niranda Perera
Hi guys,

Any update on this?

Best

On Tue, Apr 12, 2016 at 12:46 PM, Niranda Perera 
wrote:

> Hi all,
>
> I have encountered a small issue in the standalone recovery mode.
>
> Let's say there was an application A running in the cluster. Due to some
> issue, the entire cluster, together with the application A goes down.
>
> Then later on, cluster comes back online, and the master then goes into
> the 'recovering' mode, because it sees some apps, workers and drivers have
> already been in the cluster from Persistence Engine. While in the recovery
> process, the application comes back online, but now it would have a
> different ID, let's say B.
>
> But then, as per the master, application registration logic, this
> application B will NOT be added to the 'waitingApps' with the message
> ""Attempted to re-register application at same address". [1]
>
>   private def registerApplication(app: ApplicationInfo): Unit = {
> val appAddress = app.driver.address
> if (addressToApp.contains(appAddress)) {
>   logInfo("Attempted to re-register application at same address: " +
> appAddress)
>   return
> }
>
>
> The problem here is, master is trying to recover application A, which is
> not in there anymore. Therefore after the recovery process, app A will be
> dropped. However app A's successor, app B was also omitted from the
> 'waitingApps' list because it had the same address as App A previously.
>
> This creates a deadlock in the cluster, app A nor app B is available in
> the cluster.
>
> When the master is in the RECOVERING mode, shouldn't it add all the
> registering apps to a list first, and then after the recovery is completed
> (once the unsuccessful recoveries are removed), deploy the apps which are
> new?
>
> This would sort this deadlock IMO?
>
> look forward to hearing from you.
>
> best
>
> [1]
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L834
>
> --
> Niranda
> @n1r44 
> +94-71-554-8430
> https://pythagoreanscript.wordpress.com/
>



-- 
Niranda
@n1r44 
+94-71-554-8430
https://pythagoreanscript.wordpress.com/


Re: Recent Jenkins always fails in specific two tests

2016-04-17 Thread Marcin Tustin
Also hitting this: https://github.com/apache/spark/pull/12455.



On Sun, Apr 17, 2016 at 9:22 PM, Hyukjin Kwon  wrote:

> +1
>
> Yea, I am facing this problem as well,
> https://github.com/apache/spark/pull/12452
>
> I thought they are spurious because the tests are passed in my local.
>
>
>
> 2016-04-18 3:26 GMT+09:00 Kazuaki Ishizaki :
>
>> I realized that recent Jenkins among different pull requests always fails
>> in the following two tests
>> "SPARK-8020: set sql conf in spark conf"
>> "SPARK-9757 Persist Parquet relation with decimal column"
>>
>> Here are examples.
>> https://github.com/apache/spark/pull/11956(consoleFull:
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56058/consoleFull
>> )
>> https://github.com/apache/spark/pull/12259(consoleFull:
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56056/consoleFull
>> )
>> https://github.com/apache/spark/pull/12450(consoleFull:
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56051/consoleFull
>> 
>> )
>> https://github.com/apache/spark/pull/12453(consoleFull:
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56050/consoleFull
>> 
>> )
>> https://github.com/apache/spark/pull/12257(consoleFull:
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56061/consoleFull
>> 
>> )
>> https://github.com/apache/spark/pull/12451(consoleFull:
>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56045/consoleFull
>> 
>> )
>>
>> I have just realized that the latest master also causes the same two
>> failures at amplab Jenkins.
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/627/
>>
>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/625/
>>
>> Since they seem to have some relationships with failures in recent pull
>> requests, I created two JIRA entries.
>> https://issues.apache.org/jira/browse/SPARK-14689
>> https://issues.apache.org/jira/browse/SPARK-14690
>>
>> Best regards,
>> Kazuaki Ishizaki
>>
>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 

Latest news  at Handy
Handy just raised $50m 

 led 
by Fidelity



Re: Possible deadlock in registering applications in the recovery mode

2016-04-17 Thread Reynold Xin
I haven't looked closely at this, but I think your proposal makes sense.


On Sun, Apr 17, 2016 at 6:40 PM, Niranda Perera 
wrote:

> Hi guys,
>
> Any update on this?
>
> Best
>
> On Tue, Apr 12, 2016 at 12:46 PM, Niranda Perera  > wrote:
>
>> Hi all,
>>
>> I have encountered a small issue in the standalone recovery mode.
>>
>> Let's say there was an application A running in the cluster. Due to some
>> issue, the entire cluster, together with the application A goes down.
>>
>> Then later on, cluster comes back online, and the master then goes into
>> the 'recovering' mode, because it sees some apps, workers and drivers have
>> already been in the cluster from Persistence Engine. While in the recovery
>> process, the application comes back online, but now it would have a
>> different ID, let's say B.
>>
>> But then, as per the master, application registration logic, this
>> application B will NOT be added to the 'waitingApps' with the message
>> ""Attempted to re-register application at same address". [1]
>>
>>   private def registerApplication(app: ApplicationInfo): Unit = {
>> val appAddress = app.driver.address
>> if (addressToApp.contains(appAddress)) {
>>   logInfo("Attempted to re-register application at same address: " +
>> appAddress)
>>   return
>> }
>>
>>
>> The problem here is, master is trying to recover application A, which is
>> not in there anymore. Therefore after the recovery process, app A will be
>> dropped. However app A's successor, app B was also omitted from the
>> 'waitingApps' list because it had the same address as App A previously.
>>
>> This creates a deadlock in the cluster, app A nor app B is available in
>> the cluster.
>>
>> When the master is in the RECOVERING mode, shouldn't it add all the
>> registering apps to a list first, and then after the recovery is completed
>> (once the unsuccessful recoveries are removed), deploy the apps which are
>> new?
>>
>> This would sort this deadlock IMO?
>>
>> look forward to hearing from you.
>>
>> best
>>
>> [1]
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L834
>>
>> --
>> Niranda
>> @n1r44 
>> +94-71-554-8430
>> https://pythagoreanscript.wordpress.com/
>>
>
>
>
> --
> Niranda
> @n1r44 
> +94-71-554-8430
> https://pythagoreanscript.wordpress.com/
>


Re: Using local-cluster mode for testing Spark-related projects

2016-04-17 Thread Evan Chan
Jon,  Thanks.   I think I've figured it out, actually.   It's really
simple, one needs to simply set spark.executor.extraClassPath to the
current value of the java class path (java.class.path system
property).   Also, to not use HiveContext, which gives errors about
initializing a Derby database multiple times.

On Sun, Apr 17, 2016 at 9:51 AM, Jon Maurer  wrote:
> Take a look at spark testing base.
> https://github.com/holdenk/spark-testing-base/blob/master/README.md
>
> On Apr 17, 2016 10:28 AM, "Evan Chan"  wrote:
>>
>> What I want to find out is how to run tests like Spark's with
>> local-cluster, just like that suite, but in your own projects.   Has
>> anyone done this?
>>
>> On Sun, Apr 17, 2016 at 5:37 AM, Takeshi Yamamuro 
>> wrote:
>> > Hi,
>> > Is this a bad idea to create `SparkContext` with a `local-cluster` mode
>> > by
>> > yourself like
>> >
>> > 'https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/ShuffleSuite.scala#L55'?
>> >
>> > // maropu
>> >
>> > On Sun, Apr 17, 2016 at 9:47 AM, Evan Chan 
>> > wrote:
>> >>
>> >> Hey folks,
>> >>
>> >> I'd like to use local-cluster mode in my Spark-related projects to
>> >> test Spark functionality in an automated way in a simulated local
>> >> cluster.The idea is to test multi-process things in a much easier
>> >> fashion than setting up a real cluster.   However, getting this up and
>> >> running in a separate project (I'm using Scala 2.10 and ScalaTest) is
>> >> nontrivial.   Does anyone have any suggestions to get up and running?
>> >>
>> >> This is what I've observed so far (I'm testing against 1.5.1, but
>> >> suspect this would apply equally to 1.6.x):
>> >>
>> >> - One needs to have a real Spark distro and point to it using
>> >> SPARK_HOME
>> >> - SPARK_SCALA_VERSION needs to be set
>> >> - One needs to manually inject jar paths, otherwise dependencies are
>> >> missing.  For example, build an assembly jar of all your deps.  Java
>> >> class directory hierarchies don't seem to work with the setJars(...).
>> >>
>> >> How does Spark's internal scripts make it possible to run
>> >> local-cluster mode and set up all the class paths correctly?   And, is
>> >> it possible to mimic this setup for external Spark projects?
>> >>
>> >> thanks,
>> >> Evan
>> >>
>> >> -
>> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: dev-h...@spark.apache.org
>> >>
>> >
>> >
>> >
>> > --
>> > ---
>> > Takeshi Yamamuro
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org