Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-03-31 Thread Holden Karau
-1 (non-binding)

Python packaging doesn't seem to have quite worked out (looking at PKG-INFO
the description is "Description: ! missing pandoc do not upload to PyPI
"), ideally it would be nice to have this as a version we upgrade to
PyPi.
Building this on my own machine results in a longer description.

My guess is that whichever machine was used to package this is missing the
pandoc executable (or possibly pypandoc library).

On Fri, Mar 31, 2017 at 3:40 PM, Xiao Li  wrote:

> +1
>
> Xiao
>
> 2017-03-30 16:09 GMT-07:00 Michael Armbrust :
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.0. The vote is open until Sun, April 2nd, 2018 at 16:30 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.1
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.1.1-rc2
>>  (
>> 02b165dcc2ee5245d1293a375a31660c9d4e1fa6)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc2-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1227/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc2-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.1.1?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.0.
>>
>> *What happened to RC1?*
>>
>> There were issues with the release packaging and as a result was skipped.
>>
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-03-31 Thread Xiao Li
+1

Xiao

2017-03-30 16:09 GMT-07:00 Michael Armbrust :

> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, April 2nd, 2018 at 16:30 PST and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc2
>  (
> 02b165dcc2ee5245d1293a375a31660c9d4e1fa6)
>
> List of JIRA tickets resolved can be found with this filter
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc2-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1227/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc2-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.1?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> *What happened to RC1?*
>
> There were issues with the release packaging and as a result was skipped.
>


Re: Pull Request Made, Ignored So Far

2017-03-31 Thread Joseph Bradley
Hi John,

Thanks for pinging about this.  It does look useful, but I'll admit 3 days
isn't a long time since there are so many (hundreds) of open PRs.  I'll see
if I can take a look soon (or others should feel free to as well).  I'd
also recommend checking out the surrounding code and pinging the
contributors or committers who have worked on it to grab their attention &
early feedback.

Thanks!
Joseph

On Fri, Mar 31, 2017 at 7:37 AM, John Compitello 
wrote:

> Hi all,
>
> I’m a new Spark contributor who put in a pull request a few days ago:
> https://github.com/apache/spark/pull/17459
>
> It’s a relatively small, isolated change that should be pretty simple to
> review. It has been a big help in the main project I’m working on (
> https://github.com/hail-is/hail ) so I wanted to
> contribute it back to main Spark. It’s been a few days though, and I
> haven’t had my branch cleared to run tests or any acknowledgement of it
> all. Is there any process to ask someone to review your PR or get it
> assigned to someone? I’m afraid it’s just going to slowly sink down onto
> later and later pages in the PR list until it’s too deep for anyone to be
> expected to find otherwise.
>
> Best,
>
> John
>



-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] 


Pull Request Made, Ignored So Far

2017-03-31 Thread John Compitello
Hi all,

I’m a new Spark contributor who put in a pull request a few days ago: 
https://github.com/apache/spark/pull/17459 


It’s a relatively small, isolated change that should be pretty simple to 
review. It has been a big help in the main project I’m working on 
(https://github.com/hail-is/hail ) so I wanted to contribute 
it back to main Spark. It’s been a few days though, and I haven’t had my branch 
cleared to run tests or any acknowledgement of it all. Is there any process to 
ask someone to review your PR or get it assigned to someone? I’m afraid it’s 
just going to slowly sink down onto later and later pages in the PR list until 
it’s too deep for anyone to be expected to find otherwise. 

Best, 

John

Re: planning & discussion for larger scheduler changes

2017-03-31 Thread Tom Graves
filed [SPARK-20178] Improve Scheduler fetch failures - ASF JIRA
  
|  
|   
|   
|   ||

   |

  |
|  
|   |  
[SPARK-20178] Improve Scheduler fetch failures - ASF JIRA
   |   |

  |

  |

 

Tom 

On Thursday, March 30, 2017 1:21 PM, Tom Graves  
wrote:
 

 If we are worried about major changes destabilizing current code (which I can 
understand) only way around that is to make it pluggable or configurable.  For 
major changes it seems like making it pluggable is cleaner from a code being 
cluttered point of view. But it also means you may have to make the same or 
similar change in 2 places.We could make the interfaces more well defined but 
if the major changes would require interfaces changes that doesn't help.  It 
still seems like if we had a list of things we would like to accomplish and get 
an idea of the rough overall design we could see if defining the interfaces 
better or making them pluggable would help.
There seem to be 3 jiras all related to handling fetch failures: 
https://issues.apache.org/jira/browse/SPARK-20091,  SPARK-14649 , and 
SPARK-19753.  It might be nice to create one epic jira where we think about a 
design as a whole and discuss that more.  Any objections to this?  If not I'll 
create an epic and link the others to it.

Tom
On Monday, March 27, 2017 9:01 PM, Kay Ousterhout  
wrote:
 

 (1) I'm pretty hesitant to merge these larger changes, even if they're feature 
flagged, because:   (a) For some of these changes, it's not obvious that 
they'll always improve performance. e.g., for SPARK-14649, it's possible that 
the tasks that got re-started (and temporarily are running in two places) are 
going to fail in the first attempt (because they haven't read the missing map 
output yet).  In that case, not re-starting them will lead to worse 
performance.   (b) The scheduler already has some secret flags that aren't 
documented and are used by only a few people.  I'd like to avoid adding more of 
these (e.g., by merging these features, but having them off by default), 
because very few users use them (since it's hard to learn about them), they add 
complexity to the scheduler that we have to maintain, and for users who are 
considering using them, they often hide advanced behavior that's hard to reason 
about anyway (e.g., the point above for SPARK-14649).    (c) The worst 
performance problem is when jobs just hang or crash; we've seen a few cases of 
that in recent bugs, and I'm worried that merging these complex performance 
improvements trades better performance in a small number of cases for the 
possibility of worse performance via job crashes/hangs in other cases.
Roughly I think our standards for merging performance fixes to the scheduler 
should be that the performance improvement either (a) is simple / easy to 
reason about or (b) unambiguously fixes a serious performance problem.  In the 
case of SPARK-14649, for example, it is complex, and improves performance in 
some cases but hurts it in others, so doesn't fit either (a) or (b).
(2) I do think there are some scheduler re-factorings that would improve 
testability and our ability to reason about correctness, but think there are 
some what surgical, smaller things we could do in the vein of Imran's comment 
about reducing shared state.  Right now we have these super wide interfaces 
between different components of the scheduler, and it means you have to reason 
about the TSM, TSI, CGSB, and DAGSched to figure out whether something works.  
I think we could have an effort to make each component have a much narrower 
interface, so that each part hides a bunch of complexity from other components. 
 The most obvious place to do this in the short term is to remove a bunch of 
info tracking from the DAGScheduler; I filed a JIRA for that here.  I suspect 
there are similar things that could be done in other parts of the scheduler.
Tom's comments re: (2) are more about performance improvements rather than 
readability / testability / debuggability, but also seem important and it does 
seem useful to have a JIRA tracking those.
-Kay
On Mon, Mar 27, 2017 at 11:06 AM, Tom Graves  wrote:

1) I think this depends on individual case by case jira.  I haven't looked in 
detail at spark-14649 seems much larger although more the way I think we want 
to go. While SPARK-13669 seems less risky and easily configurable.
2) I don't know whether it needs an entire rewrite but I think there need to be 
some major changes made especially in the handling of reduces and fetch 
failures.  We could do a much better job of not throwing away existing work and 
more optimally handling the failure cases.  For this would it make sense for us 
to start with a jira that has a bullet list of things we would like to improve 
and get more cohesive view and see really how invasive it would be?
Tom 

On Friday, March 24, 2017 10:41 AM, Imran Rashid  

Re: [Important for PySpark Devs]: Master now tests with Python 2.7 rather than 2.6 - please retest any Python PRs

2017-03-31 Thread Jeff Zhang
Thanks, retrigger serveral pyspark PRs



Hyukjin Kwon 于2017年3月30日周四 上午7:42写道:

> Thank you for informing this.
>
> On 30 Mar 2017 3:52 a.m., "Holden Karau"  wrote:
>
> Hi PySpark Developers,
>
> In https://issues.apache.org/jira/browse/SPARK-19955 /
> https://github.com/apache/spark/pull/17355, as part of our continued
> Python 2.6 deprecation https://issues.apache.org/jira/browse/SPARK-15902
> & eventual removal https://issues.apache.org/jira/browse/SPARK-12661 ,
> Jenkins master will now test with Python 2.7 rather than Python 2.6. If you
> have a pending Python PR please re-run Jenkins tests prior to merge to
> avoid issues.
>
> For your local testing *make sure you have a version of Python 2.7
> installed on your machine *otherwise it will default to using the python
> executable and in the future you may run into compatibility issues.
>
> Note: this only impacts master and has not been merged to other branches,
> so if you want to make fixes that are planned for back ported to 2.1,
> please continue to use 2.6 compatible Python code (and note you can always
> explicitly set a python version to be run with the --python-executables
> flag when testing locally).
>
> Cheers,
>
> Holden :)
>
> P.S.
>
> If you run int any issues around this please feel free (as always) to
> reach out and ping me.
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>
>


Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted

2017-03-31 Thread Yu Wei
Got that.


Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux


From: Timothy Chen 
Sent: Friday, March 31, 2017 11:33:42 AM
To: Yu Wei
Cc: dev; us...@spark.apache.org
Subject: Re: [Spark on mesos] Spark framework not re-registered and lost after 
mesos master restarted

Hi Yu,

As mentioned earlier, currently the Spark framework will not
re-register as the failover_timeout is not set and there is no
configuration available yet.
It's only enabled in MesosClusterScheduler since it's meant to be a HA
framework.

We should add that configuration for users that want their Spark
frameworks to be able to failover in case of Master failover or
network disconnect, etc.

Tim

On Thu, Mar 30, 2017 at 8:25 PM, Yu Wei  wrote:
> Hi Tim,
>
> I tested the scenario again with settings as below,
>
> [dcos@agent spark-2.0.2-bin-hadoop2.7]$ cat conf/spark-defaults.conf
> spark.deploy.recoveryMode  ZOOKEEPER
> spark.deploy.zookeeper.url 192.168.111.53:2181
> spark.deploy.zookeeper.dir /spark
> spark.executor.memory 512M
> spark.mesos.principal agent-dev-1
>
>
> However, the case still failed. After master restarted, spark framework did
> not re-register.
> From spark framework log, it seemed that below method in
> MesosClusterScheduler was not called.
> override def reregistered(driver: SchedulerDriver, masterInfo: MasterInfo):
> Unit
>
> Did I miss something? Any advice?
>
>
> Thanks,
>
> Jared, (韦煜)
> Software developer
> Interested in open source software, big data, Linux
>
>
>
> 
> From: Timothy Chen 
> Sent: Friday, March 31, 2017 5:13 AM
> To: Yu Wei
> Cc: us...@spark.apache.org; dev
> Subject: Re: [Spark on mesos] Spark framework not re-registered and lost
> after mesos master restarted
>
> I think failover isn't enabled on regular Spark job framework, since we
> assume jobs are more ephemeral.
>
> It could be a good setting to add to the Spark framework to enable failover.
>
> Tim
>
> On Mar 30, 2017, at 10:18 AM, Yu Wei  wrote:
>
> Hi guys,
>
> I encountered a problem about spark on mesos.
>
> I setup mesos cluster and launched spark framework on mesos successfully.
>
> Then mesos master was killed and started again.
>
> However, spark framework couldn't be re-registered again as mesos agent
> does. I also couldn't find any error logs.
>
> And MesosClusterDispatcher is still running there.
>
>
> I suspect this is spark framework issue.
>
> What's your opinion?
>
>
>
> Thanks,
>
> Jared, (韦煜)
> Software developer
> Interested in open source software, big data, Linux