Re: yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)

2014-09-11 Thread Matthew Farrellee

shane,

is there anything we should do for pull requests that failed, but for 
unrelated issues?


best,


matt

On 09/11/2014 11:29 AM, shane knapp wrote:

...and the restart is done.

On Thu, Sep 11, 2014 at 7:38 AM, shane knapp skn...@berkeley.edu wrote:


jenkins is now in quiet mode, and a restart is happening soon.

On Wed, Sep 10, 2014 at 3:44 PM, shane knapp skn...@berkeley.edu wrote:


that's kinda what we're hoping as well.  :)

On Wed, Sep 10, 2014 at 2:46 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:


I'm looking forward to this. :)

Looks like Jenkins is having trouble triggering builds for new commits
or after user requests (e.g.
https://github.com/apache/spark/pull/2339#issuecomment-55165937).
Hopefully that will be resolved tomorrow.

Nick

On Tue, Sep 9, 2014 at 5:00 PM, shane knapp skn...@berkeley.edu wrote:


since the power incident last thursday, the github pull request builder
plugin is still not really working 100%.  i found an open issue
w/jenkins[1] that could definitely be affecting us, i will be pausing
builds early thursday morning and then restarting jenkins.
i'll send out a reminder tomorrow, and if this causes any problems for
you,
please let me know and we can work out a better time.

but, now for some good news!  yesterday morning, we racked and stacked
the
systems for the new jenkins instance in the berkeley datacenter.
tomorrow
i should be able to log in to them and start getting them set up and
configured.  this is a major step in getting us in to a much more
'production' style environment!

anyways:  thanks for your patience, and i think we've all learned that
hard
powering down your build system is a definite recipe for disaster.  :)

shane

[1] -- https://issues.jenkins-ci.org/browse/JENKINS-22509













-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: yet another jenkins restart early thursday morning -- 730am PDT (and a brief update on our new jenkins infra)

2014-09-11 Thread Matthew Farrellee
it was part of the review queue, but it looks like the runs have been 
gc'd. oh well!


best,


matt

On 09/11/2014 12:18 PM, shane knapp wrote:

you can just click on 'rebuild', if you'd like.  what project
specifically?  (i had forgotten that i'd killed
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/557/,
which i just started a rebuild on)

On Thu, Sep 11, 2014 at 9:15 AM, Matthew Farrellee m...@redhat.com
mailto:m...@redhat.com wrote:

shane,

is there anything we should do for pull requests that failed, but
for unrelated issues?

best,


matt

On 09/11/2014 11:29 AM, shane knapp wrote:

...and the restart is done.

On Thu, Sep 11, 2014 at 7:38 AM, shane knapp
skn...@berkeley.edu mailto:skn...@berkeley.edu wrote:

jenkins is now in quiet mode, and a restart is happening soon.

On Wed, Sep 10, 2014 at 3:44 PM, shane knapp
skn...@berkeley.edu mailto:skn...@berkeley.edu wrote:

that's kinda what we're hoping as well.  :)

On Wed, Sep 10, 2014 at 2:46 PM, Nicholas Chammas 
nicholas.cham...@gmail.com
mailto:nicholas.cham...@gmail.com wrote:

I'm looking forward to this. :)

Looks like Jenkins is having trouble triggering
builds for new commits
or after user requests (e.g.

https://github.com/apache/__spark/pull/2339#issuecomment-__55165937

https://github.com/apache/spark/pull/2339#issuecomment-55165937).
Hopefully that will be resolved tomorrow.

Nick

On Tue, Sep 9, 2014 at 5:00 PM, shane knapp
skn...@berkeley.edu mailto:skn...@berkeley.edu
wrote:

since the power incident last thursday, the
github pull request builder
plugin is still not really working 100%.  i
found an open issue
w/jenkins[1] that could definitely be affecting
us, i will be pausing
builds early thursday morning and then
restarting jenkins.
i'll send out a reminder tomorrow, and if this
causes any problems for
you,
please let me know and we can work out a better
time.

but, now for some good news!  yesterday morning,
we racked and stacked
the
systems for the new jenkins instance in the
berkeley datacenter.
tomorrow
i should be able to log in to them and start
getting them set up and
configured.  this is a major step in getting us
in to a much more
'production' style environment!

anyways:  thanks for your patience, and i think
we've all learned that
hard
powering down your build system is a definite
recipe for disaster.  :)

shane

[1] --
https://issues.jenkins-ci.org/__browse/JENKINS-22509
https://issues.jenkins-ci.org/browse/JENKINS-22509











-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.1.0 (RC4)

2014-09-03 Thread Matthew Farrellee

+1

built from sha w/ make-distribution.sh
tested basic examples (0 data) w/ local on fedora 20 (openjdk 1.7, 
python 2.7.5)
tested detection and log processing (25GB data) w/ mesos (0.19.0)  nfs 
on rhel 7 (openjdk 1.7, python 2.7.5)


On 09/03/2014 03:24 AM, Patrick Wendell wrote:

Please vote on releasing the following candidate as Apache Spark version 1.1.0!

The tag to be voted on is v1.1.0-rc4 (commit 2f9b2bd):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=2f9b2bd7844ee8393dc9c319f4fefedf95f5e460

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.1.0-rc4/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1031/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.1.0-rc4-docs/

Please vote on releasing this package as Apache Spark 1.1.0!

The vote is open until Saturday, September 06, at 08:30 UTC and passes if
a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.1.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== Regressions fixed since RC3 ==
SPARK-3332 - Issue with tagging in EC2 scripts
SPARK-3358 - Issue with regression for m3.XX instances

== What justifies a -1 vote for this release? ==
This vote is happening very late into the QA period compared with
previous votes, so -1 votes should only occur for significant
regressions from 1.0.2. Bugs already present in 1.0.X will not block
this release.

== What default changes should I be aware of? ==
1. The default value of spark.io.compression.codec is now snappy
-- Old behavior can be restored by switching to lzf

2. PySpark now performs external spilling during aggregations.
-- Old behavior can be restored by setting spark.shuffle.spill to false.

3. PySpark uses a new heuristic for determining the parallelism of
shuffle operations.
-- Old behavior can be restored by setting
spark.default.parallelism to the number of cores in the cluster.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: spark-ec2 depends on stuff in the Mesos repo

2014-09-03 Thread Matthew Farrellee
that's not a bad idea. it would also break the circular dep in versions 
that results in spark X's ec2 script installing spark X-1 by default.


best,


matt

On 09/03/2014 01:17 PM, Shivaram Venkataraman wrote:

The spark-ec2 repository isn't a part of Mesos. Back in the days, Spark
used to be hosted in the Mesos github organization as well and so we put
scripts that were used by Spark under the same organization.

FWIW I don't think these scripts belong in the Spark repository. They are
helper scripts that setup EC2 clusters with different components like HDFS,
Spark, Tachyon etc. Also one of the motivations for creating this
repository was the ability to change these scripts without requiring a new
Spark release or a new AMI etc.

We can move the repository to a different github organization like AMPLab
if that'll make sense.

Thanks
Shivaram


On Wed, Sep 3, 2014 at 10:06 AM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:


Spawned by this discussion
https://github.com/apache/spark/pull/1120#issuecomment-54305831.

See these 2 lines in spark_ec2.py:

- spark_ec2 L42

https://github.com/apache/spark/blob/6a72a36940311fcb3429bd34c8818bc7d513115c/ec2/spark_ec2.py#L42



- spark_ec2 L566

https://github.com/apache/spark/blob/6a72a36940311fcb3429bd34c8818bc7d513115c/ec2/spark_ec2.py#L566




Why does the spark-ec2 script depend on stuff in the Mesos repo? Should
they be moved to the Spark repo?

Nick
​






-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: spark-ec2 depends on stuff in the Mesos repo

2014-09-03 Thread Matthew Farrellee
oh, i see pwendell is did a patch to the release branch to make the 
release version == --spark-version default


best,


matt

On 09/03/2014 01:30 PM, Shivaram Venkataraman wrote:

Actually the circular dependency doesn't depend on the spark-ec2 scripts
-- The scripts contain download links to many Spark versions and you can
configure which one should be used.

Shivaram


On Wed, Sep 3, 2014 at 10:22 AM, Matthew Farrellee m...@redhat.com
mailto:m...@redhat.com wrote:

that's not a bad idea. it would also break the circular dep in
versions that results in spark X's ec2 script installing spark X-1
by default.

best,


matt


On 09/03/2014 01:17 PM, Shivaram Venkataraman wrote:

The spark-ec2 repository isn't a part of Mesos. Back in the
days, Spark
used to be hosted in the Mesos github organization as well and
so we put
scripts that were used by Spark under the same organization.

FWIW I don't think these scripts belong in the Spark repository.
They are
helper scripts that setup EC2 clusters with different components
like HDFS,
Spark, Tachyon etc. Also one of the motivations for creating this
repository was the ability to change these scripts without
requiring a new
Spark release or a new AMI etc.

We can move the repository to a different github organization
like AMPLab
if that'll make sense.

Thanks
Shivaram


On Wed, Sep 3, 2014 at 10:06 AM, Nicholas Chammas 
nicholas.cham...@gmail.com mailto:nicholas.cham...@gmail.com
wrote:

Spawned by this discussion
https://github.com/apache/__spark/pull/1120#issuecomment-__54305831
https://github.com/apache/spark/pull/1120#issuecomment-54305831.

See these 2 lines in spark_ec2.py:

 - spark_ec2 L42
 

https://github.com/apache/__spark/blob/__6a72a36940311fcb3429bd34c8818b__c7d513115c/ec2/spark_ec2.py#__L42

https://github.com/apache/spark/blob/6a72a36940311fcb3429bd34c8818bc7d513115c/ec2/spark_ec2.py#L42


 - spark_ec2 L566
 

https://github.com/apache/__spark/blob/__6a72a36940311fcb3429bd34c8818b__c7d513115c/ec2/spark_ec2.py#__L566

https://github.com/apache/spark/blob/6a72a36940311fcb3429bd34c8818bc7d513115c/ec2/spark_ec2.py#L566



Why does the spark-ec2 script depend on stuff in the Mesos
repo? Should
they be moved to the Spark repo?

Nick
​







-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Ask something about spark

2014-09-03 Thread Matthew Farrellee

reynold,

would you folks be willing to put some creative commons license 
information on the site and its content?


best,


matt

On 09/02/2014 06:32 PM, Reynold Xin wrote:

I think in general that is fine. It would be great if your slides come with
proper attribution.


On Tue, Sep 2, 2014 at 3:31 PM, Sanghoon Lee phoenixl...@gmail.com wrote:


Hi, I am phoenixlee and a Spark programmer in Korea.

And be a good chance this time, it tries to teach college students and
office workers to Spark.
This course will be done with the support of the government. Can I use the
data(pictures, samples, etc.) in the spark homepage for this course? Of
course, I will put the comments in thanks and webpage URL. It would be a
good opportunity, even though the findings were that there is no teaching
materials Spark and education (or community) still in Korea.

Thanks.
ᐧ






-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Ask something about spark

2014-09-03 Thread Matthew Farrellee

CC or Apache, it'd be helpful to have it listed in the footer of pages

best,


matt

On 09/03/2014 02:23 PM, Reynold Xin wrote:

I am not sure if I can just go ahead and update the website with a
creative common license.

IIRC, ASF websites are also Apache 2.0 license. Might need somebody from
legal to chime in.


On Wed, Sep 3, 2014 at 11:15 AM, Matthew Farrellee m...@redhat.com
mailto:m...@redhat.com wrote:

reynold,

would you folks be willing to put some creative commons license
information on the site and its content?

best,


matt


On 09/02/2014 06:32 PM, Reynold Xin wrote:

I think in general that is fine. It would be great if your
slides come with
proper attribution.


On Tue, Sep 2, 2014 at 3:31 PM, Sanghoon Lee
phoenixl...@gmail.com mailto:phoenixl...@gmail.com wrote:

Hi, I am phoenixlee and a Spark programmer in Korea.

And be a good chance this time, it tries to teach college
students and
office workers to Spark.
This course will be done with the support of the government.
Can I use the
data(pictures, samples, etc.) in the spark homepage for this
course? Of
course, I will put the comments in thanks and webpage URL.
It would be a
good opportunity, even though the findings were that there
is no teaching
materials Spark and education (or community) still in Korea.

Thanks.
ᐧ







-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Handling stale PRs

2014-08-26 Thread Matthew Farrellee

On 08/26/2014 04:57 AM, Sean Owen wrote:

On Tue, Aug 26, 2014 at 7:02 AM, Patrick Wendell pwend...@gmail.com wrote:

Most other ASF projects I know just ignore these patches. I'd prefer if we


Agree, this drives me crazy. It kills part of JIRA's usefulness. Spark
is blessed/cursed with incredible inbound load, but would love to
still see the project get this right-er than, say, Hadoop.


totally agree, this applies to patches as well as jiras. i'll add that 
projects who let things simply linger are missing an opportunity to 
engage their community.


spark should capitalize on its momentum to build a smoothly running 
community (vs not and accept an unbounded backlog as inevitable).




The more important thing, maybe, is how we want to deal with this
culturally. And I think we need to do a better job of making sure no pull
requests go unattended (i.e. waiting for committer feedback). If patches go
stale, it should be because the user hasn't responded, not us.


Stale JIRAs are a symptom, not a problem per se. I also want to see
the backlog cleared, but automatically closing doesn't help, if the
problem is too many JIRAs and not enough committer-hours to look at
them. Some noise gets closed, but some easy or important fixes may
disappear as well.


engagement in the community really needs to go both ways. it's 
reasonable for PRs that stop merging or have open comments that need 
resolution by the PRer to be loudly timed out. a similar thing goes for 
jiras, if there's a request for more information to resolve a bug and 
that information does not appear, half of the communication is gone and 
a loud time out is reasonable.


easy and important are in the eyes of the beholder. timeouts can go both 
ways. a jira or pr that has been around for a period of time (say 1/3 
the to-close timeout) should bump up for evaluation, hopefully resulting 
in few easy or important issues falling through the cracks.


fyi, i'm periodically going through the pyspark jiras, trying to 
reproduce issues, coalesce duplicates and ask for details. i've not been 
given any sort of permission to do this, i don't have any special 
position in the community to do this - in a well functioning community 
everyone should feel free to jump in and help.




Another thing is that we should, IMO, err on the side of explicitly saying
no or not yet to patches, rather than letting them linger without
attention. We do get patches where the user is well intentioned, but it is


Completely agree. The solution is partly more supply of committer time
on JIRAs. But that is detracting from the work the committers
themselves want to do. More of the solution is reducing demand by
helping people create useful, actionable, non-duplicate JIRAs from the
start. Or encouraging people to resolve existing JIRAs and shepherding
those in.


saying no/not-yet is a vitally important piece of information.



Elsewhere, I've found people reluctant to close JIRA for fear of
offending or turning off contributors. I think the opposite is true.
There is nothing wrong with no or not now especially accompanied
with constructive feedback. Better to state for the record what is not
being looked at and why, than let people work on and open the same
JIRAs repeatedly.


well stated!



I have also found in the past that a culture of tolerating eternal
JIRAs led people to file JIRAs in order to forget about a problem --
it's in JIRA, and it's in progress, so it feels like someone else is
going to fix it later and so it can be forgotten now.


there's some value in these now-i-can-forget jira, though i'm not 
personally a fan. it can be good to keep them around and reachable by 
search, but they should be clearly marked as no/not-yet or something 
similar.




For what it's worth, I think these project and culture mechanics are
so important and it's my #1 concern for Spark at this stage. This
challenge exists so much more here, exactly because there is so much
potential. I'd love to help by trying to identify and close stale
JIRAs but am afraid that tagging them is just adding to the heap of
work.


+1 concern and potential!


best,


matt

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org