Re: Incubator PMC/Board report for Jan 2014 ([ppmc])

2013-12-31 Thread Henry Saputra
Thanks Matei, will try to solicit help for our JIRA import.

Oh and happy new year to Spark dev community :)

- Henry

On Tuesday, December 31, 2013, Matei Zaharia wrote:

> Yup, I’ll write it up soon.
>
> The main thing missing for us is unfortunately still the JIRA import —
> let’s see if we can get some help with that, since I think we have at least
> one version that is importable. If you know anyone who can do this please
> point them to https://issues.apache.org/jira/browse/INFRA-6419.
>
> Matei
>
> On Dec 31, 2013, at 7:46 PM, Henry Saputra 
> >
> wrote:
>
> > Any volunteer to write up the report for Spark?
> >
> > - Henry
> >
> > On Wed, Dec 25, 2013 at 6:15 AM, Marvin >
> wrote:
> >>
> >>
> >> Dear podling,
> >>
> >> This email was sent by an automated system on behalf of the Apache
> Incubator PMC.
> >> It is an initial reminder to give you plenty of time to prepare your
> quarterly
> >> board report.
> >>
> >> The board meeting is scheduled for Wed, 15 January 2014, 10:30:30:00
> PST. The report
> >> for your podling will form a part of the Incubator PMC report. The
> Incubator PMC
> >> requires your report to be submitted 2 weeks before the board meeting,
> to allow
> >> sufficient time for review and submission (Wed, Jan 1st).
> >>
> >> Please submit your report with sufficient time to allow the incubator
> PMC, and
> >> subsequently board members to review and digest. Again, the very latest
> you
> >> should submit your report is 2 weeks prior to the board meeting.
> >>
> >> Thanks,
> >>
> >> The Apache Incubator PMC
> >>
> >> Submitting your Report
> >> --
> >>
> >> Your report should contain the following:
> >>
> >> * Your project name
> >> * A brief description of your project, which assumes no knowledge of
> the project
> >>   or necessarily of its field
> >> * A list of the three most important issues to address in the move
> towards
> >>   graduation.
> >> * Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> >> * How has the community developed since the last report
> >> * How has the project developed since the last report.
> >>
> >> This should be appended to the Incubator Wiki page at:
> >>
> >>  http://wiki.apache.org/incubator/January2014
> >>
> >> Note: This manually populated. You may need to wait a little before
> this page is
> >>  created from a template.
> >>
> >> Mentors
> >> ---
> >> Mentors should review reports for their project(s) and sign them off on
> the
> >> Incubator wiki page. Signing off reports shows that you are following
> the
> >> project - projects that are not signed may raise alarms for the
> Incubator PMC.
> >>
> >> Incubator PMC
> >>
>
>


Re: Incubator PMC/Board report for Jan 2014 ([ppmc])

2013-12-31 Thread Matei Zaharia
Yup, I’ll write it up soon.

The main thing missing for us is unfortunately still the JIRA import — let’s 
see if we can get some help with that, since I think we have at least one 
version that is importable. If you know anyone who can do this please point 
them to https://issues.apache.org/jira/browse/INFRA-6419.

Matei

On Dec 31, 2013, at 7:46 PM, Henry Saputra  wrote:

> Any volunteer to write up the report for Spark?
> 
> - Henry
> 
> On Wed, Dec 25, 2013 at 6:15 AM, Marvin  wrote:
>> 
>> 
>> Dear podling,
>> 
>> This email was sent by an automated system on behalf of the Apache Incubator 
>> PMC.
>> It is an initial reminder to give you plenty of time to prepare your 
>> quarterly
>> board report.
>> 
>> The board meeting is scheduled for Wed, 15 January 2014, 10:30:30:00 PST. 
>> The report
>> for your podling will form a part of the Incubator PMC report. The Incubator 
>> PMC
>> requires your report to be submitted 2 weeks before the board meeting, to 
>> allow
>> sufficient time for review and submission (Wed, Jan 1st).
>> 
>> Please submit your report with sufficient time to allow the incubator PMC, 
>> and
>> subsequently board members to review and digest. Again, the very latest you
>> should submit your report is 2 weeks prior to the board meeting.
>> 
>> Thanks,
>> 
>> The Apache Incubator PMC
>> 
>> Submitting your Report
>> --
>> 
>> Your report should contain the following:
>> 
>> * Your project name
>> * A brief description of your project, which assumes no knowledge of the 
>> project
>>   or necessarily of its field
>> * A list of the three most important issues to address in the move towards
>>   graduation.
>> * Any issues that the Incubator PMC or ASF Board might wish/need to be aware 
>> of
>> * How has the community developed since the last report
>> * How has the project developed since the last report.
>> 
>> This should be appended to the Incubator Wiki page at:
>> 
>>  http://wiki.apache.org/incubator/January2014
>> 
>> Note: This manually populated. You may need to wait a little before this 
>> page is
>>  created from a template.
>> 
>> Mentors
>> ---
>> Mentors should review reports for their project(s) and sign them off on the
>> Incubator wiki page. Signing off reports shows that you are following the
>> project - projects that are not signed may raise alarms for the Incubator 
>> PMC.
>> 
>> Incubator PMC
>> 



Re: Disallowing null mergeCombiners

2013-12-31 Thread Reynold Xin
I added the option that doesn't require the caller to specify the
mergeCombiner closure a while ago when I wanted to disable mapSideCombine.
In virtually all use cases I know of, it is fine & easy to specify a
mergeCombiner, so I'm all for this given it simplifies the codebase.


On Tue, Dec 31, 2013 at 5:05 PM, Patrick Wendell  wrote:

> Hey All,
>
> There is a small API change that we are considering for the external
> sort patch. Previously we allowed mergeCombiner to be null when map
> side aggregation was not enabled. This is because it wasn't necessary
> in that case since mappers didn't ship pre-aggregated values to
> reducers.
>
> Because the external sort capability also relies on the mergeCombiner
> function to merge partially-aggregated on-disk segments, we now need
> it all the time, even if map side aggregation is enabled. This is a
> fairly esoteric thing that I'm not sure anyone other than Shark ever
> used, but I want to check in case anyone had feelings about this.
>
> The relevant code is here:
>
>
> https://github.com/apache/incubator-spark/pull/303/files#diff-f70e97c099b5eac05c75288cb215e080R72
>
> - Patrick
>


Disallowing null mergeCombiners

2013-12-31 Thread Patrick Wendell
Hey All,

There is a small API change that we are considering for the external
sort patch. Previously we allowed mergeCombiner to be null when map
side aggregation was not enabled. This is because it wasn't necessary
in that case since mappers didn't ship pre-aggregated values to
reducers.

Because the external sort capability also relies on the mergeCombiner
function to merge partially-aggregated on-disk segments, we now need
it all the time, even if map side aggregation is enabled. This is a
fairly esoteric thing that I'm not sure anyone other than Shark ever
used, but I want to check in case anyone had feelings about this.

The relevant code is here:

https://github.com/apache/incubator-spark/pull/303/files#diff-f70e97c099b5eac05c75288cb215e080R72

- Patrick


Re: Incubator PMC/Board report for Jan 2014 ([ppmc])

2013-12-31 Thread Henry Saputra
Any volunteer to write up the report for Spark?

- Henry

On Wed, Dec 25, 2013 at 6:15 AM, Marvin  wrote:
>
>
> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache Incubator 
> PMC.
> It is an initial reminder to give you plenty of time to prepare your quarterly
> board report.
>
> The board meeting is scheduled for Wed, 15 January 2014, 10:30:30:00 PST. The 
> report
> for your podling will form a part of the Incubator PMC report. The Incubator 
> PMC
> requires your report to be submitted 2 weeks before the board meeting, to 
> allow
> sufficient time for review and submission (Wed, Jan 1st).
>
> Please submit your report with sufficient time to allow the incubator PMC, and
> subsequently board members to review and digest. Again, the very latest you
> should submit your report is 2 weeks prior to the board meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
> --
>
> Your report should contain the following:
>
>  * Your project name
>  * A brief description of your project, which assumes no knowledge of the 
> project
>or necessarily of its field
>  * A list of the three most important issues to address in the move towards
>graduation.
>  * Any issues that the Incubator PMC or ASF Board might wish/need to be aware 
> of
>  * How has the community developed since the last report
>  * How has the project developed since the last report.
>
> This should be appended to the Incubator Wiki page at:
>
>   http://wiki.apache.org/incubator/January2014
>
> Note: This manually populated. You may need to wait a little before this page 
> is
>   created from a template.
>
> Mentors
> ---
> Mentors should review reports for their project(s) and sign them off on the
> Incubator wiki page. Signing off reports shows that you are following the
> project - projects that are not signed may raise alarms for the Incubator PMC.
>
> Incubator PMC
>


Re: Spark graduate project ideas

2013-12-31 Thread Reynold Xin
There is a recent discussion on academic projects on Spark.

Take a look at the replies to that email (unfortunately you have to dig
through the archive to find the replies):
http://mail-archives.apache.org/mod_mbox/spark-dev/201312.mbox/%3CCAHH8_ON-2y69fBfVtt6pngWtEPOZdsmvt4hZ=doe-dzsk6k...@mail.gmail.com%3E



On Wed, Dec 25, 2013 at 5:21 AM, Фёдор Короткий wrote:

> Hi,
>
> Currently I'm pursuing a masters degree in CS and I'm in search of my year
> project theme (in distributed systems field), and Spark seems very
> interesting to me.
>
> Can you suggest some problems or ideas to work on?
>
> By the way, what is the status of external sorting(
> https://spark-project.atlassian.net/browse/SPARK-983)?
>