date:20130906

Re: [Licensing check] Spark 0.8.0-incubating RC1

2013-09-06 Thread Mattmann, Chris A (398J)

Guys there are Github hooks set up by Jukka Zitting and others
in ASF infra that will monitor the ASF mirror on Github, and then
bring pull requests back to list as emails with links to the patches
for inclusion.

Please contact infra folks if there are questions in getting it set
up and it may require an INFRA ticket.

HTH!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Henry Saputra 
Reply-To: "dev@spark.incubator.apache.org" 
Date: Tuesday, September 3, 2013 6:18 PM
To: "dev@spark.incubator.apache.org" 
Subject: Re: [Licensing check] Spark 0.8.0-incubating RC1

>So looks like we need to manually resolve the Github pull requests.
>
>Or, does github automatically know that a particular merge to ASF git repo
>is associated to a GitHub pull request?
>
>- Henry
>
>
>On Tue, Sep 3, 2013 at 1:38 PM, Matei Zaharia
>wrote:
>
>> Yup, the plan is as follows:
>>
>> - Make pull request against the mirror
>> - Code review on GitHub as usual
>> - Whoever merges it will simply merge it into the main Apache repo; when
>> this propagates, the PR will be marked as merged
>>
>> I found at least one other Apache project that did this:
>> http://wiki.apache.org/cordova/ContributorWorkflow.
>>
>> Matei
>>
>> On Sep 3, 2013, at 10:39 AM, Mark Hamstra 
>>wrote:
>>
>> > What is going to be the process for making pull requests?  Can they be
>> made
>> > against the github mirror (https://github.com/apache/incubator-spark),
>> or
>> > must we use some other way?
>> >
>> >
>> > On Tue, Sep 3, 2013 at 10:28 AM, Matei Zaharia
>>> >wrote:
>> >
>> >> Hi guys,
>> >>
>> >>> So are you planning to release 0.8 from the master branch (which is
>>at
>> >>> a106ed8... now) or from branch-0.8?
>> >>
>> >> Right now the branches are the same in terms of content (though I
>>might
>> >> not have merged the latest changes into 0.8). If we add stuff into
>> master
>> >> that we won't want in 0.8 we'll break that.
>> >>
>> >>> My recommendation is that we start to use the Incubator release
>> >> doc/guide:
>> >>>
>> >>> http://incubator.apache.org/guides/releasemanagement.html
>> >>
>> >> Cool, thanks for the pointer. I'll try to follow the steps there
>>about
>> >> signing.
>> >>
>> >>> Are we "locking" pull requests to github repo by tomorrow?
>> >>> Meaning no more push to GitHub repo for Spark.
>> >>>
>> >>> From your email seems like there will be more potential pull
>>requests
>> for
>> >>> github repo to be merged back to ASF Git repo.
>> >>
>> >> We'll probably use the GitHub repo for the last few changes in this
>> >> release and then switch. The reason is that there's a bit of work to
>>do
>> >> pull requests against the Apache one.
>> >>
>> >> Matei
>>
>>

Re: Needs a matrix library

2013-09-06 Thread Mattmann, Chris A (398J)

BSD is compatible with ALv2 per:

http://www.apache.org/legal/3party.html#category-a


It's a Category A.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Adam Estrada 
Reply-To: "d...@sis.apache.org" 
Date: Friday, September 6, 2013 8:41 PM
To: "d...@sis.apache.org" ,
"shiva...@eecs.berkeley.edu" 
Cc: "dev@spark.incubator.apache.org" 
Subject: Re: Needs a matrix library

>+1 to jblas. It has a BSD license though so it might not be compatible
>with
>the Apache v2 license. Anyone else want to weigh in on that?
>
>Adam
>
>
>On Fri, Sep 6, 2013 at 8:26 PM, Shivaram Venkataraman <
>shiva...@eecs.berkeley.edu> wrote:
>
>> For the machine learning library that is a part of Spark 0.8 we have
>>been
>> using jblas for local matrix operations. From some limited benchmarking
>> that we did, jblas is not much slower than optimized C++ libraries.
>> 
>>http://blog.mikiobraun.de/2009/04/some-benchmark-numbers-for-jblas.htmlha
>>s
>> some more details.
>>
>> For more complex operations than addition and multiplication,
>>mahout-math
>> is a pretty good library. There was a great discussion on pros/cons of
>> different Java/Scala-based matrix libraries in
>> https://github.com/mesos/spark/pull/736
>>
>> Thanks
>> Shivaram
>>
>>
>> On Fri, Sep 6, 2013 at 5:09 PM, Reynold Xin 
>>wrote:
>>
>> > They are asking about dedicated matrix libraries.
>> >
>> > Neither GraphX nor Giraph are matrix libraries. These are systems that
>> > handle large scale graph processing, which could possibly be modeled
>>as
>> > matrix computations.  Hama looks like a BSP framework, so I am not
>>sure
>> if
>> > it has anything to do with matrix library either.
>> >
>> > For very small matrices (3x3, 4x4), the cost of going through jni to
>>do
>> > native matrix operations will likely dominate the computation itself,
>>so
>> > you are probably better off with a simple unrolled for loop in Java.
>> >
>> > I haven't looked into this myself, but I heard mahout-math is a decent
>> > library.
>> >
>> > --
>> > Reynold Xin, AMPLab, UC Berkeley
>> > http://rxin.org
>> >
>> >
>> >
>> > On Sat, Sep 7, 2013 at 6:13 AM, Dmitriy Lyubimov 
>> > wrote:
>> >
>> > > keep forgetting this: what is graphx release roadmap?
>> > >
>> > > On Fri, Sep 6, 2013 at 3:04 PM, Konstantin Boudnik 
>> > wrote:
>> > > > Would it be more logical to use GraphX ?
>> > > >   https://amplab.cs.berkeley.edu/publication/graphx-grades/
>> > > >
>> > > > Cos
>> > > >
>> > > > On Fri, Sep 06, 2013 at 09:13PM, Mattmann, Chris A (398J) wrote:
>> > > >> Thanks Roman, I was thinking Giraph too (knew it supported graphs
>> but
>> > > >> wasn't sure it supported matrices). If Giraph supports matrices,
>>big
>> > +1.
>> > > >>
>> > > >> Cheers,
>> > > >> Chris
>> > > >>
>> > > >> 
>>++
>> > > >> Chris Mattmann, Ph.D.
>> > > >> Senior Computer Scientist
>> > > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > > >> Office: 171-266B, Mailstop: 171-246
>> > > >> Email: chris.a.mattm...@nasa.gov
>> > > >> WWW:  http://sunset.usc.edu/~mattmann/
>> > > >> 
>>++
>> > > >> Adjunct Assistant Professor, Computer Science Department
>> > > >> University of Southern California, Los Angeles, CA 90089 USA
>> > > >> 
>>++
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> -Original Message-
>> > > >> From: Roman Shaposhnik 
>> > > >> Date: Friday, September 6, 2013 2:00 PM
>> > > >> To: 
>> > > >> Cc: "d...@sis.apache.org" 
>> > > >> Subject: Re: Needs a matrix library
>> > > >>
>> > > >> >On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
>> > > >> > wrote:
>> > > >> >> Hey Martin,
>> > > >> >>
>> > > >> >> We may seriously consider using either Apache Hama here (which
>> will
>> > > >> >> bring in Hadoop):
>> > > >> >
>> > > >> >On that note I'd highly recommend taking a look at Apache Giraph
>> > > >> >as well: http://giraph.apache.org/
>> > > >> >
>> > > >> >Thanks,
>> > > >> >Roman.
>> > > >> >
>> > > >>
>> > >
>> >
>>

Re: Needs a matrix library

2013-09-06 Thread Adam Estrada

+1 to jblas. It has a BSD license though so it might not be compatible with
the Apache v2 license. Anyone else want to weigh in on that?

Adam


On Fri, Sep 6, 2013 at 8:26 PM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:

> For the machine learning library that is a part of Spark 0.8 we have been
> using jblas for local matrix operations. From some limited benchmarking
> that we did, jblas is not much slower than optimized C++ libraries.
> http://blog.mikiobraun.de/2009/04/some-benchmark-numbers-for-jblas.htmlhas
> some more details.
>
> For more complex operations than addition and multiplication, mahout-math
> is a pretty good library. There was a great discussion on pros/cons of
> different Java/Scala-based matrix libraries in
> https://github.com/mesos/spark/pull/736
>
> Thanks
> Shivaram
>
>
> On Fri, Sep 6, 2013 at 5:09 PM, Reynold Xin  wrote:
>
> > They are asking about dedicated matrix libraries.
> >
> > Neither GraphX nor Giraph are matrix libraries. These are systems that
> > handle large scale graph processing, which could possibly be modeled as
> > matrix computations.  Hama looks like a BSP framework, so I am not sure
> if
> > it has anything to do with matrix library either.
> >
> > For very small matrices (3x3, 4x4), the cost of going through jni to do
> > native matrix operations will likely dominate the computation itself, so
> > you are probably better off with a simple unrolled for loop in Java.
> >
> > I haven't looked into this myself, but I heard mahout-math is a decent
> > library.
> >
> > --
> > Reynold Xin, AMPLab, UC Berkeley
> > http://rxin.org
> >
> >
> >
> > On Sat, Sep 7, 2013 at 6:13 AM, Dmitriy Lyubimov 
> > wrote:
> >
> > > keep forgetting this: what is graphx release roadmap?
> > >
> > > On Fri, Sep 6, 2013 at 3:04 PM, Konstantin Boudnik 
> > wrote:
> > > > Would it be more logical to use GraphX ?
> > > >   https://amplab.cs.berkeley.edu/publication/graphx-grades/
> > > >
> > > > Cos
> > > >
> > > > On Fri, Sep 06, 2013 at 09:13PM, Mattmann, Chris A (398J) wrote:
> > > >> Thanks Roman, I was thinking Giraph too (knew it supported graphs
> but
> > > >> wasn't sure it supported matrices). If Giraph supports matrices, big
> > +1.
> > > >>
> > > >> Cheers,
> > > >> Chris
> > > >>
> > > >> ++
> > > >> Chris Mattmann, Ph.D.
> > > >> Senior Computer Scientist
> > > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > >> Office: 171-266B, Mailstop: 171-246
> > > >> Email: chris.a.mattm...@nasa.gov
> > > >> WWW:  http://sunset.usc.edu/~mattmann/
> > > >> ++
> > > >> Adjunct Assistant Professor, Computer Science Department
> > > >> University of Southern California, Los Angeles, CA 90089 USA
> > > >> ++
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> -Original Message-
> > > >> From: Roman Shaposhnik 
> > > >> Date: Friday, September 6, 2013 2:00 PM
> > > >> To: 
> > > >> Cc: "d...@sis.apache.org" 
> > > >> Subject: Re: Needs a matrix library
> > > >>
> > > >> >On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
> > > >> > wrote:
> > > >> >> Hey Martin,
> > > >> >>
> > > >> >> We may seriously consider using either Apache Hama here (which
> will
> > > >> >> bring in Hadoop):
> > > >> >
> > > >> >On that note I'd highly recommend taking a look at Apache Giraph
> > > >> >as well: http://giraph.apache.org/
> > > >> >
> > > >> >Thanks,
> > > >> >Roman.
> > > >> >
> > > >>
> > >
> >
>

Re: Needs a matrix library

2013-09-06 Thread Adam Estrada

I agree with that sentiment, Dr. Mattmann! It would be extremely cool to
see the distributed computation communities (eg. Spark and Hadoop) take
advantage of SIS. This is especially true for processing geospatial vector
data. Geospatial raster data is very splittable which makes it ideal for
this type of batch processing. Vector data is another beast all together
and I encourage folks in the aforementioned communities to think about how
to do this. I certainly have ideas and am all ears if someone would like to
chat about it!

Regards,
Adam


On Fri, Sep 6, 2013 at 9:41 PM, Mattmann, Chris A (398J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Thanks guys, just sharing a need here. SIS is a fully Java based
> geospatial library in development at Apache, aiming to support OGC
> standards. It would be great to figure out some synergy between Spark/Shark
> and SIS..
>
> Cheers,
> Chris
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
>
> -Original Message-
> From: Dmitriy Lyubimov 
> Reply-To: "dev@spark.incubator.apache.org"  >
> Date: Friday, September 6, 2013 5:25 PM
> To: "dev@spark.incubator.apache.org" 
> Cc: "d...@sis.apache.org" 
> Subject: Re: Needs a matrix library
>
> >On Fri, Sep 6, 2013 at 5:09 PM, Reynold Xin  wrote:
> >> They are asking about dedicated matrix libraries.
> >
> >Ah. I did not read the quoted email. Not sure why Chris was talking
> >about Pregel stuff, that doesn't seem what that question was about.
> >
> >>
> >> Neither GraphX nor Giraph are matrix libraries. These are systems that
> >> handle large scale graph processing, which could possibly be modeled as
> >> matrix computations.  Hama looks like a BSP framework, so I am not sure
> >>if
> >> it has anything to do with matrix library either.
> >
> >+1
> >>
> >> For very small matrices (3x3, 4x4), the cost of going through jni to do
> >> native matrix operations will likely dominate the computation itself, so
> >> you are probably better off with a simple unrolled for loop in Java.
> >
> >+1 i guess this note is about JBlas and JBlas-based derivatives like
> >Breeze
> >
> >>
> >> I haven't looked into this myself, but I heard mahout-math is a decent
> >> library.
> >
> >+1 although for such tiny things like 3x3, 4x4  our cost-based
> >optimizations are probably not going to provide any noticeable bang.
> >Mahout in-core math is mostly for uniform cost-optimized support of
> >sparse vectors along with dense.
> >
> >Also, see if this makes sense, we are leaning towards commiting these
> >scala mappings in the current Mahout's trunk :[1]
> >
> >[1]
> >
> http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in
> >-core-linear.html
> >
> >-Dmitriy
> >
> >>
> >> --
> >> Reynold Xin, AMPLab, UC Berkeley
> >> http://rxin.org
> >>
> >>
> >>
> >> On Sat, Sep 7, 2013 at 6:13 AM, Dmitriy Lyubimov 
> >>wrote:
> >>
> >>> keep forgetting this: what is graphx release roadmap?
> >>>
> >>> On Fri, Sep 6, 2013 at 3:04 PM, Konstantin Boudnik 
> >>>wrote:
> >>> > Would it be more logical to use GraphX ?
> >>> >   https://amplab.cs.berkeley.edu/publication/graphx-grades/
> >>> >
> >>> > Cos
> >>> >
> >>> > On Fri, Sep 06, 2013 at 09:13PM, Mattmann, Chris A (398J) wrote:
> >>> >> Thanks Roman, I was thinking Giraph too (knew it supported graphs
> >>>but
> >>> >> wasn't sure it supported matrices). If Giraph supports matrices,
> >>>big +1.
> >>> >>
> >>> >> Cheers,
> >>> >> Chris
> >>> >>
> >>> >> ++
> >>> >> Chris Mattmann, Ph.D.
> >>> >> Senior Computer Scientist
> >>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> >> Office: 171-266B, Mailstop: 171-246
> >>> >> Email: chris.a.mattm...@nasa.gov
> >>> >> WWW:  http://sunset.usc.edu/~mattmann/
> >>> >> ++
> >>> >> Adjunct Assistant Professor, Computer Science Department
> >>> >> University of Southern California, Los Angeles, CA 90089 USA
> >>> >> ++
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >>
> >>> >> -Original Message-
> >>> >> From: Roman Shaposhnik 
> >>> >> Date: Friday, September 6, 2013 2:00 PM
> >>> >> To: 
> >>> >> Cc: "d...@sis.apache.org" 
> >>> >> Subject: Re: Needs a matrix library
> >>> >>
> >>> >> >On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
> >>> >> > wrote:
> >>> >> >> Hey Martin,
> >>> >> >>
> >>> >> >> We may seriously co

Re: Needs a matrix library

2013-09-06 Thread Mattmann, Chris A (398J)

Thanks guys, just sharing a need here. SIS is a fully Java based
geospatial library in development at Apache, aiming to support OGC
standards. It would be great to figure out some synergy between Spark/Shark
and SIS..

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Dmitriy Lyubimov 
Reply-To: "dev@spark.incubator.apache.org" 
Date: Friday, September 6, 2013 5:25 PM
To: "dev@spark.incubator.apache.org" 
Cc: "d...@sis.apache.org" 
Subject: Re: Needs a matrix library

>On Fri, Sep 6, 2013 at 5:09 PM, Reynold Xin  wrote:
>> They are asking about dedicated matrix libraries.
>
>Ah. I did not read the quoted email. Not sure why Chris was talking
>about Pregel stuff, that doesn't seem what that question was about.
>
>>
>> Neither GraphX nor Giraph are matrix libraries. These are systems that
>> handle large scale graph processing, which could possibly be modeled as
>> matrix computations.  Hama looks like a BSP framework, so I am not sure
>>if
>> it has anything to do with matrix library either.
>
>+1
>>
>> For very small matrices (3x3, 4x4), the cost of going through jni to do
>> native matrix operations will likely dominate the computation itself, so
>> you are probably better off with a simple unrolled for loop in Java.
>
>+1 i guess this note is about JBlas and JBlas-based derivatives like
>Breeze
>
>>
>> I haven't looked into this myself, but I heard mahout-math is a decent
>> library.
>
>+1 although for such tiny things like 3x3, 4x4  our cost-based
>optimizations are probably not going to provide any noticeable bang.
>Mahout in-core math is mostly for uniform cost-optimized support of
>sparse vectors along with dense.
>
>Also, see if this makes sense, we are leaning towards commiting these
>scala mappings in the current Mahout's trunk :[1]
>
>[1] 
>http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in
>-core-linear.html
>
>-Dmitriy
>
>>
>> --
>> Reynold Xin, AMPLab, UC Berkeley
>> http://rxin.org
>>
>>
>>
>> On Sat, Sep 7, 2013 at 6:13 AM, Dmitriy Lyubimov 
>>wrote:
>>
>>> keep forgetting this: what is graphx release roadmap?
>>>
>>> On Fri, Sep 6, 2013 at 3:04 PM, Konstantin Boudnik 
>>>wrote:
>>> > Would it be more logical to use GraphX ?
>>> >   https://amplab.cs.berkeley.edu/publication/graphx-grades/
>>> >
>>> > Cos
>>> >
>>> > On Fri, Sep 06, 2013 at 09:13PM, Mattmann, Chris A (398J) wrote:
>>> >> Thanks Roman, I was thinking Giraph too (knew it supported graphs
>>>but
>>> >> wasn't sure it supported matrices). If Giraph supports matrices,
>>>big +1.
>>> >>
>>> >> Cheers,
>>> >> Chris
>>> >>
>>> >> ++
>>> >> Chris Mattmann, Ph.D.
>>> >> Senior Computer Scientist
>>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> Office: 171-266B, Mailstop: 171-246
>>> >> Email: chris.a.mattm...@nasa.gov
>>> >> WWW:  http://sunset.usc.edu/~mattmann/
>>> >> ++
>>> >> Adjunct Assistant Professor, Computer Science Department
>>> >> University of Southern California, Los Angeles, CA 90089 USA
>>> >> ++
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> -Original Message-
>>> >> From: Roman Shaposhnik 
>>> >> Date: Friday, September 6, 2013 2:00 PM
>>> >> To: 
>>> >> Cc: "d...@sis.apache.org" 
>>> >> Subject: Re: Needs a matrix library
>>> >>
>>> >> >On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
>>> >> > wrote:
>>> >> >> Hey Martin,
>>> >> >>
>>> >> >> We may seriously consider using either Apache Hama here (which
>>>will
>>> >> >> bring in Hadoop):
>>> >> >
>>> >> >On that note I'd highly recommend taking a look at Apache Giraph
>>> >> >as well: http://giraph.apache.org/
>>> >> >
>>> >> >Thanks,
>>> >> >Roman.
>>> >> >
>>> >>
>>>

Re: Needs a matrix library

2013-09-06 Thread Mattmann, Chris A (398J)

Thank you Shivaram!

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Shivaram Venkataraman 
Reply-To: "dev@spark.incubator.apache.org"
, "shiva...@eecs.berkeley.edu"

Date: Friday, September 6, 2013 5:26 PM
To: "dev@spark.incubator.apache.org" 
Cc: "d...@sis.apache.org" 
Subject: Re: Needs a matrix library

>For the machine learning library that is a part of Spark 0.8 we have been
>using jblas for local matrix operations. From some limited benchmarking
>that we did, jblas is not much slower than optimized C++ libraries.
>http://blog.mikiobraun.de/2009/04/some-benchmark-numbers-for-jblas.html
>has
>some more details.
>
>For more complex operations than addition and multiplication, mahout-math
>is a pretty good library. There was a great discussion on pros/cons of
>different Java/Scala-based matrix libraries in
>https://github.com/mesos/spark/pull/736
>
>Thanks
>Shivaram
>
>
>On Fri, Sep 6, 2013 at 5:09 PM, Reynold Xin  wrote:
>
>> They are asking about dedicated matrix libraries.
>>
>> Neither GraphX nor Giraph are matrix libraries. These are systems that
>> handle large scale graph processing, which could possibly be modeled as
>> matrix computations.  Hama looks like a BSP framework, so I am not sure
>>if
>> it has anything to do with matrix library either.
>>
>> For very small matrices (3x3, 4x4), the cost of going through jni to do
>> native matrix operations will likely dominate the computation itself, so
>> you are probably better off with a simple unrolled for loop in Java.
>>
>> I haven't looked into this myself, but I heard mahout-math is a decent
>> library.
>>
>> --
>> Reynold Xin, AMPLab, UC Berkeley
>> http://rxin.org
>>
>>
>>
>> On Sat, Sep 7, 2013 at 6:13 AM, Dmitriy Lyubimov 
>> wrote:
>>
>> > keep forgetting this: what is graphx release roadmap?
>> >
>> > On Fri, Sep 6, 2013 at 3:04 PM, Konstantin Boudnik 
>> wrote:
>> > > Would it be more logical to use GraphX ?
>> > >   https://amplab.cs.berkeley.edu/publication/graphx-grades/
>> > >
>> > > Cos
>> > >
>> > > On Fri, Sep 06, 2013 at 09:13PM, Mattmann, Chris A (398J) wrote:
>> > >> Thanks Roman, I was thinking Giraph too (knew it supported graphs
>>but
>> > >> wasn't sure it supported matrices). If Giraph supports matrices,
>>big
>> +1.
>> > >>
>> > >> Cheers,
>> > >> Chris
>> > >>
>> > >> ++
>> > >> Chris Mattmann, Ph.D.
>> > >> Senior Computer Scientist
>> > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > >> Office: 171-266B, Mailstop: 171-246
>> > >> Email: chris.a.mattm...@nasa.gov
>> > >> WWW:  http://sunset.usc.edu/~mattmann/
>> > >> ++
>> > >> Adjunct Assistant Professor, Computer Science Department
>> > >> University of Southern California, Los Angeles, CA 90089 USA
>> > >> ++
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> -Original Message-
>> > >> From: Roman Shaposhnik 
>> > >> Date: Friday, September 6, 2013 2:00 PM
>> > >> To: 
>> > >> Cc: "d...@sis.apache.org" 
>> > >> Subject: Re: Needs a matrix library
>> > >>
>> > >> >On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
>> > >> > wrote:
>> > >> >> Hey Martin,
>> > >> >>
>> > >> >> We may seriously consider using either Apache Hama here (which
>>will
>> > >> >> bring in Hadoop):
>> > >> >
>> > >> >On that note I'd highly recommend taking a look at Apache Giraph
>> > >> >as well: http://giraph.apache.org/
>> > >> >
>> > >> >Thanks,
>> > >> >Roman.
>> > >> >
>> > >>
>> >
>>

Re: Needs a matrix library

2013-09-06 Thread Mike

Why do we have spark.util.Vector?  Should it be replaced by jblas?

Re: Spark 0.8.0-incubating RC2

2013-09-06 Thread Konstantin Boudnik

As I mentioned in my reply back to you, current ASF git repo is about 4 days
behind of github, so I beleive there's going to be another merge ;)

Cos

On Fri, Sep 06, 2013 at 03:55PM, Henry Saputra wrote:
> Hi Cos,
> 
> I just replied to your pull request to bump to 0.9-SNAPSHOT versioning =)
> 
> I think new development pulls should be made against ASF git repo. The
> repo is open for pull requests now.
> 
> - Henry
> 
> On Fri, Sep 6, 2013 at 3:03 PM, Konstantin Boudnik  wrote:
> > Guys,
> >
> > how about switching master to 0.9-SNAPSHOT to avoid confusion with two
> > branches producing same version of the different artifacts?
> >
> > https://github.com/mesos/spark/pull/902
> >
> > Cos
> >
> > On Thu, Sep 05, 2013 at 08:08PM, Patrick Wendell wrote:
> >> Hey All,
> >>
> >> Matei asked me to pick this up because he's travelling this week. I
> >> cut a second release candidate from the head of the 0.8 branch (on
> >> mesos/spark gitub) to address the following issues:
> >>
> >> - RC is now hosted in an apache web space
> >> - RC now includes signature
> >> - RC now includes MD5 and SHA512 digests
> >>
> >> [tgz] 
> >> http://people.apache.org/~pwendell/spark-rc/spark-0.8.0-src-incubating-RC2.tgz
> >> [all files] http://people.apache.org/~pwendell/spark-rc/
> >>
> >> It would be great to get feedback on the release structure. I also
> >> changed the name to include "src" since we will be releasing both
> >> source and binary releases.
> >>
> >> I was a bit confused about how to attach my GPG key to the spark.asc
> >> file. I took the following steps.
> >>
> >> 1. Greated a GPG key locally
> >> 2. Distributed the key to public key servers (gpg --send-key)
> >> 3. Add exported key to my apache web space:
> >> http://people.apache.org/~pwendell/9E4FE3AF.asc
> >> 4. Added the key fingerprint at id.apage.org
> >> 5. Create an apache FOAF file with the key signature
> >>
> >> However, this doesn't seem sufficient to get my key on this page (at
> >> least, not yet):
> >> http://people.apache.org/keys/group/spark.asc
> >>
> >> Chris - are there other steps I missed? Is there a manual way to
> >> augment this file?
> >>
> >> - Patrick

Re: Needs a matrix library

2013-09-06 Thread Shivaram Venkataraman

For the machine learning library that is a part of Spark 0.8 we have been
using jblas for local matrix operations. From some limited benchmarking
that we did, jblas is not much slower than optimized C++ libraries.
http://blog.mikiobraun.de/2009/04/some-benchmark-numbers-for-jblas.html has
some more details.

For more complex operations than addition and multiplication, mahout-math
is a pretty good library. There was a great discussion on pros/cons of
different Java/Scala-based matrix libraries in
https://github.com/mesos/spark/pull/736

Thanks
Shivaram


On Fri, Sep 6, 2013 at 5:09 PM, Reynold Xin  wrote:

> They are asking about dedicated matrix libraries.
>
> Neither GraphX nor Giraph are matrix libraries. These are systems that
> handle large scale graph processing, which could possibly be modeled as
> matrix computations.  Hama looks like a BSP framework, so I am not sure if
> it has anything to do with matrix library either.
>
> For very small matrices (3x3, 4x4), the cost of going through jni to do
> native matrix operations will likely dominate the computation itself, so
> you are probably better off with a simple unrolled for loop in Java.
>
> I haven't looked into this myself, but I heard mahout-math is a decent
> library.
>
> --
> Reynold Xin, AMPLab, UC Berkeley
> http://rxin.org
>
>
>
> On Sat, Sep 7, 2013 at 6:13 AM, Dmitriy Lyubimov 
> wrote:
>
> > keep forgetting this: what is graphx release roadmap?
> >
> > On Fri, Sep 6, 2013 at 3:04 PM, Konstantin Boudnik 
> wrote:
> > > Would it be more logical to use GraphX ?
> > >   https://amplab.cs.berkeley.edu/publication/graphx-grades/
> > >
> > > Cos
> > >
> > > On Fri, Sep 06, 2013 at 09:13PM, Mattmann, Chris A (398J) wrote:
> > >> Thanks Roman, I was thinking Giraph too (knew it supported graphs but
> > >> wasn't sure it supported matrices). If Giraph supports matrices, big
> +1.
> > >>
> > >> Cheers,
> > >> Chris
> > >>
> > >> ++
> > >> Chris Mattmann, Ph.D.
> > >> Senior Computer Scientist
> > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > >> Office: 171-266B, Mailstop: 171-246
> > >> Email: chris.a.mattm...@nasa.gov
> > >> WWW:  http://sunset.usc.edu/~mattmann/
> > >> ++
> > >> Adjunct Assistant Professor, Computer Science Department
> > >> University of Southern California, Los Angeles, CA 90089 USA
> > >> ++
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> -Original Message-
> > >> From: Roman Shaposhnik 
> > >> Date: Friday, September 6, 2013 2:00 PM
> > >> To: 
> > >> Cc: "d...@sis.apache.org" 
> > >> Subject: Re: Needs a matrix library
> > >>
> > >> >On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
> > >> > wrote:
> > >> >> Hey Martin,
> > >> >>
> > >> >> We may seriously consider using either Apache Hama here (which will
> > >> >> bring in Hadoop):
> > >> >
> > >> >On that note I'd highly recommend taking a look at Apache Giraph
> > >> >as well: http://giraph.apache.org/
> > >> >
> > >> >Thanks,
> > >> >Roman.
> > >> >
> > >>
> >
>

Re: Needs a matrix library

2013-09-06 Thread Dmitriy Lyubimov

On Fri, Sep 6, 2013 at 5:09 PM, Reynold Xin  wrote:
> They are asking about dedicated matrix libraries.

Ah. I did not read the quoted email. Not sure why Chris was talking
about Pregel stuff, that doesn't seem what that question was about.

>
> Neither GraphX nor Giraph are matrix libraries. These are systems that
> handle large scale graph processing, which could possibly be modeled as
> matrix computations.  Hama looks like a BSP framework, so I am not sure if
> it has anything to do with matrix library either.

+1
>
> For very small matrices (3x3, 4x4), the cost of going through jni to do
> native matrix operations will likely dominate the computation itself, so
> you are probably better off with a simple unrolled for loop in Java.

+1 i guess this note is about JBlas and JBlas-based derivatives like Breeze

>
> I haven't looked into this myself, but I heard mahout-math is a decent
> library.

+1 although for such tiny things like 3x3, 4x4  our cost-based
optimizations are probably not going to provide any noticeable bang.
Mahout in-core math is mostly for uniform cost-optimized support of
sparse vectors along with dense.

Also, see if this makes sense, we are leaning towards commiting these
scala mappings in the current Mahout's trunk :[1]

[1] 
http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html

-Dmitriy

>
> --
> Reynold Xin, AMPLab, UC Berkeley
> http://rxin.org
>
>
>
> On Sat, Sep 7, 2013 at 6:13 AM, Dmitriy Lyubimov  wrote:
>
>> keep forgetting this: what is graphx release roadmap?
>>
>> On Fri, Sep 6, 2013 at 3:04 PM, Konstantin Boudnik  wrote:
>> > Would it be more logical to use GraphX ?
>> >   https://amplab.cs.berkeley.edu/publication/graphx-grades/
>> >
>> > Cos
>> >
>> > On Fri, Sep 06, 2013 at 09:13PM, Mattmann, Chris A (398J) wrote:
>> >> Thanks Roman, I was thinking Giraph too (knew it supported graphs but
>> >> wasn't sure it supported matrices). If Giraph supports matrices, big +1.
>> >>
>> >> Cheers,
>> >> Chris
>> >>
>> >> ++
>> >> Chris Mattmann, Ph.D.
>> >> Senior Computer Scientist
>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> Office: 171-266B, Mailstop: 171-246
>> >> Email: chris.a.mattm...@nasa.gov
>> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> ++
>> >> Adjunct Assistant Professor, Computer Science Department
>> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> ++
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> -Original Message-
>> >> From: Roman Shaposhnik 
>> >> Date: Friday, September 6, 2013 2:00 PM
>> >> To: 
>> >> Cc: "d...@sis.apache.org" 
>> >> Subject: Re: Needs a matrix library
>> >>
>> >> >On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
>> >> > wrote:
>> >> >> Hey Martin,
>> >> >>
>> >> >> We may seriously consider using either Apache Hama here (which will
>> >> >> bring in Hadoop):
>> >> >
>> >> >On that note I'd highly recommend taking a look at Apache Giraph
>> >> >as well: http://giraph.apache.org/
>> >> >
>> >> >Thanks,
>> >> >Roman.
>> >> >
>> >>
>>

Re: Needs a matrix library

2013-09-06 Thread Reynold Xin

They are asking about dedicated matrix libraries.

Neither GraphX nor Giraph are matrix libraries. These are systems that
handle large scale graph processing, which could possibly be modeled as
matrix computations.  Hama looks like a BSP framework, so I am not sure if
it has anything to do with matrix library either.

For very small matrices (3x3, 4x4), the cost of going through jni to do
native matrix operations will likely dominate the computation itself, so
you are probably better off with a simple unrolled for loop in Java.

I haven't looked into this myself, but I heard mahout-math is a decent
library.

--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org

On Sat, Sep 7, 2013 at 6:13 AM, Dmitriy Lyubimov  wrote:

> keep forgetting this: what is graphx release roadmap?
>
> On Fri, Sep 6, 2013 at 3:04 PM, Konstantin Boudnik  wrote:
> > Would it be more logical to use GraphX ?
> >   https://amplab.cs.berkeley.edu/publication/graphx-grades/
> >
> > Cos
> >
> > On Fri, Sep 06, 2013 at 09:13PM, Mattmann, Chris A (398J) wrote:
> >> Thanks Roman, I was thinking Giraph too (knew it supported graphs but
> >> wasn't sure it supported matrices). If Giraph supports matrices, big +1.
> >>
> >> Cheers,
> >> Chris
> >>
> >> ++
> >> Chris Mattmann, Ph.D.
> >> Senior Computer Scientist
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 171-266B, Mailstop: 171-246
> >> Email: chris.a.mattm...@nasa.gov
> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> ++
> >> Adjunct Assistant Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++
> >>
> >>
> >>
> >>
> >>
> >>
> >> -Original Message-
> >> From: Roman Shaposhnik 
> >> Date: Friday, September 6, 2013 2:00 PM
> >> To: 
> >> Cc: "d...@sis.apache.org" 
> >> Subject: Re: Needs a matrix library
> >>
> >> >On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
> >> > wrote:
> >> >> Hey Martin,
> >> >>
> >> >> We may seriously consider using either Apache Hama here (which will
> >> >> bring in Hadoop):
> >> >
> >> >On that note I'd highly recommend taking a look at Apache Giraph
> >> >as well: http://giraph.apache.org/
> >> >
> >> >Thanks,
> >> >Roman.
> >> >
> >>
>

Re: Fair scheduler documentation

2013-09-06 Thread Matei Zaharia

Yup, expect to see a pull request soon.

Matei

On Sep 6, 2013, at 6:19 PM, Patrick Wendell  wrote:

> Matei mentioned to me that he was going to write docs for this. Matei,
> is that still your intention?
> 
> - Patrick
> 
> On Fri, Sep 6, 2013 at 2:49 PM, Evan Chan  wrote:
>> Are we ready to document the fair scheduler?This section on the
>> standalone docs seems out of date
>> 
>> # Job Scheduling
>> 
>> The standalone cluster mode currently only supports a simple FIFO scheduler
>> across jobs.
>> However, to allow multiple concurrent jobs, you can control the maximum
>> number of resources each Spark job will acquire.
>> By default, it will acquire *all* the cores in the cluster, which only
>> makes sense if you run just a single
>> job at a time. You can cap the number of cores using
>> `System.setProperty("spark.cores.max", "10")` (for example).
>> This value must be set *before* initializing your SparkContext.
>> 
>> 
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> e...@ooyala.com  |
>> 
>> 
>>

Re: Spark 0.8.0-incubating RC2

2013-09-06 Thread Patrick Wendell

Thanks Chris - also it appears that my key has now been added to this file:

http://people.apache.org/keys/group/spark.asc

- Patrick

On Fri, Sep 6, 2013 at 1:57 PM, Mattmann, Chris A (398J)
 wrote:
> Feedback coming, sorry been swamped and only recently back from DC/DARPA
> but will reply soon (hopefully tonight).
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
>
> -Original Message-
> From: Patrick Wendell 
> Date: Friday, September 6, 2013 1:56 PM
> To: "dev@spark.incubator.apache.org" 
> Cc: jpluser , Henry Saputra
> 
> Subject: Re: Spark 0.8.0-incubating RC2
>
>>Hey Chris, Henry... do you guys have feedback here? This was based
>>largely on your feedback in the last "round" :)
>>
>>On Thu, Sep 5, 2013 at 9:58 PM, Patrick Wendell 
>>wrote:
>>> Hey Evan,
>>>
>>> These are posted primarily for the purpose of having the Apache
>>> mentors look at the bundling format, they are not likely to be the
>>> exact commit we release. Matei will be merging in some doc stuff
>>> before the release, I'm pretty sure that includes your docs.
>>>
>>> - Patrick
>>>
>>> On Thu, Sep 5, 2013 at 9:25 PM, Evan Chan  wrote:
 Patrick,

 I'm planning to submit documentation PR's against mesos/spark, by
tomorrow,
 is that OK?We really should update the docs.

 thanks,
 Evan



 On Thu, Sep 5, 2013 at 9:20 PM, Patrick Wendell 
wrote:

> No these are posted primarily for the purpose of having the Apache
> mentors look at the bundling format, they are not likely to be the
> exact commit we release (though this RC was
> fc6fbfe7d7e9171572c898d9e90301117517e60e).
>
> On Thu, Sep 5, 2013 at 9:14 PM, Mark Hamstra 
> wrote:
> > Are these RCs not getting tagged in the repository, or am I just not
> > looking in the right place?
> >
> >
> >
> > On Thu, Sep 5, 2013 at 8:08 PM, Patrick Wendell 
> wrote:
> >
> >> Hey All,
> >>
> >> Matei asked me to pick this up because he's travelling this week. I
> >> cut a second release candidate from the head of the 0.8 branch (on
> >> mesos/spark gitub) to address the following issues:
> >>
> >> - RC is now hosted in an apache web space
> >> - RC now includes signature
> >> - RC now includes MD5 and SHA512 digests
> >>
> >> [tgz]
> >>
>
>http://people.apache.org/~pwendell/spark-rc/spark-0.8.0-src-incubating-
>RC2.tgz
> >> [all files] http://people.apache.org/~pwendell/spark-rc/
> >>
> >> It would be great to get feedback on the release structure. I also
> >> changed the name to include "src" since we will be releasing both
> >> source and binary releases.
> >>
> >> I was a bit confused about how to attach my GPG key to the
>spark.asc
> >> file. I took the following steps.
> >>
> >> 1. Greated a GPG key locally
> >> 2. Distributed the key to public key servers (gpg --send-key)
> >> 3. Add exported key to my apache web space:
> >> http://people.apache.org/~pwendell/9E4FE3AF.asc
> >> 4. Added the key fingerprint at id.apage.org
> >> 5. Create an apache FOAF file with the key signature
> >>
> >> However, this doesn't seem sufficient to get my key on this page
>(at
> >> least, not yet):
> >> http://people.apache.org/keys/group/spark.asc
> >>
> >> Chris - are there other steps I missed? Is there a manual way to
> >> augment this file?
> >>
> >> - Patrick
> >>
>



 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com  |

 



>

Re: Spark 0.8.0-incubating RC2

2013-09-06 Thread Henry Saputra

Hi Cos,

I just replied to your pull request to bump to 0.9-SNAPSHOT versioning =)

I think new development pulls should be made against ASF git repo. The
repo is open for pull requests now.

- Henry

On Fri, Sep 6, 2013 at 3:03 PM, Konstantin Boudnik  wrote:
> Guys,
>
> how about switching master to 0.9-SNAPSHOT to avoid confusion with two
> branches producing same version of the different artifacts?
>
> https://github.com/mesos/spark/pull/902
>
> Cos
>
> On Thu, Sep 05, 2013 at 08:08PM, Patrick Wendell wrote:
>> Hey All,
>>
>> Matei asked me to pick this up because he's travelling this week. I
>> cut a second release candidate from the head of the 0.8 branch (on
>> mesos/spark gitub) to address the following issues:
>>
>> - RC is now hosted in an apache web space
>> - RC now includes signature
>> - RC now includes MD5 and SHA512 digests
>>
>> [tgz] 
>> http://people.apache.org/~pwendell/spark-rc/spark-0.8.0-src-incubating-RC2.tgz
>> [all files] http://people.apache.org/~pwendell/spark-rc/
>>
>> It would be great to get feedback on the release structure. I also
>> changed the name to include "src" since we will be releasing both
>> source and binary releases.
>>
>> I was a bit confused about how to attach my GPG key to the spark.asc
>> file. I took the following steps.
>>
>> 1. Greated a GPG key locally
>> 2. Distributed the key to public key servers (gpg --send-key)
>> 3. Add exported key to my apache web space:
>> http://people.apache.org/~pwendell/9E4FE3AF.asc
>> 4. Added the key fingerprint at id.apage.org
>> 5. Create an apache FOAF file with the key signature
>>
>> However, this doesn't seem sufficient to get my key on this page (at
>> least, not yet):
>> http://people.apache.org/keys/group/spark.asc
>>
>> Chris - are there other steps I missed? Is there a manual way to
>> augment this file?
>>
>> - Patrick

Re: Spark 0.8.0-incubating RC2

2013-09-06 Thread Patrick Wendell

Hey Chris, Henry... do you guys have feedback here? This was based
largely on your feedback in the last "round" :)

On Thu, Sep 5, 2013 at 9:58 PM, Patrick Wendell  wrote:
> Hey Evan,
>
> These are posted primarily for the purpose of having the Apache
> mentors look at the bundling format, they are not likely to be the
> exact commit we release. Matei will be merging in some doc stuff
> before the release, I'm pretty sure that includes your docs.
>
> - Patrick
>
> On Thu, Sep 5, 2013 at 9:25 PM, Evan Chan  wrote:
>> Patrick,
>>
>> I'm planning to submit documentation PR's against mesos/spark, by tomorrow,
>> is that OK?We really should update the docs.
>>
>> thanks,
>> Evan
>>
>>
>>
>> On Thu, Sep 5, 2013 at 9:20 PM, Patrick Wendell  wrote:
>>
>>> No these are posted primarily for the purpose of having the Apache
>>> mentors look at the bundling format, they are not likely to be the
>>> exact commit we release (though this RC was
>>> fc6fbfe7d7e9171572c898d9e90301117517e60e).
>>>
>>> On Thu, Sep 5, 2013 at 9:14 PM, Mark Hamstra 
>>> wrote:
>>> > Are these RCs not getting tagged in the repository, or am I just not
>>> > looking in the right place?
>>> >
>>> >
>>> >
>>> > On Thu, Sep 5, 2013 at 8:08 PM, Patrick Wendell 
>>> wrote:
>>> >
>>> >> Hey All,
>>> >>
>>> >> Matei asked me to pick this up because he's travelling this week. I
>>> >> cut a second release candidate from the head of the 0.8 branch (on
>>> >> mesos/spark gitub) to address the following issues:
>>> >>
>>> >> - RC is now hosted in an apache web space
>>> >> - RC now includes signature
>>> >> - RC now includes MD5 and SHA512 digests
>>> >>
>>> >> [tgz]
>>> >>
>>> http://people.apache.org/~pwendell/spark-rc/spark-0.8.0-src-incubating-RC2.tgz
>>> >> [all files] http://people.apache.org/~pwendell/spark-rc/
>>> >>
>>> >> It would be great to get feedback on the release structure. I also
>>> >> changed the name to include "src" since we will be releasing both
>>> >> source and binary releases.
>>> >>
>>> >> I was a bit confused about how to attach my GPG key to the spark.asc
>>> >> file. I took the following steps.
>>> >>
>>> >> 1. Greated a GPG key locally
>>> >> 2. Distributed the key to public key servers (gpg --send-key)
>>> >> 3. Add exported key to my apache web space:
>>> >> http://people.apache.org/~pwendell/9E4FE3AF.asc
>>> >> 4. Added the key fingerprint at id.apage.org
>>> >> 5. Create an apache FOAF file with the key signature
>>> >>
>>> >> However, this doesn't seem sufficient to get my key on this page (at
>>> >> least, not yet):
>>> >> http://people.apache.org/keys/group/spark.asc
>>> >>
>>> >> Chris - are there other steps I missed? Is there a manual way to
>>> >> augment this file?
>>> >>
>>> >> - Patrick
>>> >>
>>>
>>
>>
>>
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> e...@ooyala.com  |
>>
>> 
>>

Re: Spark 0.8.0-incubating RC2

2013-09-06 Thread Mattmann, Chris A (398J)

Feedback coming, sorry been swamped and only recently back from DC/DARPA
but will reply soon (hopefully tonight).

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Patrick Wendell 
Date: Friday, September 6, 2013 1:56 PM
To: "dev@spark.incubator.apache.org" 
Cc: jpluser , Henry Saputra

Subject: Re: Spark 0.8.0-incubating RC2

>Hey Chris, Henry... do you guys have feedback here? This was based
>largely on your feedback in the last "round" :)
>
>On Thu, Sep 5, 2013 at 9:58 PM, Patrick Wendell 
>wrote:
>> Hey Evan,
>>
>> These are posted primarily for the purpose of having the Apache
>> mentors look at the bundling format, they are not likely to be the
>> exact commit we release. Matei will be merging in some doc stuff
>> before the release, I'm pretty sure that includes your docs.
>>
>> - Patrick
>>
>> On Thu, Sep 5, 2013 at 9:25 PM, Evan Chan  wrote:
>>> Patrick,
>>>
>>> I'm planning to submit documentation PR's against mesos/spark, by
>>>tomorrow,
>>> is that OK?We really should update the docs.
>>>
>>> thanks,
>>> Evan
>>>
>>>
>>>
>>> On Thu, Sep 5, 2013 at 9:20 PM, Patrick Wendell 
>>>wrote:
>>>
 No these are posted primarily for the purpose of having the Apache
 mentors look at the bundling format, they are not likely to be the
 exact commit we release (though this RC was
 fc6fbfe7d7e9171572c898d9e90301117517e60e).

 On Thu, Sep 5, 2013 at 9:14 PM, Mark Hamstra 
 wrote:
 > Are these RCs not getting tagged in the repository, or am I just not
 > looking in the right place?
 >
 >
 >
 > On Thu, Sep 5, 2013 at 8:08 PM, Patrick Wendell 
 wrote:
 >
 >> Hey All,
 >>
 >> Matei asked me to pick this up because he's travelling this week. I
 >> cut a second release candidate from the head of the 0.8 branch (on
 >> mesos/spark gitub) to address the following issues:
 >>
 >> - RC is now hosted in an apache web space
 >> - RC now includes signature
 >> - RC now includes MD5 and SHA512 digests
 >>
 >> [tgz]
 >>
 
http://people.apache.org/~pwendell/spark-rc/spark-0.8.0-src-incubating-
RC2.tgz
 >> [all files] http://people.apache.org/~pwendell/spark-rc/
 >>
 >> It would be great to get feedback on the release structure. I also
 >> changed the name to include "src" since we will be releasing both
 >> source and binary releases.
 >>
 >> I was a bit confused about how to attach my GPG key to the
spark.asc
 >> file. I took the following steps.
 >>
 >> 1. Greated a GPG key locally
 >> 2. Distributed the key to public key servers (gpg --send-key)
 >> 3. Add exported key to my apache web space:
 >> http://people.apache.org/~pwendell/9E4FE3AF.asc
 >> 4. Added the key fingerprint at id.apage.org
 >> 5. Create an apache FOAF file with the key signature
 >>
 >> However, this doesn't seem sufficient to get my key on this page
(at
 >> least, not yet):
 >> http://people.apache.org/keys/group/spark.asc
 >>
 >> Chris - are there other steps I missed? Is there a manual way to
 >> augment this file?
 >>
 >> - Patrick
 >>

>>>
>>>
>>>
>>> --
>>> --
>>> Evan Chan
>>> Staff Engineer
>>> e...@ooyala.com  |
>>>
>>> 
>>> 
>>>
>>>

Re: Needs a matrix library

2013-09-06 Thread Dmitriy Lyubimov

keep forgetting this: what is graphx release roadmap?

On Fri, Sep 6, 2013 at 3:04 PM, Konstantin Boudnik  wrote:
> Would it be more logical to use GraphX ?
>   https://amplab.cs.berkeley.edu/publication/graphx-grades/
>
> Cos
>
> On Fri, Sep 06, 2013 at 09:13PM, Mattmann, Chris A (398J) wrote:
>> Thanks Roman, I was thinking Giraph too (knew it supported graphs but
>> wasn't sure it supported matrices). If Giraph supports matrices, big +1.
>>
>> Cheers,
>> Chris
>>
>> ++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>>
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Roman Shaposhnik 
>> Date: Friday, September 6, 2013 2:00 PM
>> To: 
>> Cc: "d...@sis.apache.org" 
>> Subject: Re: Needs a matrix library
>>
>> >On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
>> > wrote:
>> >> Hey Martin,
>> >>
>> >> We may seriously consider using either Apache Hama here (which will
>> >> bring in Hadoop):
>> >
>> >On that note I'd highly recommend taking a look at Apache Giraph
>> >as well: http://giraph.apache.org/
>> >
>> >Thanks,
>> >Roman.
>> >
>>

Re: Needs a matrix library

2013-09-06 Thread Konstantin Boudnik

Would it be more logical to use GraphX ?
  https://amplab.cs.berkeley.edu/publication/graphx-grades/

Cos

On Fri, Sep 06, 2013 at 09:13PM, Mattmann, Chris A (398J) wrote:
> Thanks Roman, I was thinking Giraph too (knew it supported graphs but
> wasn't sure it supported matrices). If Giraph supports matrices, big +1.
> 
> Cheers,
> Chris
> 
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: Roman Shaposhnik 
> Date: Friday, September 6, 2013 2:00 PM
> To: 
> Cc: "d...@sis.apache.org" 
> Subject: Re: Needs a matrix library
> 
> >On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
> > wrote:
> >> Hey Martin,
> >>
> >> We may seriously consider using either Apache Hama here (which will
> >> bring in Hadoop):
> >
> >On that note I'd highly recommend taking a look at Apache Giraph
> >as well: http://giraph.apache.org/
> >
> >Thanks,
> >Roman.
> >
>

Re: Spark 0.8.0-incubating RC2

2013-09-06 Thread Konstantin Boudnik

Guys,

how about switching master to 0.9-SNAPSHOT to avoid confusion with two
branches producing same version of the different artifacts?

https://github.com/mesos/spark/pull/902

Cos

On Thu, Sep 05, 2013 at 08:08PM, Patrick Wendell wrote:
> Hey All,
> 
> Matei asked me to pick this up because he's travelling this week. I
> cut a second release candidate from the head of the 0.8 branch (on
> mesos/spark gitub) to address the following issues:
> 
> - RC is now hosted in an apache web space
> - RC now includes signature
> - RC now includes MD5 and SHA512 digests
> 
> [tgz] 
> http://people.apache.org/~pwendell/spark-rc/spark-0.8.0-src-incubating-RC2.tgz
> [all files] http://people.apache.org/~pwendell/spark-rc/
> 
> It would be great to get feedback on the release structure. I also
> changed the name to include "src" since we will be releasing both
> source and binary releases.
> 
> I was a bit confused about how to attach my GPG key to the spark.asc
> file. I took the following steps.
> 
> 1. Greated a GPG key locally
> 2. Distributed the key to public key servers (gpg --send-key)
> 3. Add exported key to my apache web space:
> http://people.apache.org/~pwendell/9E4FE3AF.asc
> 4. Added the key fingerprint at id.apage.org
> 5. Create an apache FOAF file with the key signature
> 
> However, this doesn't seem sufficient to get my key on this page (at
> least, not yet):
> http://people.apache.org/keys/group/spark.asc
> 
> Chris - are there other steps I missed? Is there a manual way to
> augment this file?
> 
> - Patrick

RDD placement

2013-09-06 Thread hilfi alkaff

Hi,

>From my understanding, RDDs are placed on the same machine that computed
the transformation. My question is, if the available bandwidth between
machines are highly varying, will this highly impact the performance?

-- 
~Hilfi Alkaff~

Re: Fair scheduler documentation

2013-09-06 Thread Patrick Wendell

Matei mentioned to me that he was going to write docs for this. Matei,
is that still your intention?

- Patrick

On Fri, Sep 6, 2013 at 2:49 PM, Evan Chan  wrote:
> Are we ready to document the fair scheduler?This section on the
> standalone docs seems out of date
>
> # Job Scheduling
>
> The standalone cluster mode currently only supports a simple FIFO scheduler
> across jobs.
> However, to allow multiple concurrent jobs, you can control the maximum
> number of resources each Spark job will acquire.
> By default, it will acquire *all* the cores in the cluster, which only
> makes sense if you run just a single
> job at a time. You can cap the number of cores using
> `System.setProperty("spark.cores.max", "10")` (for example).
> This value must be set *before* initializing your SparkContext.
>
>
> --
> --
> Evan Chan
> Staff Engineer
> e...@ooyala.com  |
>
> 
>

Fair scheduler documentation

2013-09-06 Thread Evan Chan

Are we ready to document the fair scheduler?This section on the
standalone docs seems out of date

# Job Scheduling

The standalone cluster mode currently only supports a simple FIFO scheduler
across jobs.
However, to allow multiple concurrent jobs, you can control the maximum
number of resources each Spark job will acquire.
By default, it will acquire *all* the cores in the cluster, which only
makes sense if you run just a single
job at a time. You can cap the number of cores using
`System.setProperty("spark.cores.max", "10")` (for example).
This value must be set *before* initializing your SparkContext.


-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |

Re: Needs a matrix library

2013-09-06 Thread Mattmann, Chris A (398J)

Thanks Roman, I was thinking Giraph too (knew it supported graphs but
wasn't sure it supported matrices). If Giraph supports matrices, big +1.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Roman Shaposhnik 
Date: Friday, September 6, 2013 2:00 PM
To: 
Cc: "d...@sis.apache.org" 
Subject: Re: Needs a matrix library

>On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
> wrote:
>> Hey Martin,
>>
>> We may seriously consider using either Apache Hama here (which will
>> bring in Hadoop):
>
>On that note I'd highly recommend taking a look at Apache Giraph
>as well: http://giraph.apache.org/
>
>Thanks,
>Roman.
>

Re: Spark 0.8.0-incubating RC2

2013-09-06 Thread Mattmann, Chris A (398J)

Awesome was going to tell you it might take a sec to sync. Woot.

OK more tonight..

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Patrick Wendell 
Date: Friday, September 6, 2013 2:14 PM
To: jpluser 
Cc: "dev@spark.incubator.apache.org" ,
Henry Saputra 
Subject: Re: Spark 0.8.0-incubating RC2

>Thanks Chris - also it appears that my key has now been added to this
>file:
>
>http://people.apache.org/keys/group/spark.asc
>
>- Patrick
>
>On Fri, Sep 6, 2013 at 1:57 PM, Mattmann, Chris A (398J)
> wrote:
>> Feedback coming, sorry been swamped and only recently back from DC/DARPA
>> but will reply soon (hopefully tonight).
>>
>> ++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++
>>
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Patrick Wendell 
>> Date: Friday, September 6, 2013 1:56 PM
>> To: "dev@spark.incubator.apache.org" 
>> Cc: jpluser , Henry Saputra
>> 
>> Subject: Re: Spark 0.8.0-incubating RC2
>>
>>>Hey Chris, Henry... do you guys have feedback here? This was based
>>>largely on your feedback in the last "round" :)
>>>
>>>On Thu, Sep 5, 2013 at 9:58 PM, Patrick Wendell 
>>>wrote:
 Hey Evan,

 These are posted primarily for the purpose of having the Apache
 mentors look at the bundling format, they are not likely to be the
 exact commit we release. Matei will be merging in some doc stuff
 before the release, I'm pretty sure that includes your docs.

 - Patrick

 On Thu, Sep 5, 2013 at 9:25 PM, Evan Chan  wrote:
> Patrick,
>
> I'm planning to submit documentation PR's against mesos/spark, by
>tomorrow,
> is that OK?We really should update the docs.
>
> thanks,
> Evan
>
>
>
> On Thu, Sep 5, 2013 at 9:20 PM, Patrick Wendell 
>wrote:
>
>> No these are posted primarily for the purpose of having the Apache
>> mentors look at the bundling format, they are not likely to be the
>> exact commit we release (though this RC was
>> fc6fbfe7d7e9171572c898d9e90301117517e60e).
>>
>> On Thu, Sep 5, 2013 at 9:14 PM, Mark Hamstra
>>
>> wrote:
>> > Are these RCs not getting tagged in the repository, or am I just
>>not
>> > looking in the right place?
>> >
>> >
>> >
>> > On Thu, Sep 5, 2013 at 8:08 PM, Patrick Wendell
>>
>> wrote:
>> >
>> >> Hey All,
>> >>
>> >> Matei asked me to pick this up because he's travelling this
>>week. I
>> >> cut a second release candidate from the head of the 0.8 branch
>>(on
>> >> mesos/spark gitub) to address the following issues:
>> >>
>> >> - RC is now hosted in an apache web space
>> >> - RC now includes signature
>> >> - RC now includes MD5 and SHA512 digests
>> >>
>> >> [tgz]
>> >>
>>
>>http://people.apache.org/~pwendell/spark-rc/spark-0.8.0-src-incubatin
>>g-
>>RC2.tgz
>> >> [all files] http://people.apache.org/~pwendell/spark-rc/
>> >>
>> >> It would be great to get feedback on the release structure. I
>>also
>> >> changed the name to include "src" since we will be releasing both
>> >> source and binary releases.
>> >>
>> >> I was a bit confused about how to attach my GPG key to the
>>spark.asc
>> >> file. I took the following steps.
>> >>
>> >> 1. Greated a GPG key locally
>> >> 2. Distributed the key to public key servers (gpg --send-key)
>> >> 3. Add exported key to my apache web space:
>> >> http://people.apache.org/~pwendell/9E4FE3AF.asc
>> >> 4. Added the key fingerprint at id.apage.org
>> >> 5. Create an apache FOAF file with the key signature
>> >>
>> >> However, this doesn't seem sufficient to get my key on this page
>>(at
>> >> least, not yet):
>> >> http://people.apache.org/keys/group/spark.asc
>> >>
>> >> Chris - are there other steps I missed? Is there a manual way to
>> >> augment this file?
>> >>
>> >> - Patric

Re: Spark 0.8.0-incubating RC2

2013-09-06 Thread Henry Saputra

+1 to Chris' comment =)

On Fri, Sep 6, 2013 at 2:15 PM, Mattmann, Chris A (398J)
 wrote:
> Awesome was going to tell you it might take a sec to sync. Woot.
>
> OK more tonight..
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
>
>
> -Original Message-
> From: Patrick Wendell 
> Date: Friday, September 6, 2013 2:14 PM
> To: jpluser 
> Cc: "dev@spark.incubator.apache.org" ,
> Henry Saputra 
> Subject: Re: Spark 0.8.0-incubating RC2
>
>>Thanks Chris - also it appears that my key has now been added to this
>>file:
>>
>>http://people.apache.org/keys/group/spark.asc
>>
>>- Patrick
>>
>>On Fri, Sep 6, 2013 at 1:57 PM, Mattmann, Chris A (398J)
>> wrote:
>>> Feedback coming, sorry been swamped and only recently back from DC/DARPA
>>> but will reply soon (hopefully tonight).
>>>
>>> ++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattm...@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++
>>>
>>>
>>>
>>>
>>>
>>>
>>> -Original Message-
>>> From: Patrick Wendell 
>>> Date: Friday, September 6, 2013 1:56 PM
>>> To: "dev@spark.incubator.apache.org" 
>>> Cc: jpluser , Henry Saputra
>>> 
>>> Subject: Re: Spark 0.8.0-incubating RC2
>>>
Hey Chris, Henry... do you guys have feedback here? This was based
largely on your feedback in the last "round" :)

On Thu, Sep 5, 2013 at 9:58 PM, Patrick Wendell 
wrote:
> Hey Evan,
>
> These are posted primarily for the purpose of having the Apache
> mentors look at the bundling format, they are not likely to be the
> exact commit we release. Matei will be merging in some doc stuff
> before the release, I'm pretty sure that includes your docs.
>
> - Patrick
>
> On Thu, Sep 5, 2013 at 9:25 PM, Evan Chan  wrote:
>> Patrick,
>>
>> I'm planning to submit documentation PR's against mesos/spark, by
>>tomorrow,
>> is that OK?We really should update the docs.
>>
>> thanks,
>> Evan
>>
>>
>>
>> On Thu, Sep 5, 2013 at 9:20 PM, Patrick Wendell 
>>wrote:
>>
>>> No these are posted primarily for the purpose of having the Apache
>>> mentors look at the bundling format, they are not likely to be the
>>> exact commit we release (though this RC was
>>> fc6fbfe7d7e9171572c898d9e90301117517e60e).
>>>
>>> On Thu, Sep 5, 2013 at 9:14 PM, Mark Hamstra
>>>
>>> wrote:
>>> > Are these RCs not getting tagged in the repository, or am I just
>>>not
>>> > looking in the right place?
>>> >
>>> >
>>> >
>>> > On Thu, Sep 5, 2013 at 8:08 PM, Patrick Wendell
>>>
>>> wrote:
>>> >
>>> >> Hey All,
>>> >>
>>> >> Matei asked me to pick this up because he's travelling this
>>>week. I
>>> >> cut a second release candidate from the head of the 0.8 branch
>>>(on
>>> >> mesos/spark gitub) to address the following issues:
>>> >>
>>> >> - RC is now hosted in an apache web space
>>> >> - RC now includes signature
>>> >> - RC now includes MD5 and SHA512 digests
>>> >>
>>> >> [tgz]
>>> >>
>>>
>>>http://people.apache.org/~pwendell/spark-rc/spark-0.8.0-src-incubatin
>>>g-
>>>RC2.tgz
>>> >> [all files] http://people.apache.org/~pwendell/spark-rc/
>>> >>
>>> >> It would be great to get feedback on the release structure. I
>>>also
>>> >> changed the name to include "src" since we will be releasing both
>>> >> source and binary releases.
>>> >>
>>> >> I was a bit confused about how to attach my GPG key to the
>>>spark.asc
>>> >> file. I took the following steps.
>>> >>
>>> >> 1. Greated a GPG key locally
>>> >> 2. Distributed the key to public key servers (gpg --send-key)
>>> >> 3. Add exported key to my apache web space:
>>> >> http://people.apache.org/~pwendell/9E4FE3AF.asc
>>> >> 4. Added the key fingerprint at id.apage.org
>>> >> 5. Create an apache FOAF file with the key signature
>>> >>
>>> >> However, this doesn't seem sufficient to

Re: Spark 0.8.0-incubating RC2

2013-09-06 Thread Henry Saputra

HI Patrick, yeah feedback is coming. A bit swamped going to weekend,
sorry about it =(

And yeah the key should be added to groups later when LDAP sync happen.

- Henry

On Fri, Sep 6, 2013 at 1:56 PM, Patrick Wendell  wrote:
> Hey Chris, Henry... do you guys have feedback here? This was based
> largely on your feedback in the last "round" :)
>
> On Thu, Sep 5, 2013 at 9:58 PM, Patrick Wendell  wrote:
>> Hey Evan,
>>
>> These are posted primarily for the purpose of having the Apache
>> mentors look at the bundling format, they are not likely to be the
>> exact commit we release. Matei will be merging in some doc stuff
>> before the release, I'm pretty sure that includes your docs.
>>
>> - Patrick
>>
>> On Thu, Sep 5, 2013 at 9:25 PM, Evan Chan  wrote:
>>> Patrick,
>>>
>>> I'm planning to submit documentation PR's against mesos/spark, by tomorrow,
>>> is that OK?We really should update the docs.
>>>
>>> thanks,
>>> Evan
>>>
>>>
>>>
>>> On Thu, Sep 5, 2013 at 9:20 PM, Patrick Wendell  wrote:
>>>
 No these are posted primarily for the purpose of having the Apache
 mentors look at the bundling format, they are not likely to be the
 exact commit we release (though this RC was
 fc6fbfe7d7e9171572c898d9e90301117517e60e).

 On Thu, Sep 5, 2013 at 9:14 PM, Mark Hamstra 
 wrote:
 > Are these RCs not getting tagged in the repository, or am I just not
 > looking in the right place?
 >
 >
 >
 > On Thu, Sep 5, 2013 at 8:08 PM, Patrick Wendell 
 wrote:
 >
 >> Hey All,
 >>
 >> Matei asked me to pick this up because he's travelling this week. I
 >> cut a second release candidate from the head of the 0.8 branch (on
 >> mesos/spark gitub) to address the following issues:
 >>
 >> - RC is now hosted in an apache web space
 >> - RC now includes signature
 >> - RC now includes MD5 and SHA512 digests
 >>
 >> [tgz]
 >>
 http://people.apache.org/~pwendell/spark-rc/spark-0.8.0-src-incubating-RC2.tgz
 >> [all files] http://people.apache.org/~pwendell/spark-rc/
 >>
 >> It would be great to get feedback on the release structure. I also
 >> changed the name to include "src" since we will be releasing both
 >> source and binary releases.
 >>
 >> I was a bit confused about how to attach my GPG key to the spark.asc
 >> file. I took the following steps.
 >>
 >> 1. Greated a GPG key locally
 >> 2. Distributed the key to public key servers (gpg --send-key)
 >> 3. Add exported key to my apache web space:
 >> http://people.apache.org/~pwendell/9E4FE3AF.asc
 >> 4. Added the key fingerprint at id.apage.org
 >> 5. Create an apache FOAF file with the key signature
 >>
 >> However, this doesn't seem sufficient to get my key on this page (at
 >> least, not yet):
 >> http://people.apache.org/keys/group/spark.asc
 >>
 >> Chris - are there other steps I missed? Is there a manual way to
 >> augment this file?
 >>
 >> - Patrick
 >>

>>>
>>>
>>>
>>> --
>>> --
>>> Evan Chan
>>> Staff Engineer
>>> e...@ooyala.com  |
>>>
>>> 
>>>

Re: Needs a matrix library

2013-09-06 Thread Roman Shaposhnik

On Fri, Sep 6, 2013 at 1:33 PM, Mattmann, Chris A (398J)
 wrote:
> Hey Martin,
>
> We may seriously consider using either Apache Hama here (which will
> bring in Hadoop):

On that note I'd highly recommend taking a look at Apache Giraph
as well: http://giraph.apache.org/

Thanks,
Roman.

Re: Needs a matrix library

2013-09-06 Thread Mattmann, Chris A (398J)

Hey Martin,

We may seriously consider using either Apache Hama here (which will
bring in Hadoop):

http://hama.apache.org/

Or alternatively, think about some Apache Spark based library:

http://spark.incubator.apache.org/

http://stackoverflow.com/questions/18453359/scala-spark-matrix-operations


It will refer to you to MLBase, which we could base it on.

I'm CC'ing the Apache Spark list here to connect the dots.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Martin Desruisseaux 
Organization: Geomatys
Reply-To: "d...@sis.apache.org" 
Date: Friday, September 6, 2013 1:29 PM
To: Apache SIS 
Subject: Needs a matrix library

>Hello all
>
>I have hit a point in Apache SIS where I need some matrix
>implementations. Additions and multiplications are easy to implement,
>but matrix inversions are more difficult. In Geotk I was using the
>"vecmath" package (a derivative of legacy Sun Java3D library). However
>vecmath is licensed under LGPL 2 [1], which in my understanding can not
>work with Apache projects.
>
>Does anyone would recommend a small library implementing Matrix objects
>with basic operations? Note that I'm not looking for a full linear
>algebra package. In particular, I would like a library with optimized
>matrix implementation for the 3x3 and 4x4 cases - matrices of those
>sizes will occur very often, and dedicated implementations make a real
>difference both in performance and accuracy.
>
> Martin
>
>[1] 
>https://java.net/projects/vecmath/sources/svn/content/trunk/LICENSE.txt
>

Re: Apache account

2013-09-06 Thread Andrew Hart


Hi Nick,

I searched through the foundation mail archives and and see that your 
ICLA request was marked as received and accepted on July 01. However, 
for some reason, it appears your information was not added to the list 
of filed ICLAs, and you do not yet show up under the "Unlisted CLAs" 
section of http://people.apache.org/committer-index.html.


I will investigate further to see what might have happened.

Best,
Andrew.



On 09/05/2013 11:17 PM, Nick Pentreath wrote:

Hi

I submitted my license agreement and account name request a while back, but
still haven't received any correspondence. Just wondering what I need to do
in order to follow this up?

Thanks
Nick

Re: Incubator PMC/Board report for Sep 2013 ([ppmc])

2013-09-06 Thread Suresh Marru

Thanks Matei, I signed off as well.

Suresh

On Sep 4, 2013, at 12:52 PM, Matei Zaharia  wrote:

> Hi guys,
> 
> I've written a draft update at 
> https://wiki.apache.org/incubator/September2013#preview. Let me know how it 
> looks.
> 
> Matei
> 
> On Sep 1, 2013, at 6:43 AM, Marvin  wrote:
> 
>> 
>> 
>> Dear podling,
>> 
>> This email was sent by an automated system on behalf of the Apache Incubator 
>> PMC.
>> It is an initial reminder to give you plenty of time to prepare your 
>> quarterly
>> board report.
>> 
>> The board meeting is scheduled for Wed, 18 September 2013, 10:30:00:00 PST. 
>> The report 
>> for your podling will form a part of the Incubator PMC report. The Incubator 
>> PMC 
>> requires your report to be submitted 2 weeks before the board meeting, to 
>> allow 
>> sufficient time for review and submission (Wed, Sep 4th).
>> 
>> Please submit your report with sufficient time to allow the incubator PMC, 
>> and 
>> subsequently board members to review and digest. Again, the very latest you 
>> should submit your report is 2 weeks prior to the board meeting.
>> 
>> Thanks,
>> 
>> The Apache Incubator PMC
>> 
>> Submitting your Report
>> --
>> 
>> Your report should contain the following:
>> 
>> * Your project name
>> * A brief description of your project, which assumes no knowledge of the 
>> project
>>  or necessarily of its field
>> * A list of the three most important issues to address in the move towards 
>>  graduation.
>> * Any issues that the Incubator PMC or ASF Board might wish/need to be aware 
>> of
>> * How has the community developed since the last report
>> * How has the project developed since the last report.
>> 
>> This should be appended to the Incubator Wiki page at:
>> 
>> http://wiki.apache.org/incubator/September2013
>> 
>> Note: This manually populated. You may need to wait a little before this 
>> page is
>> created from a template.
>> 
>> Mentors
>> ---
>> Mentors should review reports for their project(s) and sign them off on the 
>> Incubator wiki page. Signing off reports shows that you are following the 
>> project - projects that are not signed may raise alarms for the Incubator 
>> PMC.
>> 
>> Incubator PMC
>> 
>

Re: off-heap RDDs

2013-09-06 Thread Haoyuan Li

That will be great!

Haoyuan


On Thu, Sep 5, 2013 at 9:28 PM, Evan Chan  wrote:

> Haoyuan,
>
> Thanks, that sounds great, exactly what we are looking for.
>
> We might be interested in integrating Tachyon with CFS (Cassandra File
> System, the Cassandra-based implementation of HDFS).
>
> -Evan
>
>
>
> On Sat, Aug 31, 2013 at 3:33 PM, Haoyuan Li  wrote:
>
> > Evan,
> >
> > If I understand you correctly, you want to avoid network I/O as much as
> > possible by caching the data on the node having the data on disk.
> Actually,
> > what I meant client caching would automatically do this. For example,
> > suppose you have a cluster of machines, nothing cached in memory yet.
> Then
> > a spark application runs on it. Spark asks Tachyon where data X is. Since
> > nothing is in memory yet, Tachyon would return disk locations for the
> first
> > time. Then Spark program will try to take advantage of disk data
> locality,
> > and load the data X in HDFS node N into the off-heap memory of node N. In
> > the future, when Spark asks Tachyon the location of X, Tachyon will
> return
> > node N. There is no network I/O involved in the whole process. Let me
> know
> > if I misunderstood something.
> >
> > Haoyuan
> >
> >
> > On Fri, Aug 30, 2013 at 10:00 AM, Evan Chan  wrote:
> >
> > > Hey guys,
> > >
> > > I would also prefer to strengthen and get behind Tachyon, rather than
> > > implement a separate solution (though I guess if it's not offiically
> > > supported, then nobody will ask questions).  But it's more that
> off-heap
> > > memory is difficult, so it's better to focus efforts on one project, is
> > my
> > > feeling.
> > >
> > > Haoyuan,
> > >
> > > Tachyon brings cached HDFS data to the local client.  Have we thought
> > about
> > > the opposite approach, which might be more efficient?
> > >  - Load the data in HDFS node N into the off-heap memory of node N
> > >  - in Spark, inform the framework (maybe via RDD partition/location
> info)
> > > of where the data is, that it is located in node N
> > >  - bring the computation to node N
> > >
> > > This avoids network IO and may be much more efficient for many types of
> > > applications.   I know this would be a big win for us.
> > >
> > > -Evan
> > >
> > >
> > > On Wed, Aug 28, 2013 at 1:37 AM, Haoyuan Li 
> > wrote:
> > >
> > > > No problem. Like reading/writing data from/to off-heap bytebuffer,
> > when a
> > > > program reads/writes data from/to Tachyon, Spark/Shark needs to do
> > > ser/de.
> > > > Efficient ser/de will help on performance a lot as people pointed
> out.
> > > One
> > > > solution is that the application can do primitive operations directly
> > on
> > > > ByteBuffer, like how Shark is handling it now. Most related code is
> > > located
> > > > at "
> > > >
> > >
> >
> https://github.com/amplab/shark/tree/master/src/main/scala/shark/memstore2
> > > > "
> > > > and "
> > > >
> > > >
> > >
> >
> https://github.com/amplab/shark/tree/master/src/tachyon_enabled/scala/shark/tachyon
> > > > ".
> > > >
> > > > Haoyuan
> > > >
> > > >
> > > > On Wed, Aug 28, 2013 at 1:21 AM, Imran Rashid 
> > > > wrote:
> > > >
> > > > > Thanks Haoyuan.  It seems like we should try out Tachyon, sounds
> like
> > > > > it is what we are looking for.
> > > > >
> > > > > On Wed, Aug 28, 2013 at 8:18 AM, Haoyuan Li 
> > > > wrote:
> > > > > > Response inline.
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 27, 2013 at 1:37 AM, Imran Rashid <
> > im...@therashids.com>
> > > > > wrote:
> > > > > >
> > > > > >> Thanks for all the great comments & discussion.  Let me expand a
> > bit
> > > > > >> on our use case, and then I'm gonna combine responses to various
> > > > > >> questions.
> > > > > >>
> > > > > >> In general, when we use spark, we have some really big RDDs that
> > use
> > > > > >> up a lot of memory (10s of GB per node) that are really our
> "core"
> > > > > >> data sets.  We tend to start up a spark application, immediately
> > > load
> > > > > >> all those data sets, and just leave them loaded for the lifetime
> > of
> > > > > >> that process.  We definitely create a lot of other RDDs along
> the
> > > way,
> > > > > >> and lots of intermediate objects that we'd like to go through
> > normal
> > > > > >> garbage collection.  But those all require much less memory,
> maybe
> > > > > >> 1/10th of the big RDDs that we just keep around.  I know this
> is a
> > > bit
> > > > > >> of a special case, but it seems like it probably isn't that
> > > different
> > > > > >> from a lot of use cases.
> > > > > >>
> > > > > >> Reynold Xin wrote:
> > > > > >> > This is especially attractive if the application can read
> > directly
> > > > > from
> > > > > >> a byte
> > > > > >> > buffer without generic serialization (like Shark).
> > > > > >>
> > > > > >> interesting -- can you explain how this works in Shark?  do you
> > have
> > > > > >> some general way of storing data in byte buffers that avoids
> > > > > >> serialization?  Or do you mean that if the user is effectively
> >

Re: Apache account

2013-09-06 Thread Reynold Xin

Copying Chris on this one.


--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org



On Fri, Sep 6, 2013 at 2:17 PM, Nick Pentreath wrote:

> Hi
>
> I submitted my license agreement and account name request a while back, but
> still haven't received any correspondence. Just wondering what I need to do
> in order to follow this up?
>
> Thanks
> Nick
>

Re: [Licensing check] Spark 0.8.0-incubating RC1

Re: Needs a matrix library

Re: Needs a matrix library

Re: Needs a matrix library

Re: Needs a matrix library

Re: Needs a matrix library

Re: Needs a matrix library

Re: Spark 0.8.0-incubating RC2

Re: Needs a matrix library

Re: Needs a matrix library

Re: Needs a matrix library

Re: Fair scheduler documentation

Re: Spark 0.8.0-incubating RC2

Re: Spark 0.8.0-incubating RC2

Re: Spark 0.8.0-incubating RC2

Re: Spark 0.8.0-incubating RC2

Re: Needs a matrix library

Re: Needs a matrix library

Re: Spark 0.8.0-incubating RC2

RDD placement

Re: Fair scheduler documentation

Fair scheduler documentation

Re: Needs a matrix library

Re: Spark 0.8.0-incubating RC2

Re: Spark 0.8.0-incubating RC2

Re: Spark 0.8.0-incubating RC2

Re: Needs a matrix library

Re: Needs a matrix library

Re: Apache account

Re: Incubator PMC/Board report for Sep 2013 ([ppmc])

Re: off-heap RDDs

Re: Apache account

32 matches

Site Navigation

Mail list logo

Footer information