Re: [VOTE] Release Apache Spark 0.8.0-incubating (rc4)

2013-12-17 Thread Chris Mattmann
Hi Guys,

+1 from me (binding):

SIGS pass, CHECKSUMS pass:

[chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
$HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-hadoop1
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100  131M  100  131M0 0  1754k  0  0:01:16  0:01:16 --:--:--
1165k
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100   490  100   4900 0   6965  0 --:--:-- --:--:-- --:--:--
13611
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100   129  100   1290 0   1839  0 --:--:-- --:--:-- --:--:--
3583
[chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
$HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-hadoop2
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100  215M  100  215M0 0  1815k  0  0:02:01  0:02:01 --:--:--
1826k
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100   490  100   4900 0   6831  0 --:--:-- --:--:-- --:--:--
13611
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100   129  100   1290 0   1819  0 --:--:-- --:--:-- --:--:--
3583
[chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
$HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-cdh
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
[chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
$HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-cdh4
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100  136M  100  136M0 0  1757k  0  0:01:19  0:01:19 --:--:--
1502k
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100   490  100   4900 0   6892  0 --:--:-- --:--:-- --:--:--
13611
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100   123  100   1230 0   1702  0 --:--:-- --:--:-- --:--:--
3514
[chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
$HOME/bin/stage_apache_rc spark 0.8.1-incubating
http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100 4565k  100 4565k0 0  1636k  0  0:00:02  0:00:02 --:--:--
1656k
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
100   490  100   4900 0   6949  0 --:--:-- --:--:-- --:--:--
13611
  % Total% Received % Xferd  Average Speed   TimeTime Time
Current
 Dload  Upload   Total   SpentLeft
Speed
10077  100770 0   1109  0 --:--:-- --:--:-- --:--:--
2200
[chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
$HOME/bin/verify_gpg_sigs
Verifying Signature for file spark-0.8.1-incubating-bin-cdh4.tgz.asc
gpg: Signature made Tue Dec 10 15:03:24 2013 PST using RSA key ID 9E4FE3AF
gpg: Good signature from Patrick Wendell pwend...@gmail.com
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the
owner.
Primary key fingerprint: 5AA9 0E72 812F F246 7904  277D 548F 5FEE 9E4F E3AF
Verifying Signature for file spark-0.8.1-incubating-bin-hadoop1.tgz.asc
gpg: Signature made Tue Dec 10 14:58:15 2013 PST using RSA key ID 9E4FE3AF
gpg: Good signature from Patrick Wendell pwend...@gmail.com
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the
owner.
Primary key fingerprint: 5AA9 0E72 812F F246 7904  277D 548F 5FEE 9E4F E3AF
Verifying Signature for file spark-0.8.1-incubating-bin-hadoop2.tgz.asc
gpg: Signature made Tue Dec 10 15:09:16 2013 PST using RSA key ID 9E4FE3AF
gpg: Good signature from Patrick Wendell pwend...@gmail.com
gpg: WARNING: This key is not certified with a trusted signature!
gpg: 

Re: [VOTE] Release Apache Spark 0.8.0-incubating (rc4)

2013-12-17 Thread Roman Shaposhnik
+1 (binding) from me as well.

That said, I'd expect the issues identified around
jar inclusion to be blocking for 0.9 (do we have
a blocker JIRA filed?). There's also a few issues
around the build but I need to spend time and file
JIRAs myself. Will do in time for 0.9

Thanks,
Roman.

On Tue, Dec 17, 2013 at 9:15 AM, Chris Mattmann mattm...@apache.org wrote:
 Hi Guys,

 +1 from me (binding):

 SIGS pass, CHECKSUMS pass:

 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
 $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-hadoop1
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 100  131M  100  131M0 0  1754k  0  0:01:16  0:01:16 --:--:--
 1165k
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 100   490  100   4900 0   6965  0 --:--:-- --:--:-- --:--:--
 13611
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 100   129  100   1290 0   1839  0 --:--:-- --:--:-- --:--:--
 3583
 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
 $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-hadoop2
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 100  215M  100  215M0 0  1815k  0  0:02:01  0:02:01 --:--:--
 1826k
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 100   490  100   4900 0   6831  0 --:--:-- --:--:-- --:--:--
 13611
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 100   129  100   1290 0   1819  0 --:--:-- --:--:-- --:--:--
 3583
 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
 $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-cdh
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
 $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-cdh4
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 100  136M  100  136M0 0  1757k  0  0:01:19  0:01:19 --:--:--
 1502k
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 100   490  100   4900 0   6892  0 --:--:-- --:--:-- --:--:--
 13611
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 100   123  100   1230 0   1702  0 --:--:-- --:--:-- --:--:--
 3514
 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
 $HOME/bin/stage_apache_rc spark 0.8.1-incubating
 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 100 4565k  100 4565k0 0  1636k  0  0:00:02  0:00:02 --:--:--
 1656k
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 100   490  100   4900 0   6949  0 --:--:-- --:--:-- --:--:--
 13611
   % Total% Received % Xferd  Average Speed   TimeTime Time
 Current
  Dload  Upload   Total   SpentLeft
 Speed
 10077  100770 0   1109  0 --:--:-- --:--:-- --:--:--
 2200
 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann%
 $HOME/bin/verify_gpg_sigs
 Verifying Signature for file spark-0.8.1-incubating-bin-cdh4.tgz.asc
 gpg: Signature made Tue Dec 10 15:03:24 2013 PST using RSA key ID 9E4FE3AF
 gpg: Good signature from Patrick Wendell pwend...@gmail.com
 gpg: WARNING: This key is not certified with a trusted signature!
 gpg:  There is no indication that the signature belongs to the
 owner.
 Primary key fingerprint: 5AA9 0E72 812F F246 7904  277D 548F 5FEE 9E4F E3AF
 Verifying Signature for file spark-0.8.1-incubating-bin-hadoop1.tgz.asc
 gpg: Signature made Tue Dec 10 14:58:15 2013 PST using RSA key ID 9E4FE3AF
 gpg: Good signature from Patrick Wendell pwend...@gmail.com
 gpg: WARNING: This key is not 

Spark development for undergraduate project

2013-12-17 Thread Matthew Cheah
Hi everyone,

During my most recent internship, I worked extensively with Apache Spark,
integrating it into a company's data analytics platform. I've now become
interested in contributing to Apache Spark.

I'm returning to undergraduate studies in January and there is an academic
course which is simply a standalone software engineering project. I was
thinking that some contribution to Apache Spark would satisfy my curiosity,
help continue support the company I interned at, and give me academic
credits required to graduate, all at the same time. It seems like too good
an opportunity to pass up.

With that in mind, I have the following questions:

   1. At this point, is there any self-contained project that I could work
   on within Spark? Ideally, I would work on it independently, in about a
   three month time frame. This time also needs to accommodate ramping up on
   the Spark codebase and adjusting to the Scala programming language and
   paradigms. The company I worked at primarily used the Java APIs. The output
   needs to be a technical report describing the project requirements, and the
   design process I took to engineer the solution for the requirements. In
   particular, it cannot just be a series of haphazard patches.
   2. How can I get started with contributing to Spark?
   3. Is there a high-level UML or some other design specification for the
   Spark architecture?

Thanks! I hope to be of some help =)

-Matt Cheah


Re: Spark development for undergraduate project

2013-12-17 Thread Christopher Nguyen
Matt, some suggestions.

If you're interested in the machine-learning layer, perhaps you could look
into helping to harmonize our (Adatao) dataframe representation with
MLlib's, and base RDDs for that matter. It requires someone to spend some
dedicated time looking into the trade-offs between generalizability vs
performance issues, etc. It's something our groups have talked about doing
but haven't been able to invest the resources to do.

Separately, neural nets/deep learning is an area of emerging interest to
look into with Spark. It may drive some alternate optimization patterns for
Spark, e.g., sub-cluster communication. If interested, I can connect you to
some deep learning folks at UoT (not too far from you) and Google. Matei
may also have some interest in this.

--
Christopher T. Nguyen
Co-founder  CEO, Adatao http://adatao.com
linkedin.com/in/ctnguyen



On Tue, Dec 17, 2013 at 10:43 AM, Matthew Cheah mcch...@uwaterloo.cawrote:

 Hi everyone,

 During my most recent internship, I worked extensively with Apache Spark,
 integrating it into a company's data analytics platform. I've now become
 interested in contributing to Apache Spark.

 I'm returning to undergraduate studies in January and there is an academic
 course which is simply a standalone software engineering project. I was
 thinking that some contribution to Apache Spark would satisfy my curiosity,
 help continue support the company I interned at, and give me academic
 credits required to graduate, all at the same time. It seems like too good
 an opportunity to pass up.

 With that in mind, I have the following questions:

1. At this point, is there any self-contained project that I could work
on within Spark? Ideally, I would work on it independently, in about a
three month time frame. This time also needs to accommodate ramping up
 on
the Spark codebase and adjusting to the Scala programming language and
paradigms. The company I worked at primarily used the Java APIs. The
 output
needs to be a technical report describing the project requirements, and
 the
design process I took to engineer the solution for the requirements. In
particular, it cannot just be a series of haphazard patches.
2. How can I get started with contributing to Spark?
3. Is there a high-level UML or some other design specification for the
Spark architecture?

 Thanks! I hope to be of some help =)

 -Matt Cheah



Spark development for undergraduate project

2013-12-17 Thread Matthew Cheah
Hi everyone,

During my most recent internship, I worked extensively with Apache Spark,
integrating it into a company's data analytics platform. I've now become
interested in contributing to Apache Spark.

I'm returning to undergraduate studies in January and there is an academic
course which is simply a standalone software engineering project. I was
thinking that some contribution to Apache Spark would satisfy my curiosity,
help continue support the company I interned at, and give me academic
credits required to graduate, all at the same time. It seems like too good
an opportunity to pass up.

With that in mind, I have the following questions:

   1. At this point, is there any self-contained project that I could work
   on within Spark? Ideally, I would work on it independently, in about a
   three month time frame. This time also needs to accommodate ramping up on
   the Spark codebase and adjusting to the Scala programming language and
   paradigms. The company I worked at primarily used the Java APIs. The output
   needs to be a technical report describing the project requirements, and the
   design process I took to engineer the solution for the requirements. In
   particular, it cannot just be a series of haphazard patches.
   2. How can I get started with contributing to Spark?
   3. Is there a high-level UML or some other design specification for the
   Spark architecture?

Thanks! I hope to be of some help =)

-Matt Cheah


[RESULT] [VOTE] Release Apache Spark 0.8.1-incubating (rc4)

2013-12-17 Thread Patrick Wendell
The vote is now closed. This vote passes with 4 IPMC +1's and no 0 or -1 votes.

+1 (4 Total)
Marvin Humphrey
Henry Saputra
Chris Mattmann
Roman Shaposhnik

0 (0 Total)

-1 (0 Total)

* = Binding Vote

Thanks to everyone who helped vet this release.

- Patrick


Fwd: Spark development for undergraduate project

2013-12-17 Thread Matthew Cheah
Hi everyone,

During my most recent internship, I worked extensively with Apache Spark,
integrating it into a company's data analytics platform. I've now become
interested in contributing to Apache Spark.

I'm returning to undergraduate studies in January and there is an academic
course which is simply a standalone software engineering project. I was
thinking that some contribution to Apache Spark would satisfy my curiosity,
help continue support the company I interned at, and give me academic
credits required to graduate, all at the same time. It seems like too good
an opportunity to pass up.

With that in mind, I have the following questions:

   1. At this point, is there any self-contained project that I could work
   on within Spark? Ideally, I would work on it independently, in about a
   three month time frame. This time also needs to accommodate ramping up on
   the Spark codebase and adjusting to the Scala programming language and
   paradigms. The company I worked at primarily used the Java APIs. The output
   needs to be a technical report describing the project requirements, and the
   design process I took to engineer the solution for the requirements. In
   particular, it cannot just be a series of haphazard patches.
   2. How can I get started with contributing to Spark?
   3. Is there a high-level UML or some other design specification for the
   Spark architecture?

Thanks! I hope to be of some help =)

-Matt Cheah