Re: [VOTE] Release Apache Spark 0.8.0-incubating (rc4)
Hi Guys, +1 from me (binding): SIGS pass, CHECKSUMS pass: [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-hadoop1 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 131M 100 131M0 0 1754k 0 0:01:16 0:01:16 --:--:-- 1165k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 490 100 4900 0 6965 0 --:--:-- --:--:-- --:--:-- 13611 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 129 100 1290 0 1839 0 --:--:-- --:--:-- --:--:-- 3583 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-hadoop2 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 215M 100 215M0 0 1815k 0 0:02:01 0:02:01 --:--:-- 1826k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 490 100 4900 0 6831 0 --:--:-- --:--:-- --:--:-- 13611 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 129 100 1290 0 1819 0 --:--:-- --:--:-- --:--:-- 3583 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-cdh http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-cdh4 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 136M 100 136M0 0 1757k 0 0:01:19 0:01:19 --:--:-- 1502k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 490 100 4900 0 6892 0 --:--:-- --:--:-- --:--:-- 13611 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 123 100 1230 0 1702 0 --:--:-- --:--:-- --:--:-- 3514 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/stage_apache_rc spark 0.8.1-incubating http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 4565k 100 4565k0 0 1636k 0 0:00:02 0:00:02 --:--:-- 1656k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 490 100 4900 0 6949 0 --:--:-- --:--:-- --:--:-- 13611 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 10077 100770 0 1109 0 --:--:-- --:--:-- --:--:-- 2200 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/verify_gpg_sigs Verifying Signature for file spark-0.8.1-incubating-bin-cdh4.tgz.asc gpg: Signature made Tue Dec 10 15:03:24 2013 PST using RSA key ID 9E4FE3AF gpg: Good signature from Patrick Wendell pwend...@gmail.com gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 5AA9 0E72 812F F246 7904 277D 548F 5FEE 9E4F E3AF Verifying Signature for file spark-0.8.1-incubating-bin-hadoop1.tgz.asc gpg: Signature made Tue Dec 10 14:58:15 2013 PST using RSA key ID 9E4FE3AF gpg: Good signature from Patrick Wendell pwend...@gmail.com gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 5AA9 0E72 812F F246 7904 277D 548F 5FEE 9E4F E3AF Verifying Signature for file spark-0.8.1-incubating-bin-hadoop2.tgz.asc gpg: Signature made Tue Dec 10 15:09:16 2013 PST using RSA key ID 9E4FE3AF gpg: Good signature from Patrick Wendell pwend...@gmail.com gpg: WARNING: This key is not certified with a trusted signature! gpg:
Re: [VOTE] Release Apache Spark 0.8.0-incubating (rc4)
+1 (binding) from me as well. That said, I'd expect the issues identified around jar inclusion to be blocking for 0.9 (do we have a blocker JIRA filed?). There's also a few issues around the build but I need to spend time and file JIRAs myself. Will do in time for 0.9 Thanks, Roman. On Tue, Dec 17, 2013 at 9:15 AM, Chris Mattmann mattm...@apache.org wrote: Hi Guys, +1 from me (binding): SIGS pass, CHECKSUMS pass: [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-hadoop1 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 131M 100 131M0 0 1754k 0 0:01:16 0:01:16 --:--:-- 1165k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 490 100 4900 0 6965 0 --:--:-- --:--:-- --:--:-- 13611 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 129 100 1290 0 1839 0 --:--:-- --:--:-- --:--:-- 3583 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-hadoop2 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 215M 100 215M0 0 1815k 0 0:02:01 0:02:01 --:--:-- 1826k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 490 100 4900 0 6831 0 --:--:-- --:--:-- --:--:-- 13611 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 129 100 1290 0 1819 0 --:--:-- --:--:-- --:--:-- 3583 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-cdh http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/stage_apache_rc spark 0.8.1-incubating-bin-cdh4 http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 136M 100 136M0 0 1757k 0 0:01:19 0:01:19 --:--:-- 1502k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 490 100 4900 0 6892 0 --:--:-- --:--:-- --:--:-- 13611 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 123 100 1230 0 1702 0 --:--:-- --:--:-- --:--:-- 3514 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/stage_apache_rc spark 0.8.1-incubating http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 4565k 100 4565k0 0 1636k 0 0:00:02 0:00:02 --:--:-- 1656k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 490 100 4900 0 6949 0 --:--:-- --:--:-- --:--:-- 13611 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 10077 100770 0 1109 0 --:--:-- --:--:-- --:--:-- 2200 [chipotle:~/tmp/apache-spark-0.8.1-incubating-rc4] mattmann% $HOME/bin/verify_gpg_sigs Verifying Signature for file spark-0.8.1-incubating-bin-cdh4.tgz.asc gpg: Signature made Tue Dec 10 15:03:24 2013 PST using RSA key ID 9E4FE3AF gpg: Good signature from Patrick Wendell pwend...@gmail.com gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 5AA9 0E72 812F F246 7904 277D 548F 5FEE 9E4F E3AF Verifying Signature for file spark-0.8.1-incubating-bin-hadoop1.tgz.asc gpg: Signature made Tue Dec 10 14:58:15 2013 PST using RSA key ID 9E4FE3AF gpg: Good signature from Patrick Wendell pwend...@gmail.com gpg: WARNING: This key is not
Spark development for undergraduate project
Hi everyone, During my most recent internship, I worked extensively with Apache Spark, integrating it into a company's data analytics platform. I've now become interested in contributing to Apache Spark. I'm returning to undergraduate studies in January and there is an academic course which is simply a standalone software engineering project. I was thinking that some contribution to Apache Spark would satisfy my curiosity, help continue support the company I interned at, and give me academic credits required to graduate, all at the same time. It seems like too good an opportunity to pass up. With that in mind, I have the following questions: 1. At this point, is there any self-contained project that I could work on within Spark? Ideally, I would work on it independently, in about a three month time frame. This time also needs to accommodate ramping up on the Spark codebase and adjusting to the Scala programming language and paradigms. The company I worked at primarily used the Java APIs. The output needs to be a technical report describing the project requirements, and the design process I took to engineer the solution for the requirements. In particular, it cannot just be a series of haphazard patches. 2. How can I get started with contributing to Spark? 3. Is there a high-level UML or some other design specification for the Spark architecture? Thanks! I hope to be of some help =) -Matt Cheah
Re: Spark development for undergraduate project
Matt, some suggestions. If you're interested in the machine-learning layer, perhaps you could look into helping to harmonize our (Adatao) dataframe representation with MLlib's, and base RDDs for that matter. It requires someone to spend some dedicated time looking into the trade-offs between generalizability vs performance issues, etc. It's something our groups have talked about doing but haven't been able to invest the resources to do. Separately, neural nets/deep learning is an area of emerging interest to look into with Spark. It may drive some alternate optimization patterns for Spark, e.g., sub-cluster communication. If interested, I can connect you to some deep learning folks at UoT (not too far from you) and Google. Matei may also have some interest in this. -- Christopher T. Nguyen Co-founder CEO, Adatao http://adatao.com linkedin.com/in/ctnguyen On Tue, Dec 17, 2013 at 10:43 AM, Matthew Cheah mcch...@uwaterloo.cawrote: Hi everyone, During my most recent internship, I worked extensively with Apache Spark, integrating it into a company's data analytics platform. I've now become interested in contributing to Apache Spark. I'm returning to undergraduate studies in January and there is an academic course which is simply a standalone software engineering project. I was thinking that some contribution to Apache Spark would satisfy my curiosity, help continue support the company I interned at, and give me academic credits required to graduate, all at the same time. It seems like too good an opportunity to pass up. With that in mind, I have the following questions: 1. At this point, is there any self-contained project that I could work on within Spark? Ideally, I would work on it independently, in about a three month time frame. This time also needs to accommodate ramping up on the Spark codebase and adjusting to the Scala programming language and paradigms. The company I worked at primarily used the Java APIs. The output needs to be a technical report describing the project requirements, and the design process I took to engineer the solution for the requirements. In particular, it cannot just be a series of haphazard patches. 2. How can I get started with contributing to Spark? 3. Is there a high-level UML or some other design specification for the Spark architecture? Thanks! I hope to be of some help =) -Matt Cheah
Spark development for undergraduate project
Hi everyone, During my most recent internship, I worked extensively with Apache Spark, integrating it into a company's data analytics platform. I've now become interested in contributing to Apache Spark. I'm returning to undergraduate studies in January and there is an academic course which is simply a standalone software engineering project. I was thinking that some contribution to Apache Spark would satisfy my curiosity, help continue support the company I interned at, and give me academic credits required to graduate, all at the same time. It seems like too good an opportunity to pass up. With that in mind, I have the following questions: 1. At this point, is there any self-contained project that I could work on within Spark? Ideally, I would work on it independently, in about a three month time frame. This time also needs to accommodate ramping up on the Spark codebase and adjusting to the Scala programming language and paradigms. The company I worked at primarily used the Java APIs. The output needs to be a technical report describing the project requirements, and the design process I took to engineer the solution for the requirements. In particular, it cannot just be a series of haphazard patches. 2. How can I get started with contributing to Spark? 3. Is there a high-level UML or some other design specification for the Spark architecture? Thanks! I hope to be of some help =) -Matt Cheah
[RESULT] [VOTE] Release Apache Spark 0.8.1-incubating (rc4)
The vote is now closed. This vote passes with 4 IPMC +1's and no 0 or -1 votes. +1 (4 Total) Marvin Humphrey Henry Saputra Chris Mattmann Roman Shaposhnik 0 (0 Total) -1 (0 Total) * = Binding Vote Thanks to everyone who helped vet this release. - Patrick
Fwd: Spark development for undergraduate project
Hi everyone, During my most recent internship, I worked extensively with Apache Spark, integrating it into a company's data analytics platform. I've now become interested in contributing to Apache Spark. I'm returning to undergraduate studies in January and there is an academic course which is simply a standalone software engineering project. I was thinking that some contribution to Apache Spark would satisfy my curiosity, help continue support the company I interned at, and give me academic credits required to graduate, all at the same time. It seems like too good an opportunity to pass up. With that in mind, I have the following questions: 1. At this point, is there any self-contained project that I could work on within Spark? Ideally, I would work on it independently, in about a three month time frame. This time also needs to accommodate ramping up on the Spark codebase and adjusting to the Scala programming language and paradigms. The company I worked at primarily used the Java APIs. The output needs to be a technical report describing the project requirements, and the design process I took to engineer the solution for the requirements. In particular, it cannot just be a series of haphazard patches. 2. How can I get started with contributing to Spark? 3. Is there a high-level UML or some other design specification for the Spark architecture? Thanks! I hope to be of some help =) -Matt Cheah