Re: [VOTE] Graduation of Apache Spark
+1 *Proud to be a donor of the AAP. Join the movement today. **http://donate.aamaadmiparty.org http://messenger.aamaadmiparty.org/l/cbfetvjEQCcn892763mMmlNiaw/9KS50hXPn6DRjLRWMOJzkw/M4r8D763LgDtF7f16PFYqQeA* the dreamers of the day are dangerous men, for they may act their dream with open eyes, and make it possible On Thu, Jan 30, 2014 at 1:26 PM, Xia Zhu xia@gmail.com wrote: +1 On Wed, Jan 29, 2014 at 11:28 PM, Heiko Braun ike.br...@googlemail.com wrote: +1 Am 30.01.2014 um 08:15 schrieb Stevo Slavić ssla...@gmail.com: +1 On Thu, Jan 30, 2014 at 2:09 AM, Jason Dai jason@gmail.com wrote: +1 On Tue, Jan 28, 2014 at 2:43 AM, 冯俊峰 junfeng.f...@gmail.com wrote: +1 On 2014-01-26 4:50 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi guys, Discussion has proceeded positively, so I'm calling for a community VOTE for the graduation of Apache Spark (incubating) into a top level project. If this VOTE is successful, then I'll call an Incubator PMC VOTE in 72 hours, and if that is successful, we’ll submit the project graduation resolution below into the board agenda for the next Apache board meeting. So far, I've heard the following VOTEs (implied) during the DISCUSS thread. If you see your name there is no need to VOTE again and I’ll carry through the VOTE as below. If you want to change your VOTE, or I got it wrong, let me know and we'll change it. +1 Matei Zaharia Reynold Xin Tathagata Das Sean McNamara Patrick Wendell Mark Hamstra Chris Mattmann * Tom Graves Henry Saputra * Andy Konwinski Josh Rosen Mosharaf Chowdhury Mridul Muralidharan Nick Pentreath Andrew Xia Haoyuan Li * - indicates IPMC Anyone else interested, please VOTE to graduate Apache Spark from the Incubator. I'll try and close the VOTE on Wednesday and then start the Incubator PMC VOTE on gene...@incubator.apache.org. [ ] +1 Graduate Apache Spark (incubating) from the Incubator per the resolution below. [ ] +0 Don't care. [ ] -1 Don't graduate Apache Spark (incubating) from the Incubator because.. Thanks guys! The resolution is included below. --- WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to fast and flexible large-scale data analysis on clusters. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache Spark Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Spark Project be and hereby is responsible for the creation and maintenance of software related to efficient cluster management, resource isolation and sharing across distributed applications; and be it further RESOLVED, that the office of Vice President, Apache Spark be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Spark Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Spark Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Spark Project: * Mosharaf Chowdhury mosha...@apache.org * Jason Dai jason...@apache.org * Tathagata Das t...@eecs.berkeley.edu * Ankur Dave ankurd...@gmail.com * Aaron Davidson aarondavid...@berkeley.edu * Thomas Dudziak to...@apache.org * Robert Evans bo...@apache.org * Thomas Graves tgra...@apache.org * Andy Konwinski and...@apache.org * Stephen Haberman steph...@apache.org * Mark Hamstra markhams...@apache.org * Shane Huang shane_hu...@apache.org * Ryan LeCompte ryanlecom...@apache.org * Haoyuan Li haoy...@apache.org * Sean McNamara mcnam...@apache.org * Mridul Muralidharam mrid...@yahoo-inc.com * Kay Ousterhout k...@eecs.berkeley.edu * Nick Pentreath mln...@apache.org * Imran Rashid im...@quantifind.com * Charles Reiss wog...@apache.org * Josh Rosen joshro...@apache.org * Prashant Sharma prash...@apache.org * Ram Sriharsha harsh...@yahoo-inc.com * Shivaram Venkataraman shiva...@apache.org * Patrick Wendell pwend...@apache.org * Andrew Xia xiajunl...@gmail.com * Reynold Xin r...@apache.org * Matei Zaharia ma...@apache.org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matei Zaharia be appointed to the office of Vice President, Apache Spark, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws
Source code JavaNetworkWordcount
Hi Guys, I'm not very good like java programmer, so anybody could me help with this code piece from JavaNetworkWordcount: JavaPairDStreamString, Integer wordCounts = words.map( new PairFunctionString, String, Integer() { @Override public Tuple2String, Integer call(String s) throws Exception { return new Tuple2String, Integer(s, 1); } }).reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer i1, Integer i2) throws Exception { return i1 + i2; } }); JavaPairDStreamString, Integer counts = wordCounts.reduceByKeyAndWindow( new Function2Integer, Integer, Integer() { public Integer call(Integer i1, Integer i2) { return i1 + i2; } }, new Function2Integer, Integer, Integer() { public Integer call(Integer i1, Integer i2) { return i1 - i2; } }, new Duration(60 * 5 * 1000), new Duration(1 * 1000) ); I would like to think a manner of counting and after summing and getting a total from words counted in a single file, for example a book in txt extension Don Quixote. The counts function give me the resulted from each word has found and not a total of words from the file. Tathagata has sent me a piece from scala code, Thanks Tathagata by your attention with my posts I am very thankfully, yourDStream.foreachRDD(rdd = { // Get and print first n elements val firstN = rdd.take(n) println(First N elements = + firstN) // Count the number of elements in each batch println(RDD has + rdd.count() + elements) }) yourDStream.count.print() Could anybody help me? Thanks Guys -- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo.
ApacheCon
I might have missed it earlier, but is anybody planning to present at ApacheCon? I think it's in Denver this year, April 7-9. Thinking of submitting a talk about how we use Spark and Cassandra. -Evan -- -- Evan Chan Staff Engineer e...@ooyala.com |
Re: ApacheCon
There is one proposal on Spark: http://events.linuxfoundation.org/cfp/cfp-list?page=1#overlay=cfp/proposals/1461 On Thu, Jan 30, 2014 at 11:08 AM, Evan Chan e...@ooyala.com wrote: I might have missed it earlier, but is anybody planning to present at ApacheCon? I think it's in Denver this year, April 7-9. Thinking of submitting a talk about how we use Spark and Cassandra. -Evan -- -- Evan Chan Staff Engineer e...@ooyala.com |
rough date for spark summit 2014
I know this is still a few months off and folks are rushing towards 0.9 release, but do the devs have a rough date for Spark Summit 2014? Looks like it'll be in summer, but is it Jun / July / Aug / Sep ? Even late-summer would help. Summer being a popular vacation time, a few months advance notice would be greatly appreciated (read: I missed last summit due to a pre-scheduled vacation and would hate to miss this one :) Thanks, Ameet
Re: Source code JavaNetworkWordcount
Let me first ask for a few clarifications. 1. If you just want to count the words in a single text file like Don Quixote (that is, not for a stream of data), you should use only Spark. Then the program to count the frequency of words in a text file would look like this in Java. If you are not super-comfortable with Java, then I strongly recommend using the Scala API or pyspark. For scala, it may be a little trickier to learn if you have absolutely no idea. But it is worth it. The frequency count would look like this. val sc = new SparkContext(...) val linesInFile = sc.textFile(path_to_file) val words = linesInFile.flatMap(line = line.split( )) val frequencies = words.map(word = (word, 1L)).reduceByKey(_ + _) println(Word frequencies = + frequences.collect()) // collect is costly if the file is large 2. Let me assume that you want to do read a stream of text over the network and then print the count of total number of words into a file. Note that it is total number of words and not frequency of each word. The Java version would be something like this. DStreamInteger totalCounts = words.count(); totalCounts.foreachRDD(new Function2JavaRDDLong, Time, Void() { @Override public Void call(JavaRDDLong pairRDD, Time time) throws Exception { Long totalCount = totalCounts.first(); // print to screen System.out.println(totalCount); // append count to file ... return null; } }) This is count how many words have been received in each batch. The Scala version would be much simpler to read. words.count().foreachRDD(rdd = { val totalCount = rdd.first() // print to screen println(totalCount) // append count to file ... }) Hope this helps! I apologize if the code doesnt compile, I didnt test for syntax and stuff. TD On Thu, Jan 30, 2014 at 8:12 AM, Eduardo Costa Alfaia e.costaalf...@unibs.it wrote: Hi Guys, I'm not very good like java programmer, so anybody could me help with this code piece from JavaNetworkWordcount: JavaPairDStreamString, Integer wordCounts = words.map( new PairFunctionString, String, Integer() { @Override public Tuple2String, Integer call(String s) throws Exception { return new Tuple2String, Integer(s, 1); } }).reduceByKey(new Function2Integer, Integer, Integer() { @Override public Integer call(Integer i1, Integer i2) throws Exception { return i1 + i2; } }); JavaPairDStreamString, Integer counts = wordCounts.reduceByKeyAndWindow( new Function2Integer, Integer, Integer() { public Integer call(Integer i1, Integer i2) { return i1 + i2; } }, new Function2Integer, Integer, Integer() { public Integer call(Integer i1, Integer i2) { return i1 - i2; } }, new Duration(60 * 5 * 1000), new Duration(1 * 1000) ); I would like to think a manner of counting and after summing and getting a total from words counted in a single file, for example a book in txt extension Don Quixote. The counts function give me the resulted from each word has found and not a total of words from the file. Tathagata has sent me a piece from scala code, Thanks Tathagata by your attention with my posts I am very thankfully, yourDStream.foreachRDD(rdd = { // Get and print first n elements val firstN = rdd.take(n) println(First N elements = + firstN) // Count the number of elements in each batch println(RDD has + rdd.count() + elements) }) yourDStream.count.print() Could anybody help me? Thanks Guys -- INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI I dati utilizzati per l'invio del presente messaggio sono trattati dall'Università degli Studi di Brescia esclusivamente per finalità istituzionali. Informazioni più dettagliate anche in ordine ai diritti dell'interessato sono riposte nell'informativa generale e nelle notizie pubblicate sul sito web dell'Ateneo nella sezione Privacy. Il contenuto di questo messaggio è rivolto unicamente alle persona cui è indirizzato e può contenere informazioni la cui riservatezza è tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso in mancanza di autorizzazione del destinatario. Qualora il messaggio fosse pervenuto per errore, preghiamo di eliminarlo.
Re: ApacheCon
I believe Cos was planning to submit one about Spark and Shark in real prod. Similar to what he did for Spark summit. But more talks are better =) - Henry On Thu, Jan 30, 2014 at 11:08 AM, Evan Chan e...@ooyala.com wrote: I might have missed it earlier, but is anybody planning to present at ApacheCon? I think it's in Denver this year, April 7-9. Thinking of submitting a talk about how we use Spark and Cassandra. -Evan -- -- Evan Chan Staff Engineer e...@ooyala.com |
Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)
I just tried the EC2 scripts as a part of this rc5, and it *looks* like it did not setup this version properly. Is that in scope for this rc? Brian -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-0-9-0-incubating-rc5-tp318p421.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: ApacheCon
Yes, I did a couple of days ago. And I will tweak it to be more technical than @spark-summit, cause I hope the audience will more development oriented. I agree that more the merrier though! Cos On Thu, Jan 30, 2014 at 11:20AM, Henry Saputra wrote: I believe Cos was planning to submit one about Spark and Shark in real prod. Similar to what he did for Spark summit. But more talks are better =) - Henry On Thu, Jan 30, 2014 at 11:08 AM, Evan Chan e...@ooyala.com wrote: I might have missed it earlier, but is anybody planning to present at ApacheCon? I think it's in Denver this year, April 7-9. Thinking of submitting a talk about how we use Spark and Cassandra. -Evan -- -- Evan Chan Staff Engineer e...@ooyala.com |