Re: [VOTE] Graduation of Apache Spark

2014-01-30 Thread Manoj Awasthi
+1


*Proud to be a donor of the AAP. Join the movement today.
**http://donate.aamaadmiparty.org
http://messenger.aamaadmiparty.org/l/cbfetvjEQCcn892763mMmlNiaw/9KS50hXPn6DRjLRWMOJzkw/M4r8D763LgDtF7f16PFYqQeA*
the dreamers of the day are dangerous men, for they may act their dream
with open eyes, and make it possible


On Thu, Jan 30, 2014 at 1:26 PM, Xia Zhu xia@gmail.com wrote:

 +1


 On Wed, Jan 29, 2014 at 11:28 PM, Heiko Braun ike.br...@googlemail.com
 wrote:

 
 
  +1
 
 
   Am 30.01.2014 um 08:15 schrieb Stevo Slavić ssla...@gmail.com:
  
   +1
  
  
   On Thu, Jan 30, 2014 at 2:09 AM, Jason Dai jason@gmail.com
 wrote:
  
   +1
  
  
   On Tue, Jan 28, 2014 at 2:43 AM, 冯俊峰 junfeng.f...@gmail.com wrote:
  
   +1
   On 2014-01-26 4:50 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
  
   Hi guys,
  
   Discussion has proceeded positively, so I'm calling for a community
   VOTE
   for the graduation of Apache Spark (incubating) into a top level
   project.
   If this VOTE is successful, then I'll call an Incubator PMC VOTE in
 72
   hours, and if that is successful, we’ll submit the project
 graduation
   resolution below into the board agenda for the next Apache board
   meeting.
  
   So far, I've heard the following VOTEs (implied) during the DISCUSS
   thread. If you see your name there is no need to VOTE again and I’ll
   carry
   through the VOTE as below. If you want to change your VOTE, or I got
  it
   wrong, let me know and we'll change it.
  
   +1
  
   Matei Zaharia
   Reynold Xin
   Tathagata Das
   Sean McNamara
   Patrick Wendell
   Mark Hamstra
   Chris Mattmann *
   Tom Graves
   Henry Saputra *
   Andy Konwinski
   Josh Rosen
   Mosharaf Chowdhury
   Mridul Muralidharan
   Nick Pentreath
   Andrew Xia
   Haoyuan Li
  
   * - indicates IPMC
  
   Anyone else interested, please VOTE to graduate Apache Spark from
 the
   Incubator. I'll try and close the VOTE on Wednesday and then start
 the
   Incubator PMC VOTE on gene...@incubator.apache.org.
  
   [ ] +1 Graduate Apache Spark (incubating) from the Incubator per the
   resolution below.
   [ ] +0 Don't care.
   [ ] -1 Don't graduate Apache Spark (incubating) from the Incubator
   because..
  
   Thanks guys! The resolution is included below.
  
   ---
  
   WHEREAS, the Board of Directors deems it to be in the best
   interests of the Foundation and consistent with the
   Foundation's purpose to establish a Project Management
   Committee charged with the creation and maintenance of
   open-source software, for distribution at no charge to the
   public, related to fast and flexible large-scale data analysis
   on clusters.
  
   NOW, THEREFORE, BE IT RESOLVED, that a Project Management
   Committee (PMC), to be known as the Apache Spark Project, be
   and hereby is established pursuant to Bylaws of the Foundation;
   and be it further
  
   RESOLVED, that the Apache Spark Project be and hereby is
   responsible for the creation and maintenance of software
   related to efficient cluster management, resource isolation
   and sharing across distributed applications; and be it further
   RESOLVED, that the office of Vice President, Apache Spark be
   and hereby is created, the person holding such office to serve
   at the direction of the Board of Directors as the chair of the
   Apache Spark Project, and to have primary responsibility for
   management of the projects within the scope of responsibility
   of the Apache Spark Project; and be it further
   RESOLVED, that the persons listed immediately below be and
   hereby are appointed to serve as the initial members of the
   Apache Spark Project:
  
   * Mosharaf Chowdhury mosha...@apache.org
   * Jason Dai jason...@apache.org
   * Tathagata Das t...@eecs.berkeley.edu
   * Ankur Dave ankurd...@gmail.com
   * Aaron Davidson aarondavid...@berkeley.edu
   * Thomas Dudziak to...@apache.org
   * Robert Evans bo...@apache.org
   * Thomas Graves tgra...@apache.org
   * Andy Konwinski and...@apache.org
   * Stephen Haberman steph...@apache.org
   * Mark Hamstra markhams...@apache.org
   * Shane Huang shane_hu...@apache.org
   * Ryan LeCompte ryanlecom...@apache.org
   * Haoyuan Li haoy...@apache.org
   * Sean McNamara mcnam...@apache.org
   * Mridul Muralidharam mrid...@yahoo-inc.com
   * Kay Ousterhout k...@eecs.berkeley.edu
   * Nick Pentreath mln...@apache.org
   * Imran Rashid im...@quantifind.com
   * Charles Reiss wog...@apache.org
   * Josh Rosen joshro...@apache.org
   * Prashant Sharma prash...@apache.org
   * Ram Sriharsha harsh...@yahoo-inc.com
   * Shivaram Venkataraman shiva...@apache.org
   * Patrick Wendell pwend...@apache.org
   * Andrew Xia xiajunl...@gmail.com
   * Reynold Xin r...@apache.org
   * Matei Zaharia ma...@apache.org
  
   NOW, THEREFORE, BE IT FURTHER RESOLVED, that Matei Zaharia be
   appointed to the office of Vice President, Apache Spark, to
   serve in accordance with and subject to the direction of the
   Board of Directors and the Bylaws 

Source code JavaNetworkWordcount

2014-01-30 Thread Eduardo Costa Alfaia
Hi Guys,

I'm not very good like java programmer, so anybody could me help with this
code piece from JavaNetworkWordcount:

JavaPairDStreamString, Integer wordCounts = words.map(
new PairFunctionString, String, Integer() {
 @Override
  public Tuple2String, Integer call(String s) throws Exception {
return new Tuple2String, Integer(s, 1);
  }
}).reduceByKey(new Function2Integer, Integer, Integer() {
  @Override
  public Integer call(Integer i1, Integer i2) throws Exception {
return i1 + i2;
  }
});

  JavaPairDStreamString, Integer counts =
wordCounts.reduceByKeyAndWindow(
new Function2Integer, Integer, Integer() {
  public Integer call(Integer i1, Integer i2) { return i1 + i2; }
},
new Function2Integer, Integer, Integer() {
  public Integer call(Integer i1, Integer i2) { return i1 - i2; }
},
new Duration(60 * 5 * 1000),
new Duration(1 * 1000)
  );

I would like to think a manner of counting and after summing  and getting a
total from words counted in a single file, for example a book in txt
extension Don Quixote. The counts function give me the resulted from each
word has found and not a total of words from the file.
Tathagata has sent me a piece from scala code, Thanks Tathagata by your
attention with my posts I am very thankfully,

  yourDStream.foreachRDD(rdd = {

   // Get and print first n elements
   val firstN = rdd.take(n)
   println(First N elements =  + firstN)

  // Count the number of elements in each batch
  println(RDD has  + rdd.count() +  elements)

})

yourDStream.count.print()

Could anybody help me?


Thanks Guys

-- 
INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI

I dati utilizzati per l'invio del presente messaggio sono trattati 
dall'Università degli Studi di Brescia esclusivamente per finalità 
istituzionali. Informazioni più dettagliate anche in ordine ai diritti 
dell'interessato sono riposte nell'informativa generale e nelle notizie 
pubblicate sul sito web dell'Ateneo nella sezione Privacy.

Il contenuto di questo messaggio è rivolto unicamente alle persona cui 
è indirizzato e può contenere informazioni la cui riservatezza è 
tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso 
in mancanza di autorizzazione del destinatario. Qualora il messaggio 
fosse pervenuto per errore, preghiamo di eliminarlo.


ApacheCon

2014-01-30 Thread Evan Chan
I might have missed it earlier, but is anybody planning to present at
ApacheCon?  I think it's in Denver this year, April 7-9.

Thinking of submitting a talk about how we use Spark and Cassandra.

-Evan


-- 
--
Evan Chan
Staff Engineer
e...@ooyala.com  |


Re: ApacheCon

2014-01-30 Thread Ted Yu
There is one proposal on Spark:
http://events.linuxfoundation.org/cfp/cfp-list?page=1#overlay=cfp/proposals/1461


On Thu, Jan 30, 2014 at 11:08 AM, Evan Chan e...@ooyala.com wrote:

 I might have missed it earlier, but is anybody planning to present at
 ApacheCon?  I think it's in Denver this year, April 7-9.

 Thinking of submitting a talk about how we use Spark and Cassandra.

 -Evan


 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com  |



rough date for spark summit 2014

2014-01-30 Thread Ameet Kini
I know this is still a few months off and folks are rushing towards 0.9
release, but do the devs have a rough date for Spark Summit 2014? Looks
like it'll be in summer, but is it Jun / July / Aug / Sep ? Even
late-summer would help.

Summer being a popular vacation time, a few months advance notice would be
greatly appreciated (read: I missed last summit due to a pre-scheduled
vacation and would hate to miss this one :)

Thanks,
Ameet


Re: Source code JavaNetworkWordcount

2014-01-30 Thread Tathagata Das
Let me first ask for a few clarifications.

1. If you just want to count the words in a single text file like Don
Quixote (that is, not for a stream of data), you should use only Spark.
Then the program to count the frequency of words in a text file would look
like this in Java. If you are not super-comfortable with Java, then I
strongly recommend using the Scala API or pyspark. For scala, it may be a
little trickier to learn if you have absolutely no idea. But it is worth
it. The frequency count would look like this.

val sc = new SparkContext(...)
val linesInFile = sc.textFile(path_to_file)
val words = linesInFile.flatMap(line = line.split( ))
val frequencies = words.map(word = (word, 1L)).reduceByKey(_ + _)
println(Word frequencies =  + frequences.collect())  // collect is
costly if the file is large


2. Let me assume that you want to do read a stream of text over the network
and then print the count of total number of words into a file. Note that it
is total number of words and not frequency of each word. The Java
version would be something like this.

DStreamInteger totalCounts = words.count();

totalCounts.foreachRDD(new Function2JavaRDDLong, Time, Void() {
   @Override public Void call(JavaRDDLong pairRDD, Time time) throws
Exception {
   Long totalCount = totalCounts.first();

   // print to screen
   System.out.println(totalCount);

  // append count to file
  ...
  return null;
}
})

This is count how many words have been received in each batch. The Scala
version would be much simpler to read.

words.count().foreachRDD(rdd = {
val totalCount = rdd.first()

// print to screen
println(totalCount)

// append count to file
...
})

Hope this helps! I apologize if the code doesnt compile, I didnt test for
syntax and stuff.

TD



On Thu, Jan 30, 2014 at 8:12 AM, Eduardo Costa Alfaia 
e.costaalf...@unibs.it wrote:

 Hi Guys,

 I'm not very good like java programmer, so anybody could me help with this
 code piece from JavaNetworkWordcount:

 JavaPairDStreamString, Integer wordCounts = words.map(
 new PairFunctionString, String, Integer() {
  @Override
   public Tuple2String, Integer call(String s) throws Exception {
 return new Tuple2String, Integer(s, 1);
   }
 }).reduceByKey(new Function2Integer, Integer, Integer() {
   @Override
   public Integer call(Integer i1, Integer i2) throws Exception {
 return i1 + i2;
   }
 });

   JavaPairDStreamString, Integer counts =
 wordCounts.reduceByKeyAndWindow(
 new Function2Integer, Integer, Integer() {
   public Integer call(Integer i1, Integer i2) { return i1 + i2; }
 },
 new Function2Integer, Integer, Integer() {
   public Integer call(Integer i1, Integer i2) { return i1 - i2; }
 },
 new Duration(60 * 5 * 1000),
 new Duration(1 * 1000)
   );

 I would like to think a manner of counting and after summing  and getting a
 total from words counted in a single file, for example a book in txt
 extension Don Quixote. The counts function give me the resulted from each
 word has found and not a total of words from the file.
 Tathagata has sent me a piece from scala code, Thanks Tathagata by your
 attention with my posts I am very thankfully,

   yourDStream.foreachRDD(rdd = {

// Get and print first n elements
val firstN = rdd.take(n)
println(First N elements =  + firstN)

   // Count the number of elements in each batch
   println(RDD has  + rdd.count() +  elements)

 })

 yourDStream.count.print()

 Could anybody help me?


 Thanks Guys

 --
 INFORMATIVA SUL TRATTAMENTO DEI DATI PERSONALI

 I dati utilizzati per l'invio del presente messaggio sono trattati
 dall'Università degli Studi di Brescia esclusivamente per finalità
 istituzionali. Informazioni più dettagliate anche in ordine ai diritti
 dell'interessato sono riposte nell'informativa generale e nelle notizie
 pubblicate sul sito web dell'Ateneo nella sezione Privacy.

 Il contenuto di questo messaggio è rivolto unicamente alle persona cui
 è indirizzato e può contenere informazioni la cui riservatezza è
 tutelata legalmente. Ne sono vietati la riproduzione, la diffusione e l'uso
 in mancanza di autorizzazione del destinatario. Qualora il messaggio
 fosse pervenuto per errore, preghiamo di eliminarlo.



Re: ApacheCon

2014-01-30 Thread Henry Saputra
I believe Cos was planning to submit one about Spark and Shark in real
prod. Similar to what he did for Spark summit.

But more talks are better =)

- Henry

On Thu, Jan 30, 2014 at 11:08 AM, Evan Chan e...@ooyala.com wrote:
 I might have missed it earlier, but is anybody planning to present at
 ApacheCon?  I think it's in Denver this year, April 7-9.

 Thinking of submitting a talk about how we use Spark and Cassandra.

 -Evan


 --
 --
 Evan Chan
 Staff Engineer
 e...@ooyala.com  |


Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-30 Thread bkrouse
I just tried the EC2 scripts as a part of this rc5, and it *looks* like it
did not setup this version properly.  Is that in scope for this rc?

Brian



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-0-9-0-incubating-rc5-tp318p421.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: ApacheCon

2014-01-30 Thread Konstantin Boudnik
Yes, I did a couple of days ago. And I will tweak it to be more technical than
@spark-summit, cause I hope the audience will more development oriented. 

I agree that more the merrier though!
  Cos

On Thu, Jan 30, 2014 at 11:20AM, Henry Saputra wrote:
 I believe Cos was planning to submit one about Spark and Shark in real
 prod. Similar to what he did for Spark summit.
 
 But more talks are better =)
 
 - Henry
 
 On Thu, Jan 30, 2014 at 11:08 AM, Evan Chan e...@ooyala.com wrote:
  I might have missed it earlier, but is anybody planning to present at
  ApacheCon?  I think it's in Denver this year, April 7-9.
 
  Thinking of submitting a talk about how we use Spark and Cassandra.
 
  -Evan
 
 
  --
  --
  Evan Chan
  Staff Engineer
  e...@ooyala.com  |