Did you want to mention the parquet talks at the Hadoop summit in June? Otherwise this looks good to me.
On Sunday, April 26, 2015, Sally Khudairi <[email protected]> wrote: > Hi everyone --I haven't received any other feedback, so I think we're all > set to announce tomorrow. > I'd like to issue the press release at at 7AM ET. I'll confirm when we're > live. > If there are any showstoppers, please let me know ASAP. > Thanks so much,Sally > > From: Sally Khudairi <[email protected] <javascript:;>> > To: Sally Khudairi <[email protected] <javascript:;>>; Daniel > Weeks <[email protected] <javascript:;>>; " > [email protected] <javascript:;>" < > [email protected] <javascript:;>> > Cc: Chris Aniszczyk <[email protected] <javascript:;>>; Ryan Blue < > [email protected] <javascript:;>>; "[email protected] <javascript:;>" < > [email protected] <javascript:;>>; "Mattmann, Chris A (3980)" < > [email protected] <javascript:;>>; "[email protected] > <javascript:;>" <[email protected] <javascript:;>> > Sent: Friday, 24 April 2015, 16:17 > Subject: FINAL CALL: Apache Parquet TLP announcement [was Re: Graduation > blog post?] > > Hello again, everyone --below is the latest draft. > > Please review and forward any changes/additions no later than 5PM ET on > Sunday in order for us to announce on Monday morning. I was aiming to go > live by 7AM ET if that works for you. > > Kindly confirm. > > Thanks in advance, > Sally > > = = = > > DRAFT :: NOT FOR DISTRIBUTION > > The Apache Software Foundation Announces Apache™ Parquet™ as a Top-Level > Project > > Open Source storage format for the Apache™ Hadoop® ecosystem in use at > Cloudera, NASA, Netflix, Stripe and Twitter, among other organizations > > Forest Hill, MD –27 April 2015– The Apache Software Foundation (ASF), the > all-volunteer developers, stewards, and incubators of more than 350 Open > Source projects and initiatives, announced today that Apache™ Parquet™ has > graduated from the Apache Incubator to become a Top-Level Project (TLP), > signifying that the project's community and products have been > well-governed under the ASF's meritocratic process and principles. > > "The incubation process at Apache has been fantastic and really the last > step of making Parquet a community driven standard fully integrated within > the greater Hadoop ecosystem," said Julien Le Dem, Vice President of Apache > Parquet. > > Apache Parquet is an Open Source columnar storage format for the Apache™ > Hadoop® ecosystem, built to work across programming languages and much more: > > > - processing frameworks (MapReduce, Apache Spark, Scalding, Cascading, > Crunch, Kite) > - data models (Apache Avro, Apache Thrift, Protocol Buffers, POJOs) > - query engines (Apache Hive, Impala, HAWQ, Apache Drill, Apache Tajo, > Apache Pig, Presto, Apache Spark SQL) > > "At Twitter, Parquet has helped us scale our big data usage by in some > cases reducing storage requirements by one third on large datasets as well > as scan and deserialization time. This translated into hardware savings as > well as reduced latency for accessing the data. Furthermore, Parquet being > integrated with so many tools creates opportunities and flexibility > regarding query engines," said Chris Aniszczyk, Head of Open Source at > Twitter. "Finally, it's just fantastic to see it graduate to a top-level > project and we look forward to further collaborating with the Apache > Parquet community to continually improve performance." > > "Parquet’s integration with other object models, like Avro and Thrift, has > been a key feature for our customers," said Ryan Blue, Software Engineer at > Cloudera. "They can take advantage of columnar storage without changing the > classes they already use in their production applications." > > "At Netflix, Parquet is the primary storage format for data warehousing. > More than 7 petabytes of our 10+ Petabyte warehouse is Parquet formatted > data that we query across a wide range of tools including Apache Hive, > Apache Pig, Apache Spark, PigPen, Presto, and native MapReduce. The > performance benefit of columnar projection and statistics is a game changer > for our big data platform," said Daniel Weeks, Software Engineer at > Netflix. "We look forward to working with the Apache community to advance > the state of big data storage with Parquet and are excited to see the > project graduate to full Apache status." > > "Stripe's data warehouse has been built on Parquet from the beginning," > said Avi Bryant, Engineering Manager at Stripe. "Every aspect of our > pipeline, from data import to machine learning to adhoc SQL analysis, uses > Apache Parquet as the common interchange format." > > "I was extremely happy to see Parquet arrive as an Incubator project," > said Chris Mattmann, Apache Parquet Incubator Mentor, and Chief Architect, > Instrument and Science Data Systems Section at NASA Jet Propulsion > Laboratory. "After talking with some in its community there was a real > match with this columnar data format technology and its community with the > way that we do things here at the ASF. Parquet has had an exemplar > Incubation, and the project has big things ahead of it. I am encouraging my > Data Science Team at NASA to evaluate it for data representation especially > as it relates to our science holdings in Earth, planetary and space > sciences, and astrophysics." > > The Apache Parquet project welcomes contributions and community > participation through mailing lists, face-to-face MeetUps, and user events. > For more information, visit http://parquet.apache.org/community/ > > Availability and Oversight > Apache Parquet software is released under the Apache License v2.0 and is > overseen by a self-selected team of active contributors to the project. A > Project Management Committee (PMC) guides the Project's day-to-day > operations, including community development and product releases. For > downloads, documentation, and ways to become involved with Apache Parquet, > visit http://parquet.apache.org/ and https://twitter.com/ApacheParquet > > About the Apache Incubator > The Apache Incubator is the entry path for projects and codebases wishing > to become part of the efforts at The Apache Software Foundation. All code > donations from external organizations and existing external projects > wishing to join the ASF enter through the Incubator to: 1) ensure all > donations are in accordance with the ASF legal standards; and 2) develop > new communities that adhere to our guiding principles. Incubation is > required of all newly accepted projects until a further review indicates > that the infrastructure, communications, and decision making process have > stabilized in a manner consistent with other successful ASF projects. While > incubation status is not necessarily a reflection of the completeness or > stability of the code, it does indicate that the project has yet to be > fully endorsed by the ASF. For more information, visit > http://incubator.apache.org/. > > About The Apache Software Foundation (ASF) > Established in 1999, the all-volunteer Foundation oversees more than 350 > leading Open Source projects, including Apache HTTP Server --the world's > most popular Web server software. Through the ASF's meritocratic process > known as "The Apache Way," more than 500 individual Members and 4,500 > Committers successfully collaborate to develop freely available > enterprise-grade software, benefiting millions of users worldwide: > thousands of software solutions are distributed under the Apache License; > and the community actively participates in ASF mailing lists, mentoring > initiatives, and ApacheCon, the Foundation's official user conference, > trainings, and expo. The ASF is a US 501(c)(3) charitable organization, > funded by individual donations and corporate sponsors including Bloomberg, > Budget Direct, Cerner, Citrix, Cloudera, Comcast, Facebook, Google, > Hortonworks, HP, IBM, InMotion Hosting, iSigma, Matt Mullenweg, Microsoft, > Pivotal, Produban, WANdisco, and Yahoo. For more information, visit > http://www.apache.org/ or follow @TheASF on Twitter. > > © The Apache Software Foundation. "Apache", "Avro", "Apache Avro", > "Drill", "Apache Drill", "Hadoop", "Apache Hadoop", "Parquet", "Apache > Parquet", "Pig", "Apache Pig", "Spark", "Apache Spark", "Tajo", "Apache > Tajo", "Thrift", "Apache Thrift", and "ApacheCon" are registered trademarks > or trademarks of the Apache Software Foundation in the United States and/or > other countries. All other brands and trademarks are the property of their > respective owners. > > # # # > > [MEDIA CONTACT:SALLY] > ________________________________ > > > From: Sally Khudairi <[email protected] <javascript:;>> > To: Sally Khudairi <[email protected] <javascript:;>>; Daniel Weeks > <[email protected] <javascript:;>>; "[email protected] > <javascript:;>" <[email protected] <javascript:;>> > Cc: Chris Aniszczyk <[email protected] <javascript:;>>; Ryan Blue < > [email protected] <javascript:;>>; "[email protected] <javascript:;>" < > [email protected] <javascript:;>>; "Mattmann, Chris A (3980)" < > [email protected] <javascript:;>>; "[email protected] > <javascript:;>" <[email protected] <javascript:;>> > Sent: Friday, 24 April 2015, 13:56 > Subject: Re: Graduation blog post? > > > > Done. > > ALL: can you please let me know if there are any events that Parquet will > be at? Presenting? Hosting? etc. > > Thank you! > > -Sally > > > > > > ________________________________ > From: Sally Khudairi <[email protected] <javascript:;>> > To: Daniel Weeks <[email protected] <javascript:;>>; " > [email protected] <javascript:;>" < > [email protected] <javascript:;>> > Cc: Chris Aniszczyk <[email protected] <javascript:;>>; Ryan Blue < > [email protected] <javascript:;>>; "[email protected] <javascript:;>" < > [email protected] <javascript:;>>; "Mattmann, Chris A (3980)" < > [email protected] <javascript:;>>; "[email protected] > <javascript:;>" <[email protected] <javascript:;>> > Sent: Friday, 24 April 2015, 13:40 > Subject: Re: Graduation blog post? > > > > Of course --I'll fix that now! > > Sorry about that, Daniel. > > -Sally > > > > > > > ________________________________ > From: Daniel Weeks <[email protected] <javascript:;>> > To: [email protected] <javascript:;>; Sally Khudairi < > [email protected] <javascript:;>> > Cc: Chris Aniszczyk <[email protected] <javascript:;>>; Ryan Blue < > [email protected] <javascript:;>>; "[email protected] <javascript:;>" < > [email protected] <javascript:;>>; "Mattmann, Chris A (3980)" < > [email protected] <javascript:;>>; "[email protected] > <javascript:;>" <[email protected] <javascript:;>> > Sent: Friday, 24 April 2015, 13:38 > Subject: Re: Graduation blog post? > > > > Sally, > > Just wanted to comment that my last name is misspelled in the Netflix > testimonial. Can someone fix that? (it's Weeks, not Week) > > Thanks, > Dan > > > > > On Fri, Apr 24, 2015 at 10:23 AM, Sally Khudairi > <[email protected]> wrote: > > Hi everyone --there's been the addition of a quote from Stripe: > > > >"Stripe's data warehouse has been built on Parquet from the beginning," > said Avi Bryant, Engineering Manager at Stripe. "Every aspect of our > pipeline, from data import to machine learning to adhoc SQL analysis, uses > Apache Parquet as the common interchange format." > > > > > >--please note that I added "Apache" to "Parquet" in the second sentence. > Stripe has also been added to the sub-head. > > > >Are we waiting for quotes from anyone else? If not, I can add a closing > sentence and forward the final copy later today. > > > >Thanks so much, > >Sally > > > > > > > >----- Original Message ----- > > > >From: Sally Khudairi <[email protected] <javascript:;>> > >To: Chris Aniszczyk <[email protected] <javascript:;>>; " > [email protected] <javascript:;>" < > [email protected] <javascript:;>> > >Cc: Ryan Blue <[email protected] <javascript:;>>; "[email protected] > <javascript:;>" <[email protected] <javascript:;>>; "Mattmann, Chris A > (3980)" <[email protected] <javascript:;>>; "[email protected] > <javascript:;>" <[email protected] <javascript:;>> > >Sent: Thursday, 23 April 2015, 15:25 > >Subject: Re: Graduation blog post? > > > >Hello everyone --below is the draft thus far. > > > > > >I was aiming to announce on Monday by 7AM ET, but noticed that we're > waiting for additional quotes. > > > >Also, should we get a closing quote from Julien? Perhaps something that > invites additional community participation? > > > >Please let me know your thoughts. > > > >Thanks so much, > >Sally > > > >= = = > > > >The Apache Software Foundation Announces Apache™ Parquet™ as a Top-Level > Project > > > >Open Source storage format for the Apache™ Hadoop® ecosystem in use at > Cloudera, NASA, Netflix, and Twitter, among other organizations > > > >Forest Hill, MD –27 April 2015– The Apache Software Foundation (ASF), the > all-volunteer developers, stewards, and incubators of more than 350 Open > Source projects and initiatives, announced today that Apache™ Parquet™ has > graduated from the Apache Incubator to become a Top-Level Project (TLP), > signifying that the project's community and products have been > well-governed under the ASF's meritocratic process and principles. > > > >"The incubation process at Apache has been fantastic and really the last > step of making Parquet a community driven standard fully integrated within > the greater Hadoop ecosystem." said Julien Le Dem, Vice President of Apache > Parquet. > > > >Apache Parquet is an Open Source columnar storage format for the Apache™ > Hadoop® ecosystem, built to work across programming languages and much more: > >- processing frameworks (MapReduce, Apache Spark, Scalding, Cascading, > Crunch, Kite) > >- data models (Apache Avro, Apache Thrift, Protocol Buffers, POJOs) > >- query engines (Apache Hive, Impala, HAWQ, Apache Drill, Apache Tajo, > Apache Pig, Presto, Apache Spark SQL) > > > >"At Twitter, Parquet has helped us scale our big data usage by in some > cases reducing storage requirements by one third on large datasets as well > as scan and deserialization time. This translated into hardware savings as > well as reduced latency for accessing the data. Furthermore, Parquet being > integrated with so many tools creates opportunities and flexibility > regarding query engines," said Chris Aniszczyk, Head of Open Source at > Twitter. "Finally, it's just fantastic to see it graduate to a top-level > project and we look forward to further collaborating with the Apache > Parquet community to continually improve performance." > > > >"Parquet’s integration with other object models, like Avro and Thrift, > has been a key feature for our customers," said Ryan Blue, Software > Engineer at Cloudera. "They can take advantage of columnar storage without > changing the classes they already use in their production applications." > > > >"At Netflix, Parquet is the primary storage format for data warehousing. > More than 7 petabytes of our 10+ Petabyte warehouse is Parquet formatted > data that we query across a wide range of tools including Apache Hive, > Apache Pig, Apache Spark, PigPen, Presto, and native MapReduce. The > performance benefit of columnar projection and statistics is a game changer > for our big data platform," said Daniel Week, Software Engineer at Netflix. > "We look forward to working with the Apache community to advance the state > of big data storage with Parquet and are excited to see the project > graduate to full Apache status." > > > >"I was extremely happy to see Parquet arrive as an Incubator project," > said Chris Mattmann, Apache Parquet Incubator Mentor, and Chief Architect, > Instrument and Science Data Systems Section at NASA Jet Propulsion > Laboratory. "After talking with some in its community there was a real > match with > >this columnar data format technology and its community with the way that > we do things here at the ASF. Parquet has had an exemplar Incubation, and > the project has big things ahead of it. I am encouraging my Data Science > Team at NASA to evaluate it for data representation especially > >as it relates to our science holdings in Earth, planetary and space > sciences, and astrophysics." > > > > > >Stripe? @cra reached out to Avi, said he would get something by Monday > >Criteo? > > > >@@CLOSING QUOTE FROM JULIEN? > > > >Availability and Oversight > >Apache Parquet software is released under the Apache License v2.0 and is > overseen by a self-selected team of active contributors to the project. A > Project Management Committee (PMC) guides the Project's day-to-day > operations, including community development and product releases. For > downloads, documentation, and ways to become involved with Apache Parquet, > visit http://parquet.apache.org/ and https://twitter.com/ApacheParquet > > > >About the Apache Incubator > >The Apache Incubator is the entry path for projects and codebases wishing > to become part of the efforts at The Apache Software Foundation. All code > donations from external organizations and existing external projects > wishing to join the ASF enter through the Incubator to: 1) ensure all > donations are in accordance with the ASF legal standards; and 2) develop > new communities that adhere to our guiding principles. Incubation is > required of all newly accepted projects until a further review indicates > that the infrastructure, communications, and decision making process have > stabilized in a manner consistent with other successful ASF projects. While > incubation status is not necessarily a reflection of the completeness or > stability of the code, it does indicate that the project has yet to be > fully endorsed by the ASF. For more information, visit > http://incubator.apache.org/. > > > >About The Apache Software Foundation (ASF) > >Established in 1999, the all-volunteer Foundation oversees more than 350 > leading Open Source projects, including Apache HTTP Server --the world's > most popular Web server software. Through the ASF's meritocratic process > known as "The Apache Way," more than 500 individual Members and 4,500 > Committers successfully collaborate to develop freely available > enterprise-grade software, benefiting millions of users worldwide: > thousands of software solutions are distributed under the Apache License; > and the community actively participates in ASF mailing lists, mentoring > initiatives, and ApacheCon, the Foundation's official user conference, > trainings, and expo. The ASF is a US 501(c)(3) charitable organization, > funded by individual donations and corporate sponsors including Bloomberg, > Budget Direct, Cerner, Citrix, Cloudera, Comcast, Facebook, Google, > Hortonworks, HP, IBM, InMotion Hosting, iSigma, Matt Mullenweg, Microsoft, > Pivotal, Produban, WANdisco, and Yahoo. For more information, visit > http://www.apache.org/ or follow @TheASF on Twitter. > > > >© The Apache Software Foundation. "Apache", "Avro", "Apache Avro", > "Drill", "Apache Drill", "Hadoop", "Apache Hadoop", "Parquet", "Apache > Parquet", "Pig", "Apache Pig", "Spark", "Apache Spark", "Tajo", "Apache > Tajo", "Thrift", "Apache Thrift", and "ApacheCon" are registered trademarks > or trademarks of the Apache Software Foundation in the United States and/or > other countries. All other brands and trademarks are the property of their > respective owners. > > > ># # # > > > > > >________________________________ > > > >From: Chris Aniszczyk <[email protected] <javascript:;>> > >To: "[email protected] <javascript:;>" < > [email protected] <javascript:;>> > >Cc: Sally Khudairi <[email protected] <javascript:;>>; Ryan Blue < > [email protected] <javascript:;>>; "[email protected] <javascript:;>" < > [email protected] <javascript:;>>; "Mattmann, Chris A (3980)" < > [email protected] <javascript:;>>; "[email protected] > <javascript:;>" <[email protected] <javascript:;>> > >Sent: Wednesday, 22 April 2015, 14:51 > >Subject: Re: Graduation blog post? > > > > > > > >Thanks Daniel, I added your quote. > > > > > > > > > >On Wed, Apr 22, 2015 at 12:14 PM, Daniel Weeks <[email protected]> > wrote: > > > >Netflix Testimonial: > >> > >>At Netflix, Parquet is the primary storage format for data warehousing. > >>More than 7 petabytes of our 10+ Petabyte warehouse is Parquet formatted > >>data that we query across a wide range of tools including Apache Hive, > >>Apache Pig, Apache Spark, PigPen, Presto, and native MapReduce. The > >>performance benefit of columnar projection and statistics is a game > changer > >>for our big data platform. We look forward to working with the Apache > >>community to advance the state of big data storage with Parquet and are > >>excited to see the project graduate to full Apache status. > >> > >>Daniel Weeks > >>Engineering Manager - Big Data Compute > >>Neflix > >> > >> > >>On Wed, Apr 22, 2015 at 9:36 AM, Sally Khudairi < > >>[email protected]> wrote: > >> > >>> Thanks for the draft thus far, Ryan. > >>> Can we please include at least one more industry testimonial? > >>> Also, if you can please provide edit access to my account at > >>> [email protected] <javascript:;>, that would be great. > >>> Thanks in advance for this! > >>> -Sally > >>> > >>> > >>> From: Ryan Blue <[email protected] <javascript:;>> > >>> To: [email protected] <javascript:;>; Sally Khudairi < > [email protected] <javascript:;>> > >>> Cc: "Mattmann, Chris A (3980)" <[email protected] > <javascript:;>>; " > >>> [email protected] <javascript:;>" <[email protected] <javascript:;>>; " > [email protected] <javascript:;>" < > >>> [email protected] <javascript:;>> > >>> Sent: Monday, 20 April 2015, 15:48 > >>> Subject: Re: Graduation blog post? > >>> > >>> On 04/20/2015 12:36 PM, Jake Farrell wrote: > >>> > Hey Sally > >>> > i've got root@ karma and will take care of the infra side of things > for > >>> > us once the board has successfully voted on our resolution > >>> > > >>> > -Jake > >>> > >>> Thanks, Jake! I've already sent an e-mail to Infra, but I'll follow up > >>> with this news so they don't worry about it. > >>> > >>> rb > >>> > >>> > >>> -- > >>> Ryan Blue > >>> Software Engineer > >>> Cloudera, Inc. > >>> > >>> > >>> > >>> > >> > > > > > >-- > > > >Cheers, > > > >Chris Aniszczyk > >http://aniszczyk.org > >+1 512 961 6719 > > > >
