Re: [VOTE] Release Apache Hivemall (Incubating) v0.5.0 Release Candidate 3

2018-02-28 Thread Reynold Xin
+1


On Mon, Feb 26, 2018 at 5:20 PM, Makoto Yui  wrote:

> Hi,
>
> I created a docker image [1,2] to verify release artifacts.
>
> Just run the following command to verify a release artifact.
>
> # 1). run a docker image
> $ docker run -it hivemall/verify:v0.5.0-rc3
>
> # 2). download Hivemall release artifacts, build, and run unit tests.
> $ ./build_from_src.sh
>
> We need votes from IPMC members.
>
> Thanks,
> Makoto
>
> [1] https://hub.docker.com/r/hivemall/verify/
> [2] https://github.com/myui/hivemall-dockerfiles
>
> 2018-02-20 14:19 GMT+09:00 Makoto Yui :
> > Hi all,
> >
> > The Apache Hivemall community has approved a proposal to release
> > Apache Hivemall v0.5.0
> > by v0.5.0 Release Candidate 3.
> >
> > We now kindly request that the Incubator PMC members review and vote
> > on this incubator release candidate.
> >
> > The PPMC vote thread is located here:
> > https://www.mail-archive.com/dev@hivemall.incubator.apache.
> org/msg00462.html
> > (vote)
> > https://www.mail-archive.com/dev@hivemall.incubator.apache.
> org/msg00468.html
> > (vote result)
> >
> > Links to various release artifacts are given below.
> >
> > - The source tarball, including signatures, digests, ChangeLog, etc.:
> >   https://dist.apache.org/repos/dist/dev/incubator/hivemall/0.
> 5.0-incubating-rc3/
> > - Sources for the release:
> >   https://dist.apache.org/repos/dist/dev/incubator/hivemall/0.
> 5.0-incubating-rc3/hivemall-0.5.0-incubating-source-release.zip
> > - Git tag for the release:
> >   https://git-wip-us.apache.org/repos/asf?p=incubator-
> hivemall.git;a=shortlog;h=refs/tags/v0.5.0-rc3
> > - The Nexus Staging URL:
> >   https://repository.apache.org/content/repositories/
> orgapachehivemall-1003/
> > - KEYS file for verification:
> >   https://dist.apache.org/repos/dist/dev/incubator/hivemall/KEYS
> > - For information about the contents of this release, see:
> >   https://dist.apache.org/repos/dist/dev/incubator/hivemall/0.
> 5.0-incubating-rc3/ChangeLog.html
> >
> > Artifacts verification how-to can be found in
> > http://hivemall.incubator.apache.org/verify_artifacts.html
> >
> > Please vote accordingly:
> >
> > [ ] +1  approve (Release this package as Apache Hivemall
> 0.5.0-incubating)
> > [ ] -1  disapprove (and reason why)
> >
> > The vote will be open for at least 72 hours.
> >
> > Regards,
> > Makoto
> > on behalf of Apache Hivemall PPMC
> >
> > --
> > Makoto YUI 
> > Research Engineer, Treasure Data, Inc.
> > http://myui.github.io/
>
>
>
> --
> Makoto YUI 
> Research Engineer, Treasure Data, Inc.
> http://myui.github.io/
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Accept OpenWhisk into the Apache Incubator

2016-11-17 Thread Reynold Xin
+1


On Thu, Nov 17, 2016 at 7:22 AM, Sam Ruby  wrote:

> Now that the discussion thread on the OpenWhisk Proposal has died
> down, please take a moment to vote on accepting OpenWhisk into the
> Apache Incubator.
>
> The ASF voting rules are described at:
>http://www.apache.org/foundation/voting.html
>
> A vote for accepting a new Apache Incubator podling is a majority vote
> for which only Incubator PMC member votes are binding.
>
> Votes from other people are also welcome as an indication of peoples
> enthusiasm (or lack thereof).
>
> Please do not use this VOTE thread for discussions.
> If needed, start a new thread instead.
>
> This vote will run for at least 72 hours. Please VOTE as follows
> [] +1 Accept OpenWhisk into the Apache Incubator
> [] +0 Abstain.
> [] -1 Do not accept OpenWhisk into the Apache Incubator because ...
>
> The proposal is listed below, but you can also access it on the wiki:
>https://wiki.apache.org/incubator/OpenWhiskProposal
>
> - Sam Ruby
>
> = OpenWhisk Proposal =
>
> OpenWhisk is an open source, distributed Serverless computing platform
> able to execute application logic (Actions) in response to events
> (Triggers) from external sources (Feeds) or HTTP requests governed by
> conditional logic (Rules). It provides a programming environment
> supported by a REST API-based Command Line Interface (CLI) along with
> tooling to support packaging and catalog services.
>
> Champion: Sam Ruby, IBM
>
> Mentors:
>  * Felix Meschberger, Adobe
>  * Isabel Drost-Fromm, Elasticsearch GmbH
>  * Sergio Fernández, Redlink GmbH
>
> == Background ==
>
> Serverless computing is the evolutionary next stage in Cloud computing
> carrying further the abstraction offered to software developers using
> Container-based operating system virtualization. The Serverless
> paradigm enables programmers to just “write” functional code and not
> worry about having to configure any aspect of a server needed for
> execution. Such Serverless functions are single purpose and stateless
> that respond to event-driven data sources and can be scaled on-demand.
>
> The OpenWhisk project offers a truly open, highly scalable, performant
> distributed Serverless platform leveraging other open technologies
> along with a robust programming model, catalog of service and event
> provider integrations and developer tooling.
> Specifically, every architectural component service of the OpenWhisk
> platform (e.g., Controller, Invokers, Messaging, Router, Catalog, API
> Gateway, etc.) all is designed to be run and scaled as a Docker
> container. In addition, OpenWhisk uniquely leverages aspects of Docker
> engine to manage, load balance and scale supported OpenWhisk runtime
> environments (e.g., JavaScript, Python, Swift, Java, etc.), that run
> Serverless functional code within Invoker compute instances, using
> Docker containers.
>
> OpenWhisk's containerized design tenants not only allows it to be
> hosted in various IaaS, PaaS Clouds platforms that support Docker
> containers, but also achieves the high expectation of the Serverless
> computing experience by masking all aspects of traditional resource
> specification and configuration from the end user simplifying and
> accelerating Cloud application development.
> In order to enable HTTP requests as a source of events, and thus the
> creation of Serverless microservices that expose REST APIs, OpenWhisk
> includes an API Gateway that performs tasks like security, request
> routing, throttling, and logging.
>
> == Rationale ==
>
> Serverless computing is in the very early stages of the technology
> adoption curve and has great promise in enabling new paradigms in
> event-driven application development, but current implementation
> efforts are fractured as most are tied to specific Cloud platforms and
> services. Having an open implementation of a Serverless platform, such
> as OpenWhisk, available and governed by an open community like Apache
> could accelerate growth of this technology, as well as encourage
> dialog and interoperability.
>
> Having the ASF accept and incubate OpenWhisk would provide a clear
> signal to developers interested in Serverless and its future that they
> are welcome to participate and contribute in its development, growth
> and governance.
>
> In addition, there are numerous projects already at the ASF that would
> provide a natural fit to the API-centric, event-driven programming
> model that OpenWhisk sees as integral to a Serverless future. In fact,
> any project that includes a service that can produce or consume
> actionable events could become an integration point with
> OpenWhisk-enabled functions. Apache projects that manage programming
> languages and (micro) service runtimes could become part of the
> OpenWhisk set of supported runtime environments for functions. Device
> and API gateways would provide natural event sources that could
> utilize OpenWhisk functions to process, store and analyze vast 

Re: [DISCUSS] China Contribution. (was: RocketMQ Incubation Proposal)

2016-11-13 Thread Reynold Xin
Hi Niclas,

The thing about archiving is a great point and I agree with you that it is
important to have archives that survive technologies disruptions, and
mailing lists are unparalleled there. The main thing I see here is that we
would want to be inclusive and bring discussions back to archives, either
through automatic means or manual means. It is not always an argument to
reject "newer tech". For any technology we choose, we need to be extremely
careful with data lock-in and mitigate the risks when the technology
disappears.

Also absolutely agree that there is a big difference between dev@ and user@.


Jeff - I understand why it was a shock to you when I mentioned "wechat" and
why you would draw parallel to snapchat.

I personally don't get why people would use wechat for serious business,
since it is painful to type on a mobile phone, but it is very common in
China. Dozens of Apache projects have wechat groups (mostly by users of
those projects and not by PMCs or committers, with some projects having
multiple groups with thousands of users).



On Sun, Nov 13, 2016 at 2:32 PM, Jeff Genender <jgenen...@savoirtech.com>
wrote:

>
> > On Nov 13, 2016, at 2:57 PM, Reynold Xin <r...@apache.org> wrote:
> >
> > "a better global way to A) communicate across a medium that everyone
> uses daily B) archive to search and come back to"
> >
> > How would we even validate or decide that? For discussions like this it
> is very easy to fall into confirmation bias.
>
> How?   Dunno… maybe about 17 years of historical data from Apache?  This
> one is pretty easy to “confirm”, no?
>
> >
> > I use mailing lists all the time since it is the Apache Way, but I also
> admit there are potentially better ways for other projects. People that are
> used to mailing lists might think mailing lists are the best thing in the
> world, but the reality is that majority of the developers in this world,
> outside a few core open source projects, have never used mailing lists. If
> we talk to the QQ/Wechat/web-based-forum generation in China and force them
> to use mailing lists, they might comply because it is the Apache Way, but
> they will also develop the sentiment that the ASF refuses to change and
> adapt newer technologies.
>
> Wechat?  Really?  Lets throw in Snapchat too while we are at it so there
> is no footprint for that discussion.  Seriously?  Reynold, is this really
> coming from you of all people?
>
> This project wanted to come to Apache, right?  Did they (or other Chinese
> projects) not look at the way things are done and all of a sudden have an
> issue with it?  I’m just sayin’… there shouldn’t be surprises here, right?
>
> Jeff
>
>
> >
> > And to be honest, while I think mailing lists are great for simple
> voting and information dissemination, there are obvious downsides of
> mailing lists too. That's why a lot of projects also augment mailing lists
> via video discussions, google docs for commenting, wiki, etc.
> >
> > In reality, there are also legal reasons why we use mailing lists, and
> those are not as well known. We should document those and make them more
> visible too.
> >
> >
> >
> > On Sun, Nov 13, 2016 at 12:25 PM, Jeff Genender <jgenen...@apache.org
> <mailto:jgenen...@apache.org>> wrote:
> > > On Nov 13, 2016, at 11:33 AM, Gunnar Tapper <tapper.gun...@gmail.com
> <mailto:tapper.gun...@gmail.com>> wrote:
> > > As mentioned, the Apache Way is that "everything happens on the
> mailing lists." As a matter of fact, key parts of being an incubator is to
> learn how to operate per the Apache Way and to build communities. We even
> include statistics about mailing list engagement as an indicator of
> community building.
> > >
> >
> > Gunnar, I’m going to give you a big -1 to this.
> >
> > Unless you can come up with a better global way to A) communicate across
> a medium that everyone uses daily B) archive to search and come back to, I
> am in full disagreement.  Since I have been with Apache (about 14 years), I
> have yet to find a better medium than the lists, and its always been a
> known fact that ultimately, any non-mail list discussions that result in
> some form of a decision are brought to the mail lists for global discussion.
> >
> > Our mail lists are indexed by Google and others.  Its easy to find what
> one looks for.
> >
> > Jeff
> >
> >
>
>


Re: [DISCUSS] China Contribution. (was: RocketMQ Incubation Proposal)

2016-11-13 Thread Reynold Xin
"a better global way to A) communicate across a medium that everyone uses
daily B) archive to search and come back to"

How would we even validate or decide that? For discussions like this it is
very easy to fall into confirmation bias.

I use mailing lists all the time since it is the Apache Way, but I also
admit there are potentially better ways for other projects. People that are
used to mailing lists might think mailing lists are the best thing in the
world, but the reality is that majority of the developers in this world,
outside a few core open source projects, have never used mailing lists. If
we talk to the QQ/Wechat/web-based-forum generation in China and force them
to use mailing lists, they might comply because it is the Apache Way, but
they will also develop the sentiment that the ASF refuses to change and
adapt newer technologies.

And to be honest, while I think mailing lists are great for simple voting
and information dissemination, there are obvious downsides of mailing lists
too. That's why a lot of projects also augment mailing lists via video
discussions, google docs for commenting, wiki, etc.

In reality, there are also legal reasons why we use mailing lists, and
those are not as well known. We should document those and make them more
visible too.



On Sun, Nov 13, 2016 at 12:25 PM, Jeff Genender 
wrote:

> > On Nov 13, 2016, at 11:33 AM, Gunnar Tapper 
> wrote:
> > As mentioned, the Apache Way is that "everything happens on the mailing
> lists." As a matter of fact, key parts of being an incubator is to learn
> how to operate per the Apache Way and to build communities. We even include
> statistics about mailing list engagement as an indicator of community
> building.
> >
>
> Gunnar, I’m going to give you a big -1 to this.
>
> Unless you can come up with a better global way to A) communicate across a
> medium that everyone uses daily B) archive to search and come back to, I am
> in full disagreement.  Since I have been with Apache (about 14 years), I
> have yet to find a better medium than the lists, and its always been a
> known fact that ultimately, any non-mail list discussions that result in
> some form of a decision are brought to the mail lists for global discussion.
>
> Our mail lists are indexed by Google and others.  Its easy to find what
> one looks for.
>
> Jeff
>


Re: [DISCUSS] China Contribution. (was: RocketMQ Incubation Proposal)

2016-11-10 Thread Reynold Xin
Background: I have no tie to RocketMQ. I didn't even know about it until
today and I don't know any of the people associated with the project. I am
Chinese but living in the US. I'm purely playing devil's advocate about a
meta-point here and don't know if it applies to RocketMQ or not.

I definitely agree with Jeff's point that "my thoughts about community
would be getting as many people and users involved as possible".

That said, for a project started in China, it is unclear switching the
primary development language from Chinese to English would help with
accomplishing that goal. While lowering the bar for non-Chinese speakers to
participate, it will limit the efficacy of its original developers, and
increases the bar for more Chinese developers, which are the more natural,
immediate expansion targets for the community.

If we as a community want to enforce the usage of English as the standard,
we should just explicitly say that.

I'd avoid using the argument that English will bring more users, as it is
not defensible and risk being interpreted as western arrogance. Afterall,
three out of the six largest Internet companies (by market cap) are
currently in mainland China, and they all have enormous daily active users
even though they are targeting primarily Chinese.


On Thu, Nov 10, 2016 at 11:14 PM, Jeff Genender <jgenen...@apache.org>
wrote:

> I would think that English is generally used because its the most
> international language, not because its the most used in the world.  Thus
> it helps cross borders for communication.  At the end of the day, I think
> you need to look at your community and ask if you want it to cross borders
> or not.  Do you want worldwide contribution (and adoption)?  I can tell you
> that I glean a lot of information from the mail lists when I run into
> problems or issues using Apache software.  If the discussions are in
> Chinese, you may miss a lot of people who can be a part of the discussion
> from outside of China.  I think you really need to think about who you want
> your users to be and how you want your product adopted.
>
> In addition, this is an incubated project.  AFAICT, the champion doesn’t
> speak Chinese, and I am wild-guessing maybe 2 of the mentors do.  This
> means the other mentors may have a difficult time steering the project when
> they are needed.  It makes it difficult for the champion to asses any
> problems without having someone notify him of a translated issue.  In the
> unlikely event that the project requires input from the incubation PMC or,
> the board for that matter, it would be very difficult to get a proper
> insight into the issues without have solid knowledge of the language.
>
> I personally don’t know of any rule or regulation that locks down a
> language and perhaps a board member can chime in on that.  But my .02 is
> that if I were bringing a project to Apache, my thoughts about community
> would be getting as many people and users involved as possible.  If you
> don’t use a more cross-border/international language, then I believe that
> you may ultimately be hindering your project beyond your borders.  I think
> that would be a shame.  OTOH, maybe your desire is to keep RocketMQ a
> Chinese piece of software.  I guess that is ok too… but I would be
> interested in why.
>
> Just my usual .02.
>
> Jeff
>
> > On Nov 10, 2016, at 11:53 PM, Tom Barber <t...@spicule.co.uk> wrote:
> >
> > I believe I saw something the other day where someone was talking about
> diverse languages on mailing lists. personally I think it's okay but
> obviously it decreases the chance of participation of others.
> >
> > of course the old saying "if it wasn't discussed on the list it never
> happened" didn't mention the language.
> >
> > Thought must be taken for jira and code comments as well. how would non
> Chinese speaking people follow development?
> >
> >
> > On 11 Nov 2016 06:45, "Reynold Xin" <r...@apache.org  r...@apache.org>> wrote:
> > Adding members@
> >
> > On Thu, Nov 10, 2016 at 10:40 PM, Reynold Xin <r...@apache.org  r...@apache.org>> wrote:
> >
> > > To play devil's advocate: is it OK for Apache projects that consist
> > > primarily of Chinese developers to communicate in Chinese? Or put it
> > > differently -- is it a requirement that all communications must be in
> > > English?
> > >
> > > I can see an inclusiveness argument for having to use English, as
> English
> > > is one of the most common languages. However, many talented software
> > > developers in China don't have the sufficient level of proficiency
> when it
> > > comes to English, as the penetration rate of English in China is much

Re: [DISCUSS] China Contribution. (was: RocketMQ Incubation Proposal)

2016-11-10 Thread Reynold Xin
Adding members@

On Thu, Nov 10, 2016 at 10:40 PM, Reynold Xin <r...@apache.org> wrote:

> To play devil's advocate: is it OK for Apache projects that consist
> primarily of Chinese developers to communicate in Chinese? Or put it
> differently -- is it a requirement that all communications must be in
> English?
>
> I can see an inclusiveness argument for having to use English, as English
> is one of the most common languages. However, many talented software
> developers in China don't have the sufficient level of proficiency when it
> comes to English, as the penetration rate of English in China is much lower
> than other countries. It is as hard for Chinese speakers to learn English
> as for English speakers to learn Chinese.
>
> One can certainly argue forcing everybody to use English will also exclude
> those Chinese developers, and from the perspective of the number of native
> speakers, Mandarin (a Chinese dialect) outnumbers English 3 to 1 according
> to Wikipedia.
>
> Similar argument also applies to Japanese, and many other countries,
> except the number of Chinese speakers is much larger.
>
>
>
>
> On Thu, Nov 10, 2016 at 10:18 PM, Luke Han <luke...@apache.org> wrote:
>
>> Hi Gunnar,
>>
>> I don't think your point is right, one community's problem (maybe not
>> real,
>> but just
>> refer to what you mentioned) could NOT represent all contributions from
>> China,
>> or any other territories from all of the world.
>>
>> This will misleading people to ignore contributions from Chinese and LABEL
>> for such
>> contributors and committers..as your pattern, there are tons of "issue" to
>> describe like
>> Russian Contribution, German Contributions, Canada contribution or
>> others...
>> that's not right way.
>>
>> Yes, Chinese people are not native English speakers, but they are
>> contributing to
>> most of the ASF projects and others foundation projects very much,
>> involved
>> in many
>> discussion, development, decision and others deeply.
>>
>> Let's try to talk with some data, here's summary about last 31 days
>> mailing
>> list activity from lists.apache.org [1]:
>>
>> Project |  Emails|   Topics|   Participants
>> HBase |   610  |406  |   100
>> Spark   |   412  |88   |   124
>> Kylin |   294  |144  |   61
>> CarbonData |   852  |250  |   116
>> HAWQ  |   284  |109  |   57
>> Trafodion  |   87   |20   |   25
>>
>> There are many Chinese people are participating in these projects, you
>> could check
>> each one and see how Chinese people are discussing within mailing list.
>>
>> It's really not easy for Chinese people, they have to find out a way to
>> access
>> gmail or others since there's GFW, they are not native English speakers,
>> they have limited experiences for open source especially the Apache Way.
>> But they are willing to contribute, willing to participate global
>> community, and try
>> their best to learn and follow The Apache Way. We should have the patience
>> for
>> those new comers.
>>
>> As one thing I'm doing now is try to let more people to know our journey,
>> our experience
>>  about how to follow the Apache Way, how we overcome such
>> challenges...through
>> conference, events, meetup, blog, book and so on...and also helping many
>> potential projects
>> who are interesting to join Apache family.
>>
>> I would like suggest to change this topic to something like "Help
>> Trafodion
>> community"
>> which will help to focus on real issue and your concern (Does Trafodion
>> PMC
>> know
>> this concern?)  I'm very happy to help...share with you many articles,
>> session recordings and
>> others about open source, even could try to do some face to face
>> discussion
>> if necessary:-)
>>
>>
>> [1] https://lists.apache.org  <https://lists.apache.org>
>>
>> On Fri, Nov 11, 2016 at 3:00 AM, Gunnar Tapper <tapper.gun...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > Using the RocketMQ proposal to start a larger discussion.
>> >
>> > Apache Trafodion is another project that has a lot of contribution from
>> > China.
>> >
>> > One of the struggles I've seen is that the contributors aren't that
>> active
>> > on email. Rather, they prefer to use a forum on QQ communicating in
>> > Chinese.
>&

Re: [DISCUSS] China Contribution. (was: RocketMQ Incubation Proposal)

2016-11-10 Thread Reynold Xin
To play devil's advocate: is it OK for Apache projects that consist
primarily of Chinese developers to communicate in Chinese? Or put it
differently -- is it a requirement that all communications must be in
English?

I can see an inclusiveness argument for having to use English, as English
is one of the most common languages. However, many talented software
developers in China don't have the sufficient level of proficiency when it
comes to English, as the penetration rate of English in China is much lower
than other countries. It is as hard for Chinese speakers to learn English
as for English speakers to learn Chinese.

One can certainly argue forcing everybody to use English will also exclude
those Chinese developers, and from the perspective of the number of native
speakers, Mandarin (a Chinese dialect) outnumbers English 3 to 1 according
to Wikipedia.

Similar argument also applies to Japanese, and many other countries, except
the number of Chinese speakers is much larger.




On Thu, Nov 10, 2016 at 10:18 PM, Luke Han  wrote:

> Hi Gunnar,
>
> I don't think your point is right, one community's problem (maybe not real,
> but just
> refer to what you mentioned) could NOT represent all contributions from
> China,
> or any other territories from all of the world.
>
> This will misleading people to ignore contributions from Chinese and LABEL
> for such
> contributors and committers..as your pattern, there are tons of "issue" to
> describe like
> Russian Contribution, German Contributions, Canada contribution or
> others...
> that's not right way.
>
> Yes, Chinese people are not native English speakers, but they are
> contributing to
> most of the ASF projects and others foundation projects very much, involved
> in many
> discussion, development, decision and others deeply.
>
> Let's try to talk with some data, here's summary about last 31 days mailing
> list activity from lists.apache.org [1]:
>
> Project |  Emails|   Topics|   Participants
> HBase |   610  |406  |   100
> Spark   |   412  |88   |   124
> Kylin |   294  |144  |   61
> CarbonData |   852  |250  |   116
> HAWQ  |   284  |109  |   57
> Trafodion  |   87   |20   |   25
>
> There are many Chinese people are participating in these projects, you
> could check
> each one and see how Chinese people are discussing within mailing list.
>
> It's really not easy for Chinese people, they have to find out a way to
> access
> gmail or others since there's GFW, they are not native English speakers,
> they have limited experiences for open source especially the Apache Way.
> But they are willing to contribute, willing to participate global
> community, and try
> their best to learn and follow The Apache Way. We should have the patience
> for
> those new comers.
>
> As one thing I'm doing now is try to let more people to know our journey,
> our experience
>  about how to follow the Apache Way, how we overcome such
> challenges...through
> conference, events, meetup, blog, book and so on...and also helping many
> potential projects
> who are interesting to join Apache family.
>
> I would like suggest to change this topic to something like "Help Trafodion
> community"
> which will help to focus on real issue and your concern (Does Trafodion PMC
> know
> this concern?)  I'm very happy to help...share with you many articles,
> session recordings and
> others about open source, even could try to do some face to face discussion
> if necessary:-)
>
>
> [1] https://lists.apache.org  
>
> On Fri, Nov 11, 2016 at 3:00 AM, Gunnar Tapper 
> wrote:
>
> > Hi,
> >
> > Using the RocketMQ proposal to start a larger discussion.
> >
> > Apache Trafodion is another project that has a lot of contribution from
> > China.
> >
> > One of the struggles I've seen is that the contributors aren't that
> active
> > on email. Rather, they prefer to use a forum on QQ communicating in
> > Chinese.
> >
> > I'm currently the release manager and I must admit that it's hard not to
> > see all discussions. Several of us are trying to encourage questions etc
> > via the email lists but users just prefer Chinese forums.
> >
> > I suspect that Apache will see more of this behavior moving forward,
> > especially as other proposals come in. So, I'm hoping that members in
> China
> > can help advise on what can be done to address communication issues like
> > this.
> >
> > Thanks,
> >
> > Gunnar
> >
> > On Nov 5, 2016 12:21 PM, "Ross Gardler" 
> > wrote:
> >
> > Some folks may remember my state of the feather session a couple of years
> > ago when I called for more awareness of the ASFs role in open source
> beyond
> > English speaking countries. This was prompted by a fact finding trip to
> > China.
> >
> > RocketMQ and the team behind it was one of the projects I talked to. 

Re: [VOTE] Accept Hivemall into the Apache Incubator

2016-09-03 Thread Reynold Xin
s no conflict in their target runtimes.
>
> === A Excessive Fascination with the Apache Brand ===
>
> Our interest for this incubation is attracting more contributors,
> building a strong community with open governance, and increasing the
> visibility of Hivemall in the market/community. We will be sensitive
> to inadvertent abuse of the Apache brand for any commercial use and
> will work with the Incubator PMC and project mentors to ensure the
> brand policies are respected.
>
> == Documentation ==
>
> Information on Hivemall can be found at:
> https://github.com/myui/hivemall/wiki
>
> == Initial Source ==
>
> We released the initial version of Hivemall in 2013 at
> https://github.com/myui/hivemall and introduced Hivemall at the Hadoop
> Summit 2014.
>
> == Source and Intellectual Property Submission Plan ==
>
> We know no legal encumberment to transfer of the source to Apache. We
> are going to get Contributor License Agreement (CLA) for all property
> of Hivemall.
>
> Also, we plan to get a sign from AIST for Software Grant Agreement (SGA).
>
> == External Dependencies ==
>
> Hivemall depends on the following third party libraries:
>
> Core module:
>  * netty (The MIT License)
>  * smile (Apache License v2.0)
>  * org.takuaani.xz (Public Domain)
>  * xgboost (Apache License v2.0)
>  * hadoop (Apache License v2.0)
>  * hive (Apache License v2.0)
>  * log4j (Apache License v2.0)
>  * guava (Apache License v2.0)
>  * lucene-analyzers-kuromoji (Apache License v2.0)
>  * junit (Eclipse Public License v1.0)
>  * mockito (The MIT License)
>  * powermock (Apache License v2.0)
>  * kryo (BSD License)
>
> Hivemall on Spark:
>  * spark (Apache License v2.0)
>  * commons-cli  (Apache License v2.0)
>  * commons-logging (Apache License v2.0)
>  * commons-compress (Apache License v2.0)
>  * scala-library (BSD License)
>  * scalatest (Apache License v2.0)
>  * xerial-core (Apache License v2.0)
>
> The dependencies all have Apache compatible licenses.
>
> == Cryptography ==
>
> N/A
>
> == Required resources ==
>
> === Mailing lists ===
>
>  * priv...@hivemall.incubator.apache.org  (with moderated subscriptions)
>  * comm...@hivemall.incubator.apache.org
>  * d...@hivemall.incubator.apache.org
>  * u...@hivemall.incubator.apache.org
>
> === Git Repository ===
>
> https://git-wip-us.apache.org/repos/asf/incubator-hivemall.git
>
> === JIRA assistance ===
>
> JIRA project Hivemall (HIVEMALL)
>
> == Initial Committers ==
>
>  * Makoto Yui (m...@treasure-data.com)
>  * Takeshi Yamamuro (yamamuro.tak...@lab.ntt.co.jp)
>  * Daniel Dai (da...@hortonworks.com)
>  * Tsuyoshi Ozawa (ozawa.tsuyo...@lab.ntt.co.jp)
>  * Kai Sasaki (sas...@treasure-data.com)
>
> == Affiliations ==
>
> === Treasure Data ===
>  * Makoto Yui
>  * Kai Sasaki
>
> === NTT ===
>  * Takeshi Yamamuro
>  * Tsuyoshi Ozawa Apache Hadoop PMC member
>
> === Hortonworks ===
>  * Daniel Dai (ASF member) Apache Pig PMC member
>
> == Sponsors ==
>
> === Champion ===
>  * Roman Shaposhnik (Pivotal, ASF member, IPMC member) Apache
> Bigtop/Incubator PMC member
>
> === Nominated Mentors ===
>
>  * Reynold Xin (Dataricks, ASF member) Apache Spark PMC member
>  * Markus Weimer (Microsoft, ASF member) Apache REEF PMC member
>  * Xiangrui Meng (Databricks, ASF member) Apache Spark PMC member
>
> === Sponsoring Entity ===
>
> We are requesting the Incubator to sponsor this project.
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Apache SystemML 0.10.0-incubating (RC2)

2016-06-10 Thread Reynold Xin
+1



On Mon, Jun 6, 2016 at 2:27 PM, Justin Mclean 
wrote:

> Hi,
>
> +1 binding
>
> I checked:
> - names contain incubating
> - signatures good
> - DISCLAIMER exists
> - LICENSE and NOTICE correct
> - There are NO unexpended binary in the source release
> - All files have apache headers
> - Can compile from source
>
> Minor issue there’s no need to list the copyright for abego software or
> ANTLR software in the NOTICE file as both are BSD licensed. [1] Please fix
> this for the next release.
>
> Thanks,
> Justin
>
> 1. http://www.apache.org/dev/licensing-howto.html#permissive-deps
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Accept PredictionIO into the Apache Incubator

2016-05-24 Thread Reynold Xin
+1 (binding)


On Mon, May 23, 2016 at 3:22 PM, Andrew Purtell  wrote:

> Since discussion on the matter of PredictionIO has died down, I would like
> to call a VOTE
> on accepting PredictionIO into the Apache Incubator.
>
> Proposal: https://wiki.apache.org/incubator/PredictionIO
>
> ​[ ] +1 Accept PredictionIO into the Apache Incubator
> [ ] +0 Abstain
> [ ] -1 Do not accept PredictionIO into the Apache Incubator, because ...
>
> This vote will be open for at least 72 hours.
>
> My vote is +1 (binding)
>
> --
>
> PredictionIO Proposal
>
> Abstract
>
> PredictionIO is an open source Machine Learning Server built on top of
> state-of-the-art open source stack, that enables developers to manage and
> deploy production-ready predictive services for various kinds of machine
> learning tasks.
>
> Proposal
>
> The PredictionIO platform consists of the following components:
>
>* PredictionIO framework - provides the machine learning stack for
>  building, evaluating and deploying engines with machine learning
>  algorithms. It uses Apache Spark for processing.
>
>* Event Server - the machine learning analytics layer for unifying
> events
>  from multiple platforms. It can use Apache HBase or any JDBC backends
>  as its data store.
>
> The PredictionIO community also maintains a Template Gallery, a place to
> publish and download (free or proprietary) engine templates for different
> types of machine learning applications, and is a complemental part of the
> project. At this point we exclude the Template Gallery from the proposal,
> as it has a separate set of contributors and we’re not familiar with an
> Apache approved mechanism to maintain such a gallery.
>
> Background
>
> PredictionIO was started with a mission to democratize and bring machine
> learning to the masses.
>
> Machine learning has traditionally been a luxury for big companies like
> Google, Facebook, and Netflix. There are ML libraries and tools lying
> around the internet but the effort of putting them all together as a
> production-ready infrastructure is a very resource-intensive task that is
> remotely reachable by individuals or small businesses.
>
> PredictionIO is a production-ready, full stack machine learning system that
> allows organizations of any scale to quickly deploy machine learning
> capabilities. It comes with official and community-contributed machine
> learning engine templates that are easy to customize.
>
> Rationale
>
> As usage and number of contributors to PredictionIO has grown bigger and
> more diverse, we have sought for an independent framework for the project
> to keep thriving. We believe the Apache foundation is a great fit. Joining
> Apache would ensure that tried and true processes and procedures are in
> place for the growing number of organizations interested in contributing
> to PredictionIO. PredictionIO is also a good fit for the Apache foundation.
> PredictionIO was built on top of several Apache projects (HBase, Spark,
> Hadoop). We are familiar with the Apache process and believe that the
> democratic and meritocratic nature of the foundation aligns with the
> project goals.
>
> Initial Goals
>
> The initial milestones will be to move the existing codebase to Apache and
> integrate with the Apache development process. Once this is accomplished,
> we plan for incremental development and releases that follow the Apache
> guidelines, as well as growing our developer and user communities.
>
> Current Status
>
> PredictionIO has undergone nine minor releases and many patches.
> PredictionIO is being used in production by Salesforce.com as well as many
> other organizations and apps. The PredictionIO codebase is currently
> hosted at GitHub, which will form the basis of the Apache git repository.
>
> Meritocracy
>
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements in an open forum. We intend to invite additional developers
> to participate. We will encourage and monitor community participation so
> that privileges can be extended to those that contribute.
>
> Community
>
> Acceptance into the Apache foundation would bolster the already strong
> user and developer community around PredictionIO. That community includes
> many contributors from various other companies, and an active mailing list
> composed of hundreds of users.
>
> Core Developers
>
> The core developers of our project are listed in our contributors and
> initial PPMC below. Though many are employed at Salesforce.com, there are
> also engineers from ActionML, and independent developers.
>
> Alignment
>
> The ASF is the natural choice to host the PredictionIO project as its goal
> is democratizing Machine Learning by making it more easily accessible to
> every user/developer. PredictionIO is built on top of several top level
> Apache projects as outlined above.
>
> Known Risks
>
> Orphaned Products
>
> PredictionIO has a solid and growing community. It is deployed on
> 

Re: [VOTE] Graduate Zeppelin from the Incubator

2016-04-17 Thread Reynold Xin
+1


On Sat, Apr 16, 2016 at 2:01 AM, moon soo Lee  wrote:

> Hi,
>
> Apache Zeppelin started incubating about a year and 4 months ago
> (2014-12-23) and the members of the community think that it is ready to
> graduate from the incubator to be a TLP.
>
> Since it's inception, Zeppelin community has made 3 releases, recruited 4
> PPMC and resolved 500+ issues [1] with 90+ contributors [2]. Now, community
> is very open, active and continuously growing.
>
> The Apache Zeppelin community has discussed and voted on graduation to
> top level
> project.
> The vote passed with 22 +1 votes (9 binding) and no 0 or -1 votes.
>
> Incubation Status:
> http://incubator.apache.org/projects/zeppelin.html
> Maturity Assessment:
>
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Apache+Zeppelin+Project+Maturity+Model
> Discussion:
> https://s.apache.org/gLi0
> https://s.apache.org/GhqY (continue)
> Vote:
> https://s.apache.org/7hCK
> Result:
> https://s.apache.org/1rJD
>
> Please vote on the resolution pasted below to graduate Apache Zeppelin
> from the incubator to top level project.
>
> [ ] +1 Graduate Apache Zeppelin from the Incubator.
> [ ] +0 Don't care.
> [ ] -1 Don't graduate Apache Zeppelin from the Incubator because
>
> This vote will be open for at least 72 hours.
> Many thanks to our mentors and everyone else for the support,
>
> [1] https://s.apache.org/eswD
> [2] https://s.apache.org/gi3o
>
> Apache Zeppelin top-level project resolution:
> 
>
> WHEREAS, the Board of Directors deems it to be in the best
> interests of the Foundation and consistent with the
> Foundation's purpose to establish a Project Management
> Committee charged with the creation and maintenance of
> open-source software, for distribution at no charge to
> the public, related to a collaborative data analytics and
> visualization tool for general-purpose data processing systems.
>
> NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> Committee (PMC), to be known as the "Apache Zeppelin Project",
> be and hereby is established pursuant to Bylaws of the
> Foundation; and be it further
>
> RESOLVED, that the Apache Zeppelin Project be and hereby is
> responsible for the creation and maintenance of software
> related to a collaborative data analytics and
> visualization tool for general-purpose data processing systems; and be it
> further
>
> RESOLVED, that the office of "Vice President, Apache Zeppelin" be
> and hereby is created, the person holding such office to
> serve at the direction of the Board of Directors as the chair
> of the Apache Zeppelin Project, and to have primary responsibility
> for management of the projects within the scope of
> responsibility of the Apache Zeppelin Project; and be it further
>
> RESOLVED, that the persons listed immediately below be and
> hereby are appointed to serve as the initial members of the
> Apache Zeppelin Project:
>
> * Alexander Bezzubov 
> * Anthony Corbacho 
> * Damien Corneau 
> * Felix Cheung 
> * Jongyoul Lee 
> * Kevin Sangwoo Kim 
> * Lee Moon Soo 
> * Mina Lee 
> * Prabhjyot Singh 
>
> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Lee Moon Soo
> be appointed to the office of Vice President, Apache Zeppelin, to
> serve in accordance with and subject to the direction of the
> Board of Directors and the Bylaws of the Foundation until
> death, resignation, retirement, removal or disqualification,
> or until a successor is appointed; and be it further
>
> RESOLVED, that the initial Apache Zeppelin PMC be and hereby is
> tasked with the creation of a set of bylaws intended to
> encourage open development and increased participation in the
> Apache Zeppelin Project; and be it further
>
> RESOLVED, that the Apache Zeppelin Project be and hereby
> is tasked with the migration and rationalization of the Apache
> Incubator Zeppelin podling; and be it further
>
> RESOLVED, that all responsibilities pertaining to the Apache
> Incubator Zeppelin podling encumbered upon the Apache Incubator
> Project are hereafter discharge.
>


Re: [VOTE] Accept Gearpump into the Apache Incubator

2016-03-01 Thread Reynold Xin
am
> > processing, they have fundamentally different architectures. In
> particular,
> > Gearpump adopts the micro-service model, building on the Akka framework,
> > for concurrency, isolation and error handling, which we believe is a
> future
> > trend for building distributed software. We look forward to collaboration
> > with other Apache communities.
> >
> >  An Excessive Fascination with the Apache Brand 
> > The ASF has a strong brand; we appreciate that fact and will protect the
> > brand. Gearpump is an existing open source project with many committers
> and
> > years of effort.  The reasons to join Apache are outlined in the
> Rationale
> > section above.
> >
> > === Documentation ===
> > Information on Gearpump can be found at:
> > Gearpump website: http://gearpump.io
> > Codebase: https://github.com/gearpump/gearpump
> >
> > === Initial Source and Intellectual Property Submission Plan ===
> > The Gearpump codebase is currently hosted on Github: https://github.com/
> > gearpump/gearpump. We will use this codebase to migrate to the Apache
> > foundation. The Gearpump source code is licensed under Apache License
> > Version 2.0 and will be kept that way. All contributions on the project
> > will be licensed directly to the Apache foundation through signed
> > Individual Contributor License Agreements or Corporate Contributor
> License
> > Agreements.
> >
> > === External Dependencies ===
> > All of Gearpump dependencies are distributed under Apache compatible
> > licenses.
> >
> > Gearpump leverages Akka which has Apache 2.0 licensing for current and
> > planned versions
> >
> >
> http://doc.akka.io/docs/akka/2.3.12/project/licenses.html#Licenses_for_Dependency_Libraries
> >
> > === Cryptography ===
> > Gearpump does not include or utilize cryptographic code.
> >
> > === Required Resources ===
> > We request that following resources be created for the project to use
> >
> >  Mailing lists 
> >
> > gearpump-priv...@incubator.apache.org (with moderated subscriptions)
> > gearpump-dev
> > gearpump-user
> > gearpump-commits
> >
> > ==== Git repository 
> > Git is the preferred source control system: git://
> git.apache.org/gearpump
> >
> >  Documentation 
> > https://gearpump.incubator.apache.org/docs/
> >
> >  JIRA instance 
> > JIRA Gearpump (GEARPUMP)
> > https://issues.apache.org/jira/browse/gearpump
> >
> > === Initial Committers ===
> > * Xiang Zhong 
> >
> > * Tianlun Zhang 
> >
> > * Qian Xu 
> >
> > * Huafeng Wang 
> >
> > * Kam Kasravi 
> >
> > * Weihua Jiang 
> >
> > * Tomasz Targonski 
> >
> > * Karol Brejna 
> >
> > * Gang Wang 
> >
> > * Mark Chmarny 
> >
> > * Xinglang Wang 
> >
> > * Lan Wang 
> >
> > * Jianzhong Chen 
> >
> > * Xuefu Zhang 
> >
> > * Rui Li 
> >
> > === Affiliations ===
> > * Xiang Zhong –  Intel
> >
> > * Tianlun Zhang –  Intel
> >
> > * Qian Xu –  Intel
> >
> > * Huafeng Wang –  Intel
> >
> > * Kam Kasravi –  Intel
> >
> > * Weihua Jiang –  Intel
> >
> > * Tomasz Targonski – Intel
> >
> > * Karol Brejna – Intel
> >
> > * Mark Chmarny – Intel
> >
> > * Gang Wang – Intel
> >
> > * Mark Chmarny  – Intel
> >
> > * Xinglang Wang  – Ebay
> >
> > * Lan Wang – Huawei
> >
> > * Jianzhong Chen – Cloudera
> >
> > * Xuefu Zhang – Cloudera
> >
> > * Rui Li  – Intel
> >
> > === Sponsors ===
> >
> >  Champion 
> > Andrew Purtell 
> >
> >  Nominated Mentors 
> > * Andrew Purtell 
> >
> > * Jarek Jarcec Cecho 
> >
> > * Todd Lipcon 
> >
> > * Xuefu Zhang 
> >
> > * Reynold Xin 
> >
> >  Sponsoring Entity 
> > Apache Incubator PMC​
> >
> > ​
> >
>


Re: [VOTE] Apache SystemML 0.9.0-incubating (RC3)

2016-02-10 Thread Reynold Xin
+1

(some issues as pointed out above, but we can fix those in the next release)


On Sat, Feb 6, 2016 at 11:29 AM, Luciano Resende 
wrote:

> Let me reiterate my +1 (binding) vote here...
>
> Anyone else willing to review the release ?
>
> On Mon, Feb 1, 2016 at 2:57 PM, Luciano Resende 
> wrote:
>
> > Please vote to approve the release of the following candidate as Apache
> > SystemML version 0.9.0!
> >
> > The PPMC vote thread:
> >
> >
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00267.html
> >
> > And the result:
> >
> >
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00279.html
> >
> > The tag to be voted on is v0.9.0-rc3
> > (49528085a9b2ea0babade040db821c8158a57ab5)
> >
> >
> >
> https://github.com/apache/incubator-systemml/tree/49528085a9b2ea0babade040db821c8158a57ab5
> >
> > The release files, including signatures, digests, etc. can be found at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachesystemml-1003/
> >
> > The distribution and rat report is also available at:
> >
> > http://people.apache.org/~lresende/systemml/0.9.0-rc3/
> >
> > The vote is open for at least 72 hours and passes if a majority of at
> > least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache SystemML 0.9.0
> > [ ] -1 Do not release this package because ...
> >
> >
> >
> > --
> > Luciano Resende
> > http://people.apache.org/~lresende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
> >
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: [VOTE] Accept Torii into Apache Incubator

2015-11-30 Thread Reynold Xin
t Simple (MIT)
>> * Spring Framework Core (Apache v2)
>> * Play (Apache v2)
>> * SLF4J (MIT)
>> * Scala
>> * Scalatest (Apache v2)
>> * Scalactic (Apache v2)
>> * Mockito (MIT)
>> 
>> == Required Resources ==
>> 
>> === Mailing lists ===
>> 
>> * priv...@torii.incubator.apache.org (with moderated subscriptions)
>> * comm...@torii.incubator.apache.org
>> * d...@torii.incubator.apache.org
>> 
>> === Git Repository ===
>> 
>> * https://git-wip-us.apache.org/repos/asf/incubator-torii.git
>> 
>> === Issue Tracking ===
>> 
>> * A JIRA issue tracker: https://issues.apache.org/jira/browse/TORII
>> 
>> == Initial Committers ==
>> 
>> * Leugim Bustelo (lbustelo AT us DOT ibm DOT com)
>> * Jakob Odersky (odersky AT us DOT ibm DOT com)
>> * Luciano Resende (lresende AT apache DOT org)
>> * Robert Senkbeil (rcsenkbe AT us DOT ibm DOT com)
>> * Corey Stubbs (cstubbs AT us DOT ibm DOT com)
>> * Miao Wang (wangmiao AT us DOT ibm DOT com)
>> * Sean Welleck (swelleck AT us DOT ibm DOT com)
>> 
>> === Affiliations ===
>> All of the initial committers are employed by IBM.
>> 
>> == Sponsors ==
>> 
>> === Champion ===
>> * Sam Ruby (rubys AT apache DOT org)
>> 
>> === Nominated Mentors ===
>> * Luciano Resende (lresende AT apache DOT org)
>> * Reynold Xin (rxin AT apache DOT org)
>> * Hitesh Shah (hitesh AT apache DOT org)
>> * Julien Le Dem (julien AT apache DOT org)
>> 
>> === Sponsoring Entity ===
>> 
>> We would like to propose the Apache Incubator to sponsor this project.
>> 
>> 
>> --
>> Luciano Resende
>> http://people.apache.org/~lresende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
> 
> 
> 
> -- 
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Kudu into the Apache Incubator

2015-11-24 Thread Reynold Xin
+1


On Tue, Nov 24, 2015 at 11:32 AM, Todd Lipcon  wrote:

> Hi all,
>
> Discussion on the [DISCUSS] thread seems to have wound down, so I'd like to
> call a VOTE on acceptance of Kudu into the ASF Incubator. The proposal is
> pasted below and also available on the wiki at:
> https://wiki.apache.org/incubator/KuduProposal
>
> The proposal is unchanged since the original version, except for the
> addition of Carl Steinbach as a Mentor.
>
> Please cast your votes:
>
> [] +1, accept Kudu into the Incubator
> [] +/-0, positive/negative non-counted expression of feelings
> [] -1, do not accept Kudu into the incubator (please state reasoning)
>
> Given the US holiday this week, I imagine many folks are traveling or
> otherwise offline. So, let's run the vote for a full week rather than the
> traditional 72 hours. Unless the IPMC objects to the extended voting
> period, the vote will close on Tues, Dec 1st at noon PST.
>
> Thanks
> -Todd
> -
>
> = Kudu Proposal =
>
> == Abstract ==
>
> Kudu is a distributed columnar storage engine built for the Apache Hadoop
> ecosystem.
>
> == Proposal ==
>
> Kudu is an open source storage engine for structured data which supports
> low-latency random access together with efficient analytical access
> patterns. Kudu distributes data using horizontal partitioning and
> replicates each partition using Raft consensus, providing low
> mean-time-to-recovery and low tail latencies. Kudu is designed within the
> context of the Apache Hadoop ecosystem and supports many integrations with
> other data analytics projects both inside and outside of the Apache
> Software Foundation.
>
>
>
> We propose to incubate Kudu as a project of the Apache Software Foundation.
>
> == Background ==
>
> In recent years, explosive growth in the amount of data being generated and
> captured by enterprises has resulted in the rapid adoption of open source
> technology which is able to store massive data sets at scale and at low
> cost. In particular, the Apache Hadoop ecosystem has become a focal point
> for such “big data” workloads, because many traditional open source
> database systems have lagged in offering a scalable alternative.
>
>
>
> Structured storage in the Hadoop ecosystem has typically been achieved in
> two ways: for static data sets, data is typically stored on Apache HDFS
> using binary data formats such as Apache Avro or Apache Parquet. However,
> neither HDFS nor these formats has any provision for updating individual
> records, or for efficient random access. Mutable data sets are typically
> stored in semi-structured stores such as Apache HBase or Apache Cassandra.
> These systems allow for low-latency record-level reads and writes, but lag
> far behind the static file formats in terms of sequential read throughput
> for applications such as SQL-based analytics or machine learning.
>
>
>
> Kudu is a new storage system designed and implemented from the ground up to
> fill this gap between high-throughput sequential-access storage systems
> such as HDFS and low-latency random-access systems such as HBase or
> Cassandra. While these existing systems continue to hold advantages in some
> situations, Kudu offers a “happy medium” alternative that can dramatically
> simplify the architecture of many common workloads. In particular, Kudu
> offers a simple API for row-level inserts, updates, and deletes, while
> providing table scans at throughputs similar to Parquet, a commonly-used
> columnar format for static data.
>
>
>
> More information on Kudu can be found at the existing open source project
> website: http://getkudu.io and in particular in the Kudu white-paper PDF:
> http://getkudu.io/kudu.pdf from which the above was excerpted.
>
> == Rationale ==
>
> As described above, Kudu fills an important gap in the open source storage
> ecosystem. After our initial open source project release in September 2015,
> we have seen a great amount of interest across a diverse set of users and
> companies. We believe that, as a storage system, it is critical to build an
> equally diverse set of contributors in the development community. Our
> experiences as committers and PMC members on other Apache projects have
> taught us the value of diverse communities in ensuring both longevity and
> high quality for such foundational systems.
>
> == Initial Goals ==
>
>  * Move the existing codebase, website, documentation, and mailing lists to
> Apache-hosted infrastructure
>  * Work with the infrastructure team to implement and approve our code
> review, build, and testing workflows in the context of the ASF
>  * Incremental development and releases per Apache guidelines
>
> == Current Status ==
>
>  Releases 
>
> Kudu has undergone one public release, tagged here
> https://github.com/cloudera/kudu/tree/kudu0.5.0-release
>
> This initial release was not performed in the typical ASF fashion -- no
> source tarball was released, but rather only convenience binaries made
> available in 

Re: RTC vs CTR (was: Concerning Sentry...)

2015-11-22 Thread Reynold Xin
Most non-trivial software projects I worked on (paid or un-paid) have RTC
culture. I cannot represent every single project, but in the ones that I'm
closely involved with that use RTC, it is simply part of the culture and
recognition that mandatory code review improves code quality. (We can
debate about this in a separate thread, since this is not what this thread
is about.)


I don't think we should elevate everything to "Apache Way", "trust", or
"community building". RTC vs CTR is not about:

1. Apache Way

Given ASF doesn't require RTC vs CTR vs somewhere in between, and different
TLPs already follow different ways, I don't think any mentor or the
incubator should force their view upon incubating projects.

2. Trust

It's just part of a project's process and culture. Greg brought up that RTC
is an indication of lack of trust and committers are just treated as normal
contributors: "What I haven't seen is an explanation why a committer must
be treated the same as a drive-by. Both are subject to requiring
'permission' to make even the simplest of changes under RTC."

Committers are required to use JIRA, github, and follow many other
processes that "drive-by" should follow. I don't see why "code review" is
different from filing JIRA tickets. In most RTC projects, committers do
have more rights -- a committer can review somebody else's patch and commit
it.

3. community building

Lots of successful open source projects, both inside and outside ASF,
employ RTC. As Todd mentioned, almost all the top 10 most starred (on
github) projects use some form of RTC, so it is hard for me to believe that
RTC would hinder community building. Of course, one can always argue that
if those projects had employed CTR, maybe they would've been even more
popular. But then we got into the area that we just have to agree to
disagree.



On Sun, Nov 22, 2015 at 9:37 PM, Todd Lipcon  wrote:

> On Sun, Nov 22, 2015 at 12:18 PM, Konstantin Boudnik 
> wrote:
>
> >  >
> > > > The question is not to decide if C-T-R is The Apache Way over R-T-C.
> > The
> > > > question is wether a project entering incubation with a selected
> R-T-C
> > > > mode is likely to exit incubation for the simple reason it will be
> very
> > > > hard for this project to grow its community due to this choice. It's
> > > > like starting a 100m race with a 20kb backpack on your shoulder...
> > > >
> > >
> > > If you have any statistics that show this to be the case, I'd be very
> > > interested. RTC is the norm in basically every Apache project I've
> been a
> > > part of, many of which have thriving communities and are generally
> > regarded
> > > as successful software projects.
> >
> > Do you have any statistics on that, Todd? Would be very interesting to
> see,
> > indeed.
> >
> >
> I don't have incubator stats... nor do I have a good way to measure "most
> active" or "most successful" projects in the ASF (seems that itself could
> be a 'centithread'-worthy discussion). But a potential proxy could be the
> number of stars on github:
>
> https://github.com/search?utf8=%E2%9C%93=user%3Aapache=Repositories=searchresults
>  (sort by number of stars)
>
> Of the top ten:
>
> Spark: RTC via github pull request
> Storm: RTC (https://storm.apache.org/documentation/BYLAWS.html see "Code
> Change")
> Cassandra: RTC (based on my skimming the commit log which has "Reviewed by"
> quite often)
> CouchDB: RTC (http://couchdb.apache.org/bylaws.html see "RTC" section)
> Kafka: RTC (based on "Reviewed by" showing up in recent commit logs)
> Thrift: CTR
> Mesos: RTC (based on reviewboard links in most of the recent commits)
> Zookeeper: RTC (based on personal experience and comments above in this
> thread)
> Cordova: CTR (based on
>
> https://github.com/apache/cordova-coho/blob/master/docs/committer-workflow.md
> )
> Hadoop: RTC (based on personal experience)
>
> Briefly looking through the #11 through #30 projects I also see a
> substantial number which operate on RTC (and others for which I don't know)
>
> So, I don't think there's much evidence that RTC prevents a project from
> becoming successful in the eyes of the developer community. Also worth
> noting that several of these are relatively new TLPs (i.e. within the last
> ~3 years) whereas others are quite old but still active and successful.
>
> -Todd
>


Re: [DISCUSS] Spark-Kernel Incubator Proposal

2015-11-13 Thread Reynold Xin
I'm happy to mentor the incubation if you are still looking for mentors.

I'd also like to second Matei that spark-kernel as a name is fairly
confusing. It only makes sense when viewing from IPython notebook's point
of view to refer to these things as kernels. Outside of that context, it
sounds like it is the spark-core module, which this obviously isn't.



On Fri, Nov 13, 2015 at 2:28 PM, P. Taylor Goetz  wrote:

> Thanks for the reference Alex. It answers my question regarding the path
> you chose.
>
> -Taylor
>
> > On Nov 13, 2015, at 12:13 AM, Alexander Bezzubov 
> wrote:
> >
> > Hi,
> >
> > it looks pretty interesting, especially a part about integration with
> > Zeppelin as another Scala interpreter implementation.
> >
> > AFAIK there was a discussion on including Spark-Kernel to spark core
> > https://issues.apache.org/jira/browse/SPARK-4605 but not sure about a
> > possibility of becoming a sub-project one.
> >
> > Would be interesting to know as indeed it looks very aligned with Apache
> > Spark.
> >
> > --
> > Alex
> >
> >> On Fri, Nov 13, 2015 at 10:05 AM, P. Taylor Goetz 
> wrote:
> >>
> >> Just a quick (or maybe not :) ) question...
> >>
> >> Given the tight coupling to the Apache Spark project, were there any
> >> considerations or discussions with the Spark community regarding
> including
> >> the Spark-Kernel functionality outright in Spark, or the possibility of
> >> becoming a subproject?
> >>
> >> I'm just curious. I don't think an answer one way or another would
> >> necessarily block incubation.
> >>
> >> -Taylor
> >>
> >>> On Nov 12, 2015, at 7:17 PM, da...@fallside.com wrote:
> >>>
> >>> Hello, we would like to start a discussion on accepting the
> Spark-Kernel,
> >>> a mechanism for applications to interactively and remotely access
> Apache
> >>> Spark, into the Apache Incubator.
> >>>
> >>> The proposal is available online at
> >>> https://wiki.apache.org/incubator/SparkKernelProposal, and it is
> >> appended
> >>> to this email.
> >>>
> >>> We are looking for additional mentors to help with this project, and we
> >>> would much appreciate your guidance and advice.
> >>>
> >>> Thank-you in advance,
> >>> David Fallside
> >>>
> >>>
> >>>
> >>> = Spark-Kernel Proposal =
> >>>
> >>> == Abstract ==
> >>> Spark-Kernel provides applications with a mechanism to interactively
> and
> >>> remotely access Apache Spark.
> >>>
> >>> == Proposal ==
> >>> The Spark-Kernel enables interactive applications to access Apache
> Spark
> >>> clusters. More specifically:
> >>> * Applications can send code-snippets and libraries for execution by
> >> Spark
> >>> * Applications can be deployed separately from Spark clusters and
> >>> communicate with the Spark-Kernel using the provided Spark-Kernel
> client
> >>> * Execution results and streaming data can be sent back to calling
> >>> applications
> >>> * Applications no longer have to be network connected to the workers
> on a
> >>> Spark cluster because the Spark-Kernel acts as each application’s proxy
> >>> * Work has started on enabling Spark-Kernel to support languages in
> >>> addition to Scala, namely Python (with PySpark), R (with SparkR), and
> SQL
> >>> (with SparkSQL)
> >>>
> >>> == Background & Rationale ==
> >>> Apache Spark provides applications with a fast and general purpose
> >>> distributed computing engine that supports static and streaming data,
> >>> tabular and graph representations of data, and an extensive library of
> >>> machine learning libraries. Consequently, a wide variety of
> applications
> >>> will be written for Spark and there will be interactive applications
> that
> >>> require relatively frequent function evaluations, and batch-oriented
> >>> applications that require one-shot or only occasional evaluation.
> >>>
> >>> Apache Spark provides two mechanisms for applications to connect with
> >>> Spark. The primary mechanism launches applications on Spark clusters
> >> using
> >>> spark-submit
> >>> (http://spark.apache.org/docs/latest/submitting-applications.html);
> this
> >>> requires developers to bundle their application code plus any
> >> dependencies
> >>> into JAR files, and then submit them to Spark. A second mechanism is an
> >>> ODBC/JDBC API
> >>> (
> >>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
> >> )
> >>> which enables applications to issue SQL queries against SparkSQL.
> >>>
> >>> Our experience when developing interactive applications, such as
> analytic
> >>> applications and Jupyter Notebooks, to run against Spark was that the
> >>> spark-submit mechanism was overly cumbersome and slow (requiring JAR
> >>> creation and forking processes to run spark-submit), and the SQL
> >> interface
> >>> was too limiting and did not offer easy access to components other than
> >>> SparkSQL, such as streaming. The most promising mechanism provided by
> >>> Apache Spark was the command-line shell
> >>> (
> >>
> 

request to join ipmc

2015-10-26 Thread Reynold Xin
Hi,

I am an Apache member and would like to join the IPMC. I'm a Spark
committer & PMC member, and have also contributed to various other projects
including Hive, Hadoop, etc. Let me know what else you need from me.

Thanks!


Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)

2013-09-21 Thread Reynold Xin
+1
On Sep 21, 2013 8:03 AM, Matei Zaharia matei.zaha...@gmail.com wrote:

 +1

 Matei

 On Sep 20, 2013, at 1:56 PM, Patrick Wendell pwend...@gmail.com wrote:

  Please vote on releasing the following candidate as Apache Spark
  (incubating) version 0.8.0. This will be the first incubator release for
  Spark in Apache.
 
  The tag to be voted on is v0.8.0-incubating (commit 3b85a85):
 
 https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=commit;h=3b85a8558da2c87873c85f227a189e45bf16b65d
 
  The release files, including signatures, digests, etc can be found at:
  http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc6/files/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-059/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-0.8.0-incubating-rc6/docs/
 
  A vote on this release has passed within the Spark PPMC [1] including
  +1 votes from our IPMC mentors (Chris Mattman and Henry Saputra).
 
  Please vote on releasing this package as Apache Spark 0.8.0-incubating!
 
  The vote is open until Monday, September 23rd at 21:00 UTC and passes if
  a majority of at least 3 +1 IPMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 0.8.0-incubating
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.incubator.apache.org/
 
 
  [1]
 http://mail-archives.apache.org/mod_mbox/incubator-spark-dev/201309.mbox/%3CCABPQxsvS14wfiABj32b_%2BgtLafmDog%3DcbWjn7v4FoqG5g-a7mQ%40mail.gmail.com%3E
 
  - Patrick
 
  -
  To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
  For additional commands, e-mail: general-h...@incubator.apache.org
 


 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org




Re: [PROPOSAL] Apache Spark for the Incubator

2013-05-31 Thread Reynold Xin
Spark it is an execution framework, but it also provides some high level
APIs which makes it much easier to do data analytics.

For example, to do grep like queries:

val docs = sparkContext.textFile(hdfs://...)
docs.filter(doc = doc.contains(Berkeley)).count

Another example to do word count (using the Scala API):

val docs = sparkContext.textFile(hdfs://...)
val counts = docs.flatMap(line = line.split(\\s+)).map(word =
(word, 1)).reduceByKey(_
+ _)
counts.saveAsTextFile(hdfs://...)

The high level APIs are similar to a lot of the relational operators,
including aggregations, group bys, joins, etc.

Shark uses Spark as the execution engine but provides a Hive-compatible SQL
interface. This proposal is however only about moving Spark to ASF
incubator, and not Shark.

--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org


On Fri, May 31, 2013 at 1:03 PM, Henry Saputra henry.sapu...@gmail.comwrote:

 I believe it is more of a framework but you can take a look at Shark which
 using Spark to do data warehousing that support hive query (
 http://shark.cs.berkeley.edu)

 - Henry

 On Friday, May 31, 2013, Chen, Pei wrote:

  +1 (non-binding)
  This seems like a really interesting project.
  Q- Is Spark just a framework/API or does it also have some tools
  implemented for data analytics?
  --Pei
 
   -Original Message-
   From: Mattmann, Chris A (398J) [mailto:chris.a.mattm...@jpl.nasa.gov]
   Sent: Friday, May 31, 2013 2:04 PM
   To: general@incubator.apache.org
   Subject: [PROPOSAL] Apache Spark for the Incubator
  
   Hi Folks,
  
   I'm pleased to bring you a proposal to the Apache Incubator for the
  Apache
   Spark project: https://wiki.apache.org/incubator/SparkProposal
  
   The work originates from the Berkeley AMPLab and through a number of
   industry participants, and other institutions. Spark is a framework for
  large-
   scale data analysis on clusters, with a particular focus on low latency
   operations.
   The
   source code is written in Scala, and provides a number of APIs and
  bindings in
   various programming languages.
  
   The proposal text is copied to the bottom of this email. I'm going to
  leave this
   thread open for the next week for discussion. Once it's died down, I'll
  call an
   official VOTE.
  
   Suresh, Ross G. -- heads up -- this project may be of interest to you
  both and
   would welcome you guys as additional mentors. We currently have 3
   mentors committed to the project, but would love to have more. People
   interested in contributing should declare their interest here on the
   general@incubator thread and those potential contributors will be
  discussed
   by the incoming Spark community.
  
   Questions -- let's hear em'! :)
  
   Cheers,
   Chris
   (Champion, incoming Apache Spark)
  
   === Abstract ===
   Spark is an open source system for large-scale data analysis on
 clusters.
  
   === Proposal ===
   Spark is an open source system for fast and flexible large-scale data
  analysis.
   Spark provides a general purpose runtime that supports low-latency
   execution in several forms. These include interactive exploration of
 very
   large datasets, near real-time stream processing, and ad-hoc SQL
  analytics
   (through higher layer extensions). Spark interfaces with HDFS, HBase,
   Cassandra and several other storage storage layers, and exposes APIs in
   Scala, Java and Python.
   Background
   Spark started as U.C. Berkeley research project, designed to
 efficiently
  run
   machine learning algorithms on large datasets. Over time, it has
 evolved
  into
   a general computing engine as outlined above. Spark¹s developer
 community
   has also grown to include additional institutions, such as
 universities,
   research labs, and corporations. Funding has been provided by various
   institutions including the U.S. National Science Foundation, DARPA,
 and a
   number of industry sponsors. See:
   https://amplab.cs.berkeley.edu/sponsors/ for full details.
  
   === Rationale ===
   As the number of contributors to Spark has grown, we have sought for a
   long-term home for the project, and we believe the Apache foundation
   would be a great fit. Spark is a natural fit for the Apache foundation:
  Spark
   already interoperates with several existing Apache projects (HDFS,
 HBase,
   Hive, Cassandra, Avro and Flume to name a few). The Spark team is
  familiar
   with the Apache process and and subscribes to the Apache mission - the
   team includes multiple Apache committers already. Finally, joining
 Apache
   will help coordinate the development effort of the growing number of
   organizations which contribute to Spark.
  
   == Initial Goals ==
   The initial goals will most likely be to move the existing codebase to
  Apache
   and integrate with the Apache development process. Furthermore, we plan
   for incremental development, and releases along with the Apache
   guidelines.
  
   === Current Status ===
   == Meritocracy ==
   The Spark