Re: Any objections to git hosting for Incubator projects?
Also in S4, everyone agrees. It was discussed in the dev mailing list. -leo On Mon, Dec 5, 2011 at 2:02 PM, Ross Gardler rgard...@opendirective.com wrote: Sent from my mobile device, please forgive errors and brevity. On Dec 5, 2011 10:21 AM, Mark Struberg strub...@yahoo.de wrote: +1 for DeltaSpike I thinkthe other requests over at asf-infra also did come from Mentors (as far as I have seen). Correct for Callback. My proposal links to the dev list thread in which all mentors agree to help. Ross LieGrue, strub - Original Message - From: Bertrand Delacretaz bdelacre...@apache.org To: general@incubator.apache.org; Joe Schaefer joe_schae...@yahoo.com Cc: Sent: Monday, December 5, 2011 9:57 AM Subject: Re: Any objections to git hosting for Incubator projects? Hi Joe, On Sat, Dec 3, 2011 at 6:46 PM, Joe Schaefer joe_schae...@yahoo.com wrote: So earlier this week infrastructure put out an RFP regarding early adoption of git hosting at the ASF and 3 Incubator projects have responded: callback, s4, and deltaspike... Very cool. Unless there are formal objections to such submissions infrastructure will evaluate their proposals just as if they came from the IPMC itself I'm ok as long as you have evidence (via messages or votes on public mailing lists) that those podlings' mentors support those requests. -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- Leo Neumeyer (@leoneu) - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: Git to SVN
cross-posting to incubator in case someone there also had to go through the git to svn migration and can share their experience. Is there a process we can follow to put s4 in the beta git program? I know this can create additional work for the infra folks but seems we may already have a beta system perhaps we can just try it and help improve it. We all benefit. -leo On Tue, Nov 8, 2011 at 11:48 PM, Bruce Robbins bruce.robbins...@gmail.comwrote: I am the fellow trying to convert the S4 git repository to SVN with history preserved. I gave the following a whirl: http://code.google.com/p/support/wiki/ImportingFromGit http://sandrotosi.blogspot.com/2010/02/migrate-git-repo-to-svn-one.html http://blog.paulbetts.org/index.php/2007/12/02/moving-git-repositories-to-svntfs/ The first two basically had the same issue: it required me to manually resolve all merges that had been performed in the history (and we have many). The third one just produced a series of error messages. I have not tried the perl script yet. I will contact Jukka Zittings. However, I agree with Jeremy: it would be great if we didn't have to preserve history from git to svn, and then sometime later from svn back to git. On Tue, Nov 8, 2011 at 6:08 AM, Jeremy Thomerson jer...@thomersonfamily.com wrote: On Tue, Nov 8, 2011 at 12:26 AM, Paul Davis paul.joseph.da...@gmail.com wrote: Since so many projects seem to be migrating from svn to git, is there any way we can continue using git for this project. (I mean use git only, not as a mirror). While I can't comment on best practices for Git-to-SVN moving, I'd just like to chime in to note that so many projects is not a good description of the current state of Git support at the ASF. Currently there is a single project using it in a limited manner as a test for wider support. There is no proposed time table for if/when Git support will be expanded so I would proceed as if it doesn't exist. Is it possible that this project could be added as a beta tester of Git? It really doesn't make sense to make them migrate git-svn-git. If the Git experiment fails, then they do a single migration. If it succeeds, no migration necessary. Jeremy Thomerson -- Leo Neumeyer (@leoneu)
Re: Git to SVN
I'm curious if you also discussed using a hosted service like GitHub for the projects. Seems to me that it would save us so much in resources and time to take advantage of their free accounts for open source projects and they seem to be doing a pretty good job. Perhaps there are concerns about relying on a third party but for an organization powered by volunteers, using a free service like this could be a great benefit. Perhaps there are good reasons why this cannot be done but was wondering if it was discussed at all. -leo On Wed, Nov 9, 2011 at 10:46 AM, Christian Grobmeier grobme...@gmail.comwrote: On Wed, Nov 9, 2011 at 7:38 PM, Jeremy Thomerson jer...@thomersonfamily.com wrote: On Wed, Nov 9, 2011 at 1:30 PM, Christian Grobmeier grobme...@gmail.com wrote: In addition you can ask on wave-dev. They have finally decided to loose history and do an initial checkin because it was not easily possible to convert. Why are we doing this on any project when we potentially have a git solution around the corner? Why not add these projects to the git experiment and then only take this drastic action if the git experiment fails? This discussion was already held before a while. There are many questions to solve before. Cheers, Christian Jeremy Thomerson -- http://www.grobmeier.de -- Leo Neumeyer (@leoneu)
Re: [VOTE] S4 to join the Incubator
with Zookeeper. He has been an active contributor to Hadoop. * Flavio Junqueira has a background in distributed computing. He is a committer of ZooKeeper, a ZooKeeper PMC member, and a committer of BookKeeper; * Matthieu Morel has extensive background in distributed systems, he likes theory and loves to implement things. He has been the main designer and implementor of S4 checkpointing.* Anish Nair has been the project’s main customer. With his background on natural language processing and algorithms he developed the applications that drove the S4 design including processing of social feeds and real-time recommendation engines. * Leo Neumeyer has a background in signal processing and statistical modeling but has been advocating clean simple software design throughout his career. At Yahoo! he conceived and championed the S4 project as a solution to improve monetization in search advertising. * Bruce Robbins has been the main S4 developer, taking the concept from idea to releases. Bruce engineering experience ranges from programming Mainframe computers to assembly code. === Alignment === S4 brings stream processing capabilities that complement Hadoop's batch processing capabilities. == Known Risks == === Orphaned Products === S4 has been used in production at Yahoo! and is being evaluated by other organizations. The developers have continued to support the project on their own time. We believe that adoption will increase significantly as more tools and documentation become available. As the project evolves, we may see new ideas that we may want to adopt or, if it makes sense and is practical, we may want to merge two or more open source projects. We believe that there is a clear need to have a well supported open source stream processing platform and therefore, there is low risk of the project becoming orphan. However, we are open to combining projects in order to have fewer projects with a more active community. Ultimately, this will be decided by the design ideas, the implementation quality, and the adoption. === Inexperience with Open Source === The S4 code was open sourced by Yahoo! under Apache 2.0 license. One committer of the S4 project, Flavio Junqueira, is intimately familiar with the Apache model for open-source development and is experienced with working with new contributors. Flavio is both a committer a PMC member for ZooKeeper. The other developers have had experience as contributors in other open-source projects. Most of the original S4 developers continue to be committers. === Homogeneous Developers === The initial set of committers for S4 represent four different companies: A9, Linkedin, Quantbench, and Yahoo!. This set is diverse enough for a starting project. === Reliance on Salaried Developers === Some committers are contributing as part of their jobs, but as we move to a more diverse set of developers we expect a good mix of salaried and volunteer time. === Relationships with Other Apache Projects === S4 relies on the following Apache projects: * BCEL (bytecode generation library) * commons cli (command line interface) * commons logging (needed by some other dependency) * log4j * commons jexl (expression processing) * zookeeper * Maven and its usual plug-ins (build time only) Compared to existing projects, S4 complements existing functionality in a few ways summarized below: * Flume: S4 processes streams in a distributed fashion and enables applications to form arbitrary graphs of processing elements. Flume focuses on accumulating streams of logs in a centalized repository for batch processing; * Kafka: Kafka is a pub/sub messaging layer that interposes generation of events and processing, while S4 itself forwards events and processes them in a stream fashion. * Hadoop: Hadoop focuses on batch processing of large data sets, while S4 is a platform for stream processing of events. We would like to implement extensions that enable processing in both platforms with the same code. === An Excessive Fascination with the Apache Brand === The project has already received a significant amount of attention and so far has been associated with Yahoo!. We would like, however, to foster the development of a community around S4 that evolves independently of the interests of a single company. Given the reliance of S4 on some Apache projects and the principles promoted by the foundation, we find it a suitable home for the project. == Documentation == * S4 Website: http://s4.io * S4 documentation: http://docs.s4.io/ * S4 Forum: http://groups.google.com/**group/s4-project/topicshttp://groups.google.com/group/s4-project/topics * S4 Mailing list (with archives): http://groups.google.com/** group/s4-project http://groups.google.com/group/s4-project == Source and Intellectual Property Submission Plan == The S4 source code is already licensed under Apache Software License 2.0
Re: [VOTE] S4 to join the Incubator
use Hadoop by segmenting the input stream into data batches. This solution is not efficient, results in high latency, and introduces unnecessary complexity. The S4 design is primarily driven by large scale applications for data mining and machine learning in a production environment. We think that the S4 design is surprisingly flexible and lends itself to run in large clusters built with commodity hardware. S4 enables application programmers to focus more on the application and less on the infrastructure. S4 also provides a consistent graph oriented programming model that, if widely adopted, will facilitate sharing of basic component across developers. == Initial Goals == The basic S4 infrastructure is complete and can be used in real-world applications. However, many additional components need to be developed and improved. Some areas we hope to focus on in Apache: * Add a reliable communication protocol option to the communication layer for low bandwidth control messages that require guaranteed delivery. * Higher-performance serialization and inter-node communication. * Functionality to save the state of PEs at runtime transparently and restore it at startup. * Intelligent load shedding strategies. * Dynamic load balancing to make it possible to add and remove nodes from the cluster without data loss. * Dynamic application loading and unloading. * Migration to a pure object-oriented design that takes advantage of Java static typing using Generics in the framework code. (Keep it simple for the application developer.) * Eliminate string identifiers and XML configuration. * Adopt JSR 330 (Dependency Injection for Java). * Add real-time query support. * Add a cluster management system. Clearly this is a long list but sets the high level roadmap for the project. == Current Status == The project has been under development at Yahoo! since late 2008, and it was open sourced in October 2010. Since then we have received patches from developers, started a discussion forum, and improved the documentation. === Meritocracy === The S4 project was initially developed at Yahoo! Labs, a research-oriented organization that values original ideas and individual contributions. The design evolved in a bottom up fashion, where decisions were driven by the application and the long-term viability and flexibility of the platform. Once the project became open-source it continued to be managed by those who were actively doing the work. === Community === S4 is currently in use internally at Yahoo!, and since it was released as an open source project it has received positive feedback and contributions from developers. === Core Developers === S4 developers span a few companies and work on a voluntary basis. We expect to have developers from other organizations joining the team in the next few months, especially if S4 joins the Apache Incubator project. Being an Apache Incubator project is likely to attract the attention of more talented developers. One interesting aspect of the current group of developers is the diverse background: * Kishore Gopalakrishna was the main developer of the communication layer and the integration with Zookeeper. He has been an active contributor to Hadoop. * Flavio Junqueira has a background in distributed computing. He is a committer of ZooKeeper, a ZooKeeper PMC member, and a committer of BookKeeper; * Matthieu Morel has extensive background in distributed systems, he likes theory and loves to implement things. He has been the main designer and implementor of S4 checkpointing.* Anish Nair has been the project’s main customer. With his background on natural language processing and algorithms he developed the applications that drove the S4 design including processing of social feeds and real-time recommendation engines. * Leo Neumeyer has a background in signal processing and statistical modeling but has been advocating clean simple software design throughout his career. At Yahoo! he conceived and championed the S4 project as a solution to improve monetization in search advertising. * Bruce Robbins has been the main S4 developer, taking the concept from idea to releases. Bruce engineering experience ranges from programming Mainframe computers to assembly code. === Alignment === S4 brings stream processing capabilities that complement Hadoop's batch processing capabilities. == Known Risks == === Orphaned Products === S4 has been used in production at Yahoo! and is being evaluated by other organizations. The developers have continued to support the project on their own time. We believe that adoption will increase significantly as more tools and documentation become available. As the project evolves, we may see new ideas that we may want to adopt or, if it makes sense and is practical, we may want to merge two or more open source projects. We believe
Re: [PROPOSAL] S4 for the Apache Incubator
Phil and all, Great discussion and so happy you want to join the team. No need to apologize !! My feeling is that if someone wants to join the project as a contributor and has technical merit he or she will become a committer pretty quickly. I think that having a minimal protocol is useful to make sure people get to know each other and the project. In fact, the current policy seems good to me: http://incubator.apache.org/guides/participation.html I love the DO-ocracy concept and seems to be the best way to become a committer. So I propose that those who are interested and can volunteer some time, start thinking on how to contribute. If the project is accepted we will discuss the details in the mailing list. Thanks again! -leo On Sep 15, 2011, at 5:54 PM, Phillip Rhodes wrote: On Thu, Sep 15, 2011 at 5:34 PM, Flavio Junqueira f...@s4.io wrote: I have read the guide to participation: http://incubator.apache.org/guides/participation.html and I understand from there that people shouldn't simply jump in as an initial committer without a short introduction and without acknowledgment from the proposer. Since I was one of those people, let me issue a mea culpa here. Despite having read the participation guidelines (more than once even) I apparently slipped into a bit of a conditioned response, from observed behavior. For better or worse, it has become not uncommon (in my experience) to see people simply jump in and add themselves. In retrospect, yes, it probably is a bit rude, and I apologize for my part in this. I suppose t's just what Roy said in 2006: everybody saw a certain process appearing to happen, assumed it was policy and didn't give it any further thought. Guess I'm guilty of that. Our expectation when we submitted the proposal was that the initial set of committers would comprise the people who have initially contributed to get the current code to this stage, and we were not expecting arbitrary requests to join the initial list of committers. While jumping in is - as we've already established - in bad taste, I *think* that (most|any|some) projects entering incubation should expect such requests. Part of the focus of the incubator, as I've understood it, is to promote sufficient diversity in the community and the team, that no one block of people can kill the project by dropping out or whatever. Having new initial committers that have no outstanding connection to the project is one way to achieve that. In this regard, the incubation period is radically different from other times in the project lifecycle. Or, again, that has been my understanding. Then again, maybe it only appears that way because some projects make it a point to appeal to people *to* join in as initial committers. Of course, as a potential Apache project (now potentially incubator, but looking forward to being TLP in the future), we are ready to work towards building a community, which includes granting the status of committer to contributors. However, we'd like new committers to earn their status by showing commitment to the community and demonstrating technical merit. Absolutely, and entering the incubator is the only time - AFAIK - that projects here tend to take a slightly different stance. It's all about seeding the initial pool before the project gets underway. That said, I'm not sure projects are required to accept an additional initial committers beyond what the proposer suggests. For my own part, I'll just say that I'm excited about S4, very happy to volunteer to help, and if you guys want me, I'm in. If not, take me off the list and it'll all be cool. FSM knows, I have plenty of stuff to keep me occupied already. ;-) As far as introduction goes... Well, I founded Fogbeam Labs, started the ScrewPile project to develop an OSS suite of Enterprise Knowledge Management software. I've been a professional software engineer for the past 12-13 years, working mostly in Java, but some C, C++, Python and Groovy as well. If anyone wants to know more about me, just ask, or see: https://plus.google.com/u/1/114301088526097505896/about Cheers, Phil - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[PROPOSAL] S4 for the Apache Incubator
Dear all, I would like to propose S4 to be an Apache Incubator project. S4 is a distributed streaming platform written in Java. Here is a link to the proposal in the Incubator wiki: http://wiki.apache.org/incubator/S4Proposal Thanks, Leo Neumeyer http://s4.io http://twitter.com/leoneu - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org