Yep, will send a result shortly. Lewis, after that, can you help me get the podling bootstrap tasks started?
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> Reply-To: "general@incubator.apache.org" <general@incubator.apache.org> Date: Friday, February 12, 2016 at 11:31 AM To: "general@incubator.apache.org" <general@incubator.apache.org> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling >Hi Chris, >Is it time to close out this VOTE and bring Joshua on board? >Lewis > >On Wed, Feb 3, 2016 at 4:01 PM, <general-digest-h...@incubator.apache.org> >wrote: > >> >> From: Danese Cooper <dan...@gmail.com> >> To: "general@incubator.apache.org" <general@incubator.apache.org> >> Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu> >> Date: Wed, 3 Feb 2016 07:43:11 -0800 >> Subject: Re: [VOTE] Accept Joshua as an Apache Incubator Podling >> +1 (binding) Accept Joshua as an Apache Incubator podling. >> >> D >> >> > On Jan 30, 2016, at 12:00 PM, Mattmann, Chris A (3980) < >> chris.a.mattm...@jpl.nasa.gov> wrote: >> > >> > Hi Everyone, >> > >> > OK the discussion is now completed. Please VOTE to accept Joshua >> > into the Apache Incubator. I’ll leave the VOTE open for at least >> > the next 72 hours, with hopes to close it next Friday the 5th of >> > February, 2016. >> > >> > [ ] +1 Accept Joshua as an Apache Incubator podling. >> > [ ] +0 Abstain. >> > [ ] -1 Don’t accept Joshua as an Apache Incubator podling because.. >> > >> > Of course, I am +1 on this. Please note VOTEs from Incubator PMC >> > members are binding but all are welcome to VOTE! >> > >> > Cheers, >> > Chris >> > >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > Chris Mattmann, Ph.D. >> > Chief Architect >> > Instrument Software and Science Data Systems Section (398) >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> > Office: 168-519, Mailstop: 168-527 >> > Email: chris.a.mattm...@nasa.gov >> > WWW: http://sunset.usc.edu/~mattmann/ >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > Adjunct Associate Professor, Computer Science Department >> > University of Southern California, Los Angeles, CA 90089 USA >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > >> > >> > >> > >> > >> > -----Original Message----- >> > From: jpluser <chris.a.mattm...@jpl.nasa.gov> >> > Date: Tuesday, January 12, 2016 at 10:56 PM >> > To: "general@incubator.apache.org" <general@incubator.apache.org> >> > Cc: "p...@cs.jhu.edu" <p...@cs.jhu.edu> >> > Subject: [DISCUSS] Apache Joshua Incubator Proposal - Machine >>Translation >> > Toolkit >> > >> >> Hi Everyone, >> >> >> >> Please find attached for your viewing pleasure a proposed new >>project, >> >> Apache Joshua, a statistical machine translation toolkit. The >>proposal >> >> is in wiki draft form at: >> https://wiki.apache.org/incubator/JoshuaProposal >> >> >> >> Proposal text is copied below. I’ll leave the discussion open for a >> week >> >> and we are interested in folks who would like to be initial >>committers >> >> and mentors. Please discuss here on the thread. >> >> >> >> Thanks! >> >> >> >> Cheers, >> >> Chris (Champion) >> >> >> >> ——— >> >> >> >> = Joshua Proposal = >> >> >> >> == Abstract == >> >> [[joshua-decoder.org|Joshua]] is an open-source statistical machine >> >> translation toolkit. It includes a Java-based decoder for translating >> with >> >> phrase-based, hierarchical, and syntax-based translation models, a >> >> Hadoop-based grammar extractor (Thrax), and an extensive set of tools >> and >> >> scripts for training and evaluating new models from parallel text. >> >> >> >> == Proposal == >> >> Joshua is a state of the art statistical machine translation system >>that >> >> provides a number of features: >> >> >> >> * Support for the two main paradigms in statistical machine >>translation: >> >> phrase-based and hierarchical / syntactic. >> >> * A sparse feature API that makes it easy to add new feature >>templates >> >> supporting millions of features >> >> * Native implementations of many tuners (MERT, MIRA, PRO, and >>AdaGrad) >> >> * Support for lattice decoding, allowing upstream NLP tools to expose >> >> their hypothesis space to the MT system >> >> * An efficient representation for models, allowing for quick loading >>of >> >> multi-gigabyte model files >> >> * Fast decoding speed (on par with Moses and mtplz) >> >> * Language packs — precompiled models that allow the decoder to be >> run as >> >> a black box >> >> * Thrax, a Hadoop-based tool for learning translation models from >> >> parallel text >> >> * A suite of tools for constructing new models for any language pair >>for >> >> which sufficient training data exists >> >> >> >> == Background and Rationale == >> >> A number of factors make this a good time for an Apache project >>focused >> on >> >> machine translation (MT): the quality of MT output (for many language >> >> pairs); the average computing resources available on computers, >>relative >> >> to the needs of MT systems; and the availability of a number of >> >> high-quality toolkits, together with a large base of researchers >>working >> >> on them. >> >> >> >> Over the past decade, machine translation (MT; the automatic >>translation >> >> of one human language to another) has become a reality. The research >> into >> >> statistical approaches to translation that began in the early >>nineties, >> >> together with the availability of large amounts of training data, and >> >> better computing infrastructure, have all come together to produce >> >> translations results that are “good enough†for a large set of >> language >> >> pairs and use cases. Free services like >> >> [[https://www.bing.com/translator|Bing Translator]] and >> >> [[https://translate.google.com|Google Translate]] have made these >> services >> >> available to the average person through direct interfaces and through >> >> tools like browser plugins, and sites across the world with higher >> >> translation needs use them to translate their pages through >> automatically. >> >> >> >> MT does not require the infrastructure of large corporations in >>order to >> >> produce feasible output. Machine translation can be >>resource-intensive, >> >> but need not be prohibitively so. Disk and memory usage are mostly a >> >> matter of model size, which for most language pairs is a few >>gigabytes >> at >> >> most, at which size models can provide coverage on the order of tens >>or >> >> even hundreds of thousands of words in the input and output >>languages. >> The >> >> computational complexity of the algorithms used to search for >> translations >> >> of new sentences are typically linear in the number of words in the >> input >> >> sentence, making it possible to run a translation engine on a >>personal >> >> computer. >> >> >> >> The research community has produced many different open source >> translation >> >> projects for a range of programming languages and under a variety of >> >> licenses. These projects include the core “decoder†, which takes >>a >> model >> >> and uses it to translate new sentences between the language pair the >> model >> >> was defined for. They also typically include a large set of tools >>that >> >> enable new models to be built from large sets of example translations >> >> (“parallel data†) and monolingual texts. These toolkits are >>usually >> built >> >> to support the agendas of the (largely) academic researchers that >>build >> >> them: the repeated cycle of building new models, tuning model >>parameters >> >> against development data, and evaluating them against held-out test >> data, >> >> using standard metrics for testing the quality of MT output. >> >> >> >> Together, these three factors—the quality of machine translation >> output, >> >> the feasibility of translating on standard computers, and the >> availability >> >> of tools to build models—make it reasonable for the end users to >>use >> MT as >> >> a black-box service, and to run it on their personal machine. >> >> >> >> These factors make it a good time for an organization with the >>status of >> >> the Apache Foundation to host a machine translation project. >> >> >> >> == Current Status == >> >> Joshua was originally ported from David Chiang’s Python >> implementation of >> >> Hiero by Zhifei Li, while he was a Ph.D. student at Johns Hopkins >> >> University. The current version is maintained by Matt Post at Johns >> >> Hopkins’ Human Language Technology Center of Excellence. Joshua has >> made >> >> many releases with a list of over 20 source code tags. The last >>release >> of >> >> Joshua was 6.0.5 on November 5th, 2015. >> >> >> >> == Meritocracy == >> >> The current developers are familiar with meritocratic open source >> >> development at Apache. Apache was chosen specifically because we >>want to >> >> encourage this style of development for the project. >> >> >> >> == Community == >> >> Joshua is used widely across the world. Perhaps its biggest (known) >> >> research / industrial user is the Amazon research group in Berlin. >> Another >> >> user is the US Army Research Lab. No formal census has been >>undertaken, >> >> but posts to the Joshua technical support mailing list, along with >>the >> >> occasional contributions, suggest small research and academic >> communities >> >> spread across the world, many of them in India. >> >> >> >> During incubation, we will explicitly seek to increase our usage >>across >> >> the board, including academic research, industry, and other end users >> >> interested in statistical machine translation. >> >> >> >> == Core Developers == >> >> The current set of core developers is fairly small, having fallen >>with >> the >> >> graduation from Johns Hopkins of some core student participants. >> However, >> >> Joshua is used fairly widely, as mentioned above, and there remains a >> >> commitment from the principal researcher at Johns Hopkins to >>continue to >> >> use and develop it. Joshua has seen a number of new community members >> >> become interested recently due to a potential for its projected use >>in a >> >> number of ongoing DARPA projects such as XDATA and Memex. >> >> >> >> == Alignment == >> >> Joshua is currently Copyright (c) 2015, Johns Hopkins University All >> >> rights reserved and licensed under BSD 2-clause license. It would of >> >> course be the intention to relicense this code under AL2.0 which >>would >> >> permit expanded and increased use of the software within Apache >> projects. >> >> There is currently an ongoing effort within the Apache Tika >>community to >> >> utilize Joshua within Tika’s Translate API, see >> >> [[https://issues.apache.org/jira/browse/TIKA-1343|TIKA-1343]]. >> >> >> >> == Known Risks == >> >> >> >> === Orphaned products === >> >> At the moment, regular contributions are made by a single >>contributor, >> the >> >> lead maintainer. He (Matt Post) plans to continue development for the >> next >> >> few years, but it is still a single point of failure, since the >>graduate >> >> students who worked on the project have moved on to jobs, mostly in >> >> industry. However, our goal is to help that process by growing the >> >> community in Apache, and at least in growing the community with users >> and >> >> participants from NASA JPL. >> >> >> >> === Inexperience with Open Source === >> >> The team both at Johns Hopkins and NASA JPL have experience with many >> OSS >> >> software projects at Apache and elsewhere. We understand "how it >>works" >> >> here at the foundation. >> >> >> >> >> >> == Relationships with Other Apache Products == >> >> Joshua includes dependences on Hadoop, and also is included as a >>plugin >> in >> >> Apache Tika. We are also interested in coordinating with other >>projects >> >> including Spark, and other projects needing MT services for language >> >> translation. >> >> >> >> == Developers == >> >> Joshua only has one regular developer who is employed by Johns >>Hopkins >> >> University. NASA JPL (Mattmann and McGibbney) have been contributing >> >> lately including a Brew formula and other contributions to the >>project >> >> through the DARPA XDATA and Memex programs. >> >> >> >> == Documentation == >> >> Documentation and publications related to Joshua can be found at >> >> joshua-decoder.org. The source for the Joshua documentation is >> currently >> >> hosted on Github at >> >> https://github.com/joshua-decoder/joshua-decoder.github.com >> >> >> >> == Initial Source == >> >> Current source resides at Github: github.com/joshua-decoder/joshua >>(the >> >> main decoder and toolkit) and github.com/joshua-decoder/thrax (the >> grammar >> >> extraction tool). >> >> >> >> == External Dependencies == >> >> Joshua has a number of external dependencies. Only BerkeleyLM (Apache >> 2.0) >> >> and KenLM (LGPG 2.1) are run-time decoder dependencies (one of which >>is >> >> needed for translating sentences with pre-built models). The rest are >> >> dependencies for the build system and pipeline, used for constructing >> and >> >> training new models from parallel text. >> >> >> >> Apache projects: >> >> * Ant >> >> * Hadoop >> >> * Commons >> >> * Maven >> >> * Ivy >> >> >> >> There are also a number of other open-source projects with various >> >> licenses that the project depends on both dynamically (runtime), and >> >> statically. >> >> >> >> === GNU GPL 2 === >> >> * Berkeley Aligner: https://code.google.com/p/berkeleyaligner/ >> >> >> >> === LGPG 2.1 === >> >> * KenLM: github.com/kpu/kenlm >> >> >> >> === Apache 2.0 === >> >> * BerkeleyLM: https://code.google.com/p/berkeleylm/ >> >> >> >> === GNU GPL === >> >> * GIZA++: http://www.statmt.org/moses/giza/GIZA++.html >> >> >> >> == Required Resources == >> >> * Mailing Lists >> >> * priv...@joshua.incubator.apache.org >> >> * d...@joshua.incubator.apache.org >> >> * comm...@joshua.incubator.apache.org >> >> >> >> * Git Repos >> >> * https://git-wip-us.apache.org/repos/asf/joshua.git >> >> >> >> * Issue Tracking >> >> * JIRA Joshua (JOSHUA) >> >> >> >> * Continuous Integration >> >> * Jenkins builds on https://builds.apache.org/ >> >> >> >> * Web >> >> * http://joshua.incubator.apache.org/ >> >> * wiki at http://cwiki.apache.org >> >> >> >> == Initial Committers == >> >> The following is a list of the planned initial Apache committers (the >> >> active subset of the committers for the current repository on >>Github). >> >> >> >> * Matt Post (p...@cs.jhu.edu) >> >> * Lewis John McGibbney (lewi...@apache.org) >> >> * Chris Mattmann (mattm...@apache.org) >> >> >> >> == Affiliations == >> >> >> >> * Johns Hopkins University >> >> * Matt Post >> >> >> >> * NASA JPL >> >> * Chris Mattmann >> >> * Lewis John McGibbney >> >> >> >> >> >> == Sponsors == >> >> === Champion === >> >> * Chris Mattmann (NASA/JPL) >> >> >> >> === Nominated Mentors === >> >> * Paul Ramirez >> >> * Lewis John McGibbney >> >> * Chris Mattmann >> >> >> >> == Sponsoring Entity == >> >> The Apache Incubator >> >> >> >> >> >> >> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Chris Mattmann, Ph.D. >> >> Chief Architect >> >> Instrument Software and Science Data Systems Section (398) >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> Office: 168-519, Mailstop: 168-527 >> >> Email: chris.a.mattm...@nasa.gov >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Adjunct Associate Professor, Computer Science Department >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++