Re: [DISCUSS] Graduation (was Re: Path to TLP)

Chris Mattmann Tue, 26 Sep 2017 09:57:50 -0700

Thanks Matt. My feeling is that if you are willing to make you the chair of the 
project,
which is really an administrative role if you are willing and willingness to 
submit a board
report once monthly, and then quarterly after 3 months. This is to recognize 
your contributions
and merit to the project, which will never expire. Even if you are not actively 
developing, I think
you would make a great chair.


Apache Joshua works, has a release, and has a good community around it of 
people like Lewis, 
Tommaso, and others that I think it would withstand even your development 
departure. It could
also make a good academic/learning tool and could be something we could focus 
on getting new 
GSOC projects to add in the NeuralMT stuff.

If you are OK with that I think we should proceed. Let me know and thanks.

Cheers,
Chris




On 9/25/17, 11:24 PM, "Matt Post" <p...@cs.jhu.edu> wrote:

    Hi everyone,
    
    I think now is as good time a time as any to mention my feelings about 
Joshua. You may have noticed that I haven't done much active development over 
the past year; you likely also know that the reason is that the research 
community has shifted entirely from work on statistical models to work on 
neural machine translation. On the research side, neural models now 
consistently outperform phrase-based systems on BLEU score on language pairs 
where there is enough data (roughly, around 15 million words of training), and 
work there has injected a lot of new life into a field that many had felt was 
starting to stagnate. From a production standpoint, neural systems are also a 
big win: the models do best with a GPU and take some time to train, but the 
architecture and pipeline are simpler, and the resulting models are 
constant-sized and on the order of a few gigabytes at most, instead of scaling 
with training data into the tens of gigabytes, as statistical systems do. 
Test-time inference can also be run fairly efficiently on CPUs where throughput 
demands are low enough. All commercial systems are now neural or are quickly 
moving in that direction, including relatively surprising places like Systran, 
which until recently was known as the world's best-known rule-based system. As 
GPUs become more ubiquitous and cheap, this situation is only going to get 
better, even for the end user. There is little doubt that neural MT has 
supplanted statistical approaches to machine translation, across both academic 
research and industry. And it is still in its relative infancy, with lots of 
interesting research problems and engineering issues to investigate and resolve.
    
    It's somewhat sad for me because I've been working on or with Joshua for 
almost seven years, but I also find my feelings here interesting in contrast to 
a previous time I've felt tugged away from Joshua. As many of you know, Philipp 
Koehn joined JHU a few years ago, which brought some tension to JHU with 
respect to collaborating on research. There was pressure for me to switch. 
Moses had a much bigger development community and was much more feature rich, 
but despite this, I was reluctant to let go of Joshua, for a number of reasons. 
Java is nicer to work with than C++ (and not really that much slower); our code 
is better written, IMO; jar files are easier to distribute than C++ in compiled 
or source form; and, of course, I had much more familiarity with the codebase, 
not to mention something of a personal stake in Joshua. But with neural MT, I 
have none of these reservations. It's nice for one to have the Moses/Joshua 
tension resolved (sometimes, ignoring a problem does make it go away!), but for 
all the reasons I listed in the opening paragraph, NMT is now the clear way to 
go. And the bottom line for me is that I can no longer justify spending time on 
Joshua during my working hours, and with a young family and other interests 
that I want to pursue, I don't have time for it outside of work. I am happy to 
still linger on the project, but am unlikely to be much of an active 
participant unless I'm explicitly asked for something.
    
    As I've written before here, I think there may still some role for 
statistical systems, and therefore, for Joshua. In low-resource situations, 
StatMT may still be the right approach overall, or even simply the best way to 
quickly build up a working system. There is some promise I think in deploying 
models easily on older hardware that people have, and perhaps getting people to 
hep contribute translations and translation memories that could be used to 
build and improve systems. There are surely more good ideas in this space in 
the vein of providing a good tool to users. 
    
    It's been a great experience for me working with the Apache community on 
Joshua. I am grateful to Chris for convincing us to make Joshua an Apache 
incubator project, which put a lot of new life into the project. Lewis has been 
a lot of help throughout helping smooth over the transition; Tommaso has 
repeatedly helped with tasks large and small; and that is just three of you. 
It's too bad therefore that the timing just didn't work out, but neural MT 
ascended very rapidly. I know there are other members here who are also 
thinking along these lines. At the same time, I hope my departure from active 
development doesn’t mean the end of the project for those of you who wish to 
keep working on it. 
    
    Sincerely,
    matt
    
    
    > Le 25 sept. 2017 à 23:10, Tommaso Teofili <tommaso.teof...@gmail.com> a 
écrit :
    > 
    > I would also think we're ready for graduation.
    > My only concern relates to how many of the current committers are willing
    > to keep contributing to the project, basically if we have a PMC which is
    > big enough for the graduation.
    > 
    > Regards,
    > Tommaso
    > 
    > 
    > Il giorno sab 23 set 2017 alle ore 01:21 Chris Mattmann 
<mattm...@apache.org>
    > ha scritto:
    > 
    >> Tom, glad you raised this issue, IMO, Joshua is ready for TLP.
    >> 
    >> We’ve:
    >> 
    >> 1. Added new PPMC/committers
    >> 2. Made a release
    >> 3. Been friendly and cordial and welcoming on the lists
    >> 4. Vetted the software
    >> 5. Have some decent, emerging docs
    >> 
    >> Graduation time…Thoughts?
    >> 
    >> Cheers,
    >> Chris
    >> 
    >> P.S. Subject line change to officially turn this into a [DISCUSS] and
    >> hopefully
    >> a [VOTE]
    >> 
    >> 
    >> 
    >> On 9/22/17, 4:19 PM, "Tom Barber" <t...@spicule.co.uk> wrote:
    >> 
    >>    So I've not checked against the checklist on the podling page yet, but
    >> what
    >>    do people feel is missing from Joshua prior to graduation?
    >> 
    >>    I'd like to see some non mentors ship a release so we know we've got
    >> the
    >>    docs right, but of course it doesn't have to be a major release.
    >> Similarly
    >>    was all the licensing stuff resolved etc?
    >> 
    >>    I'm curious as its not a very fast paced project and it feels like 
ones
    >>    like Joshua could sit in the incubator for years without causing much
    >>    trouble but also not graduating. I'm not in any great rush, but what 
do
    >>    people feel about it?
    >> 
    >>    Tom
    >> 
    >> 
    >> 
    >>

Re: [DISCUSS] Graduation (was Re: Path to TLP)

Reply via email to