Thanks Matt. My feeling is that if you are willing to make you the chair
of the project,
which is really an administrative role if you are willing and willingness
to submit a board
report once monthly, and then quarterly after 3 months. This is to
recognize your contributions
and merit to the project, which will never expire. Even if you are not
actively developing, I think
you would make a great chair.
Apache Joshua works, has a release, and has a good community around it of
people like Lewis,
Tommaso, and others that I think it would withstand even your development
departure. It could
also make a good academic/learning tool and could be something we could
focus on getting new
GSOC projects to add in the NeuralMT stuff.
If you are OK with that I think we should proceed. Let me know and thanks.
Cheers,
Chris
On 9/25/17, 11:24 PM, "Matt Post" <p...@cs.jhu.edu> wrote:
Hi everyone,
I think now is as good time a time as any to mention my feelings about
Joshua. You may have noticed that I haven't done much active development
over the past year; you likely also know that the reason is that the
research community has shifted entirely from work on statistical models to
work on neural machine translation. On the research side, neural models now
consistently outperform phrase-based systems on BLEU score on language
pairs where there is enough data (roughly, around 15 million words of
training), and work there has injected a lot of new life into a field that
many had felt was starting to stagnate. From a production standpoint,
neural systems are also a big win: the models do best with a GPU and take
some time to train, but the architecture and pipeline are simpler, and the
resulting models are constant-sized and on the order of a few gigabytes at
most, instead of scaling with training data into the tens of gigabytes, as
statistical systems do. Test-time inference can also be run fairly
efficiently on CPUs where throughput demands are low enough. All commercial
systems are now neural or are quickly moving in that direction, including
relatively surprising places like Systran, which until recently was known
as the world's best-known rule-based system. As GPUs become more ubiquitous
and cheap, this situation is only going to get better, even for the end
user. There is little doubt that neural MT has supplanted statistical
approaches to machine translation, across both academic research and
industry. And it is still in its relative infancy, with lots of interesting
research problems and engineering issues to investigate and resolve.
It's somewhat sad for me because I've been working on or with Joshua
for almost seven years, but I also find my feelings here interesting in
contrast to a previous time I've felt tugged away from Joshua. As many of
you know, Philipp Koehn joined JHU a few years ago, which brought some
tension to JHU with respect to collaborating on research. There was
pressure for me to switch. Moses had a much bigger development community
and was much more feature rich, but despite this, I was reluctant to let go
of Joshua, for a number of reasons. Java is nicer to work with than C++
(and not really that much slower); our code is better written, IMO; jar
files are easier to distribute than C++ in compiled or source form; and, of
course, I had much more familiarity with the codebase, not to mention
something of a personal stake in Joshua. But with neural MT, I have none of
these reservations. It's nice for one to have the Moses/Joshua tension
resolved (sometimes, ignoring a problem does make it go away!), but for all
the reasons I listed in the opening paragraph, NMT is now the clear way to
go. And the bottom line for me is that I can no longer justify spending
time on Joshua during my working hours, and with a young family and other
interests that I want to pursue, I don't have time for it outside of work.
I am happy to still linger on the project, but am unlikely to be much of an
active participant unless I'm explicitly asked for something.
As I've written before here, I think there may still some role for
statistical systems, and therefore, for Joshua. In low-resource situations,
StatMT may still be the right approach overall, or even simply the best way
to quickly build up a working system. There is some promise I think in
deploying models easily on older hardware that people have, and perhaps
getting people to hep contribute translations and translation memories that
could be used to build and improve systems. There are surely more good
ideas in this space in the vein of providing a good tool to users.
It's been a great experience for me working with the Apache community
on Joshua. I am grateful to Chris for convincing us to make Joshua an
Apache incubator project, which put a lot of new life into the project.
Lewis has been a lot of help throughout helping smooth over the transition;
Tommaso has repeatedly helped with tasks large and small; and that is just
three of you. It's too bad therefore that the timing just didn't work out,
but neural MT ascended very rapidly. I know there are other members here
who are also thinking along these lines. At the same time, I hope my
departure from active development doesn’t mean the end of the project for
those of you who wish to keep working on it.
Sincerely,
matt
Le 25 sept. 2017 à 23:10, Tommaso Teofili <tommaso.teof...@gmail.com>
a écrit :
I would also think we're ready for graduation.
My only concern relates to how many of the current committers are
willing
to keep contributing to the project, basically if we have a PMC
which is
big enough for the graduation.
Regards,
Tommaso
Il giorno sab 23 set 2017 alle ore 01:21 Chris Mattmann <
mattm...@apache.org>
ha scritto:
Tom, glad you raised this issue, IMO, Joshua is ready for TLP.
We’ve:
1. Added new PPMC/committers
2. Made a release
3. Been friendly and cordial and welcoming on the lists
4. Vetted the software
5. Have some decent, emerging docs
Graduation time…Thoughts?
Cheers,
Chris
P.S. Subject line change to officially turn this into a [DISCUSS]
and
hopefully
a [VOTE]
On 9/22/17, 4:19 PM, "Tom Barber" <t...@spicule.co.uk> wrote:
So I've not checked against the checklist on the podling page
yet, but
what
do people feel is missing from Joshua prior to graduation?
I'd like to see some non mentors ship a release so we know we've
got
the
docs right, but of course it doesn't have to be a major release.
Similarly
was all the licensing stuff resolved etc?
I'm curious as its not a very fast paced project and it feels
like ones
like Joshua could sit in the incubator for years without causing
much
trouble but also not graduating. I'm not in any great rush, but
what do
people feel about it?
Tom