Daniel Naber <daniel.na...@languagetool.org> wrote:

> Hi,
>
> the year is slowly coming to an end, so I thought I'd try to summarize what
> we've achieved this year and how we can move LT forward in the future. In
> 2015, we...
>
> * made three releases so far (2.9, 3.0, 3.1), another one is planned
> * more than doubled the number of visits to languagetool.org (January:
> 156,000, November: 326,000)
> * released a Chrome extension with more than 1,500 users now
> * added support for ngram models to detect confusion of (mostly) homophones
> (English, German)
> * did several things I forgot to list here

Good progress!

> * added and improved many language-specific rules. Specifically, 14
> languages are maintained if you define this as "had at least ten commits in
> its grammar.xml and disambiguation.xml files this year". However, this also
> means 17 languages are not maintained.
>
> This last point of unmaintained languages highlights what I think is an
> important issue: In the last three years, we increased our number of users
> by a factor of 10. At the same time, the number of commits and people who
> regularly contribute didn't grow at all (see attachment). Many languages are
> not maintained, and even those that are often only have a single
> contributor. If that contributor becomes inactive, finding a new one seems
> almost impossible. If we continue like this, LT will some day end up with
> very few languages that are actually maintained. As there doesn't seem to be
> any correlation between number of users and number of regular contributors,
> user growth won't help us.
>
> I have no solution for this problem, but some ideas I'd like to get feedback
> on:
>
> (1) Clean up: throw out all unmaintained languages that also have less than
> 100 rules. This way users don't get the false impression that their language
> is supported when it actually isn't. It might also create some motivation to
> contribute when users notice that "their" language is being thrown out.

I'm against it for several reasons:
* some unmaintained languages may still have good rules. It's a shame
  to discard the work.
* having a few rules in unmaintained languages may help to find new
  contributors. But I understand that it's hard somehow to find new
  contributors.
* number of rules is a useful metric, but it does not say how good and
  useful those rules are. Of course, assessing quality of support in
  a language is subjective,and it can only be done by someone knowing
  the language well enough, so there is no simple solution to decide how
  good LT is in each language.
* support for many language is a tick in a box and one of the differentiating
  feature of LT compared to other grammar checker, even if quality varies
  between languages.

Clearly indicating that support for a language is currently unmaintained
is the best we can do in my opinion, and we already do that.

> (2) Grow the contributor community: somehow find new contributors to revive
> the unmaintained languages and find contributors to support the maintainers
> of languages The thing is: I have no idea how
> to do this. For example, we have a text on languagetool.org saying we're
> looking for help with marketing. This text has been shown to more than
> 40,000 visitors and the effect so far has been zero (actually four people
> have contacted me, but three of those have already disappeared). What is
> holding people back from becoming a regular contributor?

I'm guessing that there are few contributors because of the
learning curve.  I remember looking at LT, being interested but it
took me a while before I actually started to contribute. It's not that
hard in end, but contributors have to be motivated enough to
understand how to contribute.

Documentation for developers has improved, but improving it
further may be effective for finding new contributors.


> (3) Crowdsourcing: give up on finding qualified contributors, instead
> develop tools that allow contribution via very, very simple means, like
> clicking on correct and incorrect sentences. It's not clear how well this
> could work. It might be combined with (4).

I'm against it.  It would put quantity over quality.
Well, crowdsourcing can be useful to find ideas of new rules for example,
but I think that someone with LT experience has to validate the rules and
transform into concrete and robust rules.

Having a web page to report false alarms and ideas for new rules
can be useful.  Ideas for new rules are more important, as it's not
always easy for rule maintainer to think of new rules, whereas it's
easy to find false positive by just checking many texts.

I have not added many new rules in Esperanto or Breton for example,
partly because I ran out of ideas for new rules.  For French, I had more
ideas, because there can be so many ways to make mistakes when
writing in French.

> (4) Statistics: give up on finding qualified contributors and find errors
> using ngram statistics etc. With statistics, finding errors is
> language-independent. Quality might be worse than with hand-written rules,
> but for languages that are not maintained anyway there are often hardly
> hand-written rules. Of course, everybody could still contribute manually
> written rules and maybe revive language support that way.

ngram is complementary with xml rules, but it's not a replacement.
To begin with, most users won't download large files.  So ngram are
mostly useful on server only I think.

> (5) Business: develop a business model and pay people for working on LT.
> This is difficult, developing a business is a full-time job on its own. Even
> if it worked, it would only cover very few mainstream languages.

I don't think this can realistically work.

> These are the options I can think of that go beyond "let's just keep going".
> Yes, we could just keep going - for some languages, LT is in good health.
> But to be a sustainable project in the long term, I think we need either
> more than one contributor per language or we need a technological approach
> that works without a maintainer per language.
>
> Please, everybody, let me know what you think and what ideas you have about
> the future of LanguageTool.
>
> Regards
>  Daniel

------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to