Bug#403619: languagetool -- rule-based language checker

2012-05-13 Thread Daniel Naber

This is an overview of LanguageTool's runtime dependencies.

Lib exists in Debian in the version LT needs it:

  libcommons-lang-java 2.4
  libcommons-logging-java 1.1.1
  libcommons-validator-java 1.3.1

Lib exists but is not up-to-date (I checked 'unstable'):

  libsegment-java 1.3.5, LT needs 1.3.0 and LT 1.8 will need 1.3.8
  libjwordsplitter-java 3.0, LT needs 3.3
  libmorfologik-stemming-java 1.2.2, LT needs morfologik-fsa-1.5.2 and
 morfologik-stemming-1.5.2 (the lib has been split up)

Libs that I did not find in Debian and that we require:

  tika-core-0.9.jar from http://tika.apache.org/, Apache License 2.0

Libs that I did not find in Debian but that are only required for Chinese so 
I think we could do without for now:

  ictclas4j-1.0.jar from http://code.google.com/p/ictclas4j/, 
Apache License 2.0
  CJFtransform_v1.0.1_bin.jar from http://code.google.com/p/cjftransform/,
 Apache License 2.0 

The internal dictionaries we use are huge when saved as text files (e.g. 
200MB for German alone). Thus we compress them as a finite-state automaton 
with the morfologik-stemming project, which yields a 10 time better 
compression than bzip2 (tested with the German dictionary). We 
describe how to dump the dictionaries to plain text at the URL that Marcin 
has posted.

The question is, what can we do now to help the process of getting LT into 
Debian?

Regards
 Daniel

-- 
http://www.danielnaber.de



-- 
To UNSUBSCRIBE, email to debian-wnpp-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201205131635.36...@danielnaber.de



Bug#193888: Outstanding ITP - openoffice.org-thesaurus-de

2004-04-14 Thread Daniel Naber
On Tuesday 13 April 2004 19:19, Rene Engelhard wrote:

 the ooo_export.php looks like being admin
 stuff of which I have no permissions or ways to use.

You could simply remove the test for the admin user. But the script still 
won't work on the dump, but on the database. I don't see any way to change 
this easily. The script would need to be rewritten completely.

Regards
 Daniel

-- 
http://www.danielnaber.de



Bug#193888: Outstanding ITP - openoffice.org-thesaurus-de

2004-04-13 Thread Daniel Naber
On Tuesday 13 April 2004 12:24, Rene Engelhard wrote:

 As I did ask for those scripts (a.k.a part of the source in the GPL
 sense) I didn't get it. (No offense intended, this is just a report.)

Not sure what you mean, I replied to your mail on 2003-09-16. The export 
script is a web page written in PHP. It exports to two text files.These 
are then fed into an awk script which produces the binary files.

Both scripts (ooo_export.php and Parse_Thes.awk) are available from CVS:
http://cvs.sourceforge.net/viewcvs.py/openthesaurus/openthesaurus/www/
admin/

The PHP works on the database, not on the dump, so it's not so easy to set 
up a make like task which builds the files.

Regards
 Daniel 

-- 
http://www.danielnaber.de