Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Jan Schreiber
Dominique, thanks for taking the trouble to test it, etc.

>From my POV, the upshot of the discussion *so far* is that we should not
split the grammar files, even though some of them are getting quite
large. Correct me if I'm wrong. For me (on a six years old cheap
computer), there is no problem today (but I still wonder where all of
this will lead us in the near future).

The largest grammar files are something around 1.2 MB afaik, I guess
this is still in a range you can call normal today.

I still think file size is, generally speaking, an issue in a project
like this, with all those languages to consider, but as far as I'm
concerned, I consider it a non-issue pending further notice.

--Jan

Dominique Pellé wrote:
> Jan Schreiber
> 
>> I know that ridiculously huge file is a bit of a problem
> 
> Are they?  grammar.xml files are not that big.
> 
> A text editor opens the biggest grammar.xml in a blink on
> my 5 years old laptop.
> 
> To make it easier to navigate when editing, I define folds in Vim
> with a modeline (see comment at bottom of rules/fr/grammar.xml
> or rules/eo/grammar.xml). I'm pretty sure Emacs could do that too.
> I find that automatic folding helps to have a global overview of the
> large grammar file.
> 
> I removed all indentation as an experiment for de/grammar.xml
> and the saving in size is negligible (2.7%).


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Jan Schreiber
Marcin:
> Classifying some of the words semantically might be really useful for 
> some rules.

Indeed, I could not agree more. The most difficult part would be coming
up with the semantic categories in a way that is not completely ad hoc.
Everyone who has ever used a public library is probably aware of the
fact that categorizing things in a comprehensible way is anything but
trivial. Our own categories in the grammar files are cases in point.

We could use the categories from Daniel's OpenThesaurus as a starting
point for German, but they are quite unsatisfactory. Maybe the "tag
clouds" used in some public bookmarking systems such as the former
delicio.us can be used as a source, at least as a source of inspiration. --J


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] trunk/JLanguageTool/src/rules/de/grammar. xml

2012-05-16 Thread Daniel Naber
On Montag, 14. Mai 2012, Ruud Baars wrote:

> Don't bother converting the Dutch xml.
> I have already manually done that.
> 
> Have to find the time to download the snapshot and get it tested.

Just send it when you're ready - I have applied the automatic conversion to 
Dutch for now so I can remove support for the old mark_from/mark_to.

Regards
 Daniel

-- 
http://www.danielnaber.de

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Any advice?

2012-05-16 Thread Marcin Miłkowski
W dniu 2012-05-16 22:28, gulp21 pisze:
>> As much as I hate passing the buck, I'm afraid writing such a rule is
>> beyond my (pretty much non-existent) Java skills.
>
> I planned to write a Java rule for it, but I'm rather busy at the
> moment. Unless somebody is quicker than me, or has a better idea, I'll
> start working on it in few weeks.
>

Reusing AtD code might be a good idea. Their code is under LGPL, as far 
as I remember but the project seems to be stalled. And they have a clear 
procedure for training the language model AFAIK.

Regards
Marcin

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Apache OpenOffice 3.4 and LT

2012-05-16 Thread Daniel Naber
On Samstag, 12. Mai 2012, Daniel Naber wrote:

> Has anybody tried LT with the recently released Apache OpenOffice.org?
> It  works for me but the freeze-on-startup problem is worse than ever,
> it freezes 45 seconds for me (compared to 4 seconds with the latest
> LibreOffice).

It turns out the problem is Java 7. I have added a hint on our homepage 
that people should use Java 6 for now. That's very unfortunate, as Java 7 
has just been released for end users and I couldn't even find an official 
location at Oracle where you can still download Java 6 without having a 
user account. LibreOffice 3.5.4 will hopefully solve most of this problem, I 
will add a notice that people should use that as soon as it's released (in 
a few weeks).

This issue was already discussed here without a real solution:
http://lists.freedesktop.org/archives/libreoffice/2011-October/019388.html

Regards
 Daniel

-- 
http://www.danielnaber.de

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Any advice?

2012-05-16 Thread gulp21
> As much as I hate passing the buck, I'm afraid writing such a rule is
> beyond my (pretty much non-existent) Java skills.

I planned to write a Java rule for it, but I'm rather busy at the 
moment. Unless somebody is quicker than me, or has a better idea, I'll 
start working on it in few weeks.

Regards
Markus

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Any advice?

2012-05-16 Thread Jan Schreiber
This is excellent news! It pretty much answers a question I was going to
ask on this list about automating part of the rule-creation process.

With the recently updated
http://languagetool.svn.sourceforge.net/viewvc/languagetool/trunk/JLanguageTool/src/resource/de/words-similar.txt
and the Wikipedia data that we use for rule testing and such I hope it
becomes doable for German.

Marcin Miłkowski schrieb:
> Hi all,
> 
> Actually, word confusion is one area where a lot of experiments were 
> made. I also made an experiment with Brill tagger and it worked really 
> fine with English. It should be easy with Dutch as well:
> 
> http://marcinmilkowski.pl/downloads/automating_rules_full.pdf
> 
> Unfortunately, the process does not produce full-blown LT rules, just 
> Brill tagger rules, but all you need is a big clean corpus without any 
> mistake and a list of confusions. The rest is pretty much automatic, and 
> the quality is pretty high.
> 
> It was on our Google Summer of Code list exactly for this reason - 
> making this process automatic seems very easy, as there is a Java 
> version of a Brill tagger that could be used, and we could fairly easily 
> convert the rules to our formalism.
> 
> Another option would be to use the statistical modeling the way it is 
> used in After the Deadline. I'm not sure how good it is in such things, 
> as it never really impressed me with high number of raised alarms.
> 
> Regards
> Marcin
> 
> W dniu 2012-05-16 20:58, Juan Martorell pisze:
>> The problem you set out is entirely semantic and common to all
>> languages. There is no way to distinguish both verbs but semantically.
>> That introduces a new category for comparison, perhaps category trees
>> and IMHO that would overwhelm the scope of the project.
>>
>> A possible shortcut is introducing semantic categories as mock POS and
>> checking their compatibility within the rules as if it were a common
>> agreement. I discourage this because it denormalizes the tagger dictionary.
>>
>> I therefore recommend the brute-force approach, provided that the chance
>> of committing such mistakes justifies the investment.
>>
>> However I'd rather focus on lightening the software than on swelling it
>> with new features. The more light, fast and easy to use, the more
>> successful.
>>
>> Best regards,
>> Juan
>>
>> 2012/5/16 R.J. Baars mailto:r.j.ba...@xs4all.nl>>
>>
>> There is quite a bit of word confusing going on in Dutch. An example:
>>
>> geplant (planted) versus gepland (planned).
>>
>> This is not a grammatical issue, but actually using the wrong word,
>> thereby altering the intention of the sentence.
>>
>> Neitehr is wrong. Both are very common. Nevertheless, a warning is of
>> added value. What I need is suppression of lots of unnecessary warnings.
>>
>> I could add exceptrions for the warning on 'geplant' for every sentence
>> that contains either plant, tree, shrub, etc.
>> And exceptions on the warning on 'gepland' for every sentence
>> containing:
>> project, activity, planning etc.
>>
>> But would it be possible to create a 'context' from the sentence and
>> checking if the word is likely in the context?
>>
>>  >From teh large corpus we built, it would be possible to determine the
>> 'likely context words' for any confusing word.
>>
>> Has anyone ever thought about a way to implement this kind of check
>> to LT?
>>
>> Any thoughts to do it within existing functionality? Is Dutch the only
>> language having a confusion issue like this?
>>
>> Ruud
>>
>>
>> 
>> --
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond.
>> Discussions
>> will include endpoint security, mobile security and the latest in
>> malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> ___
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> 
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>>
>>
>>
>> --
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>
>>
>>
>> ___
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
> 
> 
> -

Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Marcin Miłkowski
W dniu 2012-05-16 20:10, Jan Schreiber pisze:

> BTW, it should be possible to store at least those entities outside the
> file itself, but I don't know how. --Jan

Well, I had a look and it seems that you are using some of the entities 
to define fairly long regular expressions (disjunctions). This slows 
down LT quite substantially (I profiled some rules in the Polish XML 
file). I had such long lists for Polish reflexive verbs, and I decided 
to add a new POS tag for that, and it made processing much faster.

But my solution was a hack that can be made more general. We do not need 
to be include such new classifications in the normal tagger file: as our 
taggers can be used instead of all such disjunctive regular expressions, 
you could also simply include lists of adjectives referring to languages 
(sprachadj) in a dedicated semantic tagger file. This might be read by a 
manual tagger or a morfologik-stemming tagger (which will definitely 
work faster). We could, in principle, add a new attribute - a "semantic 
classification tag" - to XML that would be differentiated from a normal 
POS tag, and use our existing tagger infrastructure to support this new 
feature.

I planned to use some parts of the Polish Wordnet for some rules, and 
only recently it was made available under a BSD-like license. 
Classifying some of the words semantically might be really useful for 
some rules.

Regards
Marcin

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Any advice?

2012-05-16 Thread Jan Schreiber
gulp21:
> As there are many rules of that type, I would suggest that a general 
> WrongWordInContext-java-rules is created, because having many xml-rules 
> which only differ in the list of words seems to be absurd.

I'm pretty sure that would help a lot, especially since Juan pointed out
that the need for disambiguation is common to all languages. Ideally a
Java rule accompanied by a tab-separated list per language.

As much as I hate passing the buck, I'm afraid writing such a rule is
beyond my (pretty much non-existent) Java skills. --Jan

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Any advice?

2012-05-16 Thread Marcin Miłkowski
Hi all,

Actually, word confusion is one area where a lot of experiments were 
made. I also made an experiment with Brill tagger and it worked really 
fine with English. It should be easy with Dutch as well:

http://marcinmilkowski.pl/downloads/automating_rules_full.pdf

Unfortunately, the process does not produce full-blown LT rules, just 
Brill tagger rules, but all you need is a big clean corpus without any 
mistake and a list of confusions. The rest is pretty much automatic, and 
the quality is pretty high.

It was on our Google Summer of Code list exactly for this reason - 
making this process automatic seems very easy, as there is a Java 
version of a Brill tagger that could be used, and we could fairly easily 
convert the rules to our formalism.

Another option would be to use the statistical modeling the way it is 
used in After the Deadline. I'm not sure how good it is in such things, 
as it never really impressed me with high number of raised alarms.

Regards
Marcin

W dniu 2012-05-16 20:58, Juan Martorell pisze:
> The problem you set out is entirely semantic and common to all
> languages. There is no way to distinguish both verbs but semantically.
> That introduces a new category for comparison, perhaps category trees
> and IMHO that would overwhelm the scope of the project.
>
> A possible shortcut is introducing semantic categories as mock POS and
> checking their compatibility within the rules as if it were a common
> agreement. I discourage this because it denormalizes the tagger dictionary.
>
> I therefore recommend the brute-force approach, provided that the chance
> of committing such mistakes justifies the investment.
>
> However I'd rather focus on lightening the software than on swelling it
> with new features. The more light, fast and easy to use, the more
> successful.
>
> Best regards,
> Juan
>
> 2012/5/16 R.J. Baars mailto:r.j.ba...@xs4all.nl>>
>
> There is quite a bit of word confusing going on in Dutch. An example:
>
> geplant (planted) versus gepland (planned).
>
> This is not a grammatical issue, but actually using the wrong word,
> thereby altering the intention of the sentence.
>
> Neitehr is wrong. Both are very common. Nevertheless, a warning is of
> added value. What I need is suppression of lots of unnecessary warnings.
>
> I could add exceptrions for the warning on 'geplant' for every sentence
> that contains either plant, tree, shrub, etc.
> And exceptions on the warning on 'gepland' for every sentence
> containing:
> project, activity, planning etc.
>
> But would it be possible to create a 'context' from the sentence and
> checking if the word is likely in the context?
>
>  >From teh large corpus we built, it would be possible to determine the
> 'likely context words' for any confusing word.
>
> Has anyone ever thought about a way to implement this kind of check
> to LT?
>
> Any thoughts to do it within existing functionality? Is Dutch the only
> language having a confusion issue like this?
>
> Ruud
>
>
> 
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond.
> Discussions
> will include endpoint security, mobile security and the latest in
> malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> 
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
>
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Dominique Pellé
Jan Schreiber

> I know that ridiculously huge file is a bit of a problem

Are they?  grammar.xml files are not that big.

A text editor opens the biggest grammar.xml in a blink on
my 5 years old laptop.

To make it easier to navigate when editing, I define folds in Vim
with a modeline (see comment at bottom of rules/fr/grammar.xml
or rules/eo/grammar.xml). I'm pretty sure Emacs could do that too.
I find that automatic folding helps to have a global overview of the
large grammar file.

I removed all indentation as an experiment for de/grammar.xml
and the saving in size is negligible (2.7%).

Regards
-- Dominique

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Any advice?

2012-05-16 Thread Juan Martorell
The problem you set out is entirely semantic and common to all languages.
There is no way to distinguish both verbs but semantically. That introduces
a new category for comparison, perhaps category trees and IMHO that would
overwhelm the scope of the project.

A possible shortcut is introducing semantic categories as mock POS and
checking their compatibility within the rules as if it were a common
agreement. I discourage this because it denormalizes the tagger dictionary.

I therefore recommend the brute-force approach, provided that the chance
of committing such mistakes justifies the investment.

However I'd rather focus on lightening the software than on swelling it
with new features. The more light, fast and easy to use, the more
successful.

Best regards,
Juan

2012/5/16 R.J. Baars 

> There is quite a bit of word confusing going on in Dutch. An example:
>
> geplant (planted) versus gepland (planned).
>
> This is not a grammatical issue, but actually using the wrong word,
> thereby altering the intention of the sentence.
>
> Neitehr is wrong. Both are very common. Nevertheless, a warning is of
> added value. What I need is suppression of lots of unnecessary warnings.
>
> I could add exceptrions for the warning on 'geplant' for every sentence
> that contains either plant, tree, shrub, etc.
> And exceptions on the warning on 'gepland' for every sentence containing:
> project, activity, planning etc.
>
> But would it be possible to create a 'context' from the sentence and
> checking if the word is likely in the context?
>
> >From teh large corpus we built, it would be possible to determine the
> 'likely context words' for any confusing word.
>
> Has anyone ever thought about a way to implement this kind of check to LT?
>
> Any thoughts to do it within existing functionality? Is Dutch the only
> language having a confusion issue like this?
>
> Ruud
>
>
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Jan Schreiber
Second thoughts: otoh, the earlier we do it, the less work will it be. I
definitely agree that a file size of more than 1 MB is not very good.

I wrote:
> Daniel Naber wrote:
>> But what about splitting up that file into its categories? We could have 
>> 5-10 smaller files rather than one large one. The current one gets difficult 
>> to handle for some editors and other tools.
>>
> Ächz. :-/ I would like to avoid that as long as possible. I know that
> ridiculously huge file is a bit of a problem, but splitting it without
> breaking anything (entities etc.) would be a pain in the ... I mean, a
> lot of work.
> 
> BTW, it should be possible to store at least those entities outside the
> file itself, but I don't know how. --Jan


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] Any advice?

2012-05-16 Thread gulp21
There are some German rules which detect a word which is used in the 
wrong context, e.g. "Miene" (facial expression) and "Mine" (mine, lead). 
There is a list of words which are often used with "Miene" (verziehen, 
aufsetzen, gekränkt etc.), and words which are often used with "Mine" 
(explodieren, unterirdisch, Stift etc.).  The rule checks whether 
"Miene" appears together with a word of the "Mine"-list and whether 
there is no word of the "Miene"-list (and the same for "Mine").
So in your case, you could check whether "geplant" is used together with 
project, activity, or planning, and the words plant, tree, or shrub do 
not appear.
As there are many rules of that type, I would suggest that a general 
WrongWordInContext-java-rules is created, because having many xml-rules 
which only differ in the list of words seems to be absurd.

Regards
Markus


Am 16.05.2012 15:29, schrieb R.J. Baars:
> There is quite a bit of word confusing going on in Dutch. An example:
>
> geplant (planted) versus gepland (planned).
>
> This is not a grammatical issue, but actually using the wrong word,
> thereby altering the intention of the sentence.
>
> Neitehr is wrong. Both are very common. Nevertheless, a warning is of
> added value. What I need is suppression of lots of unnecessary warnings.
>
> I could add exceptrions for the warning on 'geplant' for every sentence
> that contains either plant, tree, shrub, etc.
> And exceptions on the warning on 'gepland' for every sentence containing:
> project, activity, planning etc.
>
> But would it be possible to create a 'context' from the sentence and
> checking if the word is likely in the context?
>
>> From teh large corpus we built, it would be possible to determine the
> 'likely context words' for any confusing word.
>
> Has anyone ever thought about a way to implement this kind of check to LT?
>
> Any thoughts to do it within existing functionality? Is Dutch the only
> language having a confusion issue like this?
>
> Ruud
>
>
> --
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> ___
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Jan Schreiber
Daniel Naber wrote:
> Feel free to do that, although some new spaces might be re-introduced as I 
> cannot set up my IDE for spaces/tabs on a per-project basis.

Then let's forget that. If there is one thing on earth that I can't
stand it's a mixture of spaces and tabs. It visually messes up
indentation unless we all happen to have the same tab settings in our
editors.

> But what about splitting up that file into its categories? We could have 
> 5-10 smaller files rather than one large one. The current one gets difficult 
> to handle for some editors and other tools.
> 
Ächz. :-/ I would like to avoid that as long as possible. I know that
ridiculously huge file is a bit of a problem, but splitting it without
breaking anything (entities etc.) would be a pain in the ... I mean, a
lot of work.

BTW, it should be possible to store at least those entities outside the
file itself, but I don't know how. --Jan

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Daniel Naber
On Mittwoch, 16. Mai 2012, Jan Schreiber wrote:

> One tiny thing is still bugging me: Since the file is so long, the
> change in indentation (four spaces rather than two) results in a
> noticeable increase of the file size. We could avoid this by using tabs
> instead of spaces for indentation, that would mean one character instead
> of four. This isn't much, but it will probably save us some few hundred
> kB. I would be able to this quite easily. Any objections?

Feel free to do that, although some new spaces might be re-introduced as I 
cannot set up my IDE for spaces/tabs on a per-project basis.

But what about splitting up that file into its categories? We could have 
5-10 smaller files rather than one large one. The current one gets difficult 
to handle for some editors and other tools.

Regards
 Daniel

-- 
http://www.danielnaber.de

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


[Languagetool] Any advice?

2012-05-16 Thread R.J. Baars
There is quite a bit of word confusing going on in Dutch. An example:

geplant (planted) versus gepland (planned).

This is not a grammatical issue, but actually using the wrong word,
thereby altering the intention of the sentence.

Neitehr is wrong. Both are very common. Nevertheless, a warning is of
added value. What I need is suppression of lots of unnecessary warnings.

I could add exceptrions for the warning on 'geplant' for every sentence
that contains either plant, tree, shrub, etc.
And exceptions on the warning on 'gepland' for every sentence containing:
project, activity, planning etc.

But would it be possible to create a 'context' from the sentence and
checking if the word is likely in the context?

>From teh large corpus we built, it would be possible to determine the
'likely context words' for any confusing word.

Has anyone ever thought about a way to implement this kind of check to LT?

Any thoughts to do it within existing functionality? Is Dutch the only
language having a confusion issue like this?

Ruud


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: [Languagetool] [LanguageTool] SF.net SVN: languagetool:[6896] ...

2012-05-16 Thread Jan Schreiber
Daniel Naber wrote:
> I tried another conversion, please let me know if this is okay now.
> 
> Regards
>  Daniel
> 


Everything seems okay now, thanks. I made a few trivial cosmetic changes
to the German grammar file though.

One tiny thing is still bugging me: Since the file is so long, the
change in indentation (four spaces rather than two) results in a
noticeable increase of the file size. We could avoid this by using tabs
instead of spaces for indentation, that would mean one character instead
of four. This isn't much, but it will probably save us some few hundred
kB. I would be able to this quite easily. Any objections? --Jan

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel