W dniu 2012-05-16 20:10, Jan Schreiber pisze:

> BTW, it should be possible to store at least those entities outside the
> file itself, but I don't know how. --Jan

Well, I had a look and it seems that you are using some of the entities 
to define fairly long regular expressions (disjunctions). This slows 
down LT quite substantially (I profiled some rules in the Polish XML 
file). I had such long lists for Polish reflexive verbs, and I decided 
to add a new POS tag for that, and it made processing much faster.

But my solution was a hack that can be made more general. We do not need 
to be include such new classifications in the normal tagger file: as our 
taggers can be used instead of all such disjunctive regular expressions, 
you could also simply include lists of adjectives referring to languages 
(sprachadj) in a dedicated semantic tagger file. This might be read by a 
manual tagger or a morfologik-stemming tagger (which will definitely 
work faster). We could, in principle, add a new attribute - a "semantic 
classification tag" - to XML that would be differentiated from a normal 
POS tag, and use our existing tagger infrastructure to support this new 
feature.

I planned to use some parts of the Polish Wordnet for some rules, and 
only recently it was made available under a BSD-like license. 
Classifying some of the words semantically might be really useful for 
some rules.

Regards
Marcin

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to