W dniu 2012-05-16 20:10, Jan Schreiber pisze: > BTW, it should be possible to store at least those entities outside the > file itself, but I don't know how. --Jan
Well, I had a look and it seems that you are using some of the entities to define fairly long regular expressions (disjunctions). This slows down LT quite substantially (I profiled some rules in the Polish XML file). I had such long lists for Polish reflexive verbs, and I decided to add a new POS tag for that, and it made processing much faster. But my solution was a hack that can be made more general. We do not need to be include such new classifications in the normal tagger file: as our taggers can be used instead of all such disjunctive regular expressions, you could also simply include lists of adjectives referring to languages (sprachadj) in a dedicated semantic tagger file. This might be read by a manual tagger or a morfologik-stemming tagger (which will definitely work faster). We could, in principle, add a new attribute - a "semantic classification tag" - to XML that would be differentiated from a normal POS tag, and use our existing tagger infrastructure to support this new feature. I planned to use some parts of the Polish Wordnet for some rules, and only recently it was made available under a BSD-like license. Classifying some of the words semantically might be really useful for some rules. Regards Marcin ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel