Kevin Brubeck Unhammer kirjoitti 9. aug. 2012 kello 14:54: > Francis Tyers <fty...@prompsit.com> writes: >> El dj 09 de 08 de 2012 a les 10:35 +0200, en/na Per Tunedal va escriure: >>> I consider Apertium suitable for translating the pair Swedish - >>> Norwegian
Yes. >>> 3. You might use a level 1 translation (without constraint grammar), >>> like the pair Swedish - Danish. In that case, you could make the >>> translation usable for a wide audience by adding the pair to Apertium >>> Caffeine and the new OmegaT plug-in. >> >> In any case there is no free constraint grammar of Swedish currently >> available. The lack of CG for Swedish is a problem. My suggestion would be to write one. To be a bit specific: To write the 100-or-so rules needed for removing the gross majority, say 80(?)% of the ambiguity. > What you're describing is gisting/translation for understanding; I can't > imagine gisting MT would be very useful for sv-nb/nn (and I suspect > people would use Google for that anyway). >From the Norwegian side, we cannot imagine the need for a sv-nb/nn gisting >system. The maximum help we would need is, in rare cases, a dictionary >translating a small number of hard words. How hard Norwegian is for Swedes is of course up to the Swedes to judge. But the competition will be between understanding the Norwegian text and understanding (sic) the MT output. > But with these closely related > languages, it's possible to get to a standard good enough for > post-editing (pre-publishing), e.g. with OmegaT as you mentioned, and in > that case the users definitely know which language it is already. Yes, a production system (say, I want to translate a sv article to nn on Wikipedia) is a different matter. My experience from nn-nb translation is that time saving from post editing as compared to rewriting/translation lies around 80%. So yes, that can be a good idea. __But__ nb-nn lexicon and orthographic principles are the same, so more often than not unknown words will come out as free rides. For sv-nn/nb that will __not__ be the same (to the same extent), since both vocabulary and orthography deviates more. So, less free rides for unknown words. This implies that the transfer lexicon must be __much__ bigger than the nb-nn one in order to get the same good results as we have for nb-nn. The good news is that the making of such an enlarged transfer lexicon in part can be done automatically, and then manually post edited. >> >> (3) You make the two translators in the one pair. For this, you could >> have the same Swedish dictionary, but would need different nb and nn >> dictionaries, different sv-nb and sv-nn dictionaries and different sv-nb >> and sv-nn transfer rules. > (3) sounds best to me too. I agree. > Perhaps you could even do with one bidix, and > just use the alt="nn" vs alt="nb" attribute; a rough and dirty count > shows that the majority of entries in the nn-nb bidix carry over the > same lemma/tag: This could very well be the case, yes (cf. my experiences with free rides). > That said, I would pick one first and get the system up and running, > then expand to both later on. This is also a possibility, yes. But the expansion to both languages should be taken into account in the setup phase. > https://en.wikipedia.org/wiki/Language_identification > Using a library like that makes general (you can use it for lots of > languages) and is a *lot* faster than translating everything twice (or > thrice or …). Yes. Language identifications. > http://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursar has > more frequency lists (they also taunt you with this enormous corpus, but > it's currently "in beta", very messy, and best avoided for now). The best resource is the NoWaC corpus, it also has frequency lists, both for lemmata and for word forms. My final comment would be that the work will be 1 in the analysis/generation of Swedish 2 … and in the bidix. As for 1, we should look around in the Swedish language technology landscape and look for open resources, e.g. in Gothenburg (Aarne Ranta, also Språkbanken). As for 2, Lexin might be one resource. I am on Euralex in Oslo right now, and will ask around. Trond. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff