On 23 September 2013 08:17, Per Tunedal <per.tune...@operamail.com> wrote: > > Hi, > what should the text-files look like before starting the tagger > training? One sentence a line? Something else? > > Is a text formatted like below OK: > > Antingen genom att gå in under rätt rubrik ovan och lägga till ditt > bidrag eller lägg ditt bidrag i bufferten om du inte vet var eller hur > det ska stå. > I Önskelistan lägger du förslag på sånt du tycker borde vara med. > > Or should e.g. the punctuation marks be separated like: > I Önskelistan lägger du förslag på sånt du tycker borde vara med . >
No, you don't need to do that. You don't really need to have the text split into sentences either, but it makes life a little easier if there are problems. Some of the older language pairs have makefiles for tagger training. At a minimum, you will need to adapt the variables for language, and make sure that lt-proc is called with the same set of switches as the primary mode (if you're training for Swedish in sv-da, the mode will be the one that starts <mode name="sv-da" install="yes">). The tagset specification is where you have the most scope to control the tagger. I wrote a linter tool because of problems you were reporting, I'd recommend that you run it before training. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff