On 23 September 2013 08:17, Per Tunedal <per.tune...@operamail.com> wrote:
>
> Hi,
> what should the text-files look like before starting the tagger
> training? One sentence a line? Something else?
>
> Is a text formatted like below OK:
>
> Antingen genom att gå in under rätt rubrik ovan och lägga till ditt
> bidrag eller lägg ditt bidrag i bufferten om du inte vet var eller hur
> det ska stå.
> I Önskelistan lägger du förslag på sånt du tycker borde vara med.
>
> Or should e.g. the punctuation marks be separated like:
> I Önskelistan lägger du förslag på sånt du tycker borde vara med .
>

No, you don't need to do that. You don't really need to have the text
split into sentences either, but it makes life a little easier if
there are problems.

Some of the older language pairs have makefiles for tagger training.
At a minimum, you will need to adapt the variables for language, and
make sure that lt-proc is called with the same set of switches as the
primary mode (if you're training for Swedish in sv-da, the mode will
be the one that starts <mode name="sv-da" install="yes">).

The tagset specification is where you have the most scope to control
the tagger. I wrote a linter tool because of problems you were
reporting, I'd recommend that you run it before training.

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to