Francis Tyers <fty...@prompsit.com> writes:

> El dt 25 de 03 de 2014 a les 12:17 +0000, en/na Jim O'Regan va escriure:

[...]

>> Also, I have a tiny feature that allows the user to specify a set of
>> characters to be ignored at runtime (motivated primarily by soft
>> hyphens, but I've left it general[1]). I sent the patch to Sergio to
>> review, but I'd really rather get it in now than wait n years until
>> the next release :)
>> 
>> For the curious, I've attached the patch.
>> 
>> Current behaviour is:
>> $ echo test­ing |lttoolbox/lt-proc  
>> ~/Apertium/apertium-en-es/en-es.automorf.bin
>> ^test/test<n><sg>/test<vblex><inf>/test<vblex><pres>$­^ing/*ing
>> 
>> Using this as soft-hyphen.icx:
>> 
>> <?xml version="1.0"?>
>> <ignored-chars>
>>   <char value="&#173; "/>
>> </ignored-chars>
>> 
>> echo test­ing |lttoolbox/lt-proc -i soft-hyphen.icx
>> ~/Apertium/apertium-en-es/en-es.automorf.bin
>> ^testing/test<vblex><ger>/test<vblex><pprs>/test<vblex><subs>/testing<n><sg>$
>
> Could this just be included as default ? I mean, are there any cases in
> which we would not want to skip a soft-hyphen ? 

So having an icx on the command line is nice for developers, and people
who use lt-proc for non-Apertium things. But it would require changing
modes files for any pairs that want to take advantage of it … I think
maybe a hardcoded ignore-list in lttoolbox would be more helpful to more
users. Are there other use-cases than soft-hyphens? Or cases where we
want to _not_ ignore the soft-hyphen?

(Tino Didriksen noted some other possibly skippable stuff:
http://www.fileformat.info/info/unicode/category/Cf/list.htm )




-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C

Attachment: pgpWxOasYgJx8.pgp
Description: PGP signature

------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to