Bart Lateur wrote:
> The problem is this: given two versions of a word, one without, and one
> with hyphenation, determine which hyphens are soft hyphens (optional
> breakpoints), and which ones are hard hyphens? For example:
>
> hypo-allergeen hy-po-al-ler-geen
>
> (If you wonder about the hyphenation rules: Dutch)
>
> Here, all hyphens are soft hyphens, except the one between the "o" and
> the "a", which is required (I guess. I'm not 100% about the spelling,
> people seem to disagree on that one), or at least, let's suppose so.
>
> So, a short and sweet snippet that figures this out, please? The result
> may be whatever form you like.
#!/usr/bin/perl -w
use strict;
my ($hard,
$soft) = @ARGV ;
my $show = " $soft ";
for (my $i = 0; length $soft; $i++) {
if ( $hard !~ m/^-/
and $soft =~ m/^-/) { print substr($show, $i, 7), " $i\n" }
else { $hard =~ s/^.// }
$soft =~ s/^.//
}
__END__
[xenon ~/d/Temporary Stuff]% perl hyphen hypo-allergeen hy-po-al-ler-geen
hy-po- 2
-al-ler 8
ler-gee 12
--
Kevin Reid