--- Bill Stephenson <[EMAIL PROTECTED]> wrote:
> You need to get a book on regex's.
I know the solution lies in regex's, the problem is
that I can't quite figure out a generic enough way of
doing it. The problem is for a module and so the list
of valid segments is user defined. I guess I could do
something like:
$segs = '('. join('|', @segs) .')';
$string =~ s/^$segs//;
$first_seg = $1;
But I'd have to sort @segs somehow so that the longest
segments come first, and since alphabets can have many
many different segments, I worry about memory issues.
--- Bill Stephenson <[EMAIL PROTECTED]> wrote:
> Perl.com has the best available, "Mastering
> Regular Expressions" is what you want.
>
> Sounds like a formidable task though. For some
> additional help with your regex you can play
> with a tool posted on the "perlhelp.com" web
> site. Go to "Resources" and look for the
> "Regular Expression Explanation Generator".
Thanks, I'll have to check those out sometime.
--- Rick Measham <[EMAIL PROTECTED]> wrote:
> Wren, when you say 'segments' it appears you
> mean phonemes or phonetics.
Yeah, I do mean phonemes (or something like it). The
module is language independent, but I'll check those
modules out.
--- Chris Devers <[EMAIL PROTECTED]> wrote:
> Your definition of "segment" here is vague; is
> it safe to ignore that and just accept that a
> canonical list of each language's 'segments' is
> a static thing that is already stored as hash
> keys?
By "segment" I mean the smallest charecter or sequence
of charecters that has a regular pronunciation. But
yes, it's safe to ignore that and assume there's a
cannonical list of "segments" already in memory.
I am indeed associating the segments with values,
hence storing them as keys in a hash. Also, by storing
them that way, if I'm trying to find the values
associated with a given segment, I can quickly find it
by $all_segments{$segment_in_question} rather than
needing to do a for or foreach loop over an array of
an estimated 15..50 items.
The loop based off the longest element thing sounds
like a good idea, I'll see if I can get it to work.
For those who wonder what on earth I'm up to... it's
an OO module for autosegmental phonology. In short you
feed the object a string and an "alphabet" which maps
segments to values ("d" has +voicing, +dental,
-vocalic, etc) and it creates an array of hashes (or
hash of arrays) where the index is the sequence number
of the segment in the string, and where the key is the
name of the "tier" (voicing, dental, vocalic, etc).
Then there'll be ways to muck around with the object
ala phonetic rules. Then there'll be a method to tie
all of the tiers back together into a single string
(per the alphabet) and spit it back out.
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free web site building tool. Try it!
http://webhosting.yahoo.com/ps/sb/