subject:"Re\: matchText and accented characters"

Re: matchText and accented characters

2007-10-17 Thread Chris Sheffield

Thanks, Ken. Using the hex equivalents is an interesting suggestion.  
I may look into that further.


As for replacing the accented characters with their non-accented  
equivalents, that is also something I've done in the past, but the  
problem here is that this is Mac/PC cross platform, so it's quite a  
few extra lines of code.


So I decided to simply try the offset function, with wholeMatches set  
to true (although I can't really determine if wholeMatches affects  
offset or not), and that seems to be working fine for me. Still  
testing it out to make sure, but so far so good.


Thanks again for the suggestions.


On Oct 16, 2007, at 5:59 PM, Ken Ray wrote:


On Tue, 16 Oct 2007 12:18:54 -0600, Chris Sheffield wrote:


Thanks, Andres. But that didn't seem to fix the problem. That
property, according to the docs, only seems to apply to the numToChar
and charToNum functions. I did try it just to make sure.


The issue is that PCRE (which is the lib that Rev uses) *optionally*
supports locales, so I don't know if any locales were compiled into  
the

code that Rev uses. If you knew what you were looking for, you could
replace the accented characters with their hex equivalents and you'd
get a match:

  put matchChunk(fld 1,.*(fianc\x8E).*,tStart,tEnd)

in this case \x8E means use hex code 8E, which is ASCII 142, which
is é (at least on my Mac). To determine this, I ran this code:

  put baseConvert(charToNum(é),10,16)

which gave me 8E. So if you know specifically the characters to
match, you can use this.

On the other hand, if you have a big chunk of text and you don't know
if there are accented chars or not, I would personally run it the
brute force way:

1) put a copy of the text into another variable
2) replace the accented chars with their non-accented counterparts - a
dozen or so lines like:
   - replace é with e in myVar
   - replace ó with o in myVar
   - etc.
3) run your 'matchChunk' on the second clean variable using
non-accented text (look for fiance and not fiancé)
4) if you get a hit, use the startChar/endChar variables from the
'matchChunk' to extract the text from the *first* variable (the one
with the accented text)

Just my 2 cents,

Ken Ray
Sons of Thunder Software, Inc.
Email: [EMAIL PROTECTED]
Web Site: http://www.sonsothunder.com/
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: matchText and accented characters

2007-10-16 Thread Chris Sheffield

Sorry, I'm using matchChunk, not matchText. But maybe the solution is  
the same?


On Oct 16, 2007, at 11:49 AM, Chris Sheffield wrote:

The matchText function seems to be failing when searching for  
accented characters like á, é, í, ó, or ú. I'm not really up on my  
regex. Is there something special I need to do to make these  
characters work? For example, one search I'm performing is for the  
word fiancé.


Thanks,
Chris


--
Chris Sheffield
Read Naturally, Inc.
www.readnaturally.com
www.oneminutereader.com

Watch reading achievements rise with Read Naturally's school-to- 
home program, One Minute Reader. Make reading fun straight from  
your classroom right to their home!


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: matchText and accented characters

2007-10-16 Thread Andres Martinez


Hello Chris

I think you need to check on the unicode setting.

Use the following line before your search...

set the useUnicode to true

Regards,
Andres Martinez
www.baKno.com


On Oct 16, 2007, at 1:59 PM, Chris Sheffield wrote:

Sorry, I'm using matchChunk, not matchText. But maybe the solution  
is the same?


On Oct 16, 2007, at 11:49 AM, Chris Sheffield wrote:

The matchText function seems to be failing when searching for  
accented characters like á, é, í, ó, or ú. I'm not really up on my  
regex. Is there something special I need to do to make these  
characters work? For example, one search I'm performing is for the  
word fiancé.


Thanks,
Chris


--
Chris Sheffield
Read Naturally, Inc.
www.readnaturally.com
www.oneminutereader.com

Watch reading achievements rise with Read Naturally's school-to- 
home program, One Minute Reader. Make reading fun straight from  
your classroom right to their home!


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your  
subscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: matchText and accented characters

2007-10-16 Thread Chris Sheffield

Thanks, Andres. But that didn't seem to fix the problem. That  
property, according to the docs, only seems to apply to the numToChar  
and charToNum functions. I did try it just to make sure.


On Oct 16, 2007, at 12:02 PM, Andres Martinez wrote:


Hello Chris

I think you need to check on the unicode setting.

Use the following line before your search...

set the useUnicode to true

Regards,
Andres Martinez
www.baKno.com


On Oct 16, 2007, at 1:59 PM, Chris Sheffield wrote:



___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: matchText and accented characters

2007-10-16 Thread Ken Ray

On Tue, 16 Oct 2007 12:18:54 -0600, Chris Sheffield wrote:

 Thanks, Andres. But that didn't seem to fix the problem. That 
 property, according to the docs, only seems to apply to the numToChar 
 and charToNum functions. I did try it just to make sure.

The issue is that PCRE (which is the lib that Rev uses) *optionally* 
supports locales, so I don't know if any locales were compiled into the 
code that Rev uses. If you knew what you were looking for, you could 
replace the accented characters with their hex equivalents and you'd 
get a match:

  put matchChunk(fld 1,.*(fianc\x8E).*,tStart,tEnd)

in this case \x8E means use hex code 8E, which is ASCII 142, which 
is é (at least on my Mac). To determine this, I ran this code:

  put baseConvert(charToNum(é),10,16)

which gave me 8E. So if you know specifically the characters to 
match, you can use this.

On the other hand, if you have a big chunk of text and you don't know 
if there are accented chars or not, I would personally run it the 
brute force way: 

1) put a copy of the text into another variable
2) replace the accented chars with their non-accented counterparts - a 
dozen or so lines like:
   - replace é with e in myVar
   - replace ó with o in myVar
   - etc.
3) run your 'matchChunk' on the second clean variable using 
non-accented text (look for fiance and not fiancé)
4) if you get a hit, use the startChar/endChar variables from the 
'matchChunk' to extract the text from the *first* variable (the one 
with the accented text)

Just my 2 cents,

Ken Ray
Sons of Thunder Software, Inc.
Email: [EMAIL PROTECTED]
Web Site: http://www.sonsothunder.com/
___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: matchText and accented characters

Re: matchText and accented characters

Re: matchText and accented characters

Re: matchText and accented characters

Re: matchText and accented characters

5 matches

Site Navigation

Mail list logo

Footer information