Re: matchText and accented characters
Thanks, Ken. Using the hex equivalents is an interesting suggestion. I may look into that further. As for replacing the accented characters with their non-accented equivalents, that is also something I've done in the past, but the problem here is that this is Mac/PC cross platform, so it's quite a few extra lines of code. So I decided to simply try the offset function, with wholeMatches set to true (although I can't really determine if wholeMatches affects offset or not), and that seems to be working fine for me. Still testing it out to make sure, but so far so good. Thanks again for the suggestions. On Oct 16, 2007, at 5:59 PM, Ken Ray wrote: On Tue, 16 Oct 2007 12:18:54 -0600, Chris Sheffield wrote: Thanks, Andres. But that didn't seem to fix the problem. That property, according to the docs, only seems to apply to the numToChar and charToNum functions. I did try it just to make sure. The issue is that PCRE (which is the lib that Rev uses) *optionally* supports locales, so I don't know if any locales were compiled into the code that Rev uses. If you knew what you were looking for, you could replace the accented characters with their hex equivalents and you'd get a match: put matchChunk(fld 1,.*(fianc\x8E).*,tStart,tEnd) in this case \x8E means use hex code 8E, which is ASCII 142, which is é (at least on my Mac). To determine this, I ran this code: put baseConvert(charToNum(é),10,16) which gave me 8E. So if you know specifically the characters to match, you can use this. On the other hand, if you have a big chunk of text and you don't know if there are accented chars or not, I would personally run it the brute force way: 1) put a copy of the text into another variable 2) replace the accented chars with their non-accented counterparts - a dozen or so lines like: - replace é with e in myVar - replace ó with o in myVar - etc. 3) run your 'matchChunk' on the second clean variable using non-accented text (look for fiance and not fiancé) 4) if you get a hit, use the startChar/endChar variables from the 'matchChunk' to extract the text from the *first* variable (the one with the accented text) Just my 2 cents, Ken Ray Sons of Thunder Software, Inc. Email: [EMAIL PROTECTED] Web Site: http://www.sonsothunder.com/ ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: matchText and accented characters
Sorry, I'm using matchChunk, not matchText. But maybe the solution is the same? On Oct 16, 2007, at 11:49 AM, Chris Sheffield wrote: The matchText function seems to be failing when searching for accented characters like á, é, í, ó, or ú. I'm not really up on my regex. Is there something special I need to do to make these characters work? For example, one search I'm performing is for the word fiancé. Thanks, Chris -- Chris Sheffield Read Naturally, Inc. www.readnaturally.com www.oneminutereader.com Watch reading achievements rise with Read Naturally's school-to- home program, One Minute Reader. Make reading fun straight from your classroom right to their home! ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: matchText and accented characters
Hello Chris I think you need to check on the unicode setting. Use the following line before your search... set the useUnicode to true Regards, Andres Martinez www.baKno.com On Oct 16, 2007, at 1:59 PM, Chris Sheffield wrote: Sorry, I'm using matchChunk, not matchText. But maybe the solution is the same? On Oct 16, 2007, at 11:49 AM, Chris Sheffield wrote: The matchText function seems to be failing when searching for accented characters like á, é, í, ó, or ú. I'm not really up on my regex. Is there something special I need to do to make these characters work? For example, one search I'm performing is for the word fiancé. Thanks, Chris -- Chris Sheffield Read Naturally, Inc. www.readnaturally.com www.oneminutereader.com Watch reading achievements rise with Read Naturally's school-to- home program, One Minute Reader. Make reading fun straight from your classroom right to their home! ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: matchText and accented characters
Thanks, Andres. But that didn't seem to fix the problem. That property, according to the docs, only seems to apply to the numToChar and charToNum functions. I did try it just to make sure. On Oct 16, 2007, at 12:02 PM, Andres Martinez wrote: Hello Chris I think you need to check on the unicode setting. Use the following line before your search... set the useUnicode to true Regards, Andres Martinez www.baKno.com On Oct 16, 2007, at 1:59 PM, Chris Sheffield wrote: ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: matchText and accented characters
On Tue, 16 Oct 2007 12:18:54 -0600, Chris Sheffield wrote: Thanks, Andres. But that didn't seem to fix the problem. That property, according to the docs, only seems to apply to the numToChar and charToNum functions. I did try it just to make sure. The issue is that PCRE (which is the lib that Rev uses) *optionally* supports locales, so I don't know if any locales were compiled into the code that Rev uses. If you knew what you were looking for, you could replace the accented characters with their hex equivalents and you'd get a match: put matchChunk(fld 1,.*(fianc\x8E).*,tStart,tEnd) in this case \x8E means use hex code 8E, which is ASCII 142, which is é (at least on my Mac). To determine this, I ran this code: put baseConvert(charToNum(é),10,16) which gave me 8E. So if you know specifically the characters to match, you can use this. On the other hand, if you have a big chunk of text and you don't know if there are accented chars or not, I would personally run it the brute force way: 1) put a copy of the text into another variable 2) replace the accented chars with their non-accented counterparts - a dozen or so lines like: - replace é with e in myVar - replace ó with o in myVar - etc. 3) run your 'matchChunk' on the second clean variable using non-accented text (look for fiance and not fiancé) 4) if you get a hit, use the startChar/endChar variables from the 'matchChunk' to extract the text from the *first* variable (the one with the accented text) Just my 2 cents, Ken Ray Sons of Thunder Software, Inc. Email: [EMAIL PROTECTED] Web Site: http://www.sonsothunder.com/ ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution