But, how can it get a value larger than 255? I mean, even if there is a value greater than one byte, it should be interpreted as two consecutive characters, not just one. The problem at hand requires the speed. Hence, what can I do to make it either just ignore unicode files or ignore the higher bit (this shud work correctly for UTF 8).
----- Original Message ----- From: "Daniel F. Savarese" <[EMAIL PROTECTED]> To: "ORO Developers List" <[EMAIL PROTECTED]> Sent: Monday, January 21, 2002 1:06 PM Subject: Re: Qusetion > > In message <005e01c1a242$f74f5fc0$[EMAIL PROTECTED]>, "Hardeep Si > ngh" writes: > >I have had this problem for a long time now: > ... > >However, when I try to use this to search into a binary file (esp. a JAR > >file), it gives me > > > >Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException > at org.apache.oro.text.awk.AwkMatcher._search(AwkMatcher.java:717) > > The awk package and AwkMatcher are implemented to only work with input > containing characters with 8-bit values (0-255). This is because it is > a straight-up DFA implementation, which results in fast matches (no > backtracking) but extremely large state transition tables if the range > of input is expanded beyond 8 bits. This will be documented more > obviously in the future. At any rate, the reason you're getting the > exception is because a char value greater than 255 is being encountered, > for which no state transition is defined. For full Unicode, use the > Perl or glob matchers. > > daniel > > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
