DO NOT REPLY [Bug 9035] New: - big Latitude Longitude RE causes IndexOutOfBoundsException
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9035. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9035 big Latitude Longitude RE causes IndexOutOfBoundsException Summary: big Latitude Longitude RE causes IndexOutOfBoundsException Product: Regexp Version: unspecified Platform: All OS/Version: Linux Status: NEW Severity: Major Priority: Other Component: Other AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] I have two faily big REs dealing with Latitude and Longitude. When I use them separately, no problems. However, when I combine the 2 REs, so I can pass one Latitude-Longitude string to it, it bombs out with an exception (detailed below). Here is the test program. Refer to the example run for usage: import java.io.*; import java.util.*; import org.apache.regexp.*; public class LatLonREBug { private static final String LATITUDE_RE_STRING = -?(([0-8]?[0-9]((\\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS]; private static final String LONGITUDE_RE_STRING = -?[0-9]?[0-9])|(1[0-7][0-9]))((\\.[0-9]+)|((([0-5][0-9])|60)(([0-5][0-9])|60)?))?)|180)[eEwW]; public static final String LATITUDE_LONGITUDE_RE_STRING = ^ + LATITUDE_RE_STRING + LONGITUDE_RE_STRING + $; public static void main(String[] args) throws Throwable { RE latlonRE = new RE(LATITUDE_LONGITUDE_RE_STRING); System.out.println(LATITUDE_LONGITUDE_RE_STRING: + LATITUDE_LONGITUDE_RE_STRING); RE latRE = new RE(^ + LATITUDE_RE_STRING + $); System.out.println(LATITUDE_RE_STRING: + LATITUDE_RE_STRING); RE lonRE = new RE(^ + LONGITUDE_RE_STRING + $); System.out.println(LONGITUDE_RE_STRING: + LONGITUDE_RE_STRING); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); String line = br.readLine(); while (line != null !line.equals(quit) !line.equals(exit)) { StringTokenizer st = new StringTokenizer(line); int tokens = st.countTokens(); if (tokens 1) { String command = st.nextToken(); if (command.equalsIgnoreCase(lat)) { String lat = st.nextToken(); latRE.match(lat); System.out.println(lat + is a properly formatted latitude); } else if (command.equalsIgnoreCase(lon)) { String lon = st.nextToken(); lonRE.match(lon); System.out.println(lon + is a properly formatted longitude); } else if (command.equalsIgnoreCase(latlon)) { String latlon = st.nextToken(); latlonRE.match(latlon); System.out.println(latlon + is a properly formatted lat-lon); } else { System.out.println(unknown command: + command); } } else { System.out.println(invalid line: + line); } line = br.readLine(); } } } Here is an example run of the test-case. As you will see, when just doing latitude or longitude, the REs match as expected. But, when I do a 'latlon' string, it pukes... [mnewcomb@localhost sandbox]$ java -classpath /usr/local/regexp/jakarta-regexp-1.2.jar:. LatLonREBug LATITUDE_LONGITUDE_RE_STRING: ^-?(([0-8]?[0-9]((\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS]-?[0-9]?[0-9])|(1[0-7][0-9]))((\.[0-9]+)|((([0-5][0-9])|60)(([0-5][0-9])|60)?))?)|180)[eEwW]$ LATITUDE_RE_STRING: -?(([0-8]?[0-9]((\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS] LONGITUDE_RE_STRING: -?[0-9]?[0-9])|(1[0-7][0-9]))((\.[0-9]+)|((([0-5][0-9])|60)(([0-5][0-9])|60)?))?)|180)[eEwW] lat 55N 55N is a properly formatted latitude lat 55.454N 55.454N is a properly formatted latitude lat 5545N 5545N is a properly formatted latitude lon 123E 123E is a properly formatted longitude lon 5E 5E is a properly formatted longitude lon 123.444E 123.444E is a properly formatted longitude lon 1784532W 1784532W is a properly formatted longitude latlon 55N44E 55N44E is a properly formatted lat-lon latlon 55N44.33E Exception in thread main java.lang.ArrayIndexOutOfBoundsException at org.apache.regexp.RE.getParenEnd(RE.java:724) at org.apache.regexp.RE.matchNodes(RE.java:942) at org.apache.regexp.RE.matchNodes(RE.java:933) at org.apache.regexp.RE.matchNodes(RE.java:1376) at org.apache.regexp.RE.matchNodes(RE.java:1376) at org.apache.regexp.RE.matchNodes(RE.java:910) at org.apache.regexp.RE.matchNodes(RE.java:1376) at org.apache.regexp.RE.matchNodes(RE.java:910) at org.apache.regexp.RE.matchNodes(RE.java:1376) at org.apache.regexp.RE.matchNodes(RE.java:933)
Re: - IndexOutOfBoundsException: clarification
Actually, you can write much simpler RE's to reproduce this :-)) I had wanted to file a bugreport (along with a few others): RegExp does not support more than 16 parenthesized sub-expressions. As soon as you have more than 16 '(...)', you get ArrayIndexOOBExceptions :-( (Actually, I had seen that while taking a look at the sources and then confirmed the problem by trying it ;) That's why your two expressions work separately, but not combined. I guess I'll write a fix for that, but considering i didn#t have time to file a bugreport... A workaround in this case (just as a temporary help for Michael): Your RE has two clearly defined parts... You can probably use one more general expression to find potential matches and then check two parts separately. Not nice... Fixing the problem may actually be faster :-)) I had an estimate of 1-3 hours for fixing the code, but I'd need to find out something about the process [of submitting code] first and that would probably take longer... Cheers, Holger [EMAIL PROTECTED] wrote: DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9035. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9035 big Latitude Longitude RE causes IndexOutOfBoundsException Summary: big Latitude Longitude RE causes IndexOutOfBoundsException Product: Regexp Version: unspecified Platform: All OS/Version: Linux Status: NEW Severity: Major Priority: Other Component: Other AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] I have two faily big REs dealing with Latitude and Longitude. When I use them separately, no problems. However, when I combine the 2 REs, so I can pass one Latitude-Longitude string to it, it bombs out with an exception (detailed below). Here is the test program. Refer to the example run for usage: import java.io.*; import java.util.*; import org.apache.regexp.*; public class LatLonREBug { private static final String LATITUDE_RE_STRING = -?(([0-8]?[0-9]((\\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS]; private static final String LONGITUDE_RE_STRING = -?[0-9]?[0-9])|(1[0-7][0-9]))((\\.[0-9]+)|((([0-5][0-9])|60)(([0-5][0-9])|60)?))?)|180)[eEwW]; public static final String LATITUDE_LONGITUDE_RE_STRING = ^ + LATITUDE_RE_STRING + LONGITUDE_RE_STRING + $; public static void main(String[] args) throws Throwable { RE latlonRE = new RE(LATITUDE_LONGITUDE_RE_STRING); System.out.println(LATITUDE_LONGITUDE_RE_STRING: + LATITUDE_LONGITUDE_RE_STRING); RE latRE = new RE(^ + LATITUDE_RE_STRING + $); System.out.println(LATITUDE_RE_STRING: + LATITUDE_RE_STRING); RE lonRE = new RE(^ + LONGITUDE_RE_STRING + $); System.out.println(LONGITUDE_RE_STRING: + LONGITUDE_RE_STRING); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); String line = br.readLine(); while (line != null !line.equals(quit) !line.equals(exit)) { StringTokenizer st = new StringTokenizer(line); int tokens = st.countTokens(); if (tokens 1) { String command = st.nextToken(); if (command.equalsIgnoreCase(lat)) { String lat = st.nextToken(); latRE.match(lat); System.out.println(lat + is a properly formatted latitude); } else if (command.equalsIgnoreCase(lon)) { String lon = st.nextToken(); lonRE.match(lon); System.out.println(lon + is a properly formatted longitude); } else if (command.equalsIgnoreCase(latlon)) { String latlon = st.nextToken(); latlonRE.match(latlon); System.out.println(latlon + is a properly formatted lat-lon); } else { System.out.println(unknown command: + command); } } else { System.out.println(invalid line: + line); } line = br.readLine(); } } } Here is an example run of the test-case. As you will see, when just doing latitude or longitude, the REs match as expected. But, when I do a 'latlon' string, it pukes... [mnewcomb@localhost sandbox]$ java -classpath /usr/local/regexp/jakarta-regexp-1.2.jar:. LatLonREBug LATITUDE_LONGITUDE_RE_STRING: ^-?(([0-8]?[0-9]((\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS]-?[0-9]?[0-9])|(1[0-7][0-9]))((\.[0-9]+)|((([0-5][0-9])|60)(([0-5][0-9])|60)?))?)|180)[eEwW]$ LATITUDE_RE_STRING: -?(([0-8]?[0-9]((\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS] LONGITUDE_RE_STRING:
DO NOT REPLY [Bug 9035] - big Latitude Longitude RE causes IndexOutOfBoundsException
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9035. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9035 big Latitude Longitude RE causes IndexOutOfBoundsException [EMAIL PROTECTED] changed: What|Removed |Added Status|NEW |RESOLVED Resolution||INVALID --- Additional Comments From [EMAIL PROTECTED] 2002-05-13 16:43 --- This is a problem caused by too many parentheses. Please apply the patch supplied in bug # 8467 by Vadim Gritsenko (thanks Vadim!). There is no reason for this to not be applied as is. I did it on a fresh check-out from cvs and my code is working perfectly. Is there anyone with write-access for REGEX available to apply Vadim's patch? Michael -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]