Thanks I changed the regexp to "(^\\S*)\\s(\\S*)\\s(\\S*)\\s\\[([^\\]]+)\\]\\s\"(\\S*)\\s(\\S*)\\s([^\\s\"] *)\"\\s(\\S*)\\s(\\S*)\\s\"([^\\s\"]*)\"\\s\"([^\"]*)\""
which has got me up to about 450 line a second but that is still slow though I am stil using the readline, but using readline and a string tokenizer I can get 10x this speed which seem to me to indicate that the readline is not really a bottleneck but that the regex still is or is regex not really designed for this sort of processing and I would be better just doing a simple hard coded parse of the string?. Rob -----Original Message----- From: Daniel F. Savarese [mailto:[EMAIL PROTECTED]] Sent: Saturday, April 27, 2002 10:37 PM To: ORO Users List Subject: Re: Slow processing In message <[EMAIL PROTECTED]>, "Robert Edgar" w rites: >Hi there, >I am trying to load and parse a web server log file (about 3mb on average) >with a regex as follows >"(.*)\\s(.*)\\s(.*)\\s\\[([^\\]]+)\\]\\s\"(.*)\\s(.*)\\s(.*)\"\\s(.*)\\s(.* ) >\\s\"(.*)\"\\s\"(.*)\"\\s\"(.*)\"" > >problem is it is takeing forever, I am getting only about 15 lines a second > >code is something like > >while((logEntry = bufferedreader.readLine()) != null){ > if (matcher.contains(logEntry, pattern)) { > MatchResult result=matcher.getMatch(); > } >} Half of the problem is calling readLine(). You're converting char arrays into strings and then back again when you search for a match. If the files are about 3MB, you'll do better by reading the entire file into a char array first and then searching for matches. The other half of the problem may be the regular expression, which is causing a lot of backtracking. Try replacing .* with \S* and [^\s"]* where appropriate. daniel -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>