Thanks
I changed the regexp to
"(^\\S*)\\s(\\S*)\\s(\\S*)\\s\\[([^\\]]+)\\]\\s\"(\\S*)\\s(\\S*)\\s([^\\s\"]
*)\"\\s(\\S*)\\s(\\S*)\\s\"([^\\s\"]*)\"\\s\"([^\"]*)\""

which has got me up to about 450 line a second but that is still slow though
I am stil using the readline, but using readline and a string tokenizer I
can get 10x this speed which seem to me to indicate that the readline is not
really a bottleneck but that the regex still is or is regex not really
designed for this sort of processing and I would be better just doing a
simple hard coded parse of the string?.

Rob

-----Original Message-----
From: Daniel F. Savarese [mailto:[EMAIL PROTECTED]]
Sent: Saturday, April 27, 2002 10:37 PM
To: ORO Users List
Subject: Re: Slow processing



In message <[EMAIL PROTECTED]>, "Robert
Edgar" w
rites:
>Hi there,
>I am trying to load and parse a web server log file (about 3mb on average)
>with a regex as follows
>"(.*)\\s(.*)\\s(.*)\\s\\[([^\\]]+)\\]\\s\"(.*)\\s(.*)\\s(.*)\"\\s(.*)\\s(.*
)
>\\s\"(.*)\"\\s\"(.*)\"\\s\"(.*)\""
>
>problem is it is takeing forever, I am getting only about 15 lines a second
>
>code is something like
>
>while((logEntry = bufferedreader.readLine()) != null){
> if (matcher.contains(logEntry, pattern)) {
>   MatchResult result=matcher.getMatch();
> }
>}

Half of the problem is calling readLine().  You're converting char arrays
into strings and then back again when you search for a match.  If the
files are about 3MB, you'll do better by reading the entire file into
a char array first and then searching for matches.  The other half of
the problem may be the regular expression, which is causing a lot of
backtracking.  Try replacing .* with \S* and [^\s"]* where appropriate.

daniel



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to