In message <[EMAIL PROTECTED]>, "Robert Edgar" w
rites:
>Hi there,
>I am trying to load and parse a web server log file (about 3mb on average)
>with a regex as follows
>"(.*)\\s(.*)\\s(.*)\\s\\[([^\\]]+)\\]\\s\"(.*)\\s(.*)\\s(.*)\"\\s(.*)\\s(.*)
>\\s\"(.*)\"\\s\"(.*)\"\\s\"(.*)\""
>
>problem is it is takeing forever, I am getting only about 15 lines a second
>
>code is something like
>
>while((logEntry = bufferedreader.readLine()) != null){
> if (matcher.contains(logEntry, pattern)) {
>   MatchResult result=matcher.getMatch();
> }
>}

Half of the problem is calling readLine().  You're converting char arrays
into strings and then back again when you search for a match.  If the
files are about 3MB, you'll do better by reading the entire file into
a char array first and then searching for matches.  The other half of
the problem may be the regular expression, which is causing a lot of
backtracking.  Try replacing .* with \S* and [^\s"]* where appropriate.

daniel



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to