[KERNEL] Issue #SKER4763

Bernt Rostad Wed, 04 Jun 2008 05:10:06 -0700

Activity report on

  *[JIRA] Bug SKER4763 - Is it possible to add more information to the 
<boomerang> log-entries?*


  Scarab Link: http://sesat.no/scarab/issues/id/SKER4763
  Module: Sesat> Kernel


  Activity generated by Bernt Rostad ([EMAIL PROTECTED]) at 06/04/2008 14:04

  *Reasons for the changes*


  *Comments*
  - By Bernt Rostad - 06/04/2008 14:04 ---
  "The short answer: Yes.

This was the reason I asked you to expand the sesam.access <real-url> entry 
last autumn, to get the same elements there as in the original request entry (I 
needed the <browser>, <referer> and <user> elements). I now got the same 
problem with the <boomerang> entries.

A longer answer is based on the following two main points:

1) Filtering away unnecessary log entries
The SAXParser, used for parsing a log entry, does a great job assembling 
entries one by one, and I've been able to speed things up by giving it a rule 
to ignore log entries without a <real-url>, <boomerang> or <response> element. 
If I have to parse the original request, to obtain the extra data, I will 
actually have to parse all the logfile entries since the original entry 
contains nothing unique for filtering. If so, the filter advantage is lost.

2) Multiple entries parsing
As things work now, the SAXParser only needs to remember the current log entry. 
When done, the data is passed to an event assembler for immediate storage. If I 
need to parse separate log entries, possibly dozens of entries apart, to 
assemble an event, I will have to design a new way of storing fragments of 
events from multiple entries and then assemble the correct fragments into 
proper events when the final pieces are found. This sounds more error prone to 
me and will make a rollback, if parsing fails on one log entry, difficult (I 
can't start from the beginning of the logfile since many entries will already 
have been stored to DB).

In conclusion: If it's possible to repeat the trick from <real-url> for 
<boomerang> entries, that would be the best and fastest solution (for me, of 
course) and allow me to wrap up the last part SESTAT. If not, I'm facing some 
tough design issues and more development.

One option might be to skip the <boomerang>-entries altogether and just parse 
the original entries, i.e. those that contains a Boomerang-like uri, since that 
will cause minimal changes to the assembler and storage framework (just a new 
Boomerang-parser that will be less of a SAXParser and more of a 
StringTokenizer).


Let me know what you think, Mick, and I will consider the options."

_______________________________________________
Kernel-issues mailing list
[email protected]
http://sesat.no/mailman/listinfo/kernel-issues

[KERNEL] Issue #SKER4763

Svar til