Grant Ingersoll <[EMAIL PROTECTED]> wrote on 28/03/2007 10:44:08:

>
> On Mar 28, 2007, at 1:09 PM, Steven Parkes (JIRA) wrote:
>
> > Question (for Doron and anyone else): the file is xml and it's big,
> > so DOM isn't going to work. I could still use something SAX based
> > but since the format is so tightly controlled, I'm thinking regular
> > expressions would be sufficient and have less dependences. Anyone
> > have opinions on this?
>
>
> Personally, I think SAX is the way to go, as you'll get handling of
> escape sequences, etc. out of the box.  And seems like it is easier
> to read/maintain????

TrecDocMaker is relying on the strict structure of the input data - the
read() method there is "eating" the input stream until reaching points of
interest, and optionally collects (lines of) text, depending on the format
here you may be able to use a variation of this. If input here is not that
strictly defined, SAX would be better.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to