Hi Mailing list

I had an very interesting discussion with Mathias Razum (Mr. eSciDoc;)
at the ECDL 2009 conference. He told me that Fedora, whenever you use an
API function on an object, parses the entire object, including all
versions of datastreams. This got my interest, and after the conference
I examined the fedora code to verify his claim. Well, I saw that the
gist of it was true, and Fedora use a sax parser.

Now, I would like to start a discussion about this behaivour, if it is a
problem, and ways it could be improved. I am really not sure the
performance hit is in any way a problem, so this might be totally
redundant.

First, I am not sure, but I think that the xml storage format does not
need to be true foxml. As long as we have ObjectSerializers and
DeSerializers we should be able to use a different storage format
without changing the behaviour in any way. Is this a viable route?
Personally, I fear that it is not. 


Second, and probably more fruitful, we could do some conditional
parsing. AFAIK, the SAX parser is blazingly fast, if it does not do
anything when hitting elements. Is this true? 
That way, we could parse the basic structure of the document, but not
the datastreams. When a function then request a datastream, that
datastream is parsed, but not before then. If the latest is requested,
the version list is not parsed, and so on. 

What are your thoughts about this?

Regards



------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Fedora-commons-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to