Hi Mailing list I had an very interesting discussion with Mathias Razum (Mr. eSciDoc;) at the ECDL 2009 conference. He told me that Fedora, whenever you use an API function on an object, parses the entire object, including all versions of datastreams. This got my interest, and after the conference I examined the fedora code to verify his claim. Well, I saw that the gist of it was true, and Fedora use a sax parser.
Now, I would like to start a discussion about this behaivour, if it is a problem, and ways it could be improved. I am really not sure the performance hit is in any way a problem, so this might be totally redundant. First, I am not sure, but I think that the xml storage format does not need to be true foxml. As long as we have ObjectSerializers and DeSerializers we should be able to use a different storage format without changing the behaviour in any way. Is this a viable route? Personally, I fear that it is not. Second, and probably more fruitful, we could do some conditional parsing. AFAIK, the SAX parser is blazingly fast, if it does not do anything when hitting elements. Is this true? That way, we could parse the basic structure of the document, but not the datastreams. When a function then request a datastream, that datastream is parsed, but not before then. If the latest is requested, the version list is not parsed, and so on. What are your thoughts about this? Regards ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ Fedora-commons-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
