Re: massive xml docs
Hi Todd, I don't know how it would behave with a 100MB file but here is the method I use to get the data from a particular node of the tree without having to construct the full tree. Best wishes, Marielle ... # # Get Tag Content In XML tree # # -- @Description: Get the content of a tag, given a path in the xml tree # -- A tag tree has the following syntax: node1:node2:node3:node4 # --node1...node2...node3data/ node3.../node2.../node1 # -- @Returns: Content part when Tag( Params)? __Content__/ Tag(text string) function getTagContent_XML pXMLtext, pTagTree IF pXMLtext is empty THEN terminate(BUG x. An empty content was given to parse. pXMLtext shouldn't be empty in function _getTagContentXML.) IF pTagTree is empty THEN terminate(BUG x. No Tag Tree was specified. pTagTree shouldn't be empty in function _getTagContentXML.) set the itemdel to : replace quote with empty in pTagTree-- This is to get rid of quotes in case there is any -- repeat for each item tTag in pTagTree put getTagContent_XML(pXMLtext, tTag) into pXMLtext end repeat return pXMLtext end getTagContent_XML ... function getTagContent pXMLtext, pTagName, # -- @Requires: swapEOL() - not included here -- swaps end of lines from cr to ¬ to allow for multiline matches with matchtext # -- @Requires: stripInitialTabs() -- not included here put swapEOL(pXMLtext, remove) into pXMLtext if matchtext(pXMLtext, (?i) pTagName [ ]?[^]*(.+?)/ pTagName , tTagContent) is false then return empty put swapEOL(tTagContent, restore) into tTagContent put stripInitialTabs(tTagContent) into pXMLtext return pXMLtext end getTagContent . Marielle Lange (PhD), Psycholinguist Alternative emails: [EMAIL PROTECTED], Homepage http://homepages.widged.com/mlange/ Easy access to lexical databaseshttp:// lexicall.widged.com/ Supporting Education Technologists http:// revolution.widged.com/wiki/ ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
Ruslan- Thursday, February 9, 2006, 1:28:53 PM, you wrote: Under proprietary may be you mean implementations of engines? No, I'm referring to syntactical differences and proprietary extensions. This is why XQuery was born, as an open-source standard to try to rein this in. But this is not a problem. SQL also have standard. And many DBMS vendors implement it. A twisty maze of winding SQL standards, each different. As they say, the nice thing about standards is that there are so many to choose from. Exists draft of XUpdate. ...and it's been a draft for some time, but it does seem to be the way xml updates are heading. But it seems to me that XUpdate (and XQuery, for that matter) was developed to do the sort of things people are doing with AJAX these days. -- -Mark Wieder [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
On 2/9/06 1:32 AM, Todd Geist [EMAIL PROTECTED] wrote: On Feb 8, 2006, at 11:15 PM, Ken Ray wrote: If I read it correctly, the answer is no - basically, with true as the last param, you're telling the DLL to send you a message when it's about to start parsing the tree, when it encounters each node (so you can extract attributes from the node) and when it encounters data for the node. So you write your own code to work with the XML data as it's being parsed... hmmm... that may be perfect for what I need to do, since I don't need to parse the whole document, just the parts the user is interested in. It is unlikely that user will ever need to get at the vast majority of the data in there. Yes, and now that I see how this works, I'll add it to my XML Library too... :-) Ken Ray Sons of Thunder Software Web site: http://www.sonsothunder.com/ Email: [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
On 2/9/06 9:15 AM, Ken Ray [EMAIL PROTECTED] wrote: You can use revcreatexmltree, and just tell it to not build the dom tree/or keep the document in memory, and handle the messages. revCreateXMLTree(field XML Data,true,false,true) But won't that still choke on something as big as 100MB? If I read it correctly, the answer is no - basically, with true as the last param, you're telling the DLL to send you a message when it's about to start parsing the tree, when it encounters each node (so you can extract attributes from the node) and when it encounters data for the node. So you write your own code to work with the XML data as it's being parsed... So it sounds like a SAX in fact. Right ? -- Best regards, Ruslan Zasukhin VP Engineering and New Technology Paradigma Software, Inc Valentina - Joining Worlds of Information http://www.paradigmasoft.com [I feel the need: the need for speed] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
On 2/9/06 10:02 AM, Ken Ray [EMAIL PROTECTED] wrote: If I read it correctly, the answer is no - basically, with true as the last param, you're telling the DLL to send you a message when it's about to start parsing the tree, when it encounters each node (so you can extract attributes from the node) and when it encounters data for the node. So you write your own code to work with the XML data as it's being parsed... hmmm... that may be perfect for what I need to do, since I don't need to parse the whole document, just the parts the user is interested in. It is unlikely that user will ever need to get at the vast majority of the data in there. Yes, and now that I see how this works, I'll add it to my XML Library too... Ken, Todd. I think something wrong here. IF revCreateXMLTree() will not build DOM then it will work as SAX. Ken write: - it will send you event on each tag -- this is SAX. - you need write own code to handle tags, attributes, dat -- this is SAX. Todd, You CAN NOT parse only PART of XML. XML this is text document. You (actually not you but XML parser) MUST load and read it byte after byte sequentially. You see? - already was mentioned: * SAX adv - is good for ONE iteration on XML document - eat low RAM - so can be used for huge XML documents. But again only one iteration. E.g. Transformation of one XML doc into other disadvantages: - develop need write many code to handle tags - bad if needed do many iterations * DOM adv - is good for MANY iterations down and top - easer code to use disadvantages: - eat a lots of RAM - limited to documents that fit RAM. - Todd, you have point you need FAST speed. If you think that 500Mb of RAM is okay for your app then go with DOM. IF that is okay you need special tools. Also in ideal these tools must be able handle queries to XML document. This can be Xpath, Xquery, or SQL/XML. You have no other options. Another important question: Do you have STATIC XML document? data are fixed? you will send this to many your users ? or each user will have own document? IF you have static document, then you need simply parse it ONCE and load data into database. IF you have static document then no sense parse it millions times on each query. -- Best regards, Ruslan Zasukhin VP Engineering and New Technology Paradigma Software, Inc Valentina - Joining Worlds of Information http://www.paradigmasoft.com [I feel the need: the need for speed] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
Hi Tuviah, can write a SAX processor, in which case you won't have the whole tree in memory, but your sp You can use revcreatexmltree, and just tell it to not build the dom tree/or keep the document in memory, and handle the messages. revCreateXMLTree(field XML Data,true,false,true) very glad to see (read) that you are still alive and well :-) Tuviah Snyder www.mddev.com Best from germany Klaus Major [EMAIL PROTECTED] http://www.major-k.de ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
Todd- Can you just work with the DTD? And then query the xml document for data elements you're interested in? -- -Mark Wieder [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
Also in ideal these tools must be able handle queries to XML document. This can be Xpath, Xquery, or SQL/XML. ...and I'd advise staying away from XPath as well. It's gotten splintered into too many proprietary spinoffs. XQuery is much easier to read (IMO) and is pretty standardized these days. The only thing you can't do with it is update a document. To my mind, a 100MB xml document is poorly designed. It should be segmented into a hierarchy of smaller documents or exported to a database. But nobody asked me. -- -Mark Wieder [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
On Feb 9, 2006, at 1:01 PM, Mark Wieder wrote: Todd- Can you just work with the DTD? And then query the xml document for data elements you're interested in? I am not sure. But that might work Todd -- Todd Geist __ g e i s t i n t e r a c t i v e ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
Also in ideal these tools must be able handle queries to XML document. This can be Xpath, Xquery, or SQL/XML. ..and I'd advise staying away from XPath as well. It's gotten splintered into too many proprietary spinoffs. XQuery is much easier to read (IMO) and is pretty standardized these days. The only thing you can't do with it is update a document. To my mind, a 100MB xml document is poorly designed. It should be segmented into a hierarchy of smaller documents or exported to a database. But nobody asked me. The 100MB file IS (or can be) one of the hiearchy of documents for this application. ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Massive XML docs
Hello Everyone. I have some very large XML files ( 100mb or more) that I would to parse. And this thing has to be fast. I mean FAST. Since I have done almost no work with rev and XML, I am looking for advice on how to proceed The user experience needs to be some thing like this... I select the XML and very quickly I see the info about the top level nodes I select one of the top level nodes and I am very quickly presented with more detail on it. Which may include all of it's children and also information from other parts of the XML document that is related. I continue to work my way around the document, by simple selecting the elements that I need more info on. At no point will a user ever need to see all the data that is in the XML doc. They are almost always looking for info on just one element buried in there. What I don't know is if the user is just walking around the xml document one piece at a time, do I need to load the whole thing into RAM in a revXMLTree, and If I do, won't that just be insane for 100mb xml file. Or can I do it one step at a time as the user requests more detailed info. Can Valentina or SQLlite be employed to help the situation? Or would it be faster to parse it into a whole slew of custom props? Any ideas and or thoughts would be much appreciated Thanks Todd -- Todd Geist __ g e i s t i n t e r a c t i v e ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Massive XML docs
On 2/8/06 5:50 PM, Todd Geist [EMAIL PROTECTED] wrote: Hi Todd, Hello Everyone. I have some very large XML files ( 100mb or more) that I would to parse. And this thing has to be fast. I mean FAST. Since I have done almost no work with rev and XML, I am looking for advice on how to proceed The user experience needs to be some thing like this... I select the XML and very quickly I see the info about the top level nodes I select one of the top level nodes and I am very quickly presented with more detail on it. Which may include all of it's children and also information from other parts of the XML document that is related. I continue to work my way around the document, by simple selecting the elements that I need more info on. At no point will a user ever need to see all the data that is in the XML doc. They are almost always looking for info on just one element buried in there. What I don't know is if the user is just walking around the xml document one piece at a time, do I need to load the whole thing into RAM in a revXMLTree, and If I do, won't that just be insane for 100mb xml file. Or can I do it one step at a time as the user requests more detailed info. RevXMLTree this is DOM model ? If yes, then expect that such RAM tree will have size 5 times more than original XML document. On the other hand DOM is fastest way to iterate tree. Can Valentina or SQLlite be employed to help the situation? Or would it be faster to parse it into a whole slew of custom props? Any ideas and or thoughts would be much appreciated Todd, you have touch just HUGE issue. First of all it needs better understand your task: - so you parse document. Extract info, what next? your user must be able do queries to it ? then it sounds like DBMS job. For your task in the world exists few streams: A) work on XML itself - so called Native XML dbs B) put XML into database. It looks that major DBMS vendors as Oracle, IBM and MS do win war, So stream b) becomes the main I can make small announce -- that we develop in Valentina during last few months new XML features which will be comparable to Oracle, IBM and MS. Soon we will introduce first wave of them. To get more details please subscribe to Valentina beta list. -- Best regards, Ruslan Zasukhin VP Engineering and New Technology Paradigma Software, Inc Valentina - Joining Worlds of Information http://www.paradigmasoft.com [I feel the need: the need for speed] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Massive XML docs
Todd- I'm with Ruslan on this one. DOM is the fastest way to access xml information, but the tree size will be huge (are you talking about a single 100MB file rather than multiple files adding up to 100MB?). You can write a SAX processor, in which case you won't have the whole tree in memory, but your speed will drop noticeably. XML is nice in that it's human-parseable as well as machine-parseable (notice I didn't say readable), but it's quite a bloated format for both storage and searching. Native xml databases provide speed for data storage and retrieval but lack in querying speed unless they use significant resources in indexing. -- -Mark Wieder [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Massive XML docs
On 2/8/06 8:59 PM, Mark Wieder [EMAIL PROTECTED] wrote: Hi Mark, Hi Todd, Todd- I'm with Ruslan on this one. DOM is the fastest way to access xml information, but the tree size will be huge (are you talking about a single 100MB file rather than multiple files adding up to 100MB?). You can write a SAX processor, in which case you won't have the whole tree in memory, but your speed will drop noticeably. SAX is not good for many iterations. Usually it is used to parse all data and e,g, transform them or store into DBMS. XML is nice in that it's human-parseable as well as machine-parseable (notice I didn't say readable), but it's quite a bloated format for both storage and searching. right Native xml databases provide speed for data storage and retrieval but lack in querying speed unless they use significant resources in indexing. If cover Native XML databases. * they tend to use Xpath and/or Xquery as query language. * they do indexing like DBMS do BUT ... I can bet that quite soon they will go into history. I have see similar opinions on oracle.com :-) Oracle guys point that last 2 years vendors of Native XML dbs almost do not take part in development of Xquery standard and so on. This job do mainly Oracle, IBM, MS. Again, IMHO, this was just a fashion stream. But Text format is text format. In 1970th programmers have invent dbs exactly to run away from text formats. We have made a lots of discussion last time in team, with other developers who heavy use e.g. MS, with our university guys about what advantages gives XML and where it should live. Our resume: at least in areas: - WEB - data transfer One developer have told me how is happy that he have build his C# .NET APPLICATION (accounting for Germany customers) with XML middle-layer. So Application code do not depend directly on database structure. Interesting point of view I should say. -- Best regards, Ruslan Zasukhin VP Engineering and New Technology Paradigma Software, Inc Valentina - Joining Worlds of Information http://www.paradigmasoft.com [I feel the need: the need for speed] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
can write a SAX processor, in which case you won't have the whole tree in memory, but your sp You can use revcreatexmltree, and just tell it to not build the dom tree/or keep the document in memory, and handle the messages. revCreateXMLTree(field XML Data,true,false,true) Tuviah Snyder www.mddev.com ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
On Feb 8, 2006, at 9:44 PM, [EMAIL PROTECTED] wrote: You can use revcreatexmltree, and just tell it to not build the dom tree/or keep the document in memory, and handle the messages. revCreateXMLTree(field XML Data,true,false,true) But won't that still choke on something as big as 100MB? Todd -- Todd Geist __ g e i s t i n t e r a c t i v e ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
On 2/8/06 11:26 PM, Todd Geist [EMAIL PROTECTED] wrote: On Feb 8, 2006, at 9:44 PM, [EMAIL PROTECTED] wrote: You can use revcreatexmltree, and just tell it to not build the dom tree/or keep the document in memory, and handle the messages. revCreateXMLTree(field XML Data,true,false,true) But won't that still choke on something as big as 100MB? If I read it correctly, the answer is no - basically, with true as the last param, you're telling the DLL to send you a message when it's about to start parsing the tree, when it encounters each node (so you can extract attributes from the node) and when it encounters data for the node. So you write your own code to work with the XML data as it's being parsed... Ken Ray Sons of Thunder Software Web site: http://www.sonsothunder.com/ Email: [EMAIL PROTECTED] ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
On Feb 8, 2006, at 11:15 PM, Ken Ray wrote: If I read it correctly, the answer is no - basically, with true as the last param, you're telling the DLL to send you a message when it's about to start parsing the tree, when it encounters each node (so you can extract attributes from the node) and when it encounters data for the node. So you write your own code to work with the XML data as it's being parsed... hmmm... that may be perfect for what I need to do, since I don't need to parse the whole document, just the parts the user is interested in. It is unlikely that user will ever need to get at the vast majority of the data in there. I wonder... Todd -- Todd Geist __ g e i s t i n t e r a c t i v e ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: massive xml docs
On Feb 8, 2006, at 9:44 PM, [EMAIL PROTECTED] wrote: You can use revcreatexmltree, and just tell it to not build the dom tree/or keep the document in memory, and handle the messages. revCreateXMLTree(field XML Data,true,false,true) But won't that still choke on something as big as 100MB? Todd Why Mr. Geist, we seem to be asking the same questions. About FMPro DDR docs I suspect. ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
Re: Massive XML docs
Hello Everyone. I have some very large XML files ( 100mb or more) that I would to parse. And this thing has to be fast. I mean FAST. Since I have done almost no work with rev and XML, I am looking for advice on how to proceed I've been asking the same questions of Emmanuel over at Satimage. (SUL - Smile User List) ___ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution