Re: massive xml docs

2006-02-13 Thread Marielle Lange

Hi Todd,

I don't know how it would behave with a 100MB file but here is the  
method I use to get the data from a particular node of the tree  
without having to construct the full tree.


Best wishes,
Marielle

...
 


#
#  Get Tag Content In XML tree
#
# --   @Description: Get the content of a tag, given a path in the  
xml tree
# --   A tag tree has the following syntax:  
node1:node2:node3:node4
# --node1...node2...node3data/ 
node3.../node2.../node1
# --   @Returns:  Content part when Tag( Params)? __Content__/ 
Tag(text string)


function getTagContent_XML pXMLtext, pTagTree
  IF pXMLtext is empty THEN terminate(BUG x. An empty content was  
given to parse. pXMLtext shouldn't be empty in function  
_getTagContentXML.)
  IF pTagTree is empty THEN terminate(BUG x. No Tag Tree was  
specified. pTagTree shouldn't be empty in function _getTagContentXML.)

  
  set the itemdel to :
  replace quote with empty in pTagTree-- This is to get rid of  
quotes in case there is any

  --
  repeat for each item tTag in pTagTree
put getTagContent_XML(pXMLtext, tTag) into pXMLtext
  end repeat
  return pXMLtext
end getTagContent_XML
...
function getTagContent pXMLtext, pTagName,
# --   @Requires: swapEOL()  - not included here -- swaps end of  
lines from cr to ¬ to allow for multiline matches with matchtext

# --   @Requires: stripInitialTabs() -- not included here
  put swapEOL(pXMLtext, remove) into pXMLtext
  if matchtext(pXMLtext, (?i)  pTagName  [ ]?[^]*(.+?)/   
pTagName  , tTagContent) is false then return empty

  put swapEOL(tTagContent, restore)  into tTagContent
  put stripInitialTabs(tTagContent) into pXMLtext
  return pXMLtext
end getTagContent
.


 


Marielle Lange (PhD),  Psycholinguist

Alternative emails: [EMAIL PROTECTED],

Homepage
http://homepages.widged.com/mlange/
Easy access to lexical databaseshttp:// 
lexicall.widged.com/
Supporting Education Technologists  http:// 
revolution.widged.com/wiki/


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-10 Thread Mark Wieder
Ruslan-

Thursday, February 9, 2006, 1:28:53 PM, you wrote:

 Under proprietary may be you mean implementations of engines?

No, I'm referring to syntactical differences and proprietary
extensions. This is why XQuery was born, as an open-source standard to
try to rein this in.

 But this is not a problem. SQL also have standard. And many DBMS vendors
 implement it.

A twisty maze of winding SQL standards, each different. As they say,
the nice thing about standards is that there are so many to choose
from.

 Exists draft of XUpdate.

...and it's been a draft for some time, but it does seem to be the way
xml updates are heading. But it seems to me that XUpdate (and XQuery,
for that matter) was developed to do the sort of things people are
doing with AJAX these days.

-- 
-Mark Wieder
 [EMAIL PROTECTED]

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-09 Thread Ken Ray
On 2/9/06 1:32 AM, Todd Geist [EMAIL PROTECTED] wrote:

 
 On Feb 8, 2006, at 11:15 PM, Ken Ray wrote:
 
 
 If I read it correctly, the answer is no - basically, with true
 as the
 last param, you're telling the DLL to send you a message when it's
 about to
 start parsing the tree, when it encounters each node (so you can
 extract
 attributes from the node) and when it encounters data for the node.
 So you
 write your own code to work with the XML data as it's being parsed...
 
 hmmm... that may be perfect for what I need to do, since I don't need
 to parse the whole document, just the parts the user is interested
 in.  It is unlikely that user will ever need to get at the vast
 majority of the data in there.

Yes, and now that I see how this works, I'll add it to my XML Library too...

:-)

Ken Ray
Sons of Thunder Software
Web site: http://www.sonsothunder.com/
Email: [EMAIL PROTECTED]

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-09 Thread Ruslan Zasukhin
On 2/9/06 9:15 AM, Ken Ray [EMAIL PROTECTED] wrote:

 You can use revcreatexmltree, and just tell it to not build the dom
 tree/or keep the document in memory, and handle the messages.
 
 revCreateXMLTree(field XML Data,true,false,true)
 
 But won't that still choke on something as big as 100MB?
 
 If I read it correctly, the answer is no - basically, with true as the
 last param, you're telling the DLL to send you a message when it's about to
 start parsing the tree, when it encounters each node (so you can extract
 attributes from the node) and when it encounters data for the node. So you
 write your own code to work with the XML data as it's being parsed...

So it sounds like a SAX in fact. Right ?

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-09 Thread Ruslan Zasukhin
On 2/9/06 10:02 AM, Ken Ray [EMAIL PROTECTED] wrote:

 If I read it correctly, the answer is no - basically, with true as the
 last param, you're telling the DLL to send you a message when it's about to
 start parsing the tree, when it encounters each node (so you can extract
 attributes from the node) and when it encounters data for the node. So you
 write your own code to work with the XML data as it's being parsed...
 
 hmmm... that may be perfect for what I need to do, since I don't need
 to parse the whole document, just the parts the user is interested
 in.  It is unlikely that user will ever need to get at the vast
 majority of the data in there.
 
 Yes, and now that I see how this works, I'll add it to my XML Library too...

Ken, Todd.

I think something wrong here.

IF revCreateXMLTree() will not build DOM then it will work as SAX.

Ken write: 
  - it will send you event on each tag -- this is SAX.
  - you need write own code to handle tags, attributes, dat -- this is SAX.

Todd, You CAN NOT parse only PART of XML.

XML this is text document. You (actually not you but XML parser) MUST load
and read it byte after byte sequentially. You see?

-
already was mentioned:

* SAX
adv  
- is good for ONE iteration on XML document
- eat low RAM
- so can be used for huge XML documents. But again only one
iteration. E.g. Transformation of one XML doc into other

disadvantages: 
- develop need write many code to handle tags
- bad if needed do many iterations


* DOM
adv  
- is good for MANY iterations down and top
- easer code to use

disadvantages: 
- eat a lots of RAM
- limited to documents that fit RAM.


-
Todd, you have point you need FAST speed.

If you think that 500Mb of RAM is okay for your app then go with DOM.

IF that is okay you need special tools.
Also in ideal these tools must be able handle queries to XML document.
This can be Xpath, Xquery, or SQL/XML.

You have no other options.


Another important question:
Do you have STATIC XML document? data are fixed?
you will send this to many your users ?

or each user will have own document?


IF you have static document, then you need simply parse it ONCE and load
data into database. IF you have static document then no sense parse it
millions times on each query.


-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-09 Thread Klaus Major

Hi Tuviah,

can write a SAX processor, in which case you won't have the whole  
tree

in memory, but your sp

You can use revcreatexmltree, and just tell it to not build the dom
tree/or keep the document in memory, and handle the messages.
revCreateXMLTree(field XML Data,true,false,true)


very glad to see (read) that you are still alive and well :-)


Tuviah Snyder
www.mddev.com


Best from germany

Klaus Major
[EMAIL PROTECTED]
http://www.major-k.de

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-09 Thread Mark Wieder
Todd-

Can you just work with the DTD? And then query the xml document for
data elements you're interested in?

-- 
-Mark Wieder
 [EMAIL PROTECTED]

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-09 Thread Mark Wieder
 Also in ideal these tools must be able handle queries to XML document.
 This can be Xpath, Xquery, or SQL/XML.

...and I'd advise staying away from XPath as well. It's gotten
splintered into too many proprietary spinoffs.

XQuery is much easier to read (IMO) and is pretty standardized these
days. The only thing you can't do with it is update a document.

To my mind, a 100MB xml document is poorly designed. It should be
segmented into a hierarchy of smaller documents or exported to a
database. But nobody asked me.

-- 
-Mark Wieder
 [EMAIL PROTECTED]

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-09 Thread Todd Geist


On Feb 9, 2006, at 1:01 PM, Mark Wieder wrote:


Todd-

Can you just work with the DTD? And then query the xml document for
data elements you're interested in?


I am not sure.  But that might work


Todd

--

Todd Geist
__
g e i s t   i n t e r a c t i v e

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-09 Thread Bruce Robertson
 Also in ideal these tools must be able handle queries to XML document.
 This can be Xpath, Xquery, or SQL/XML.
 
 ..and I'd advise staying away from XPath as well. It's gotten
 splintered into too many proprietary spinoffs.
 
 XQuery is much easier to read (IMO) and is pretty standardized these
 days. The only thing you can't do with it is update a document.
 
 To my mind, a 100MB xml document is poorly designed. It should be
 segmented into a hierarchy of smaller documents or exported to a
 database. But nobody asked me.

The 100MB file IS (or can be) one of the hiearchy of documents for this
application.

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Massive XML docs

2006-02-08 Thread Todd Geist

Hello Everyone.

I have some very large XML files ( 100mb or more) that I would to  
parse. And this thing has to be fast.  I mean FAST. Since I have done  
almost no work with rev and XML, I am looking for advice on how to  
proceed


The user experience needs to be some thing like this...

I select the XML and very quickly I see the info about the top level  
nodes


I select one of the top level nodes and I am very quickly presented  
with more detail on it. Which may include all of it's children and  
also information from other parts of the XML document that is related.


I continue to work my way around the document, by simple selecting  
the elements that I need more info on.



At no point will a user ever need to see all the data that is in the  
XML doc.  They are almost always looking for info on just one element  
buried in there.  What I don't know is if the user is just walking  
around the xml document one piece at a time, do I need to load the  
whole thing into RAM in a revXMLTree, and If I do, won't that just be  
insane for 100mb xml file.  Or can I do it one step at a time as the  
user requests more detailed info.


Can Valentina or SQLlite be employed to help the situation?

Or would it be faster to parse it into a whole slew of custom props?

Any ideas and or thoughts would be much appreciated

Thanks

Todd



--

Todd Geist
__
g e i s t   i n t e r a c t i v e

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Massive XML docs

2006-02-08 Thread Ruslan Zasukhin
On 2/8/06 5:50 PM, Todd Geist [EMAIL PROTECTED] wrote:

Hi Todd,

 Hello Everyone.
 
 I have some very large XML files ( 100mb or more) that I would to
 parse. And this thing has to be fast.  I mean FAST. Since I have done
 almost no work with rev and XML, I am looking for advice on how to
 proceed
 
 The user experience needs to be some thing like this...
 
 I select the XML and very quickly I see the info about the top level
 nodes
 
 I select one of the top level nodes and I am very quickly presented
 with more detail on it. Which may include all of it's children and
 also information from other parts of the XML document that is related.
 
 I continue to work my way around the document, by simple selecting
 the elements that I need more info on.


 At no point will a user ever need to see all the data that is in the
 XML doc.  They are almost always looking for info on just one element
 buried in there.  What I don't know is if the user is just walking
 around the xml document one piece at a time, do I need to load the
 whole thing into RAM in a revXMLTree, and If I do, won't that just be
 insane for 100mb xml file.  Or can I do it one step at a time as the
 user requests more detailed info.

RevXMLTree this is DOM model ?

If yes, then expect that such RAM tree will have size 5 times more than
original XML document. On the other hand DOM is fastest way to iterate tree.

 
 Can Valentina or SQLlite be employed to help the situation?
 Or would it be faster to parse it into a whole slew of custom props?
 Any ideas and or thoughts would be much appreciated

Todd, you have touch just HUGE issue.

First of all it needs better understand your task:
- so you parse document. Extract info, what next?
your user must be able do queries to it ?

then it sounds like DBMS job.

For your task in the world exists few streams:

A) work on XML itself - so called Native XML dbs
B) put XML into database.

It looks that major DBMS vendors as Oracle, IBM and MS do win war,
So stream b) becomes the main


I can make small announce -- that we develop in Valentina during last few
months new XML features which will be comparable to Oracle, IBM and MS. Soon
we will introduce first wave of them.

To get more details please subscribe to Valentina beta list.

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Massive XML docs

2006-02-08 Thread Mark Wieder
Todd-

I'm with Ruslan on this one. DOM is the fastest way to access xml
information, but the tree size will be huge (are you talking about a
single 100MB file rather than multiple files adding up to 100MB?). You
can write a SAX processor, in which case you won't have the whole tree
in memory, but your speed will drop noticeably. XML is nice in that
it's human-parseable as well as machine-parseable (notice I didn't say
readable), but it's quite a bloated format for both storage and
searching. Native xml databases provide speed for data storage and
retrieval but lack in querying speed unless they use significant
resources in indexing.

-- 
-Mark Wieder
 [EMAIL PROTECTED]

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Massive XML docs

2006-02-08 Thread Ruslan Zasukhin
On 2/8/06 8:59 PM, Mark Wieder [EMAIL PROTECTED] wrote:

Hi Mark, 
Hi Todd,

 Todd-
 
 I'm with Ruslan on this one. DOM is the fastest way to access xml information,
 but the tree size will be huge (are you talking about a single 100MB file
 rather than multiple files adding up to 100MB?).

 You can write a SAX processor, in which case you won't have the whole tree in
 memory, but your speed will drop noticeably.

SAX is not good for many iterations. Usually it is used to parse all data
and e,g, transform them or store into DBMS.
 
 XML is nice in that it's human-parseable as well as machine-parseable (notice
 I didn't say readable), but it's quite a bloated format for both storage and
 searching. 

right

 Native xml databases provide speed for data storage and retrieval but lack in
 querying speed unless they use significant resources in indexing.

If cover Native XML databases.

* they tend to use Xpath and/or Xquery as query language.
* they do indexing like DBMS do BUT ...

I can bet that quite soon they will go into history.
I have see similar opinions on oracle.com :-)
Oracle guys point that last 2 years vendors of Native XML dbs almost do not
take part in development of Xquery standard and so on. This job do mainly
Oracle, IBM, MS.

Again, IMHO, this was just a fashion stream. But Text format is text format.
In 1970th programmers have invent dbs exactly to run away from text formats.

We have made a lots of discussion last time in team, with other developers
who heavy use e.g. MS, with our university guys about what advantages gives
XML and where it should live.

Our resume: at least in areas:
- WEB
- data transfer

One developer have told me how is happy that he have build his C# .NET
APPLICATION (accounting for Germany customers) with XML middle-layer. So
Application code do not depend directly on database structure. Interesting
point of view I should say.

-- 
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-08 Thread tsnyder
can write a SAX processor, in which case you won't have the whole tree
in memory, but your sp
You can use revcreatexmltree, and just tell it to not build the dom
tree/or keep the document in memory, and handle the messages.

revCreateXMLTree(field XML Data,true,false,true)

Tuviah Snyder
www.mddev.com

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-08 Thread Todd Geist


On Feb 8, 2006, at 9:44 PM, [EMAIL PROTECTED] wrote:


You can use revcreatexmltree, and just tell it to not build the dom
tree/or keep the document in memory, and handle the messages.

revCreateXMLTree(field XML Data,true,false,true)


But won't that still choke on something as big as 100MB?

Todd




--

Todd Geist
__
g e i s t   i n t e r a c t i v e

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-08 Thread Ken Ray
On 2/8/06 11:26 PM, Todd Geist [EMAIL PROTECTED] wrote:

 
 On Feb 8, 2006, at 9:44 PM, [EMAIL PROTECTED] wrote:
 
 You can use revcreatexmltree, and just tell it to not build the dom
 tree/or keep the document in memory, and handle the messages.
 
 revCreateXMLTree(field XML Data,true,false,true)
 
 But won't that still choke on something as big as 100MB?

If I read it correctly, the answer is no - basically, with true as the
last param, you're telling the DLL to send you a message when it's about to
start parsing the tree, when it encounters each node (so you can extract
attributes from the node) and when it encounters data for the node. So you
write your own code to work with the XML data as it's being parsed...


Ken Ray
Sons of Thunder Software
Web site: http://www.sonsothunder.com/
Email: [EMAIL PROTECTED]

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-08 Thread Todd Geist


On Feb 8, 2006, at 11:15 PM, Ken Ray wrote:



If I read it correctly, the answer is no - basically, with true  
as the
last param, you're telling the DLL to send you a message when it's  
about to
start parsing the tree, when it encounters each node (so you can  
extract
attributes from the node) and when it encounters data for the node.  
So you

write your own code to work with the XML data as it's being parsed...


hmmm... that may be perfect for what I need to do, since I don't need  
to parse the whole document, just the parts the user is interested  
in.  It is unlikely that user will ever need to get at the vast  
majority of the data in there.


I wonder...

Todd

--

Todd Geist
__
g e i s t   i n t e r a c t i v e

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: massive xml docs

2006-02-08 Thread Bruce Robertson
 
 On Feb 8, 2006, at 9:44 PM, [EMAIL PROTECTED] wrote:
 
 You can use revcreatexmltree, and just tell it to not build the dom
 tree/or keep the document in memory, and handle the messages.
 
 revCreateXMLTree(field XML Data,true,false,true)
 
 But won't that still choke on something as big as 100MB?
 
 Todd

Why Mr. Geist, we seem to be asking the same questions. About FMPro DDR docs
I suspect.

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: Massive XML docs

2006-02-08 Thread Bruce Robertson
 Hello Everyone.
 
 I have some very large XML files ( 100mb or more) that I would to
 parse. And this thing has to be fast.  I mean FAST. Since I have done
 almost no work with rev and XML, I am looking for advice on how to
 proceed

I've been asking the same questions of Emmanuel over at Satimage. (SUL -
Smile User List)

___
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution