Re: [Ohrrpgce] Node access by path (RPath?)

Ralph Versteegen Wed, 02 Jun 2010 19:07:18 -0700

On 3 June 2010 11:33, Mike Caron <[email protected]> wrote:
> On 02/06/2010 7:24 PM, James Paige wrote:
>>
>> On Wed, Jun 02, 2010 at 07:14:54PM -0400, Mike Caron wrote:
>>>
>>> On 02/06/2010 7:09 PM, James Paige wrote:
>>>>
>>>> On Wed, Jun 02, 2010 at 07:03:37PM -0400, Mike Caron wrote:
>>>>>
>>>>> On 02/06/2010 7:02 PM, James Paige wrote:
>>>>>>
>>>>>> On Wed, Jun 02, 2010 at 06:44:30PM -0400, Mike Caron wrote:
>>>>>>>
>>>>>>> On 02/06/2010 5:52 PM, James Paige wrote:
>>>>>>>>
>>>>>>>> Is this an accurate description of what Rpath is supposed to do?
>>>>>>>>
>>>>>>>> node = Reload.RPath(doc, "/party/slot[3]/stats/stat[0]/max")
>>>>>>>>
>>>>>>>> Which would look in the root node for a node named "party",
>>>>>>>> then look in the party node for a node named "slot" with int value
>>>>>>>> 3,
>>>>>>>> then look in the slot node for a node named stats,
>>>>>>>> then look in the stats node for a node named "stat" with int value
>>>>>>>> 0,
>>>>>>>> then look in the stat node for a node named "max" and return it
>>>>>>>>
>>>>>>>> If that is indeed what RPath is supposed to do, I feel up to the
>>>>>>>> task of
>>>>>>>> implementing it.
>>>>>>>>
>>>>>>>> And if that is NOT what RPath is supposed to mean, then I still feel
>>>>>>>> up
>>>>>>>> to implementing it, but I will call it something different :)
>>>>>>>
>>>>>>> Almost. RPath is intended to mimic XPath, which would interpret your
>>>>>>> example as:
>>>>>>>
>>>>>>> Match the root node named "party", then
>>>>>>>    Match the third sub-node named "slot", then
>>>>>>>     Match the first node named "stats", then
>>>>>>>      Throw an error, because indicies are one-based, but if it was a
>>>>>>> one,
>>>>>>>      Match the first node named "stat", then
>>>>>>>       Match the first node named "max"
>>>>>>>        Return this node
>>>>>>>
>>>>>>> To do what you suggest, the expression would look like:
>>>>>>>
>>>>>>> party/slot[.="3"]/stats/stat[.="0"]/max
>>>>>>>
>>>>>>> And match it against the document root, whatever it may be.
>>>>>>> Explanation:
>>>>>>>
>>>>>>> 1. A leading slash matches the *document*, not the root node.
>>>>>>> (Actually,
>>>>>>> in XML, there's a hidden node above the first tag. The leading slash
>>>>>>> matches that)
>>>>>>> 2. The dot represents the "current" node
>>>>>>> 3. The expressions inside the brackets are like WHERE clauses in SQL
>>>>>>> 4. A number by itself means the Xth node.
>>>>>>>
>>>>>>> You can do stuff like
>>>>>>>
>>>>>>> no...@foo="bar"]
>>>>>>>
>>>>>>> Which matches "node" who has an attribute "foo" whose value is "bar",
>>>>>>> and is the intended way to match nodes (Incidentally, this is why I
>>>>>>> reserved the leading @ for attributes converted from XML).
>>>>>>>
>>>>>>> So, where does this leave RPath (which I have not been describing so
>>>>>>> far)? I suggest we stick closer to XPath, but perhaps with some
>>>>>>> limitations:
>>>>>>>
>>>>>>> 1. XPath allows a bunch of pseudo functions inside the brackets,
>>>>>>> things like
>>>>>>>
>>>>>>> node[next(.) == "foo"]
>>>>>>>
>>>>>>> Which would match "node", whose next sibling's value equalled "foo".
>>>>>>> We
>>>>>>> don't need that.
>>>>>>>
>>>>>>> 2. Allow matching based on children alone, so we don't need . to mean
>>>>>>> the current node.
>>>>>>>
>>>>>>> 3. Recommend that people don't use the content of the node itself to
>>>>>>> distinguish between similar nodes. Use children for that. (Or,
>>>>>>> better,
>>>>>>> give the nodes distinct names)
>>>>>>>
>>>>>>> If you need clarification, let me know (or pull me into IRC or
>>>>>>> something)
>>>>>>>
>>>>>>
>>>>>> Okay. Sounds like RPath is a lot heavier than what I have in mind
>>>>>>
>>>>>> I will continue with my simplified implementation, but I will call it
>>>>>> something else to avoid confusion.
>>>>>>
>>>>>> Probably Reload.Ext.NodeByPath
>>>>>
>>>>> Maybe, but really, my entire email could probably have been distilled
>>>>> into the following sentence:
>>>>>
>>>>> "Correct, except I would recommend that [3] means the third node, not a
>>>>> node whose value is 3".
>>>>
>>>> The reason I don't want to do that is because of sparse arrays.
>>>>
>>>> For example, when I write the globals into the rsav file, I only write
>>>> the non-zero values.
>>>>
>>>> Each "global" node has a value which is the global id number, and a
>>>> child node named "int" which has its integer value.
>>>>
>>>> So I want "/script/globals/global[47]/int" to give me the node
>>>> containing the value for global ID 47, not the 47th global saved in the
>>>> file.
>>>
>>> Well, I guess I can agree with that, but perhaps a better way would be
>>> to store them like:
>>>
>>> <globals>
>>>   <global1>123</global1>
>>>   <global43>1</global43>
>>> </globals>
>>>
>>> Or, even,
>>>
>>> <globals>
>>>   <global>
>>>     <num>1</num>
>>>     <val>123</val>
>>>   </global>
>>>   <global>
>>>     <num>43</num>
>>>     <val>1</val>
>>>   </global>
>>> </globals>
>>>
>>> I'm partial to the first one, especially since another idea I had on the
>>> drawing-board (in my brain, not on the wiki page) was to have each node
>>> maintain a hash with all its children. Then, finding global2345 would be
>>> really fast, much faster than iterating each child and looking for one
>>> whose value is 2345.


Using "global2345" seems worse in every way to me. It takes up more
space (on disk and in memory) (I calculate about 28 bytes per global
instead of 18), it's annoying and slow to process by the global
loading code -- if you're loading globals, to process node
"global2345" you need to check whether it's of the form "globalXXXX",
then extract the number and convert it to an int -- it's breaks
abstractions (you no longer have a "global", you have an
unrecognisable "global2345"), and finally, there's no reason you can't
create a hash table of child node names+values.

Completely unrelatedly, let's not worry too much about how globals are
stored, because I think we agreed that RELOAD is not an appropriate
file format for storing script objects. So the globals array will
probably eventually be removed and replaced with a big opaque binary
lump that contains the script interpreter state.

>> faster is good :)
>> I can see how that would improve seek time. What impact would that hash
>> have on memory usage and loading time?
>
> Loading time, would be increased, but by a negligible amount (like, it would
> have a measurable effect on the massive files we were playing with the other
> day.)

As long as you only create hash tables for nodes with more than X
children (eg X = 6). A hash table for every node would probably nearly
double memory usage and loading time.

>> I was experimenting with:
>>
>> <globals>
>>   <1>123</1>
>>   <43>1</43>
>> </globals>
>>
>> but TMC talked me out of it :)
>
> Wait, what's wrong with that? That's my alternate suggestion #1! (except,
> mine involves more string concatenation!)
>
>> Perhaps it needs further debate.
>
> I would like to hear TMC's opinion on this.
>
>> ---
>> James
>
>
> --
> Mike
> _______________________________________________
> Ohrrpgce mailing list
> [email protected]
> http://lists.motherhamster.org/listinfo.cgi/ohrrpgce-motherhamster.org
>
_______________________________________________
Ohrrpgce mailing list
[email protected]
http://lists.motherhamster.org/listinfo.cgi/ohrrpgce-motherhamster.org

Re: [Ohrrpgce] Node access by path (RPath?)

Reply via email to