On Sat, Jan 21, 2012 at 3:00 AM, Tobias Bocanegra <tri...@adobe.com> wrote:
> nt:unstructured has orderable childnodes, and we also preach to use
> nt:unstructured whereever you can. Also, i assume that a lot of
> applications benefit from node ordering (like a CMS). So I think that
> the majority of the repositories have a lot of nodes with ordered
> childnodes.


agreed, at least for small child node lists (e.g. < 1k child nodes).
but i doubt there's a need for large (e.g. 1M entries)  orderable
child node lists.

> thus i would not put too much effort in having an
> efficient non-ordered format, but making the ordered storage as fast
> as possible.

i am thinking of large child node collections with stable order.
however, they wouldn't be insertion-ordered nor client-orderable.
that's the compromise.

> using lexicographical ordering for unordered child nodes
> would certainly be a good option,

that would be an implementation detail. as long as the order
is stable (repeated reads of the same node revision should
provide the same order of the child nodes) we shouldn't make
any promises about how the order is defined.

cheers
stefan

> [...] as i can reduce the lookup (but not
> insert). also, i think that re-orderings happen relatively rarely.
>
> regards, toby
>
> On Fri, Jan 20, 2012 at 9:13 AM, Stefan Guggisberg
> <stefan.guggisb...@gmail.com> wrote:
>> On Fri, Jan 20, 2012 at 5:43 PM, Jukka Zitting <jukka.zitt...@gmail.com> 
>> wrote:
>>> Hi,
>>>
>>> Thinking about this more generally, it would definitely be useful to
>>> be able to use a more efficient underlying storage for unorderable
>>> nodes. However, there still are hard use cases that require us to
>>> support orderable nodes as indicated by the type of the parent node.
>>> Thus the implementation basically has two options:
>>>
>>> 1) Use a single data structure for all nodes, like Jackrabbit
>>> currently does. This simplifies the implementation but prevents us
>>> from enjoying the performance and scalability benefits of unorderable
>>> nodes.
>>>
>>> 2) Use two data structures, one for orderable and another for
>>> unorderable nodes. This is more complex (not least because it links
>>> node types to the underlying storage model), but is probably required
>>> if we want to efficiently support up to millions of child nodes per a
>>> single parent.
>>
>> i agree that it's probably required for efficiently handling both large
>> and small child node lists.
>>
>> i am not sure that it necessarily means linking node types to the
>> underlying storage modal. the microkernel prototype currently has
>> no notion of node types and that's IMO a good thing.
>>
>> node types have a way to dominant role in the current jackrabbit
>> core implementation. jackrabbit started out as the official reference
>> implementation for jsr-170, the focus was on supporting every
>> feature of the spec.
>>
>> in jr3 i guess we can and should trade some 'note type correctness'
>> for improved efficiency.
>>
>>>
>>> It might also be possible to have the underlying storage model
>>> unorderable, and just include extra ordering metadata at a higher
>>> layer where also node types are handled. Option 2b, if you like, with
>>> probably fairly significant overhead when iterating over nodes (the
>>> implementation would probably need to pre-fetch all child nodes in
>>> advance to access their ordering metadata).
>>>
>>> If we do have native support for unorderable nodes, then I wouldn't
>>> promise any particular ordering (alphabetic, insertion order, etc.)
>>> but rather leave it undefined like in Java's Map interface. The
>>> underlying implementation is then free to use whatever ordering it
>>> thinks is best.
>>
>> i agree.
>>
>> cheers
>> stefan
>>
>>>
>>> PS. Note that ordering affects not just node traversal, but also the
>>> document ordering used in search results. Though in practice we
>>> already now disable document ordering of search results by default.
>>>
>>> BR,
>>>
>>> Jukka Zitting

Reply via email to