Re: How to implement a queue in Oak?
On 01/08/2014 10:42, Tommaso Teofili wrote: I don't yet have a proper proposal, but maybe what could be done may be something similar to what Davide has done with regards to the ordered index (using a skiplist), that is defining a specific structure of the nodes, together with a specific implementation, that avoids having to use sortable node types but is an inherently sorted structure, e.g. binary search tree. Any opinions? Unfortunately both the above options won't efficiently work for a reason or another as we discovered during the implementations of the ordered index. Skip list: * doesn't really scale with large numbers (500k-1M) as of the randomised access due to the probabilistic data structure. On the Segment case for example, fetching the next property resulted in an expensive operations with big data as you end-up most probably fetching a new segment that is not cached (read from disk). * in case of mongo back-end is not concurrent due to the storing of the next property under the node. The stack will raise a merge exception and it can't be catch by the commit hook. That's why we have to keep it asynchronous when running on mongo. Segment is not affected. BTree: Considered it but it requires sooner or later a big tree moving due to tree restructure and this is not efficient with oak. With the result that some CUD operations could take long to complete. Hopefully if the new algorithm will result in a fully working and scalable one we could consider making it a node type. Cheers Davide
Re: How to implement a queue in Oak?
Hi I have the impression, that JCR itself can adequately solve the problem using ordered child nodes. The problem is not JCR per-se but the implementation and the early-on decision to not optimize the use case of ordered child nodes (in favor of having support for optimized implementation of the unsorted case). Now with Oak, we also have customizable indices. Would it be possible to define an ordering property, index that property and then use a query to get these nodes. The query could be created such that the ordering property index would be considered (only). As a result we get quick and transparent sorted nodes ? Regards Felix Am 01.08.2014 um 07:56 schrieb Carsten Ziegeler cziege...@apache.org: I'm wondering if anyone has a good idea how to model a queue with efficient operations in JCR - or is JCR not suited for this use case? Regards Carsten 2014-07-30 15:57 GMT+02:00 Carsten Ziegeler cziege...@apache.org: Using a different storage than JCR would be easy in my case, however I *want* to use JCR Carsten 2014-07-30 14:55 GMT+02:00 Lukas Smith sm...@pooteeweet.org: Hi, I can totally see that it might be useful to be able to go through the Oak/JCR API to have a queue but maybe this is stretching Oak a bit far if you end up with 1k+ queues. However I think it would be great to look more into federation for this. I think ModeShape supports this quite well already, ie. being able to hook in another JCR tree, a file system, a git repository, CMIS .. I am sure that it would also be possible to implement on top of some MQ standard. see also https://docs.jboss.org/author/display/MODE/Federation?_sscc=t regards, Lukas On 30 Jul 2014, at 14:41, Angela Schreiber anch...@adobe.com wrote: hi carsten if you are expecting your nodes to be in a given order (e.g. the order of creation) you need to have a parent that has orderable children... in which case we don't make any promises about huge child collections... it will not work well. if you don't have the requirement of ordered children, you can have _many_ but need to make sure that your parent node doesn't have orderable children (e.g. oak:Unstructured)... but then you cannot expect that new children are appended at the end of the list... there is no list and there is not guaranteed order. i guess you have a little misunderstanding when it comes to the concept of orderable child nodes - JSR 283 will be your friend. regards angela On 30/07/14 13:27, Carsten Ziegeler cziege...@apache.org wrote: Hi, afaik with Oak the too many child nodes problem of JR2 is solved, therefore I'm wondering what the best way to store a queue in the repository is? In my use cases, there are usually not many items within a single queue, let's say a few hundreds. In some cases the queue might grow to some thousands but not more than maybe 20k. The idea is that new entries (nodes) are added to the end of the queue, and processing would read the first node from the queue, update the properties and once done, remove it. My initial design was to simply store all entries as sub nodes of some queue root entry without any hierarchy. addNode should add them at the end and simply iteration over the child nodes of the root gives the first entry. No need for sortable nodes. Does this make sense? Is there anything else to be considered? Regards Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: How to implement a queue in Oak?
Hi, On Mon, Aug 4, 2014 at 8:04 AM, Felix Meschberger fmesc...@adobe.com wrote: I have the impression, that JCR itself can adequately solve the problem using ordered child nodes. The problem is not JCR per-se but the implementation and the early-on decision to not optimize the use case of ordered child nodes (in favor of having support for optimized implementation of the unsorted case). The preference against ordering is more than just a performance optimization. Maintaining strict ordering semantics is impossible in an eventually consistent system with no global synchronization. For example, if two clients on different cluster nodes add entries at the end of the list at the same time, which entry should come first? And how do we ensure that the ordering remains consistent across all clients? Now with Oak, we also have customizable indices. Would it be possible to define an ordering property, index that property and then use a query to get these nodes. The query could be created such that the ordering property index would be considered (only). As a result we get quick and transparent sorted nodes? Yes, that's possible. My order by node name suggestion works roughly in the same way. The problem with these approaches is that you won't get strict ordering semantics without some external synchronization or a global clock to decide how the values of the ordering property should be assigned. However, it sounds like in Carsten's case relaxed ordering semantics based on millisecond timings from the system clock should be good enough. BR, Jukka Zitting
Re: How to implement a queue in Oak?
Hi, On Fri, Aug 1, 2014 at 11:58 AM, Carsten Ziegeler cziege...@apache.org wrote: in my case I have several producers and a single consumer, so your second suggestion should do the trick. Actually, the getChildNodeNames() feature is only available at Oak level, so you'll need to use JCR's getNodes() call to get something like this: // consumes all entries currently in the queue, call again later to continue void consume(Node queue) { // collect and sort all current entries in the queue MapString, Node entries = new TreeMapString, Node(); for (Node node : JcrUtils.getNodes(queue)) { entries.put(node.getName(), node); } // process all current entries in correct order for (Node entry : entries.values()) { processEntry(entry); entry.remove(); } queue.getSession().save(); } Can you make any comments about the performance of such a solution? I assume Node#getChildNodeNames() is pretty cheap, what about the addNode, getNode, removeNode methods? The exact performance depends on the storage backend you're using, but generally I'd expect the performance of the above code to be governed by the processEntry() call if it needs to do any non-trivial work on the entries in the queue. The getChildNodeNames/getNodes call (including iteration over the results) is probably the most expensive individual call if the queue is long, but since its performance is O(n) with all Oak backends, and the constant factor is pretty low, I would expect that part to only take a fraction of the time spent in the above call. The addNode/getNode/removeNode calls are O(log n) operations or faster, so the overall performance of the above method would follow O(n log n), including the explicit sorting. The more expensive the processEntry() call is, the closer the performance will be to O(n), which is the best case limit for n processEntry() calls regardless of which queue structure is used. Note that good performance here is achieved by making the consumer process entries in bulk. If you instead have a need to process entries only one by one in separate method calls (i.e. you can't keep the sorted list of entries in memory), you'd end up with an O(n^2) algorithm to consume all entries in the queue, as each consume call would need to traverse the list of remaining entries to find the one that should be processed first. BR, Jukka Zitting
Re: How to implement a queue in Oak?
I don't yet have a proper proposal, but maybe what could be done may be something similar to what Davide has done with regards to the ordered index (using a skiplist), that is defining a specific structure of the nodes, together with a specific implementation, that avoids having to use sortable node types but is an inherently sorted structure, e.g. binary search tree. Any opinions? Regards, Tommaso 2014-08-01 7:56 GMT+02:00 Carsten Ziegeler cziege...@apache.org: I'm wondering if anyone has a good idea how to model a queue with efficient operations in JCR - or is JCR not suited for this use case? Regards Carsten 2014-07-30 15:57 GMT+02:00 Carsten Ziegeler cziege...@apache.org: Using a different storage than JCR would be easy in my case, however I *want* to use JCR Carsten 2014-07-30 14:55 GMT+02:00 Lukas Smith sm...@pooteeweet.org: Hi, I can totally see that it might be useful to be able to go through the Oak/JCR API to have a queue but maybe this is stretching Oak a bit far if you end up with 1k+ queues. However I think it would be great to look more into federation for this. I think ModeShape supports this quite well already, ie. being able to hook in another JCR tree, a file system, a git repository, CMIS .. I am sure that it would also be possible to implement on top of some MQ standard. see also https://docs.jboss.org/author/display/MODE/Federation?_sscc=t regards, Lukas On 30 Jul 2014, at 14:41, Angela Schreiber anch...@adobe.com wrote: hi carsten if you are expecting your nodes to be in a given order (e.g. the order of creation) you need to have a parent that has orderable children... in which case we don't make any promises about huge child collections... it will not work well. if you don't have the requirement of ordered children, you can have _many_ but need to make sure that your parent node doesn't have orderable children (e.g. oak:Unstructured)... but then you cannot expect that new children are appended at the end of the list... there is no list and there is not guaranteed order. i guess you have a little misunderstanding when it comes to the concept of orderable child nodes - JSR 283 will be your friend. regards angela On 30/07/14 13:27, Carsten Ziegeler cziege...@apache.org wrote: Hi, afaik with Oak the too many child nodes problem of JR2 is solved, therefore I'm wondering what the best way to store a queue in the repository is? In my use cases, there are usually not many items within a single queue, let's say a few hundreds. In some cases the queue might grow to some thousands but not more than maybe 20k. The idea is that new entries (nodes) are added to the end of the queue, and processing would read the first node from the queue, update the properties and once done, remove it. My initial design was to simply store all entries as sub nodes of some queue root entry without any hierarchy. addNode should add them at the end and simply iteration over the child nodes of the root gives the first entry. No need for sortable nodes. Does this make sense? Is there anything else to be considered? Regards Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: How to implement a queue in Oak?
There are the utilities in the org.apache.jackrabbit.commons.flat package, which were built for mapping flat structures to a JCR hierarchy. See the BTreeManager class for a good starting point. Michael On 1.8.14 7:56 , Carsten Ziegeler wrote: I'm wondering if anyone has a good idea how to model a queue with efficient operations in JCR - or is JCR not suited for this use case? Regards Carsten 2014-07-30 15:57 GMT+02:00 Carsten Ziegeler cziege...@apache.org: Using a different storage than JCR would be easy in my case, however I *want* to use JCR Carsten 2014-07-30 14:55 GMT+02:00 Lukas Smith sm...@pooteeweet.org: Hi, I can totally see that it might be useful to be able to go through the Oak/JCR API to have a queue but maybe this is stretching Oak a bit far if you end up with 1k+ queues. However I think it would be great to look more into federation for this. I think ModeShape supports this quite well already, ie. being able to hook in another JCR tree, a file system, a git repository, CMIS .. I am sure that it would also be possible to implement on top of some MQ standard. see also https://docs.jboss.org/author/display/MODE/Federation?_sscc=t regards, Lukas On 30 Jul 2014, at 14:41, Angela Schreiber anch...@adobe.com wrote: hi carsten if you are expecting your nodes to be in a given order (e.g. the order of creation) you need to have a parent that has orderable children... in which case we don't make any promises about huge child collections... it will not work well. if you don't have the requirement of ordered children, you can have _many_ but need to make sure that your parent node doesn't have orderable children (e.g. oak:Unstructured)... but then you cannot expect that new children are appended at the end of the list... there is no list and there is not guaranteed order. i guess you have a little misunderstanding when it comes to the concept of orderable child nodes - JSR 283 will be your friend. regards angela On 30/07/14 13:27, Carsten Ziegeler cziege...@apache.org wrote: Hi, afaik with Oak the too many child nodes problem of JR2 is solved, therefore I'm wondering what the best way to store a queue in the repository is? In my use cases, there are usually not many items within a single queue, let's say a few hundreds. In some cases the queue might grow to some thousands but not more than maybe 20k. The idea is that new entries (nodes) are added to the end of the queue, and processing would read the first node from the queue, update the properties and once done, remove it. My initial design was to simply store all entries as sub nodes of some queue root entry without any hierarchy. addNode should add them at the end and simply iteration over the child nodes of the root gives the first entry. No need for sortable nodes. Does this make sense? Is there anything else to be considered? Regards Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: How to implement a queue in Oak?
Hi, On Wed, Jul 30, 2014 at 7:27 AM, Carsten Ziegeler cziege...@apache.org wrote: Does this make sense? Is there anything else to be considered? You didn't mention whether you expect the queue to be concurrently accessed. If there is just a single producer and a single consumer, then you could just name the entries using a running sequence number, and there's no need for repository-level ordering. For example: void produce(Node queue) { long number = queue.getProperty(writeCount).getLong(); queue.addNode(entry + number); queue.setProperty(writeCount, number +1); queue.getSession().save(); } // consumes all entries currently in the queue, call again later to continue void consume(Node queue) { long original = queue.getProperty(readCount).getLong(); long number = original; while (queue.hasChild(entry + number)) { queue.getNode(entry + number++).remove(); } if (number != original) { queue.setProperty(readCount, number); queue.getSession().save(); } } If there is a single consumer but multiple concurrent producers, you could name the entries using timestamps and do explicit ordering in the consumer: void produce(Node queue) { queue.addNode(entry- + System.currentTimeMillis() + - + UUID.randomUUID()); queue.getSession().save(); } // consumes all entries currently in the queue, call again later to continue void consume(Node queue) { ListString names = Lists.newArrayList(queue.getChildNodeNames()); Collections.sort(names); for (String name : names) { queue.getNode(name).remove(); } queue.getSession().save(); } Note that the above does not guarantee strict ordering between entries produced at roughly the same time (typically within a few dozens of milliseconds). For that you'll need more explicit synchronization between the producers. Similarly, supporting concurrent consumers will likely require some extra synchronization between the consumers as the repository itself does not provide strict guarantees in this area (unless you want to rely on the features of specific backends like the TarMK). -- Jukka Zitting
Re: How to implement a queue in Oak?
Thanks Jukka, in my case I have several producers and a single consumer, so your second suggestion should do the trick. Can you make any comments about the performance of such a solution? I assume Node#getChildNodeNames() is pretty cheap, what about the addNode, getNode, removeNode methods? Carsten 2014-08-01 15:03 GMT+02:00 Jukka Zitting ju...@zitting.name: Hi, On Wed, Jul 30, 2014 at 7:27 AM, Carsten Ziegeler cziege...@apache.org wrote: Does this make sense? Is there anything else to be considered? You didn't mention whether you expect the queue to be concurrently accessed. If there is just a single producer and a single consumer, then you could just name the entries using a running sequence number, and there's no need for repository-level ordering. For example: void produce(Node queue) { long number = queue.getProperty(writeCount).getLong(); queue.addNode(entry + number); queue.setProperty(writeCount, number +1); queue.getSession().save(); } // consumes all entries currently in the queue, call again later to continue void consume(Node queue) { long original = queue.getProperty(readCount).getLong(); long number = original; while (queue.hasChild(entry + number)) { queue.getNode(entry + number++).remove(); } if (number != original) { queue.setProperty(readCount, number); queue.getSession().save(); } } If there is a single consumer but multiple concurrent producers, you could name the entries using timestamps and do explicit ordering in the consumer: void produce(Node queue) { queue.addNode(entry- + System.currentTimeMillis() + - + UUID.randomUUID()); queue.getSession().save(); } // consumes all entries currently in the queue, call again later to continue void consume(Node queue) { ListString names = Lists.newArrayList(queue.getChildNodeNames()); Collections.sort(names); for (String name : names) { queue.getNode(name).remove(); } queue.getSession().save(); } Note that the above does not guarantee strict ordering between entries produced at roughly the same time (typically within a few dozens of milliseconds). For that you'll need more explicit synchronization between the producers. Similarly, supporting concurrent consumers will likely require some extra synchronization between the consumers as the repository itself does not provide strict guarantees in this area (unless you want to rely on the features of specific backends like the TarMK). -- Jukka Zitting -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: How to implement a queue in Oak?
I'm wondering if anyone has a good idea how to model a queue with efficient operations in JCR - or is JCR not suited for this use case? Regards Carsten 2014-07-30 15:57 GMT+02:00 Carsten Ziegeler cziege...@apache.org: Using a different storage than JCR would be easy in my case, however I *want* to use JCR Carsten 2014-07-30 14:55 GMT+02:00 Lukas Smith sm...@pooteeweet.org: Hi, I can totally see that it might be useful to be able to go through the Oak/JCR API to have a queue but maybe this is stretching Oak a bit far if you end up with 1k+ queues. However I think it would be great to look more into federation for this. I think ModeShape supports this quite well already, ie. being able to hook in another JCR tree, a file system, a git repository, CMIS .. I am sure that it would also be possible to implement on top of some MQ standard. see also https://docs.jboss.org/author/display/MODE/Federation?_sscc=t regards, Lukas On 30 Jul 2014, at 14:41, Angela Schreiber anch...@adobe.com wrote: hi carsten if you are expecting your nodes to be in a given order (e.g. the order of creation) you need to have a parent that has orderable children... in which case we don't make any promises about huge child collections... it will not work well. if you don't have the requirement of ordered children, you can have _many_ but need to make sure that your parent node doesn't have orderable children (e.g. oak:Unstructured)... but then you cannot expect that new children are appended at the end of the list... there is no list and there is not guaranteed order. i guess you have a little misunderstanding when it comes to the concept of orderable child nodes - JSR 283 will be your friend. regards angela On 30/07/14 13:27, Carsten Ziegeler cziege...@apache.org wrote: Hi, afaik with Oak the too many child nodes problem of JR2 is solved, therefore I'm wondering what the best way to store a queue in the repository is? In my use cases, there are usually not many items within a single queue, let's say a few hundreds. In some cases the queue might grow to some thousands but not more than maybe 20k. The idea is that new entries (nodes) are added to the end of the queue, and processing would read the first node from the queue, update the properties and once done, remove it. My initial design was to simply store all entries as sub nodes of some queue root entry without any hierarchy. addNode should add them at the end and simply iteration over the child nodes of the root gives the first entry. No need for sortable nodes. Does this make sense? Is there anything else to be considered? Regards Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: How to implement a queue in Oak?
Hi, I can totally see that it might be useful to be able to go through the Oak/JCR API to have a queue but maybe this is stretching Oak a bit far if you end up with 1k+ queues. However I think it would be great to look more into federation for this. I think ModeShape supports this quite well already, ie. being able to hook in another JCR tree, a file system, a git repository, CMIS .. I am sure that it would also be possible to implement on top of some MQ standard. see also https://docs.jboss.org/author/display/MODE/Federation?_sscc=t regards, Lukas On 30 Jul 2014, at 14:41, Angela Schreiber anch...@adobe.com wrote: hi carsten if you are expecting your nodes to be in a given order (e.g. the order of creation) you need to have a parent that has orderable children... in which case we don't make any promises about huge child collections... it will not work well. if you don't have the requirement of ordered children, you can have _many_ but need to make sure that your parent node doesn't have orderable children (e.g. oak:Unstructured)... but then you cannot expect that new children are appended at the end of the list... there is no list and there is not guaranteed order. i guess you have a little misunderstanding when it comes to the concept of orderable child nodes - JSR 283 will be your friend. regards angela On 30/07/14 13:27, Carsten Ziegeler cziege...@apache.org wrote: Hi, afaik with Oak the too many child nodes problem of JR2 is solved, therefore I'm wondering what the best way to store a queue in the repository is? In my use cases, there are usually not many items within a single queue, let's say a few hundreds. In some cases the queue might grow to some thousands but not more than maybe 20k. The idea is that new entries (nodes) are added to the end of the queue, and processing would read the first node from the queue, update the properties and once done, remove it. My initial design was to simply store all entries as sub nodes of some queue root entry without any hierarchy. addNode should add them at the end and simply iteration over the child nodes of the root gives the first entry. No need for sortable nodes. Does this make sense? Is there anything else to be considered? Regards Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org
Re: How to implement a queue in Oak?
Using a different storage than JCR would be easy in my case, however I *want* to use JCR Carsten 2014-07-30 14:55 GMT+02:00 Lukas Smith sm...@pooteeweet.org: Hi, I can totally see that it might be useful to be able to go through the Oak/JCR API to have a queue but maybe this is stretching Oak a bit far if you end up with 1k+ queues. However I think it would be great to look more into federation for this. I think ModeShape supports this quite well already, ie. being able to hook in another JCR tree, a file system, a git repository, CMIS .. I am sure that it would also be possible to implement on top of some MQ standard. see also https://docs.jboss.org/author/display/MODE/Federation?_sscc=t regards, Lukas On 30 Jul 2014, at 14:41, Angela Schreiber anch...@adobe.com wrote: hi carsten if you are expecting your nodes to be in a given order (e.g. the order of creation) you need to have a parent that has orderable children... in which case we don't make any promises about huge child collections... it will not work well. if you don't have the requirement of ordered children, you can have _many_ but need to make sure that your parent node doesn't have orderable children (e.g. oak:Unstructured)... but then you cannot expect that new children are appended at the end of the list... there is no list and there is not guaranteed order. i guess you have a little misunderstanding when it comes to the concept of orderable child nodes - JSR 283 will be your friend. regards angela On 30/07/14 13:27, Carsten Ziegeler cziege...@apache.org wrote: Hi, afaik with Oak the too many child nodes problem of JR2 is solved, therefore I'm wondering what the best way to store a queue in the repository is? In my use cases, there are usually not many items within a single queue, let's say a few hundreds. In some cases the queue might grow to some thousands but not more than maybe 20k. The idea is that new entries (nodes) are added to the end of the queue, and processing would read the first node from the queue, update the properties and once done, remove it. My initial design was to simply store all entries as sub nodes of some queue root entry without any hierarchy. addNode should add them at the end and simply iteration over the child nodes of the root gives the first entry. No need for sortable nodes. Does this make sense? Is there anything else to be considered? Regards Carsten -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org -- Carsten Ziegeler Adobe Research Switzerland cziege...@apache.org