Re: IndexEditorProvider behaviour question.
Hi, Thanks for looking at this, sounds like you are on the case already. if I see anything else I'll let you know. Best Regards Ian On 15 September 2016 at 05:33, Chetan Mehrotra wrote: > Note that so far LuceneIndexEditor was used only for async indexing > case and hence invoked only on leader node every 5 sec. So performance > aspects here were not that critical. However with recent work on > Hybrid indexes they would be used in critical path and hence such > aspects are important > > On Wed, Sep 14, 2016 at 3:10 PM, Ian Boston wrote: > > A and B mean that the work of creating the tree and working out the > changes > > in a tree will be duplicated roughly n times, where n is the number of > > index definitions. > > Here note that diff would be performed only once at any level and > IndexUpdate would then pass them to various editors. However > construction of trees can be avoided and I have opened OAK-4806 for > that now. Oak issue has details around why Tree was used also. > > Also with multiple index editors performance does decrease. See > OAK-1273. If we switch to Hybrid Index then this aspects improves a > bit as instead of having 50 different property indexes (with 50 editor > instance for each commit) we can have a single editor with 50 property > definition. This can be seen in benchmark in Hybrid Index (OAk-4412) > by changing the numOfIndexes > > If you see any other area of improvement say around unnecessary object > generation then let us know! > > Chetan Mehrotra >
Re: IndexEditorProvider behaviour question.
Note that so far LuceneIndexEditor was used only for async indexing case and hence invoked only on leader node every 5 sec. So performance aspects here were not that critical. However with recent work on Hybrid indexes they would be used in critical path and hence such aspects are important On Wed, Sep 14, 2016 at 3:10 PM, Ian Boston wrote: > A and B mean that the work of creating the tree and working out the changes > in a tree will be duplicated roughly n times, where n is the number of > index definitions. Here note that diff would be performed only once at any level and IndexUpdate would then pass them to various editors. However construction of trees can be avoided and I have opened OAK-4806 for that now. Oak issue has details around why Tree was used also. Also with multiple index editors performance does decrease. See OAK-1273. If we switch to Hybrid Index then this aspects improves a bit as instead of having 50 different property indexes (with 50 editor instance for each commit) we can have a single editor with 50 property definition. This can be seen in benchmark in Hybrid Index (OAk-4412) by changing the numOfIndexes If you see any other area of improvement say around unnecessary object generation then let us know! Chetan Mehrotra
IndexEditorProvider behaviour question.
Hi, The behaviour of calls to the IndexEditorProvider appears to be suboptimal. Has this area been looked at before? I am working from a complete lack of historical knowledge about the area, so probably don't know the full picture. Based on logging the calls into IndexEditorProvider.getIndexEditor(), and reading the LuceneIndexEditorProvider this is what I have observed. A. Every commit results in 1 call to IndexEditorProvider.getIndexEditor() per index definition. (perhaps 100 in a full system). B. Each IndexEditor then gets called building a tree of IndexEditors which work out changes to update the their index. C. IndexEditors sometimes filter subtrees. based on the index definition, but this seems to the the exception rather than the rule. D. Index Editor Providers produce a subtree based on type (ie a property index definition doesn't generate a IndexEditor for lucene indexes and visa versa). A and B mean that the work of creating the tree and working out the changes in a tree will be duplicated roughly n times, where n is the number of index definitions. (D means its not n*p where p is the number of IndexEditorProviders). I haven't looked at how much C reduces the cost in reality. Has anyone looked at building the tree once, and passing the fully built tree to indexers? Even if the computational effort is not great the number of objects being created and passing through GC seems higher than it needs to be. As I said, I have no historical knowledge so if doing this doesn't improve things and why is recorded just say (ideally with a pointer) so I can read and understand more. Best Regards Ian