[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected
[ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256706#comment-16256706 ] Vikas Saurabh commented on OAK-4887: [~tmueller], bq. This is because the Lucene index returns a lower cost if "order by" of that property is supported by the index (I think each indexed property reduces the cost of the index). Maybe, we can remove the sort hack from lucene index planner now - but, I think its still better that a index-based-sort gets preferred over in-mem sort done by query engine because even getting the first row for in-mem-sort case requires loading the whole result set. (I mean maybe, index-based-sort should still get preferred but the logic moves into query engine and the hack is removed from lucene index planner). > Query cost estimation: ordering by an unindexed property not reflected > -- > > Key: OAK-4887 > URL: https://issues.apache.org/jira/browse/OAK-4887 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Affects Versions: 1.4.2 >Reporter: Alexander Klimetschek >Assignee: Thomas Mueller > Fix For: 1.8, 1.7.12 > > > A query that orders by an unindexed property seems to have no effect on the > cost estimation, compared to the same query without the order by, although it > has a big impact on the execution performance for larger results/indexes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected
[ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255479#comment-16255479 ] Alexander Klimetschek commented on OAK-4887: Thanks! So I assume something else changed since 1.4.x. > Query cost estimation: ordering by an unindexed property not reflected > -- > > Key: OAK-4887 > URL: https://issues.apache.org/jira/browse/OAK-4887 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Affects Versions: 1.4.2 >Reporter: Alexander Klimetschek >Assignee: Thomas Mueller > Fix For: 1.8, 1.7.12 > > > A query that orders by an unindexed property seems to have no effect on the > cost estimation, compared to the same query without the order by, although it > has a big impact on the execution performance for larger results/indexes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected
[ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255158#comment-16255158 ] Thomas Mueller commented on OAK-4887: - http://svn.apache.org/r1815440 (trunk) > Query cost estimation: ordering by an unindexed property not reflected > -- > > Key: OAK-4887 > URL: https://issues.apache.org/jira/browse/OAK-4887 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Affects Versions: 1.4.2 >Reporter: Alexander Klimetschek >Assignee: Thomas Mueller > Fix For: 1.8, 1.7.12 > > > A query that orders by an unindexed property seems to have no effect on the > cost estimation, compared to the same query without the order by, although it > has a big impact on the execution performance for larger results/indexes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected
[ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255157#comment-16255157 ] Thomas Mueller commented on OAK-4887: - Even without the patch, with current trunk, the right index is used (if there are two indexes as described above). This is because the Lucene index returns a lower cost if "order by" of that property is supported by the index (I think each indexed property reduces the cost of the index). However, with the patch, the limit is used correctly to adjust the cost. > Query cost estimation: ordering by an unindexed property not reflected > -- > > Key: OAK-4887 > URL: https://issues.apache.org/jira/browse/OAK-4887 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Affects Versions: 1.4.2 >Reporter: Alexander Klimetschek >Assignee: Thomas Mueller > Fix For: 1.8, 1.7.12 > > > A query that orders by an unindexed property seems to have no effect on the > cost estimation, compared to the same query without the order by, although it > has a big impact on the execution performance for larger results/indexes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected
[ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253245#comment-16253245 ] Thomas Mueller commented on OAK-4887: - Actually I found Lucene returns all plans. It is true that right now, the order doesn't affect the cost. The following patch (not test yet) should resolve this issue: {noformat} --- src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java (revision 1814727) +++ src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java (working copy) @@ -1004,8 +1004,12 @@ if (p.getSupportsPathRestriction()) { entryCount = scaleEntryCount(rootState, filter, entryCount); } - -entryCount = Math.min(maxEntryCount, entryCount); +if (sortOrder == null || p.getSortOrder() != null) { +// if the query is unordered, or +// if the query contains "order by" and the index can sort on that, +// then we don't need to read all entries from the index +entryCount = Math.min(maxEntryCount, entryCount); +} double c = p.getCostPerExecution() + entryCount * p.getCostPerEntry(); if (LOG.isDebugEnabled()) { {noformat} > Query cost estimation: ordering by an unindexed property not reflected > -- > > Key: OAK-4887 > URL: https://issues.apache.org/jira/browse/OAK-4887 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Affects Versions: 1.4.2 >Reporter: Alexander Klimetschek >Assignee: Thomas Mueller > Fix For: 1.8 > > > A query that orders by an unindexed property seems to have no effect on the > cost estimation, compared to the same query without the order by, although it > has a big impact on the execution performance for larger results/indexes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected
[ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249241#comment-16249241 ] Thomas Mueller commented on OAK-4887: - [~alexander.klimetschek] this issue is quite old and hasn't shown up in real world problems I'm aware of. Is it OK to move to later (1.9)? > Query cost estimation: ordering by an unindexed property not reflected > -- > > Key: OAK-4887 > URL: https://issues.apache.org/jira/browse/OAK-4887 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Affects Versions: 1.4.2 >Reporter: Alexander Klimetschek >Assignee: Thomas Mueller > Fix For: 1.9 > > > A query that orders by an unindexed property seems to have no effect on the > cost estimation, compared to the same query without the order by, although it > has a big impact on the execution performance for larger results/indexes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected
[ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615393#comment-15615393 ] Thomas Mueller commented on OAK-4887: - Thanks a lot! I believe the Lucene index implementation only returns one plan right now, so the query engine can't decide which Lucene index to use (damAssetLuceneCreated or damAssetLucene). It would probably be best if the Lucene index returns one plan per index, so that the query engine can decide which one to use best. I don't think internal API changes are needed. After that, there are probably some smaller changes needed in the query engine, to account for the limit (if set). > Query cost estimation: ordering by an unindexed property not reflected > -- > > Key: OAK-4887 > URL: https://issues.apache.org/jira/browse/OAK-4887 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Affects Versions: 1.4.2 >Reporter: Alexander Klimetschek >Assignee: Thomas Mueller > Fix For: 1.6 > > > A query that orders by an unindexed property seems to have no effect on the > cost estimation, compared to the same query without the order by, although it > has a big impact on the execution performance for larger results/indexes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected
[ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595953#comment-15595953 ] Alexander Klimetschek commented on OAK-4887: h4. 2 indexes to test index selection Note that in the above case there is only one index. Now I tried to add another index named {{damAssetLuceneCreated}} that would include the {{jcr:created}} property, to see if it would get preferred over the existing index {{damAssetLucene}}. I basically copied the index, and added jcr:created as property in the indexRules. However, now it only ever considers the new index - the other one is never included in the index cost list anymore: {noformat} //element(*, dam:Asset) order by @jcr:created {noformat} {noformat} cost using filter Filter(query=select [jcr:path], [jcr:score], * from [dam:Asset] as a order by [jcr:created] /* xpath: //element(*, dam:Asset) order by @jcr:created */, path=*) cost for reference is Infinity cost for property is Infinity cost for nodeType is Infinity cost for lucene-property[/oak:index/damAssetLuceneCreated] is 464.0 cost for aggregate lucene is Infinity cost for solr is Infinity cost for traverse is 226100.0 {noformat} h4. 2 indexes with created vs. modified Then I also removed the indexing of the {{jcr:lastModified}} property from the new index ({{damAssetLuceneCreated}}), reindexed both, and then ran this query: {noformat} //element(*, dam:Asset) order by @jcr:lastModified {noformat} This will still pick the newer {{damAssetLuceneCreated}} index, although it does not index the jcr:lastModified property and there is a better index with {{damAssetLucene}} that does index it. {noformat} cost using filter Filter(query=select [jcr:path], [jcr:score], * from [dam:Asset] as a order by [jcr:lastModified] /* xpath: //element(*, dam:Asset) order by @jcr:lastModified */, path=*) cost for reference is Infinity cost for property is Infinity cost for nodeType is Infinity cost for lucene-property[/oak:index/damAssetLuceneCreated] is 464.0 cost for aggregate lucene is Infinity cost for solr is Infinity cost for traverse is 226100.0 {noformat} I would expect it to pick the optimal index {{damAssetLucene}}. > Query cost estimation: ordering by an unindexed property not reflected > -- > > Key: OAK-4887 > URL: https://issues.apache.org/jira/browse/OAK-4887 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Affects Versions: 1.4.2 >Reporter: Alexander Klimetschek >Assignee: Thomas Mueller > Fix For: 1.6 > > > A query that orders by an unindexed property seems to have no effect on the > cost estimation, compared to the same query without the order by, although it > has a big impact on the execution performance for larger results/indexes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected
[ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595899#comment-15595899 ] Alexander Klimetschek commented on OAK-4887: h4. Index Note that {{jcr:created}} is not covered by the index. {noformat} { "jcr:primaryType": "oak:QueryIndexDefinition", "compatVersion": 2, "type": "lucene", "async": "async", "evaluatePathRestrictions": true, "reindex": false, "reindexCount": 1, "aggregates": { "jcr:primaryType": "nt:unstructured", "dam:Asset": { "jcr:primaryType": "nt:unstructured", "include0": { "jcr:primaryType": "nt:unstructured", "path": "jcr:content" }, "include1": { "jcr:primaryType": "nt:unstructured", "path": "jcr:content/metadata" }, "include2": { "jcr:primaryType": "nt:unstructured", "path": "jcr:content/metadata/*" }, "include3": { "jcr:primaryType": "nt:unstructured", "path": "jcr:content/renditions" }, "include4": { "jcr:primaryType": "nt:unstructured", "path": "jcr:content/renditions/original" }, "include5": { "jcr:primaryType": "nt:unstructured", "path": "jcr:content/renditions/original/jcr:content" }, "include6": { "jcr:primaryType": "nt:unstructured", "path": "jcr:content/comments" }, "include7": { "jcr:primaryType": "nt:unstructured", "path": "jcr:content/comments/*" }, "include8": { "jcr:primaryType": "nt:unstructured", "path": "jcr:content/usages" } } }, "indexRules": { "jcr:primaryType": "nt:unstructured", "dam:Asset": { "jcr:primaryType": "nt:unstructured", "properties": { "jcr:primaryType": "nt:unstructured", "cqTags": { "jcr:primaryType": "nt:unstructured", "nodeScopeIndex": true, "useInSuggest": true, "propertyIndex": true, "useInSpellcheck": true, "name": "jcr:content/metadata/cq:tags" }, "dcFormat": { "jcr:primaryType": "nt:unstructured", "propertyIndex": true, "analyzed": true, "name": "jcr:content/metadata/dc:format" }, "damStatus": { "jcr:primaryType": "nt:unstructured", "propertyIndex": true, "name": "jcr:content/metadata/dam:status" }, "videoCodec": { "jcr:primaryType": "nt:unstructured", "propertyIndex": true, "name": "jcr:content/metadata/videoCodec" }, "audioCodec": { "jcr:primaryType": "nt:unstructured", "propertyIndex": true, "name": "jcr:content/metadata/audioCodec" }, "dcTitle": { "jcr:primaryType": "nt:unstructured", "nodeScopeIndex": true, "useInSuggest": true, "propertyIndex": true, "useInSpellcheck": true, "name": "jcr:content/metadata/dc:title", "boost": 2 }, "dcDescription": { "jcr:primaryType": "nt:unstructured", "nodeScopeIndex": true, "useInSuggest": true, "propertyIndex": true, "useInSpellcheck": true, "name": "jcr:content/metadata/dc:description" }, "xmpMMInstanceId": { "jcr:primaryType": "nt:unstructured", "propertyIndex": true, "name": "jcr:content/metadata/xmpMM:InstanceID" }, "xmpMMDocumentId": { "jcr:primaryType": "nt:unstructured", "propertyIndex": true, "name": "jcr:content/metadata/xmpMM:DocumentID" }, "damSha1": { "jcr:primaryType": "nt:unstructured", "propertyIndex": true, "name": "jcr:content/metadata/dam:sha1" }, "hasValidMetadata": { "jcr:primaryType": "nt:unstructured", "propertyIndex": true, "name": "jcr:content/hasValidMetadata", "type": "Boolean" }, "videoBitrate": { "jcr:primaryType": "nt:unstructured", "propertyIndex": true, "name": "jcr:content/metadata/videoBitrate" }, "audioBitRate": { "jcr:primaryType": "nt:unstructured", "propertyIndex": true, "name": "jcr:content/metadata/audioBitrate" }, "usedBy": { "jcr:primaryType": "nt:unstructured", "propertyIndex": true, "name": "jcr:content/usages/usedBy" }, "jcrLastModified": { "jcr:primaryType": "nt:unstructured", "ordered": true, "propertyIndex": true, "name": "jcr:content/jcr:lastModified", "type": "Date" }, "expirationDate": { "jcr:primaryType": "nt:unstructured",
[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected
[ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591350#comment-15591350 ] Thomas Mueller commented on OAK-4887: - [~alexander.klimetschek] could you provide an example (query, index definitions) so that I can understand what is the most pressing issue? Would it help in your case if we add "query option" mechanism as follows, so that the query engine knows what is more important? Example (about 1 rows with {noformat} //element(*, 'ec:Product')[@color = 'red' and @size = 'L'] order by @popularity desc -> expectes the whole result to be read -> will favor an index on color, size //element(*, 'ec:Product')[@color = 'red' and @size = 'L'] order by @popularity desc option(fast 10) -> will favor an index on ec:Product + popularity, so that the first 10 rows can be returned quickly {noformat} > Query cost estimation: ordering by an unindexed property not reflected > -- > > Key: OAK-4887 > URL: https://issues.apache.org/jira/browse/OAK-4887 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Affects Versions: 1.4.2 >Reporter: Alexander Klimetschek >Assignee: Thomas Mueller > Fix For: 1.6 > > > A query that orders by an unindexed property seems to have no effect on the > cost estimation, compared to the same query without the order by, although it > has a big impact on the execution performance for larger results/indexes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected
[ https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15547892#comment-15547892 ] Thomas Mueller commented on OAK-4887: - Yes, the cost of ordering is not accounted for within the query engine (org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan). For small results that easily fit in memory, let's say 1000 rows, the cost of ordering itself is very low, almost negligible. But one problem is that with "order by propertyName", the whole result needs to be read in memory before the first row can be returned. Without "order by", the first row can be returned much faster. If there are two indexes with similar cost, one that returns rows in sorted order, and one that returns rows unsorted, then it would be better to use that index. How much better is hard to say, it depends a lot on the number of rows that we expected to be read, and that number is not known. Maybe we should support a way to specify the number of rows that the query engine should optimize for; that is, the number of rows that are _expected_ to be read. This would be similar to the "fastfirstrow" option / "option (fast )" in MS SQL Server: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/09a6060a-1f72-4438-a3b2-209c240ee4d6/fastfirstrow?forum=transactsql . Currently, we support the "limit" option, but maybe a user needs to read more than rows in some cases, so the limit is not known. By the way, currently, if the chosen index supports ordering, then ordering is not done afterwards. So this part is working fine. > Query cost estimation: ordering by an unindexed property not reflected > -- > > Key: OAK-4887 > URL: https://issues.apache.org/jira/browse/OAK-4887 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: query >Affects Versions: 1.4.2 >Reporter: Alexander Klimetschek >Assignee: Thomas Mueller > Fix For: 1.6 > > > A query that orders by an unindexed property seems to have no effect on the > cost estimation, compared to the same query without the order by, although it > has a big impact on the execution performance for larger results/indexes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)