[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

2017-11-17 Thread Vikas Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256706#comment-16256706
 ] 

Vikas Saurabh commented on OAK-4887:


[~tmueller],
bq. This is because the Lucene index returns a lower cost if "order by" of that 
property is supported by the index (I think each indexed property reduces the 
cost of the index).
Maybe, we can remove the sort hack from lucene index planner now - but, I think 
its still better that a index-based-sort gets preferred over in-mem sort done 
by query engine because even getting the first row for in-mem-sort case 
requires loading the whole result set. (I mean maybe, index-based-sort should 
still get preferred but the logic moves into query engine and the hack is 
removed from lucene index planner).

> Query cost estimation: ordering by an unindexed property not reflected
> --
>
> Key: OAK-4887
> URL: https://issues.apache.org/jira/browse/OAK-4887
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.4.2
>Reporter: Alexander Klimetschek
>Assignee: Thomas Mueller
> Fix For: 1.8, 1.7.12
>
>
> A query that orders by an unindexed property seems to have no effect on the 
> cost estimation, compared to the same query without the order by, although it 
> has a big impact on the execution performance for larger results/indexes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

2017-11-16 Thread Alexander Klimetschek (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255479#comment-16255479
 ] 

Alexander Klimetschek commented on OAK-4887:


Thanks! So I assume something else changed since 1.4.x.

> Query cost estimation: ordering by an unindexed property not reflected
> --
>
> Key: OAK-4887
> URL: https://issues.apache.org/jira/browse/OAK-4887
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.4.2
>Reporter: Alexander Klimetschek
>Assignee: Thomas Mueller
> Fix For: 1.8, 1.7.12
>
>
> A query that orders by an unindexed property seems to have no effect on the 
> cost estimation, compared to the same query without the order by, although it 
> has a big impact on the execution performance for larger results/indexes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

2017-11-16 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255158#comment-16255158
 ] 

Thomas Mueller commented on OAK-4887:
-

http://svn.apache.org/r1815440 (trunk)

> Query cost estimation: ordering by an unindexed property not reflected
> --
>
> Key: OAK-4887
> URL: https://issues.apache.org/jira/browse/OAK-4887
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.4.2
>Reporter: Alexander Klimetschek
>Assignee: Thomas Mueller
> Fix For: 1.8, 1.7.12
>
>
> A query that orders by an unindexed property seems to have no effect on the 
> cost estimation, compared to the same query without the order by, although it 
> has a big impact on the execution performance for larger results/indexes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

2017-11-16 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255157#comment-16255157
 ] 

Thomas Mueller commented on OAK-4887:
-

Even without the patch, with current trunk, the right index is used (if there 
are two indexes as described above). This is because the Lucene index returns a 
lower cost if "order by" of that property is supported by the index (I think 
each indexed property reduces the cost of the index).

However, with the patch, the limit is used correctly to adjust the cost.


> Query cost estimation: ordering by an unindexed property not reflected
> --
>
> Key: OAK-4887
> URL: https://issues.apache.org/jira/browse/OAK-4887
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.4.2
>Reporter: Alexander Klimetschek
>Assignee: Thomas Mueller
> Fix For: 1.8, 1.7.12
>
>
> A query that orders by an unindexed property seems to have no effect on the 
> cost estimation, compared to the same query without the order by, although it 
> has a big impact on the execution performance for larger results/indexes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

2017-11-15 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253245#comment-16253245
 ] 

Thomas Mueller commented on OAK-4887:
-

Actually I found Lucene returns all plans. It is true that right now, the order 
doesn't affect the cost. The following patch (not test yet) should resolve this 
issue:

{noformat}
--- src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java
(revision 1814727)
+++ src/main/java/org/apache/jackrabbit/oak/query/QueryImpl.java
(working copy)
@@ -1004,8 +1004,12 @@
 if (p.getSupportsPathRestriction()) {
 entryCount = scaleEntryCount(rootState, filter, 
entryCount);
 }
-
-entryCount = Math.min(maxEntryCount, entryCount);
+if (sortOrder == null || p.getSortOrder() != null) {
+// if the query is unordered, or
+// if the query contains "order by" and the index can 
sort on that,
+// then we don't need to read all entries from the 
index
+entryCount = Math.min(maxEntryCount, entryCount);
+}
 double c = p.getCostPerExecution() + entryCount * 
p.getCostPerEntry();
 
 if (LOG.isDebugEnabled()) {
{noformat}


> Query cost estimation: ordering by an unindexed property not reflected
> --
>
> Key: OAK-4887
> URL: https://issues.apache.org/jira/browse/OAK-4887
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.4.2
>Reporter: Alexander Klimetschek
>Assignee: Thomas Mueller
> Fix For: 1.8
>
>
> A query that orders by an unindexed property seems to have no effect on the 
> cost estimation, compared to the same query without the order by, although it 
> has a big impact on the execution performance for larger results/indexes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

2017-11-13 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249241#comment-16249241
 ] 

Thomas Mueller commented on OAK-4887:
-

[~alexander.klimetschek] this issue is quite old and hasn't shown up in real 
world problems I'm aware of. Is it OK to move to later (1.9)?

> Query cost estimation: ordering by an unindexed property not reflected
> --
>
> Key: OAK-4887
> URL: https://issues.apache.org/jira/browse/OAK-4887
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.4.2
>Reporter: Alexander Klimetschek
>Assignee: Thomas Mueller
> Fix For: 1.9
>
>
> A query that orders by an unindexed property seems to have no effect on the 
> cost estimation, compared to the same query without the order by, although it 
> has a big impact on the execution performance for larger results/indexes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

2016-10-28 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615393#comment-15615393
 ] 

Thomas Mueller commented on OAK-4887:
-

Thanks a lot!

I believe the Lucene index implementation only returns one plan right now, so 
the query engine can't decide which Lucene index to use (damAssetLuceneCreated 
or damAssetLucene). It would probably be best if the Lucene index returns one 
plan per index, so that the query engine can decide which one to use best. I 
don't think internal API changes are needed. After that, there are probably 
some smaller changes needed in the query engine, to account for the limit (if 
set).

> Query cost estimation: ordering by an unindexed property not reflected
> --
>
> Key: OAK-4887
> URL: https://issues.apache.org/jira/browse/OAK-4887
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.4.2
>Reporter: Alexander Klimetschek
>Assignee: Thomas Mueller
> Fix For: 1.6
>
>
> A query that orders by an unindexed property seems to have no effect on the 
> cost estimation, compared to the same query without the order by, although it 
> has a big impact on the execution performance for larger results/indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

2016-10-21 Thread Alexander Klimetschek (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595953#comment-15595953
 ] 

Alexander Klimetschek commented on OAK-4887:


h4. 2 indexes to test index selection

Note that in the above case there is only one index.

Now I tried to add another index named {{damAssetLuceneCreated}} that would 
include the {{jcr:created}} property, to see if it would get preferred over the 
existing index {{damAssetLucene}}. I basically copied the index, and added 
jcr:created as property in the indexRules.

However, now it only ever considers the new index - the other one is never 
included in the index cost list anymore:
{noformat}
//element(*, dam:Asset) order by @jcr:created
{noformat}

{noformat}
cost using filter Filter(query=select [jcr:path], [jcr:score], * from 
[dam:Asset] as a order by [jcr:created] /* xpath: //element(*, dam:Asset) order 
by @jcr:created */, path=*)
cost for reference is Infinity
cost for property is Infinity
cost for nodeType is Infinity
cost for lucene-property[/oak:index/damAssetLuceneCreated] is 464.0
cost for aggregate lucene is Infinity
cost for solr is Infinity
cost for traverse is 226100.0
{noformat}

h4. 2 indexes with created vs. modified

Then I also removed the indexing of the {{jcr:lastModified}} property from the 
new index ({{damAssetLuceneCreated}}), reindexed both, and then ran this query:
{noformat}
//element(*, dam:Asset) order by @jcr:lastModified
{noformat}

This will still pick the newer {{damAssetLuceneCreated}} index, although it 
does not index the jcr:lastModified property and there is a better index with 
{{damAssetLucene}} that does index it.

{noformat}
cost using filter Filter(query=select [jcr:path], [jcr:score], * from 
[dam:Asset] as a order by [jcr:lastModified] /* xpath: //element(*, dam:Asset) 
order by @jcr:lastModified */, path=*)
cost for reference is Infinity
cost for property is Infinity
cost for nodeType is Infinity
cost for lucene-property[/oak:index/damAssetLuceneCreated] is 464.0
cost for aggregate lucene is Infinity
cost for solr is Infinity
cost for traverse is 226100.0
{noformat}

I would expect it to pick the optimal index {{damAssetLucene}}.


> Query cost estimation: ordering by an unindexed property not reflected
> --
>
> Key: OAK-4887
> URL: https://issues.apache.org/jira/browse/OAK-4887
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.4.2
>Reporter: Alexander Klimetschek
>Assignee: Thomas Mueller
> Fix For: 1.6
>
>
> A query that orders by an unindexed property seems to have no effect on the 
> cost estimation, compared to the same query without the order by, although it 
> has a big impact on the execution performance for larger results/indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

2016-10-21 Thread Alexander Klimetschek (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595899#comment-15595899
 ] 

Alexander Klimetschek commented on OAK-4887:


h4. Index

Note that {{jcr:created}} is not covered by the index.
{noformat}
{
  "jcr:primaryType": "oak:QueryIndexDefinition",
  "compatVersion": 2,
  "type": "lucene",
  "async": "async",
  "evaluatePathRestrictions": true,
  "reindex": false,
  "reindexCount": 1,
  "aggregates": {
"jcr:primaryType": "nt:unstructured",
"dam:Asset": {
  "jcr:primaryType": "nt:unstructured",
  "include0": {
"jcr:primaryType": "nt:unstructured",
"path": "jcr:content"
},
  "include1": {
"jcr:primaryType": "nt:unstructured",
"path": "jcr:content/metadata"
},
  "include2": {
"jcr:primaryType": "nt:unstructured",
"path": "jcr:content/metadata/*"
},
  "include3": {
"jcr:primaryType": "nt:unstructured",
"path": "jcr:content/renditions"
},
  "include4": {
"jcr:primaryType": "nt:unstructured",
"path": "jcr:content/renditions/original"
},
  "include5": {
"jcr:primaryType": "nt:unstructured",
"path": "jcr:content/renditions/original/jcr:content"
},
  "include6": {
"jcr:primaryType": "nt:unstructured",
"path": "jcr:content/comments"
},
  "include7": {
"jcr:primaryType": "nt:unstructured",
"path": "jcr:content/comments/*"
},
  "include8": {
"jcr:primaryType": "nt:unstructured",
"path": "jcr:content/usages"
}
  }
},
  "indexRules": {
"jcr:primaryType": "nt:unstructured",
"dam:Asset": {
  "jcr:primaryType": "nt:unstructured",
  "properties": {
"jcr:primaryType": "nt:unstructured",
"cqTags": {
  "jcr:primaryType": "nt:unstructured",
  "nodeScopeIndex": true,
  "useInSuggest": true,
  "propertyIndex": true,
  "useInSpellcheck": true,
  "name": "jcr:content/metadata/cq:tags"
  },
"dcFormat": {
  "jcr:primaryType": "nt:unstructured",
  "propertyIndex": true,
  "analyzed": true,
  "name": "jcr:content/metadata/dc:format"
  },
"damStatus": {
  "jcr:primaryType": "nt:unstructured",
  "propertyIndex": true,
  "name": "jcr:content/metadata/dam:status"
  },
"videoCodec": {
  "jcr:primaryType": "nt:unstructured",
  "propertyIndex": true,
  "name": "jcr:content/metadata/videoCodec"
  },
"audioCodec": {
  "jcr:primaryType": "nt:unstructured",
  "propertyIndex": true,
  "name": "jcr:content/metadata/audioCodec"
  },
"dcTitle": {
  "jcr:primaryType": "nt:unstructured",
  "nodeScopeIndex": true,
  "useInSuggest": true,
  "propertyIndex": true,
  "useInSpellcheck": true,
  "name": "jcr:content/metadata/dc:title",
  "boost": 2
  },
"dcDescription": {
  "jcr:primaryType": "nt:unstructured",
  "nodeScopeIndex": true,
  "useInSuggest": true,
  "propertyIndex": true,
  "useInSpellcheck": true,
  "name": "jcr:content/metadata/dc:description"
  },
"xmpMMInstanceId": {
  "jcr:primaryType": "nt:unstructured",
  "propertyIndex": true,
  "name": "jcr:content/metadata/xmpMM:InstanceID"
  },
"xmpMMDocumentId": {
  "jcr:primaryType": "nt:unstructured",
  "propertyIndex": true,
  "name": "jcr:content/metadata/xmpMM:DocumentID"
  },
"damSha1": {
  "jcr:primaryType": "nt:unstructured",
  "propertyIndex": true,
  "name": "jcr:content/metadata/dam:sha1"
  },
"hasValidMetadata": {
  "jcr:primaryType": "nt:unstructured",
  "propertyIndex": true,
  "name": "jcr:content/hasValidMetadata",
  "type": "Boolean"
  },
"videoBitrate": {
  "jcr:primaryType": "nt:unstructured",
  "propertyIndex": true,
  "name": "jcr:content/metadata/videoBitrate"
  },
"audioBitRate": {
  "jcr:primaryType": "nt:unstructured",
  "propertyIndex": true,
  "name": "jcr:content/metadata/audioBitrate"
  },
"usedBy": {
  "jcr:primaryType": "nt:unstructured",
  "propertyIndex": true,
  "name": "jcr:content/usages/usedBy"
  },
"jcrLastModified": {
  "jcr:primaryType": "nt:unstructured",
  "ordered": true,
  "propertyIndex": true,
  "name": "jcr:content/jcr:lastModified",
  "type": "Date"
  },
"expirationDate": {
  "jcr:primaryType": "nt:unstructured",
  

[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

2016-10-20 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591350#comment-15591350
 ] 

Thomas Mueller commented on OAK-4887:
-

[~alexander.klimetschek] could you provide an example (query, index 
definitions) so that I can understand what is the most pressing issue? Would it 
help in your case if we add "query option" mechanism as follows, so that the 
query engine knows what is more important?

Example (about 1 rows with 

{noformat}
//element(*, 'ec:Product')[@color = 'red' and @size = 'L'] order by @popularity 
desc
-> expectes the whole result to be read
-> will favor an index on color, size

//element(*, 'ec:Product')[@color = 'red' and @size = 'L'] order by @popularity 
desc option(fast 10)
-> will favor an index on ec:Product + popularity, 
so that the first 10 rows can be returned quickly
{noformat}

> Query cost estimation: ordering by an unindexed property not reflected
> --
>
> Key: OAK-4887
> URL: https://issues.apache.org/jira/browse/OAK-4887
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.4.2
>Reporter: Alexander Klimetschek
>Assignee: Thomas Mueller
> Fix For: 1.6
>
>
> A query that orders by an unindexed property seems to have no effect on the 
> cost estimation, compared to the same query without the order by, although it 
> has a big impact on the execution performance for larger results/indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-4887) Query cost estimation: ordering by an unindexed property not reflected

2016-10-05 Thread Thomas Mueller (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15547892#comment-15547892
 ] 

Thomas Mueller commented on OAK-4887:
-

Yes, the cost of ordering is not accounted for within the query engine 
(org.apache.jackrabbit.oak.query.QueryImpl.getBestSelectorExecutionPlan). For 
small results that easily fit in memory, let's say 1000 rows, the cost of 
ordering itself is very low, almost negligible. But one problem is that with 
"order by propertyName", the whole result needs to be read in memory before the 
first row can be returned. Without "order by", the first row can be returned 
much faster.

If there are two indexes with similar cost, one that returns rows in sorted 
order, and one that returns rows unsorted, then it would be better to use that 
index. How much better is hard to say, it depends a lot on the number of rows 
that we expected to be read, and that number is not known.

Maybe we should support a way to specify the number of rows that the query 
engine should optimize for; that is, the number of rows that are _expected_ to 
be read. This would be similar to the "fastfirstrow" option / "option (fast 
)" in MS SQL Server: 
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/09a6060a-1f72-4438-a3b2-209c240ee4d6/fastfirstrow?forum=transactsql
 . Currently, we support the "limit" option, but maybe a user needs to read 
more than  rows in some cases, so the limit is not known.

By the way, currently, if the chosen index supports ordering, then ordering is 
not done afterwards. So this part is working fine.




> Query cost estimation: ordering by an unindexed property not reflected
> --
>
> Key: OAK-4887
> URL: https://issues.apache.org/jira/browse/OAK-4887
> Project: Jackrabbit Oak
>  Issue Type: Improvement
>  Components: query
>Affects Versions: 1.4.2
>Reporter: Alexander Klimetschek
>Assignee: Thomas Mueller
> Fix For: 1.6
>
>
> A query that orders by an unindexed property seems to have no effect on the 
> cost estimation, compared to the same query without the order by, although it 
> has a big impact on the execution performance for larger results/indexes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)