Hi,

one of the things to do for improving the query performance is to get rid 
of the "INTO" clause, as "INTO" will copy all documents found per group 
into a new variable "g":

 FOR a in Asset 
  COLLECT attr = a.attribute1 INTO g
 RETURN { value: attr, count: length(g) }

The query without "INTO" would look like this:

 FOR a in Asset 
  COLLECT value = a.attribute1 WITH COUNT INTO length
 RETURN { value, length }

It should be faster than the one with "INTO", but I am not sure how much. 
This probably depends on the actual data.

Can you give it a try?
Thanks
Jan




Am Donnerstag, 14. September 2017 15:53:20 UTC+2 schrieb Roman Kuzmik:
>
> We are evaluating ArangoDB performance in space of facets calculations.
> There are number of other products capable of doing the same, either via 
> special API  or query language:
>
>    - MarkLogic Facets <https://docs.marklogic.com/jsearch.facet>
>    - ElasticSearch Aggregations 
>    
> <https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html>
>    - Solr Faceting 
>    <https://cwiki.apache.org/confluence/display/solr/Faceting>
>    - etc
>
> We understand, there is no special API in Arango to calculate factes 
> explicitly.
> But in reality, it is not needed, thanks for a comprehensive AQL it can be 
> easily achieved via simple query, like:
>
>  FOR a in Asset 
>   COLLECT attr = a.attribute1 INTO g
>  RETURN { value: attr, count: length(g) }
>
> This query calculate a facet on *attribute1* and yields frequency in the 
> form of:
>
> [
>   {
>     "value": "test-attr1-1",
>     "count": 2000000
>   },
>   {
>     "value": "test-attr1-2",
>     "count": 2000000
>   },
>   {
>     "value": "test-attr1-3",
>     "count": 3000000
>   }
> ]
>
>
> It is saying, that across my entire collection *attribute1* took three 
> forms (test-attr1-1, test-attr1-2 and test-attr1-3) with related counts 
> provided.
> Pretty much we run a DISTINCT query and aggregated counts.
>
> Looks simple and clean. With only one, but really big issue - 
> *performance*.
>
> Provided query above runs for !*31 seconds*! on top of the test 
> collection with only *8M* documents.
> We have experimented with different index types, storage engines (with 
> rocksdb and without), investigating explanation plans at no avail.
> Test documents we use in this test are very concise with only three short 
> attributes.
>
> We would appreciate any input at this point.
> Either we doing something wrong. Or ArangoDB simply is not designed to 
> perform in this particular area.
>
> btw, ultimate goal would be to run something like the following in 
> under-second time:
>
> LET docs = (FOR a IN Asset 
>
>   FILTER a.name like 'test-asset-%'
>
>   SORT a.name
>
>  RETURN a)
>
> LET attribute1 = (
>
>  FOR a in docs 
>
>   COLLECT attr = a.attribute1 INTO g
>
>  RETURN { value: attr, count: length(g[*])}
>
> )
>
> LET attribute2 = (
>
>  FOR a in docs 
>
>   COLLECT attr = a.attribute2 INTO g
>
>  RETURN { value: attr, count: length(g[*])}
>
> )
>
> LET attribute3 = (
>
>  FOR a in docs 
>
>   COLLECT attr = a.attribute3 INTO g
>
>  RETURN { value: attr, count: length(g[*])}
>
> )
>
> LET attribute4 = (
>
>  FOR a in docs 
>
>   COLLECT attr = a.attribute4 INTO g
>
>  RETURN { value: attr, count: length(g[*])}
>
> )
>
> RETURN {
>
>   counts: (RETURN {
>
>     total: LENGTH(docs), 
>
>     offset: 2, 
>
>     to: 4, 
>
>     facets: {
>
>       attribute1: {
>
>         from: 0, 
>
>         to: 5,
>
>         total: LENGTH(attribute1)
>
>       },
>
>       attribute2: {
>
>         from: 5, 
>
>         to: 10,
>
>         total: LENGTH(attribute2)
>
>       },
>
>       attribute3: {
>
>         from: 0, 
>
>         to: 1000,
>
>         total: LENGTH(attribute3)
>
>       },
>
>       attribute4: {
>
>         from: 0, 
>
>         to: 1000,
>
>         total: LENGTH(attribute4)
>
>       }
>
>     }
>
>   }),
>
>   items: (FOR a IN docs LIMIT 2, 4 RETURN {id: a._id, name: a.name}),
>
>   facets: {
>
>     attribute1: (FOR a in attribute1 SORT a.count LIMIT 0, 5 return a),
>
>     attribute2: (FOR a in attribute2 SORT a.value LIMIT 5, 10 return a),
>
>     attribute3: (FOR a in attribute3 LIMIT 0, 1000 return a),
>
>     attribute4: (FOR a in attribute4 SORT a.count, a.value LIMIT 0, 1000 
> return a)
>
>    }
>
> }
>
> Thanks!
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to