Hi,
one of the things to do for improving the query performance is to get rid
of the "INTO" clause, as "INTO" will copy all documents found per group
into a new variable "g":
FOR a in Asset
COLLECT attr = a.attribute1 INTO g
RETURN { value: attr, count: length(g) }
The query without "INTO" would look like this:
FOR a in Asset
COLLECT value = a.attribute1 WITH COUNT INTO length
RETURN { value, length }
It should be faster than the one with "INTO", but I am not sure how much.
This probably depends on the actual data.
Can you give it a try?
Thanks
Jan
Am Donnerstag, 14. September 2017 15:53:20 UTC+2 schrieb Roman Kuzmik:
>
> We are evaluating ArangoDB performance in space of facets calculations.
> There are number of other products capable of doing the same, either via
> special API or query language:
>
> - MarkLogic Facets <https://docs.marklogic.com/jsearch.facet>
> - ElasticSearch Aggregations
>
> <https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html>
> - Solr Faceting
> <https://cwiki.apache.org/confluence/display/solr/Faceting>
> - etc
>
> We understand, there is no special API in Arango to calculate factes
> explicitly.
> But in reality, it is not needed, thanks for a comprehensive AQL it can be
> easily achieved via simple query, like:
>
> FOR a in Asset
> COLLECT attr = a.attribute1 INTO g
> RETURN { value: attr, count: length(g) }
>
> This query calculate a facet on *attribute1* and yields frequency in the
> form of:
>
> [
> {
> "value": "test-attr1-1",
> "count": 2000000
> },
> {
> "value": "test-attr1-2",
> "count": 2000000
> },
> {
> "value": "test-attr1-3",
> "count": 3000000
> }
> ]
>
>
> It is saying, that across my entire collection *attribute1* took three
> forms (test-attr1-1, test-attr1-2 and test-attr1-3) with related counts
> provided.
> Pretty much we run a DISTINCT query and aggregated counts.
>
> Looks simple and clean. With only one, but really big issue -
> *performance*.
>
> Provided query above runs for !*31 seconds*! on top of the test
> collection with only *8M* documents.
> We have experimented with different index types, storage engines (with
> rocksdb and without), investigating explanation plans at no avail.
> Test documents we use in this test are very concise with only three short
> attributes.
>
> We would appreciate any input at this point.
> Either we doing something wrong. Or ArangoDB simply is not designed to
> perform in this particular area.
>
> btw, ultimate goal would be to run something like the following in
> under-second time:
>
> LET docs = (FOR a IN Asset
>
> FILTER a.name like 'test-asset-%'
>
> SORT a.name
>
> RETURN a)
>
> LET attribute1 = (
>
> FOR a in docs
>
> COLLECT attr = a.attribute1 INTO g
>
> RETURN { value: attr, count: length(g[*])}
>
> )
>
> LET attribute2 = (
>
> FOR a in docs
>
> COLLECT attr = a.attribute2 INTO g
>
> RETURN { value: attr, count: length(g[*])}
>
> )
>
> LET attribute3 = (
>
> FOR a in docs
>
> COLLECT attr = a.attribute3 INTO g
>
> RETURN { value: attr, count: length(g[*])}
>
> )
>
> LET attribute4 = (
>
> FOR a in docs
>
> COLLECT attr = a.attribute4 INTO g
>
> RETURN { value: attr, count: length(g[*])}
>
> )
>
> RETURN {
>
> counts: (RETURN {
>
> total: LENGTH(docs),
>
> offset: 2,
>
> to: 4,
>
> facets: {
>
> attribute1: {
>
> from: 0,
>
> to: 5,
>
> total: LENGTH(attribute1)
>
> },
>
> attribute2: {
>
> from: 5,
>
> to: 10,
>
> total: LENGTH(attribute2)
>
> },
>
> attribute3: {
>
> from: 0,
>
> to: 1000,
>
> total: LENGTH(attribute3)
>
> },
>
> attribute4: {
>
> from: 0,
>
> to: 1000,
>
> total: LENGTH(attribute4)
>
> }
>
> }
>
> }),
>
> items: (FOR a IN docs LIMIT 2, 4 RETURN {id: a._id, name: a.name}),
>
> facets: {
>
> attribute1: (FOR a in attribute1 SORT a.count LIMIT 0, 5 return a),
>
> attribute2: (FOR a in attribute2 SORT a.value LIMIT 5, 10 return a),
>
> attribute3: (FOR a in attribute3 LIMIT 0, 1000 return a),
>
> attribute4: (FOR a in attribute4 SORT a.count, a.value LIMIT 0, 1000
> return a)
>
> }
>
> }
>
> Thanks!
>
>
>
>
--
You received this message because you are subscribed to the Google Groups
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.