Hi,
I tried it myself on my local laptop, and here are the results:
Original query:
FOR a IN Asset COLLECT attr = a.attribute1 INTO g RETURN { value: attr,
count: length(g) }
This executes in about 35 seconds with the 8M documents. The execution plan
is not ideal, because it will sort the entire collection first:
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 8000000 - FOR a IN p /* full
collection scan */
3 CalculationNode 8000000 - LET #3 = a.`attribute1`
/* attribute expression */ /* collections used: a : p */
7 SortNode 8000000 - SORT #3 ASC
4 CollectNode 6400000 - COLLECT attr = #3 INTO g
/* sorted */
5 CalculationNode 6400000 - LET #5 = { "value" : attr,
"count" : LENGTH(g) } /* simple expression */
6 ReturnNode 6400000 - RETURN #5
Adjusted query:
FOR a IN Asset COLLECT value = a.attribute1 WITH COUNT INTO length
RETURN { value, length }
Changing the query as I suggested makes it finish in 6.x seconds. The
execution plan is better already:
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 8000000 - FOR a IN p /* full
collection scan */
3 CalculationNode 8000000 - LET #3 = a.`attribute1`
/* attribute expression */ /* collections used: a : p */
4 CollectNode 6400000 - COLLECT value = #3 WITH
COUNT INTO length /* hash */
7 SortNode 6400000 - SORT value ASC
5 CalculationNode 6400000 - LET #5 = { "value" : value,
"length" : length } /* simple expression */
6 ReturnNode 6400000 - RETURN #5
With a sorted (skiplist) index on "attribute1", the execution time goes
down to around 5.3 seconds, and the plan changes to:
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
10 IndexNode 8000000 - FOR a IN p /* skiplist index scan */
3 CalculationNode 8000000 - LET #3 = a.`attribute1` /*
attribute expression */ /* collections used: a : p */
4 CollectNode 6400000 - COLLECT value = #3 WITH COUNT INTO
length /* sorted */
7 CalculationNode 6400000 - LET #7 = { "value" : value,
"length" : length } /* simple expression */
8 ReturnNode 6400000 - RETURN #7
Still not ideal, but already better than the initial 30+ seconds.
That's all that right now comes to my mind that can easily be done for this
particular query.
Maybe there is something more that can be optimized here, but this may
require changes to the code.
Best regards
Jan
Am Donnerstag, 14. September 2017 20:38:31 UTC+2 schrieb Roman Kuzmik:
>
> Thanks Jan for your reply!
>
> But, yes, we have tried "2.x old school" approach* WITH COUNT*, as well
> as brand new* DISTINCT*.
> Both yields similar sluggish results :-/
>
--
You received this message because you are subscribed to the Google Groups
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.