Hi,

I tried it myself on my local laptop, and here are the results:

Original query:

   FOR a IN Asset COLLECT attr = a.attribute1 INTO g RETURN { value: attr, 
count: length(g) }

This executes in about 35 seconds with the 8M documents. The execution plan 
is not ideal, because it will sort the entire collection first:

Execution plan:
 Id   NodeType                     Est.   Comment
  1   SingletonNode                   1   * ROOT
  2   EnumerateCollectionNode   8000000     - FOR a IN p   /* full 
collection scan */
  3   CalculationNode           8000000       - LET #3 = a.`attribute1`   
/* attribute expression */   /* collections used: a : p */
  7   SortNode                  8000000       - SORT #3 ASC
  4   CollectNode               6400000       - COLLECT attr = #3 INTO g   
/* sorted */
  5   CalculationNode           6400000       - LET #5 = { "value" : attr, 
"count" : LENGTH(g) }   /* simple expression */
  6   ReturnNode                6400000       - RETURN #5


Adjusted query:

    FOR a IN Asset COLLECT value = a.attribute1 WITH COUNT INTO length 
RETURN { value, length }

Changing the query as I suggested makes it finish in 6.x seconds. The 
execution plan is better already:

Execution plan:
 Id   NodeType                     Est.   Comment
  1   SingletonNode                   1   * ROOT
  2   EnumerateCollectionNode   8000000     - FOR a IN p   /* full 
collection scan */
  3   CalculationNode           8000000       - LET #3 = a.`attribute1`   
/* attribute expression */   /* collections used: a : p */
  4   CollectNode               6400000       - COLLECT value = #3 WITH 
COUNT INTO length   /* hash */
  7   SortNode                  6400000       - SORT value ASC
  5   CalculationNode           6400000       - LET #5 = { "value" : value, 
"length" : length }   /* simple expression */
  6   ReturnNode                6400000       - RETURN #5

With a sorted (skiplist) index on "attribute1", the execution time goes 
down to around 5.3 seconds, and the plan changes to:

Execution plan:
 Id   NodeType             Est.   Comment
  1   SingletonNode           1   * ROOT
 10   IndexNode         8000000     - FOR a IN p   /* skiplist index scan */
  3   CalculationNode   8000000       - LET #3 = a.`attribute1`   /* 
attribute expression */   /* collections used: a : p */
  4   CollectNode       6400000       - COLLECT value = #3 WITH COUNT INTO 
length   /* sorted */
  7   CalculationNode   6400000       - LET #7 = { "value" : value, 
"length" : length }   /* simple expression */
  8   ReturnNode        6400000       - RETURN #7

Still not ideal, but already better than the initial 30+ seconds.
That's all that right now comes to my mind that can easily be done for this 
particular query.
Maybe there is something more that can be optimized here, but this may 
require changes to the code.

Best regards
Jan

Am Donnerstag, 14. September 2017 20:38:31 UTC+2 schrieb Roman Kuzmik:
>
> Thanks Jan for your reply!
>
> But, yes, we have tried "2.x old school" approach* WITH COUNT*, as well 
> as brand new* DISTINCT*.
> Both yields similar sluggish results :-/
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to