Recreating the indexes after import does not make a difference. Returning doc._path for 20.000 items takes 50 ms, returning doc._path takes minutes
The _path index is deduplicated, the _type index is not The only difference in the execution plans is "index only" when "RETURN doc._type". Since both _type and _path are fully indexed I would assume that the query is executed in both times based on index data. So ArangoDB will load all 100.000 objects for picking up the value of _path? The overall data is meanwhile 55 GB (about one third of the data is binary data (files and images base64 encoded). This is all no big problem for me since we perform such queries once before a migration run and it does matter taking a migration running for some hours a minutes more or less but I want to understand what is going on here (in particular this is unexpected behavior). Query String: for doc in import filter doc._type == 'Image' return doc._type Execution plan: Id NodeType Est. Comment 1 SingletonNode 1 * ROOT 7 IndexNode 2214 - FOR doc IN import /* hash index scan, index only, projections: `_type` */ 5 CalculationNode 2214 - LET #3 = doc.`_type` /* attribute expression */ /* collections used: doc : import */ 6 ReturnNode 2214 - RETURN #3 Indexes used: By Type Collection Unique Sparse Selectivity Fields Ranges 7 hash import false false 0.05 % [ `_type` ] (doc.`_type` == "Image") Optimization rules applied: Id RuleName 1 move-calculations-up 2 move-filters-up 3 move-calculations-up-2 4 move-filters-up-2 5 use-indexes 6 remove-filter-covered-by-index 7 remove-unnecessary-calculations-2 8 reduce-extraction-to-projection Query String: for doc in import filter doc._type == 'Image' return doc._path Execution plan: Id NodeType Est. Comment 1 SingletonNode 1 * ROOT 7 IndexNode 2214 - FOR doc IN import /* hash index scan, projections: `_path` */ 5 CalculationNode 2214 - LET #3 = doc.`_path` /* attribute expression */ /* collections used: doc : import */ 6 ReturnNode 2214 - RETURN #3 Indexes used: By Type Collection Unique Sparse Selectivity Fields Ranges 7 hash import false false 0.05 % [ `_type` ] (doc.`_type` == "Image") Optimization rules applied: Id RuleName 1 move-calculations-up 2 move-filters-up 3 move-calculations-up-2 4 move-filters-up-2 5 use-indexes 6 remove-filter-covered-by-index 7 remove-unnecessary-calculations-2 8 reduce-extraction-to-projection On Friday, June 14, 2019 at 9:54:10 AM UTC+2, Andreas Jung wrote: > > Using RocksDB (default installation). > > I create a new collection for every import of the data including the > indexes. > > Unfortunately I don't have the key names in my hands. They are coming > from a JSON dump of a CMS. > > Am Freitag, 14. Juni 2019 09:50:41 UTC+2 schrieb Wilfried Gösgens: >> >> Hi, >> afair you're using rocksdb? >> >> can you try to re-create that index to be on `_type`, `_path`, `_key` for >> better using of projections? >> >> Please note that you shouldn't use fieldnames starting with `_` since >> they're defined as system specific fields in arangodb. >> >> Cheers, >> Willi >> >> On Friday, June 14, 2019 at 9:41:24 AM UTC+2, Andreas Jung wrote: >>> >>> _key is a UUID4 >>> _path is standard filesystem path not longer than 100 chars each >>> >>> That can not be the problem. >>> >>> Am Freitag, 14. Juni 2019 09:36:17 UTC+2 schrieb James Courtier-Dutton: >>>> >>>> Hi, >>>> >>>> What is the average size of the returned data? It could just be the >>>> time it takes to serialise the data being returned >>>> >>>> James >>>> >>>> On Fri, 14 Jun 2019, 05:45 'Andreas Jung' via ArangoDB, < >>>> [email protected]> wrote: >>>> >>>>> Hi there, >>>>> >>>>> this query >>>>> >>>>> for doc in import >>>>> filter doc._type == 'Image' >>>>> return {path: doc._path, key: doc._key} >>>>> >>>>> takes about 45 seconds on decent hardware with an import collection of >>>>> about 100.000 items with about 21.000 of _type = 'Image'. >>>>> There is an index of _type. Using PyArango as client...I really wander >>>>> why this query is running so slow?! >>>>> >>>>> Running ArangoDB 3.4.3 >>>>> >>>>> Profile >>>>> >>>>> Query String: >>>>> for doc in import >>>>> filter doc._type == 'Image' >>>>> return {path: doc._path, key: doc._key} >>>>> >>>>> Execution plan: >>>>> Id NodeType Calls Items Runtime [s] Comment >>>>> 1 SingletonNode 1 1 0.00000 * ROOT >>>>> 7 IndexNode 21 20617 32.73956 - FOR doc IN >>>>> import /* hash index scan, projections: `_key`, `_path` */ >>>>> 5 CalculationNode 21 20617 0.04354 - LET #3 = { >>>>> "path" : doc.`_path`, "key" : doc.`_key` } /* simple expression */ /* >>>>> collections used: doc : import */ >>>>> 6 ReturnNode 21 20617 0.00016 - RETURN #3 >>>>> >>>>> Indexes used: >>>>> By Type Collection Unique Sparse Selectivity Fields >>>>> Ranges >>>>> 7 hash import false false 0.05 % [ `_type` ] >>>>> (doc.`_type` == "Image") >>>>> >>>>> Optimization rules applied: >>>>> Id RuleName >>>>> 1 move-calculations-up >>>>> 2 move-filters-up >>>>> 3 move-calculations-up-2 >>>>> 4 move-filters-up-2 >>>>> 5 use-indexes >>>>> 6 remove-filter-covered-by-index >>>>> 7 remove-unnecessary-calculations-2 >>>>> 8 reduce-extraction-to-projection >>>>> >>>>> Query Statistics: >>>>> Writes Exec Writes Ign Scan Full Scan Index Filtered Exec >>>>> Time [s] >>>>> 0 0 0 20617 0 >>>>> 32.78928 >>>>> >>>>> Query Profile: >>>>> Query Stage Duration [s] >>>>> initializing 0.00001 >>>>> parsing 0.00010 >>>>> optimizing ast 0.00001 >>>>> loading collections 0.00002 >>>>> instantiating plan 0.00005 >>>>> optimizing plan 0.00032 >>>>> executing 32.78841 >>>>> finalizing 0.00032 >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "ArangoDB" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/arangodb/6c2de54c-3936-4aa5-8b6a-2dae3e5afcf7%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/arangodb/6c2de54c-3936-4aa5-8b6a-2dae3e5afcf7%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "ArangoDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/arangodb/7657a5b0-2506-4796-a752-796396e7ef52%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
