All _path value are unique, we have about 20 different values for _type.
I am not sure if I can break down the dataset into something smaller.
The data is in general sensitive and not easy to share or anonymize.


Am Freitag, 14. Juni 2019 13:03:59 UTC+2 schrieb Wilfried Gösgens:
>
>
> Hi, 
> Can you share a set of sample documents? How well is the distribution on 
> `_type` ? Which samples are there? 
> On Friday, June 14, 2019 at 11:22:51 AM UTC+2, Andreas Jung wrote:
>>
>> Recreating the indexes after import does not make a difference.
>>
>> Returning doc._path  for 20.000 items takes 50 ms, returning doc._path 
>> takes minutes
>>
>> The _path index is deduplicated, the _type index is not 
>>
>> The only difference in the execution plans is "index only" when "RETURN 
>> doc._type". Since both _type and _path
>> are fully indexed I would assume that the query is executed in both times 
>> based on index data.
>>
>> So ArangoDB will load all 100.000 objects for picking up the value of 
>> _path? The overall data is meanwhile 55 GB
>> (about one third of the data is binary data (files and images base64 
>> encoded).  
>>
>> This is all no big problem for me since we perform such queries once 
>> before a migration run and it does matter taking
>> a migration running for some hours a minutes more or less but I want to 
>> understand what is going on here (in particular
>> this is unexpected behavior).
>>
>>
>> Query String:
>>  for doc in import 
>>  filter doc._type == 'Image'
>>  return doc._type
>>
>> Execution plan:
>>  Id   NodeType          Est.   Comment
>>   1   SingletonNode        1   * ROOT
>>   7   IndexNode         2214     - FOR doc IN import   /* hash index 
>> scan, index only, projections: `_type` */
>>   5   CalculationNode   2214       - LET #3 = doc.`_type`   /* attribute 
>> expression */   /* collections used: doc : import */
>>   6   ReturnNode        2214       - RETURN #3
>>
>> Indexes used:
>>  By   Type   Collection   Unique   Sparse   Selectivity   Fields        
>> Ranges
>>   7   hash   import       false    false         0.05 %   [ `_type` ]  
>>  (doc.`_type` == "Image")
>>
>> Optimization rules applied:
>>  Id   RuleName
>>   1   move-calculations-up
>>   2   move-filters-up
>>   3   move-calculations-up-2
>>   4   move-filters-up-2
>>   5   use-indexes
>>   6   remove-filter-covered-by-index
>>   7   remove-unnecessary-calculations-2
>>   8   reduce-extraction-to-projection
>>
>>
>>
>> Query String:
>>  for doc in import 
>>  filter doc._type == 'Image'
>>  return doc._path
>>
>> Execution plan:
>>  Id   NodeType          Est.   Comment
>>   1   SingletonNode        1   * ROOT
>>   7   IndexNode         2214     - FOR doc IN import   /* hash index 
>> scan, projections: `_path` */
>>   5   CalculationNode   2214       - LET #3 = doc.`_path`   /* attribute 
>> expression */   /* collections used: doc : import */
>>   6   ReturnNode        2214       - RETURN #3
>>
>> Indexes used:
>>  By   Type   Collection   Unique   Sparse   Selectivity   Fields        
>> Ranges
>>   7   hash   import       false    false         0.05 %   [ `_type` ]  
>>  (doc.`_type` == "Image")
>>
>> Optimization rules applied:
>>  Id   RuleName
>>   1   move-calculations-up
>>   2   move-filters-up
>>   3   move-calculations-up-2
>>   4   move-filters-up-2
>>   5   use-indexes
>>   6   remove-filter-covered-by-index
>>   7   remove-unnecessary-calculations-2
>>   8   reduce-extraction-to-projection
>>
>>
>>
>>
>> On Friday, June 14, 2019 at 9:54:10 AM UTC+2, Andreas Jung wrote:
>>>
>>> Using RocksDB (default installation).
>>>
>>> I create a new collection for every import of the data including the 
>>> indexes.
>>>
>>> Unfortunately I don't have the key names in my hands. They are coming
>>> from a JSON dump of a CMS.
>>>
>>> Am Freitag, 14. Juni 2019 09:50:41 UTC+2 schrieb Wilfried Gösgens:
>>>>
>>>> Hi, 
>>>> afair you're using rocksdb?
>>>>
>>>> can you try to re-create that index to be on `_type`, `_path`, `_key` 
>>>> for better using of projections?
>>>>
>>>> Please note that you shouldn't use fieldnames starting with `_` since 
>>>> they're defined as system specific fields in arangodb.
>>>>
>>>> Cheers, 
>>>> Willi
>>>>
>>>> On Friday, June 14, 2019 at 9:41:24 AM UTC+2, Andreas Jung wrote:
>>>>>
>>>>> _key is a UUID4
>>>>> _path is standard filesystem path not longer than 100 chars each
>>>>>
>>>>> That can not be the problem.
>>>>>
>>>>> Am Freitag, 14. Juni 2019 09:36:17 UTC+2 schrieb James Courtier-Dutton:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> What is the average size of the returned data? It could just be the 
>>>>>> time it takes to serialise the data being returned
>>>>>>
>>>>>> James
>>>>>>
>>>>>> On Fri, 14 Jun 2019, 05:45 'Andreas Jung' via ArangoDB, <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi there,
>>>>>>>
>>>>>>> this query 
>>>>>>>
>>>>>>>  for doc in import 
>>>>>>>    filter doc._type == 'Image'
>>>>>>>    return {path: doc._path, key: doc._key}
>>>>>>>
>>>>>>> takes about 45 seconds on decent hardware with an import collection 
>>>>>>> of about 100.000 items with about 21.000 of _type = 'Image'.
>>>>>>> There is an index of _type. Using PyArango as client...I really 
>>>>>>> wander why this query is running so slow?!
>>>>>>>
>>>>>>> Running ArangoDB 3.4.3
>>>>>>>
>>>>>>> Profile
>>>>>>>
>>>>>>> Query String:
>>>>>>>  for doc in import 
>>>>>>>  filter doc._type == 'Image'
>>>>>>>  return {path: doc._path, key: doc._key}
>>>>>>>
>>>>>>> Execution plan:
>>>>>>>  Id   NodeType          Calls   Items   Runtime [s]   Comment
>>>>>>>   1   SingletonNode         1       1       0.00000   * ROOT
>>>>>>>   7   IndexNode            21   20617      32.73956     - FOR doc IN 
>>>>>>> import   /* hash index scan, projections: `_key`, `_path` */
>>>>>>>   5   CalculationNode      21   20617       0.04354       - LET #3 = 
>>>>>>> { "path" : doc.`_path`, "key" : doc.`_key` }   /* simple expression */  
>>>>>>>  /* 
>>>>>>> collections used: doc : import */
>>>>>>>   6   ReturnNode           21   20617       0.00016       - RETURN #3
>>>>>>>
>>>>>>> Indexes used:
>>>>>>>  By   Type   Collection   Unique   Sparse   Selectivity   Fields    
>>>>>>>     Ranges
>>>>>>>   7   hash   import       false    false         0.05 %   [ `_type` 
>>>>>>> ]   (doc.`_type` == "Image")
>>>>>>>
>>>>>>> Optimization rules applied:
>>>>>>>  Id   RuleName
>>>>>>>   1   move-calculations-up
>>>>>>>   2   move-filters-up
>>>>>>>   3   move-calculations-up-2
>>>>>>>   4   move-filters-up-2
>>>>>>>   5   use-indexes
>>>>>>>   6   remove-filter-covered-by-index
>>>>>>>   7   remove-unnecessary-calculations-2
>>>>>>>   8   reduce-extraction-to-projection
>>>>>>>
>>>>>>> Query Statistics:
>>>>>>>  Writes Exec   Writes Ign   Scan Full   Scan Index   Filtered   Exec 
>>>>>>> Time [s]
>>>>>>>            0            0           0        20617          0        
>>>>>>> 32.78928
>>>>>>>
>>>>>>> Query Profile:
>>>>>>>  Query Stage           Duration [s]
>>>>>>>  initializing               0.00001
>>>>>>>  parsing                    0.00010
>>>>>>>  optimizing ast             0.00001
>>>>>>>  loading collections        0.00002
>>>>>>>  instantiating plan         0.00005
>>>>>>>  optimizing plan            0.00032
>>>>>>>  executing                 32.78841
>>>>>>>  finalizing                 0.00032
>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "ArangoDB" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/arangodb/6c2de54c-3936-4aa5-8b6a-2dae3e5afcf7%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/arangodb/6c2de54c-3936-4aa5-8b6a-2dae3e5afcf7%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/arangodb/de155599-d9d8-4b9a-b436-6c1e25a435f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to