[arangodb-google] Re: Loop through whole collection

Cyprien Gottstein Tue, 11 Dec 2018 01:34:20 -0800

Hey Simran,

In our case we precisely want to scan the whole content of the collection 
but we want to make sure we don't have to spend 90% of our processing time 
on the "SKIP" step of the query. (I'm sorry, I should have said SKIP in the 
previous post)


We have several use cases, I can't tell you much (it's my company project) 
but you could guess that if we want to dump the whole collection, it would 
be legitimate to read all of its document.
There is no relevant documents because all of them are !

I just read the issue you opened on GitHub, i guess, by the time its 
resolved, it could do the trick.

Don't you have anything such as physical pointers ? To be honest, we are 
currently using OrientDB and they have a very cool feature being iterating 
over the RID (Resilient Id Storage). The rids are physical pointers which 
supports comparison operator (<, <=, =, =>, >) and you perform request as 
such : "select * from my_collection where @rid > #10:100".

In the case "my_collection" starts with the rid #10:100, you basically did 
"select * from my_collection SKIP 100". The main benefits of this feature 
being that it runs extremely fast because the time spent to "skip" the 
records is almost non existent.

I'm no OrientDB evangelist, but for the sake of clarity you can go there : 
https://orientdb.com/docs/3.0.x/sql/Pagination.html#use-the-rid-limit if 
you are curious.

Thank you for your answer,
Cyprien

Le lundi 10 décembre 2018 16:54:11 UTC+1, Simran Brucherseifer a écrit :
>
> Hey Cyprien,
>
> it's going well, thanks!
>
> In general, yes, using LIMIT with a high offset can take more time than to 
> return the first few documents if there's a huge dataset to process to 
> answer the query. But it depends on the exact query. Reading all content of 
> the documents can be avoided in several cases, but it may still be 
> necessary to walk through an index data structure up to the documents that 
> need to be processed and returned.
>
> Can you post the exact query (or queries) you need so that we can better 
> understand the goal and check what the options are to optimize it?
> Maybe a secondary index can be utilized to select the relevant documents?
> Or maybe you know the document keys and can do point lookups for them?
>
> Regarding range FILTERs: It is currently not possible to do that using an 
> index on the _key attribute, but this may change:
> https://github.com/arangodb/arangodb/issues/7720
>
> Best,
> Simran
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[arangodb-google] Re: Loop through whole collection

Reply via email to