[arangodb-google] Re: Querying for the document with a max value

Christoph Engel Tue, 25 Oct 2016 08:46:01 -0700

Hi Jan,

thanks for the detailed answer.
The solution with the skiplist index works well.


Best regards
Christoph

Am Dienstag, 25. Oktober 2016 16:21:28 UTC+2 schrieb Jan:
>
> Hi,
>
> let's start by simplifying the RETURN part of the query a bit:
>
>     FOR vertex IN MyCollection 
>       FILTER HAS(vertex, 'val') 
>       COLLECT AGGREGATE maxVal = max(vertex.val) INTO maxVertex 
>       RETURN { maxVal_1: maxVal, maxVal_2: maxVertex }
>
> Because of the COLLECT AGGREGATE, the result of the query is one new 
> document, containing the attributes "maxVal_1" and "maxVal_2".
> "maxVal_1" will contain the overall maximum value of the "val" attribute. 
> "maxVal_2" in my case will contain all the visited documented that were 
> inspected while processing the COLLECT statement (because of the FILTER 
> that is all the documents that have a "val" attribute).
>
> The documents in "maxVal_2" are produced by a COLLECT, and COLLECT will 
> store all variables from its scope in its out variable prefixed with the 
> variable name. In this case, the collect will store documents from the 
> variable "vertex" in a sub-attribute "vertex" in its out variable 
> "maxVertex".
> With my test data, the contents of "maxVertex" looked like this:
>
>     "maxVal2" : [ 
>       { 
>         "vertex" : { 
>           "_key" : "5186868", 
>           "_id" : "MyCollection/5186868", 
>           "_rev" : "5186868", 
>           "val" : "baz" 
>         } 
>       }, 
>       { 
>         "vertex" : { 
>           "_key" : "5186859", 
>           "_id" : "MyCollection/5186859", 
>           "_rev" : "5186859", 
>           "val" : 6 
>         } 
>       }, 
>       { 
>         "vertex" : { 
>           "_key" : "5186865", 
>           "_id" : "MyCollection/5186865", 
>           "_rev" : "5186865", 
>           "val" : 8 
>         }
>       }
>     ]
>
> Note that there will be as many documents in "maxVertex"."vertex" as there 
> are documents that satisfy the filter condition. Please also note that 
> these documents will appear in the "maxVertex" variable in non-predictable 
> order (technically they will appear in the same order as the collection 
> iteration is done).
>
> In your query, you are accessing the 0th element of "maxVertex", which is 
> just a "random" document from the ones iterated over, but not necessarily 
> the one with the maximum value. In my case, it would be this one:
>
>       { 
>         "vertex" : { 
>           "_key" : "5186868", 
>           "_id" : "MyCollection/5186868", 
>           "_rev" : "5186868", 
>           "val" : "baz" 
>         } 
>       }
>
> Accessing the "val" sub-attribute of the 0th element will produce "null", 
> because the 0th element does not have the "val" attribute itself. It is 
> necessary to access "vertex"."val".
> So the query becomes:
>
>     FOR vertex IN MyCollection 
>       FILTER HAS(vertex, 'val') 
>       COLLECT AGGREGATE maxVal = max(vertex.val) INTO maxVertex 
>       RETURN { maxVal_1: maxVal, maxVal_2: maxVertex[0].vertex.val }
>
> Still "maxVal_2" will contain a "val" value from a "random" document and 
> not necessarily the maximum value.
>
> The main differerence is that "maxVal" is an aggregate value: it is 
> constantly updated during the iteration over all the documents, and will 
> always contain just the maximum value of the documents seen. 
> "maxVertex" however is not an aggregate. It will be build during the 
> COLLECT too, but will contain all the documents visited by the COLLECT. It 
> will not contain a single document but all.
>
>
> By the way, an easier way to access the maximum value of an attribute in a 
> collection is to create a skiplist index on the attribute and perform a 
> SORT LIMIT 1 on it, e.g.
>
>     db.MyCollection.ensureIndex({ type: "skiplist", fields: [ "val" ] });
>     q = "FOR doc IN MyCollection SORT doc.val DESC LIMIT 1 RETURN doc";
>     db._query(q).toArray();
>
> This will also be more efficient because it will look at a single document 
> only, whereas the initial query will always have to look at all the 
> documents in the collection.
>
> Best regards
> J
>
> Am Dienstag, 25. Oktober 2016 15:16:32 UTC+2 schrieb Christoph Engel:
>>
>> Hi,
>>
>> I would like to get the document from my collection which contains the 
>> maximum value "val" of all stored documents.
>> I've created the following statement:
>>
>> FOR vertex in MyCollection
>> filter has(vertex, "val")
>> COLLECT AGGREGATE 
>>     maxVal = max(vertex.val) INTO maxVertex
>> RETURN 
>> {
>>     maxVal_1: maxVal,
>>     maxVal_2: maxVertex[0].val
>> }
>>
>> As far as I understand it, *maxVertex *should contain a set of documents 
>> for each value of *maxVal*. Because there is only onevalue of *maxVal*, 
>> there should only be one group containing the document with *maxVal* in 
>> the *maxVertex*-group.
>> In this case *maxVal_1* and *maxVal_2* should have the same value, but 
>> both values are different.
>>
>> What did I understand wrong?
>>
>> best regards
>> Christoph
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[arangodb-google] Re: Querying for the document with a max value

Reply via email to