[arangodb-google] Re: Querying for the document with a max value

Jan Tue, 25 Oct 2016 07:22:07 -0700

Hi,

let's start by simplifying the RETURN part of the query a bit:


    FOR vertex IN MyCollection 
      FILTER HAS(vertex, 'val') 
      COLLECT AGGREGATE maxVal = max(vertex.val) INTO maxVertex 
      RETURN { maxVal_1: maxVal, maxVal_2: maxVertex }

Because of the COLLECT AGGREGATE, the result of the query is one new 
document, containing the attributes "maxVal_1" and "maxVal_2".
"maxVal_1" will contain the overall maximum value of the "val" attribute. 
"maxVal_2" in my case will contain all the visited documented that were 
inspected while processing the COLLECT statement (because of the FILTER 
that is all the documents that have a "val" attribute).

The documents in "maxVal_2" are produced by a COLLECT, and COLLECT will 
store all variables from its scope in its out variable prefixed with the 
variable name. In this case, the collect will store documents from the 
variable "vertex" in a sub-attribute "vertex" in its out variable 
"maxVertex".
With my test data, the contents of "maxVertex" looked like this:

    "maxVal2" : [ 
      { 
        "vertex" : { 
          "_key" : "5186868", 
          "_id" : "MyCollection/5186868", 
          "_rev" : "5186868", 
          "val" : "baz" 
        } 
      }, 
      { 
        "vertex" : { 
          "_key" : "5186859", 
          "_id" : "MyCollection/5186859", 
          "_rev" : "5186859", 
          "val" : 6 
        } 
      }, 
      { 
        "vertex" : { 
          "_key" : "5186865", 
          "_id" : "MyCollection/5186865", 
          "_rev" : "5186865", 
          "val" : 8 
        }
      }
    ]

Note that there will be as many documents in "maxVertex"."vertex" as there 
are documents that satisfy the filter condition. Please also note that 
these documents will appear in the "maxVertex" variable in non-predictable 
order (technically they will appear in the same order as the collection 
iteration is done).

In your query, you are accessing the 0th element of "maxVertex", which is 
just a "random" document from the ones iterated over, but not necessarily 
the one with the maximum value. In my case, it would be this one:

      { 
        "vertex" : { 
          "_key" : "5186868", 
          "_id" : "MyCollection/5186868", 
          "_rev" : "5186868", 
          "val" : "baz" 
        } 
      }

Accessing the "val" sub-attribute of the 0th element will produce "null", 
because the 0th element does not have the "val" attribute itself. It is 
necessary to access "vertex"."val".
So the query becomes:

    FOR vertex IN MyCollection 
      FILTER HAS(vertex, 'val') 
      COLLECT AGGREGATE maxVal = max(vertex.val) INTO maxVertex 
      RETURN { maxVal_1: maxVal, maxVal_2: maxVertex[0].vertex.val }

Still "maxVal_2" will contain a "val" value from a "random" document and 
not necessarily the maximum value.

The main differerence is that "maxVal" is an aggregate value: it is 
constantly updated during the iteration over all the documents, and will 
always contain just the maximum value of the documents seen. 
"maxVertex" however is not an aggregate. It will be build during the 
COLLECT too, but will contain all the documents visited by the COLLECT. It 
will not contain a single document but all.


By the way, an easier way to access the maximum value of an attribute in a 
collection is to create a skiplist index on the attribute and perform a 
SORT LIMIT 1 on it, e.g.

    db.MyCollection.ensureIndex({ type: "skiplist", fields: [ "val" ] });
    q = "FOR doc IN MyCollection SORT doc.val DESC LIMIT 1 RETURN doc";
    db._query(q).toArray();

This will also be more efficient because it will look at a single document 
only, whereas the initial query will always have to look at all the 
documents in the collection.

Best regards
J

Am Dienstag, 25. Oktober 2016 15:16:32 UTC+2 schrieb Christoph Engel:
>
> Hi,
>
> I would like to get the document from my collection which contains the 
> maximum value "val" of all stored documents.
> I've created the following statement:
>
> FOR vertex in MyCollection
> filter has(vertex, "val")
> COLLECT AGGREGATE 
>     maxVal = max(vertex.val) INTO maxVertex
> RETURN 
> {
>     maxVal_1: maxVal,
>     maxVal_2: maxVertex[0].val
> }
>
> As far as I understand it, *maxVertex *should contain a set of documents 
> for each value of *maxVal*. Because there is only onevalue of *maxVal*, 
> there should only be one group containing the document with *maxVal* in 
> the *maxVertex*-group.
> In this case *maxVal_1* and *maxVal_2* should have the same value, but 
> both values are different.
>
> What did I understand wrong?
>
> best regards
> Christoph
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[arangodb-google] Re: Querying for the document with a max value

Reply via email to