[arangodb-google] Re: Calculating behaviour flow

Jan Mon, 22 Aug 2016 08:29:33 -0700

Hi,

> I could do this in an app, but is there a way to do this using a an AQL 
query?


Do you mean that you want to generate session id and home event in a 
selection query, when reading the data, or will these attributes be present 
in the document data already?

Thanks for clarification.
Best regards
Jan

Am Freitag, 19. August 2016 19:48:29 UTC+2 schrieb Daniel:
>
> So because i do not have the ability to get the millisecond data, the 
> first option is not really possible
>
> I did use sequential keys for entering the data, so that should work well
>
> I'm using this to get them sorted
>
>
> FOR doc IN `stats_log`
>   FILTER
>     doc.`t_id` IN ["266","267","221","220"]
>   SORT doc.`terminal_id`, doc.`date` ASC, doc.`time` ASC, doc.`_key` ASC
>   RETURN doc
>
> I'm thinking that for processing them I would add an incrementing session 
> id, based on home event, that is whenever there is a home found, I would 
> generate a new session and use it until another 'home' is found.
> I would also add a step and increment it while the session is the same
>
> I could do this in an app, but is there a way to do this using a an AQL 
> query?
>
>
>
>
>
>
>
>
> On Friday, August 19, 2016 at 2:47:48 AM UTC-6, Jan wrote:
>>
>> Hi,
>>
>> one possible solution would be to add more precision to the timing 
>> events, e.g. by adding the milliseonds for each timestamp. This would make 
>> it more unlikely that two events will have the exact same timestamp though 
>> it could still happen in rare cases.
>> It sounds like you've already considered this option and don't want to go 
>> that way.
>>
>> Another option is to use the `_key` attribute of each document and 
>> compare them. By default, the `_key` values generated by ArangoDB are 
>> increasing numbers packaged into strings. If there is only one process that 
>> inserts documents sequentially into the collection, then documents inserted 
>> later will get "higher" values in `_key` (technically they are not higher 
>> because `_key` is a string, but the "higher" assumption holds when `_key` 
>> is converted into a number).
>> For example,
>>
>> FOR doc IN collection
>>   FILTER doc.time == '10:33:45' 
>>   SORT TO_NUMBER(doc._key) ASC
>>   RETURN doc
>>
>> would give you the documents for time `10:33:45` in insertion order.
>>
>> A third alternative is have the insertion process generate an increasing 
>> value per document it inserts. This will work fine if only one process 
>> inserts documents into the collection. Then the process could simple 
>> compare the data by time, and use an increasing number for each value with 
>> the same time value, e.g.
>>
>> {"id":"917", "date":"2016-08-01", "time":"10:33:37", 
>> "location":"home","seq":0},
>> {"id":"917", "date":"2016-08-01", "time":"10:33:39", 
>> "location":"category/1","seq":0},
>> {"id":"917", "date":"2016-08-01", "time":"10:33:45", 
>> "location":"category/4","seq":0},
>> {"id":"917", "date":"2016-08-01", "time":"10:33:45", 
>> "location":"item/6","seq":1},
>> {"id":"917", "date":"2016-08-01", "time":"10:33:50", 
>> "location":"home","seq":0}
>>
>> You could then query events by sorting by time first and then seq.
>> I hope this helps.
>>
>> Best regards
>> Jan
>>
>> Am Freitag, 19. August 2016 00:30:37 UTC+2 schrieb Daniel:
>>>
>>> How can I get the behaviour flow of user using collection of logs?
>>>
>>>
>>> Background:
>>>
>>> I'm using ArangoDB to hold an app's log data that is similar to web 
>>> traffic (I hope that this is a good use case)
>>>
>>> I've got two things I'm doing from the nodejs app that processes it
>>> 1) insert parsed data
>>> 2) pre-aggregate data to get event counts quickly
>>>
>>> In some cases I may need to get more advanced information such as the 
>>> most common behaviour flow.
>>>
>>> assuming I've got data such as this:
>>> [
>>> {"id":"917", "date":"2016-08-01", "time":"10:33:37", "location":"home"},
>>> {"id":"917", "date":"2016-08-01", "time":"10:33:39", 
>>> "location":"category/1"},
>>> {"id":"917", "date":"2016-08-01", "time":"10:33:45", 
>>> "location":"category/4"},
>>> {"id":"917", "date":"2016-08-01", "time":"10:33:45", 
>>> "location":"item/6"},
>>> {"id":"917", "date":"2016-08-01", "time":"10:33:50", "location":"home"},
>>> etc...
>>> ]
>>>
>>> the problem I've found already is that even though I'm inserting them 
>>> sequentially, once I add two lines with the same timestamp (no millisecond 
>>> info) I can't tell which one came first.
>>>
>>> Is this something I could use the graph component for? 
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[arangodb-google] Re: Calculating behaviour flow

Reply via email to