[arangodb-google] Re: Calculating behaviour flow

Jan Fri, 19 Aug 2016 01:48:07 -0700

Hi,

one possible solution would be to add more precision to the timing events, 
e.g. by adding the milliseonds for each timestamp. This would make it more 
unlikely that two events will have the exact same timestamp though it could 
still happen in rare cases.
It sounds like you've already considered this option and don't want to go 
that way.


Another option is to use the `_key` attribute of each document and compare 
them. By default, the `_key` values generated by ArangoDB are increasing 
numbers packaged into strings. If there is only one process that inserts 
documents sequentially into the collection, then documents inserted later 
will get "higher" values in `_key` (technically they are not higher because 
`_key` is a string, but the "higher" assumption holds when `_key` is 
converted into a number).
For example,

FOR doc IN collection
  FILTER doc.time == '10:33:45' 
  SORT TO_NUMBER(doc._key) ASC
  RETURN doc

would give you the documents for time `10:33:45` in insertion order.

A third alternative is have the insertion process generate an increasing 
value per document it inserts. This will work fine if only one process 
inserts documents into the collection. Then the process could simple 
compare the data by time, and use an increasing number for each value with 
the same time value, e.g.

{"id":"917", "date":"2016-08-01", "time":"10:33:37", 
"location":"home","seq":0},
{"id":"917", "date":"2016-08-01", "time":"10:33:39", 
"location":"category/1","seq":0},
{"id":"917", "date":"2016-08-01", "time":"10:33:45", 
"location":"category/4","seq":0},
{"id":"917", "date":"2016-08-01", "time":"10:33:45", 
"location":"item/6","seq":1},
{"id":"917", "date":"2016-08-01", "time":"10:33:50", 
"location":"home","seq":0}

You could then query events by sorting by time first and then seq.
I hope this helps.

Best regards
Jan

Am Freitag, 19. August 2016 00:30:37 UTC+2 schrieb Daniel:
>
> How can I get the behaviour flow of user using collection of logs?
>
>
> Background:
>
> I'm using ArangoDB to hold an app's log data that is similar to web 
> traffic (I hope that this is a good use case)
>
> I've got two things I'm doing from the nodejs app that processes it
> 1) insert parsed data
> 2) pre-aggregate data to get event counts quickly
>
> In some cases I may need to get more advanced information such as the most 
> common behaviour flow.
>
> assuming I've got data such as this:
> [
> {"id":"917", "date":"2016-08-01", "time":"10:33:37", "location":"home"},
> {"id":"917", "date":"2016-08-01", "time":"10:33:39", 
> "location":"category/1"},
> {"id":"917", "date":"2016-08-01", "time":"10:33:45", 
> "location":"category/4"},
> {"id":"917", "date":"2016-08-01", "time":"10:33:45", "location":"item/6"},
> {"id":"917", "date":"2016-08-01", "time":"10:33:50", "location":"home"},
> etc...
> ]
>
> the problem I've found already is that even though I'm inserting them 
> sequentially, once I add two lines with the same timestamp (no millisecond 
> info) I can't tell which one came first.
>
> Is this something I could use the graph component for? 
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[arangodb-google] Re: Calculating behaviour flow

Reply via email to