[orientdb] Re: Time series, best practise for queries between two points in time?

Curtis Stanford Thu, 17 Sep 2015 15:42:07 -0700

For your first solution, I don't see the point in using the time series 
structure at all. You're just using an indexed timestamp field and next 
pointers to traverse the range. The solution I outlined doesn't need any 
index or next links. Finding the first date is very fast and, no matter how 
big the range, I get data coming back immediately. I don't know for sure, 
but I imagine those next links would be a bit of a maintenance hassle.


Your "epoch day" idea seems to be a coarser version of the original time 
series structure. The time series tree (with year, month, day, hour, etc.) 
is basically the ultimate in partitioning and sub-paritiioning. Once you 
figure out the general querying algorithm, the data structures themselves 
are simple and fairly easy to maintain and you can do any kind of 
aggregation or query for any time range at any granularity.

Of course, this day or month partitioning is mandatory for someone using a 
key-value store or something like Cassandra with a partition key but we can 
do better, no?

On Thursday, September 17, 2015 at 12:33:59 PM UTC-6, Timo Pulkkinen wrote:
>
> Thanks Curtis,
>
> we were thinking something similar, but then thought that we could 
> simplify the search algorithm by following additions (using documents 
> because we a currently using Document API):
>
> "find all documents between timestamp_1 and timestamp_2"
>
> 1. Add a link between adjacent documents ( i.e. add a property 'next' to a 
> document => points to the next document available in time series) 
> (2. Keep track of the timestamp list tail somewhere, so we can always 
> quickly determine the latest data point)
> 3. Create an index of the timestamps in the documents
> 4. Find the closest document to timestamp_1 using the index (that is >= 
> timestamp_1)
> 5. Traverse the links until we reach the closest document matching 
> timestamp_2 (that is =< timestamp_2)
>
> Here the obvious problem is the potential size of the index, but in the 
> other hand it is only accessed to find the starting node. 
> And of course in this case the actual time series tree would only be used 
> for possible aggregate calculations.
>
> The other idea was to create an additional hierarchy, where the documents 
> are linked to an "epoch day" vertex, that represents days from Unix epoch 
> starting date (one instance per a day after 01.01.1970). Then we could 
> partition the search by first calculating the "epoch day" of timestamp_1 
> and search the best matching timestamp linked to it. This way we would have 
> continuous "epoch day" variable instead of repetitive and inconstant time 
> units and for example queries spanning multiple years would be easier to 
> do. Of course for aggregates the usual "time tree" would be available 
> besides this. 
>
> What do you think?
>
> Timo
>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[orientdb] Re: Time series, best practise for queries between two points in time?

Reply via email to