I am investigating the use of InfluxDB for storing statistics in our project. 
So far I love the features and ease of use of InfluxDB, but I am worried about 
the details around series cardinality in our use case.

Basically, we have to keep track of a number of statistics per user.

We expect two kinds of queries:
A. Queries by users limited to their own statistics
B. System wide queries by for our own benefit *probably* with no conditions on 
user-id´s.

The easiest approach would be to store the user-id as a tag value. However, we 
expect a regular influx (pun intended ;-)) of new users and deactivation of old 
users. Unless we can somehow clean up the existing data by removing old users, 
this would mean the series cardinality would always go up, eventually getting 
us into trouble.

The alternative would be to store all data twice:
1. In per-user series for user queries (user-id in the series name)
2. In a system-wide series without user-id info for our own system wide queries

Not ideal, but it might be workable.

As always, the devil is in the details. We expect a maximum of about 10000 new 
users per year, with a maximum of about 50000 active users at any one time. The 
basic cardinality without user-id is about 100.

This means that in ten years the cardinality would grow to about 
(50000+10*10000)*100 = 15 million. This would put it in the category of 
¨probably infeasible¨ in the general hardware guidelines for a single node.

I suspect that the problem with such large cardinality is the memory required 
for the index. Is there any way to estimate what that memory requirement would 
be?

Would this high cardinality be less of an issue in a multi-node setup?

Are there any plans to mitigate the cardinality issues in such a use case?

Would the second approach (storing the data twice) actually help, or would it 
require the same amount of memory (or even more) than the straightforward 
approach?

I would very much appreciate any feedback on these issues as at this point in 
the development of our project it is relatively easy to pick an approach. A 
migration later on would be rather costly.

Regards,
Pieter.

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/b5794cef-f4e0-4bbb-ab3b-11ab63fe2a8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to