Re: High number of Riak buckets
Hiya Alexander, Thanks much indeed for the detailed note... very interesting insights... As you deduced, I actually omitted some pieces from my email for the sake of simplicity. I'm actually leveraging a transient / stateless chat server (ejabberd) wherein messages get delivered on live sessions / streams without the client having to do look-ups. So the storage in Riak is actually a post-facto delivery / archival rather than prior to the client receiving them. Hence determining the time key for the look-up isn't going to be an issue unless I run some analytics where I query all keys (which would be an issue as I now understand from your comments). There is of course the question of offline messages whose delivery would depend on look-ups, but ejabberd there uses the username (the offline storage is with the secondary index as well on leveldb) and hence the timestamp not being important. Riak TS sure looks promising there but I'll check further whether the change would be justified for only offline messages, or in case other use cases crop up... Makes sense on the listing all keys in a bucket being expensive though - let me see how I can model my data for that!!! Thanks again for your inputs... very informative... Cheers. Vikram On Fri, Sep 30, 2016 at 12:23 PM, Alexander Sicular wrote: > Hi Vikram, > > Bucket maximums aside, why are you modeling in this fashion? How will you > retrieve individual keys if you don't know the time stamp in advance? Do > you have a lookup somewhere else? Doable as lookup keys or crdts or other > systems. Are you relying on listing all keys in a bucket? Definitely don't > do that. > > Yes, there is a better way. Use Riak TS. Create a table with a composite > primary key of topic and time. You can then retrieve by topic equality and > time range. You can then cache those results in deterministic keys as > necessary. > > If you don't already know, Riak TS is basically (there are some notable > differences) Riak KV plus the time series data model. Riak TS makes all > sorts of time series oriented projects easier than modeling them against > KV. Oh, and you can also leverage KV buckets alongside TS (resource > limitations not withstanding.) > > Would love to hear more, > Alexander > > @siculars > http://siculars.posthaven.com > > Sent from my iRotaryPhone > > > On Sep 29, 2016, at 19:42, Vikram Lalit wrote: > > > > Hi - I am creating a messaging platform wherein am modeling each topic > to serve as a separate bucket. That means there can potentially be millions > of buckets, with each message from a user becoming a value on a distinct > timestamp key. > > > > My question is there any downside to modeling my data in such a manner? > Or can folks advise a better way of storing the same in Riak? > > > > Secondly, I would like to modify the default bucket properties (n_val) - > I understand that such 'custom' buckets have a higher performance overhead > due to the extra load on the gossip protocol. Is there a way the default > n_val of newly created buckets be changed so that even if I have the above > said high number of buckets, there is no performance degrade? Believe there > was such a config allowed in app.config but not sure that file is leveraged > any more after riak.conf was introduced. > > > > Thanks much. > > ___ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: High number of Riak buckets
Hi Luke - many thanks... actually I was planning to have different bucket types have a different n_val. Or I might end up doing so... the thinking being that I intend to start my production workloads with fewer replications, but as the system matures / stabilizes (and also increases in userbase!), I would want to increase n_val. In my testing that I had done a few weeks ago, each time I tried to increase the n_val of an existing bucket, I've found conflicting results (prior question here: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2016-July/018631.html) - perhaps due to read-repair taking time - not sure. Understood though from various Riak papers that decreasing n_val should not be done, but couldn't conclude yet as to why would increasing be an issue... So to avoid the scenario, I've been thinking that as the system criticality increases, I would create a new bucket (with a higher n_val) and then start pushing newer conversations on to that bucket. Still not sure how this would behave, but let me test further with bucket types as you suggest... Do let know please if there's something glaring I'm missing as am trying to clarify the thought-process to myself as well!!! Cheers. On Fri, Sep 30, 2016 at 12:07 PM, Luke Bakken wrote: > Hi Vikram, > > If all of your buckets use the same bucket type with your custom > n_val, there won't be a performance issue. Just be sure to set n_val > on the bucket type, and that all buckets are part of that bucket type. > > http://docs.basho.com/riak/kv/2.1.4/developing/usage/bucket-types/ > > -- > Luke Bakken > Engineer > lbak...@basho.com > > On Thu, Sep 29, 2016 at 4:42 PM, Vikram Lalit > wrote: > > Hi - I am creating a messaging platform wherein am modeling each topic to > > serve as a separate bucket. That means there can potentially be millions > of > > buckets, with each message from a user becoming a value on a distinct > > timestamp key. > > > > My question is there any downside to modeling my data in such a manner? > Or > > can folks advise a better way of storing the same in Riak? > > > > Secondly, I would like to modify the default bucket properties (n_val) - > I > > understand that such 'custom' buckets have a higher performance overhead > due > > to the extra load on the gossip protocol. Is there a way the default > n_val > > of newly created buckets be changed so that even if I have the above said > > high number of buckets, there is no performance degrade? Believe there > was > > such a config allowed in app.config but not sure that file is leveraged > any > > more after riak.conf was introduced. > > > > Thanks much. > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: High number of Riak buckets
Hi Vikram, Bucket maximums aside, why are you modeling in this fashion? How will you retrieve individual keys if you don't know the time stamp in advance? Do you have a lookup somewhere else? Doable as lookup keys or crdts or other systems. Are you relying on listing all keys in a bucket? Definitely don't do that. Yes, there is a better way. Use Riak TS. Create a table with a composite primary key of topic and time. You can then retrieve by topic equality and time range. You can then cache those results in deterministic keys as necessary. If you don't already know, Riak TS is basically (there are some notable differences) Riak KV plus the time series data model. Riak TS makes all sorts of time series oriented projects easier than modeling them against KV. Oh, and you can also leverage KV buckets alongside TS (resource limitations not withstanding.) Would love to hear more, Alexander @siculars http://siculars.posthaven.com Sent from my iRotaryPhone > On Sep 29, 2016, at 19:42, Vikram Lalit wrote: > > Hi - I am creating a messaging platform wherein am modeling each topic to > serve as a separate bucket. That means there can potentially be millions of > buckets, with each message from a user becoming a value on a distinct > timestamp key. > > My question is there any downside to modeling my data in such a manner? Or > can folks advise a better way of storing the same in Riak? > > Secondly, I would like to modify the default bucket properties (n_val) - I > understand that such 'custom' buckets have a higher performance overhead due > to the extra load on the gossip protocol. Is there a way the default n_val of > newly created buckets be changed so that even if I have the above said high > number of buckets, there is no performance degrade? Believe there was such a > config allowed in app.config but not sure that file is leveraged any more > after riak.conf was introduced. > > Thanks much. > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: High number of Riak buckets
Hi Vikram, If all of your buckets use the same bucket type with your custom n_val, there won't be a performance issue. Just be sure to set n_val on the bucket type, and that all buckets are part of that bucket type. http://docs.basho.com/riak/kv/2.1.4/developing/usage/bucket-types/ -- Luke Bakken Engineer lbak...@basho.com On Thu, Sep 29, 2016 at 4:42 PM, Vikram Lalit wrote: > Hi - I am creating a messaging platform wherein am modeling each topic to > serve as a separate bucket. That means there can potentially be millions of > buckets, with each message from a user becoming a value on a distinct > timestamp key. > > My question is there any downside to modeling my data in such a manner? Or > can folks advise a better way of storing the same in Riak? > > Secondly, I would like to modify the default bucket properties (n_val) - I > understand that such 'custom' buckets have a higher performance overhead due > to the extra load on the gossip protocol. Is there a way the default n_val > of newly created buckets be changed so that even if I have the above said > high number of buckets, there is no performance degrade? Believe there was > such a config allowed in app.config but not sure that file is leveraged any > more after riak.conf was introduced. > > Thanks much. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
High number of Riak buckets
Hi - I am creating a messaging platform wherein am modeling each topic to serve as a separate bucket. That means there can potentially be millions of buckets, with each message from a user becoming a value on a distinct timestamp key. My question is there any downside to modeling my data in such a manner? Or can folks advise a better way of storing the same in Riak? Secondly, I would like to modify the default bucket properties (n_val) - I understand that such 'custom' buckets have a higher performance overhead due to the extra load on the gossip protocol. Is there a way the default n_val of newly created buckets be changed so that even if I have the above said high number of buckets, there is no performance degrade? Believe there was such a config allowed in app.config but not sure that file is leveraged any more after riak.conf was introduced. Thanks much. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com