Re: last_write_wins
Also, Using last_write_wins = true, do I need to always send the vclock while on a PUT request? In the official documention it says that riak will look only at the timestamp of the requests. Best regards, On 29 January 2014 10:29, Edgar Veiga edgarmve...@gmail.com wrote: Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com wrote: On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? It depends. Does the value you write depend in anyway on the value you read, or is it always that you are just getting a totally new value that replaces what is in Riak (regardless what is in Riak)? On 27 January 2014 23:25, Edgar Veiga edgarmve...@gmail.com wrote: Hi there everyone! I would like to know, if my current application is a good use case to set last_write_wins to true. Basically I have a cluster of node.js workers reading and writing to riak. Each node.js worker is responsible for a set of keys, so I can guarantee some kind of non distributed cache... The real deal here is that the writing operation is not run evertime an object is changed but each 5 seconds in a batch insertion/update style. This brings the guarantee that the same object cannot be write to riak at the same time, not event at the same seconds, there's always a 5 second window between each insertion/update. That said, is it profitable to me if I set last_write_wins to true? I've been facing some massive writting delays under high loads and it would be nice if I have some kind of way to tune riak. Thanks a lot and keep up the good work! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: last_write_wins
On 30 Jan 2014, at 10:37, Edgar Veiga edgarmve...@gmail.com wrote: Also, Using last_write_wins = true, do I need to always send the vclock while on a PUT request? In the official documention it says that riak will look only at the timestamp of the requests. Ok, from what you’ve said it sounds like you are always wanting to replace what is at a key with the new information you are putting. If that is the case, then you have the perfect use case for LWW=true. And indeed, you do not need to pass a vclock with your put request. And it sounds like there is no need for you to fetch-before-put since that is only to get context /resolve siblings. Curious about your use case if you can share more. Cheers Russell Best regards, On 29 January 2014 10:29, Edgar Veiga edgarmve...@gmail.com wrote: Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com wrote: On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? It depends. Does the value you write depend in anyway on the value you read, or is it always that you are just getting a totally new value that replaces what is in Riak (regardless what is in Riak)? On 27 January 2014 23:25, Edgar Veiga edgarmve...@gmail.com wrote: Hi there everyone! I would like to know, if my current application is a good use case to set last_write_wins to true. Basically I have a cluster of node.js workers reading and writing to riak. Each node.js worker is responsible for a set of keys, so I can guarantee some kind of non distributed cache... The real deal here is that the writing operation is not run evertime an object is changed but each 5 seconds in a batch insertion/update style. This brings the guarantee that the same object cannot be write to riak at the same time, not event at the same seconds, there's always a 5 second window between each insertion/update. That said, is it profitable to me if I set last_write_wins to true? I've been facing some massive writting delays under high loads and it would be nice if I have some kind of way to tune riak. Thanks a lot and keep up the good work! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: last_write_wins
On 30 Jan 2014, at 10:58, Guido Medina guido.med...@temetra.com wrote: Hi, Now I'm curious too, according to http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/ the default value for Erlang property last_write_wins is false, now, if 95% of the buckets/keys have no siblings (or conflict resolution), does that mean that for such buckets last_write_wins is set to true, I'm wondering what's the effect (if any) if allow_multi on a bucket is false. In other words; I could assume that: If allow_multi is true then last_write_wins will be ignored 'cause vclock is needed for conflict resolution? if allow_multi is false then last_write_wins is true? They’re independant settings, but allow_mult=true + lww=true makes no sense (in reality, in the code, I’m pretty sure the lww=true will be applied.) allow_mult=false+lww=false means at each vnode there is a read-before-write, and casually dominated values are dropped, while siblings values are made, but before we write to disk (or return to the user on get) we pick the sibling with the highest timestamp. This means that you get _one_ of the causally concurrent values, the one with the largest timestamp. allow_mult=false+lww=false means that at the coordinating vnode we just increment whatever vclock the put has (probably none, right?) and write it to disk (no read of the local value first) and down stream at the replicas, the same thing, just store it. I need to check, but on a get, if there are siblings, just pick the highest timestamp. I really think, for riak, 90% of the time, allow_mult=true is your best choice. John Daily did a truly exhaustive set of blog posts on this http://basho.com/understanding-riaks-configurable-behaviors-part-1/ I highly recommend it. If you data is always overwrite maybe LWW makes sense for you. If it is write once, read ever after LWW is perfect. Cheers Russell Correct me if I'm wrong, Again, we have a very similar scenarios, where we create/modify keys and we are certain we have the latest version so for us last_write_wins... Regards, Guido. On 30/01/14 10:46, Russell Brown wrote: On 30 Jan 2014, at 10:37, Edgar Veiga edgarmve...@gmail.com wrote: Also, Using last_write_wins = true, do I need to always send the vclock while on a PUT request? In the official documention it says that riak will look only at the timestamp of the requests. Ok, from what you’ve said it sounds like you are always wanting to replace what is at a key with the new information you are putting. If that is the case, then you have the perfect use case for LWW=true. And indeed, you do not need to pass a vclock with your put request. And it sounds like there is no need for you to fetch-before-put since that is only to get context /resolve siblings. Curious about your use case if you can share more. Cheers Russell Best regards, On 29 January 2014 10:29, Edgar Veiga edgarmve...@gmail.com wrote: Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com wrote: On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? It depends. Does the value you write depend in anyway on the value you read, or is it always that you are just getting a totally new value that replaces what is in Riak (regardless what is in Riak)? On 27 January 2014 23:25, Edgar Veiga edgarmve...@gmail.com wrote: Hi there everyone! I would like to know, if my current application is a good use case to set last_write_wins to true. Basically I have a cluster of node.js workers reading and writing to riak. Each node.js worker is responsible for a set of keys, so I can guarantee some kind of non distributed cache... The real deal here is that the writing operation is not run evertime an object is changed but each 5 seconds in a batch insertion/update style. This brings the guarantee that the same object cannot be write to riak at the same time, not event at the same seconds, there's always a 5 second window between each insertion/update. That said, is it profitable to me if I set last_write_wins to true? I've been facing some massive writting delays under high loads and it would be nice if I have some kind of way to tune riak. Thanks a lot and keep up the good work! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com
Re: last_write_wins
Hi Russell, Thanks for your response, I understand most of it, I know LWW=true and allow_multi=true won't make any sense, but look at this scenario: All of our buckets have allow_multi=false except for the one bucket we have for CRDT counters, our application requires certain some level of consistency so we have full control of our reads/writes using a fine grain locking mechanism combined with in-memory cache so in our case the LWW=true is what we would want?, now, we haven't touched this parameter so it is at its default value. I'm assuming it will improve performance for our case, but, if we set LWW=true, will it affect the bucket(s) with allow_multi=true, is it safe to assume that if allow_multi=true LWW will be ignored? We only modify bucket properties using Riak Java client 1.4.x atm. Also, about safety, LWW=true uses timestamp? and LWW=false uses vclock?, future of both?, should we leave it untouched? we don't really want to use something that could jeopardise our data consistency requirement even if it means better performance. Hopefully I'm enriching the subject and not hijacking it, Thanks, Guido. On 30/01/14 12:49, Russell Brown wrote: On 30 Jan 2014, at 10:58, Guido Medina guido.med...@temetra.com mailto:guido.med...@temetra.com wrote: Hi, Now I'm curious too, according to http://docs.basho.com/riak/latest/ops/advanced/configs/configuration-files/ the default value for Erlang property last_write_wins is false, now, if 95% of the buckets/keys have no siblings (or conflict resolution), does that mean that for such buckets last_write_wins is set to true, I'm wondering what's the effect (if any) if allow_multi on a bucket is false. In other words; I could assume that: * If allow_multi is true then last_write_wins will be ignored 'cause vclock is needed for conflict resolution? * if allow_multi is false then last_write_wins is true? They’re independant settings, but allow_mult=true + lww=true makes no sense (in reality, in the code, I’m pretty sure the lww=true will be applied.) allow_mult=false+lww=false means at each vnode there is a read-before-write, and casually dominated values are dropped, while siblings values are made, but before we write to disk (or return to the user on get) we pick the sibling with the highest timestamp. This means that you get _one_ of the causally concurrent values, the one with the largest timestamp. allow_mult=false+lww=false means that at the coordinating vnode we just increment whatever vclock the put has (probably none, right?) and write it to disk (no read of the local value first) and down stream at the replicas, the same thing, just store it. I need to check, but on a get, if there are siblings, just pick the highest timestamp. I really think, for riak, 90% of the time, allow_mult=true is your best choice. John Daily did a truly exhaustive set of blog posts on this http://basho.com/understanding-riaks-configurable-behaviors-part-1/ I highly recommend it. If you data is always overwrite maybe LWW makes sense for you. If it is write once, read ever after LWW is perfect. Cheers Russell Correct me if I'm wrong, Again, we have a very similar scenarios, where we create/modify keys and we are certain we have the latest version so for us last_write_wins... Regards, Guido. On 30/01/14 10:46, Russell Brown wrote: On 30 Jan 2014, at 10:37, Edgar Veiga edgarmve...@gmail.com mailto:edgarmve...@gmail.com wrote: Also, Using last_write_wins = true, do I need to always send the vclock while on a PUT request? In the official documention it says that riak will look only at the timestamp of the requests. Ok, from what you’ve said it sounds like you are always wanting to replace what is at a key with the new information you are putting. If that is the case, then you have the perfect use case for LWW=true. And indeed, you do not need to pass a vclock with your put request. And it sounds like there is no need for you to fetch-before-put since that is only to get context /resolve siblings. Curious about your use case if you can share more. Cheers Russell Best regards, On 29 January 2014 10:29, Edgar Veiga edgarmve...@gmail.com mailto:edgarmve...@gmail.com wrote: Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com mailto:russell.br...@me.com wrote: On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com mailto:edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? It depends. Does the value you write depend in anyway on the value you read, or is it always that you are just getting a totally new value that replaces what is in Riak (regardless what is in Riak)? On 27 January 2014 23:25,
Re: last_write_wins
I'll try to explain this the best I can, although it's a simples architecture I'm not describing it in my native language :) I have a set of node.js workers (64 for now) that serve as a cache/middleware layer for a dozen of php applications. Each worker deals with a set of documents (it's not a distributed cache system). Each worker updates the documents in memory, and tags them as dirty (just like OS file cache), and from time to time (for now, it's a 5 seconds window interval), a persister module will deal with the persistence of those dirty documents to riak. If the document isn't in memory, it will be fetched from riak. If you want document X, you need to ask to the corresponding worker dealing with it. Two different workers, don't deal with the same document. That way we can guarantee that there will be no concurrent writes to riak. Best Regards, On 30 January 2014 10:46, Russell Brown russell.br...@me.com wrote: On 30 Jan 2014, at 10:37, Edgar Veiga edgarmve...@gmail.com wrote: Also, Using last_write_wins = true, do I need to always send the vclock while on a PUT request? In the official documention it says that riak will look only at the timestamp of the requests. Ok, from what you've said it sounds like you are always wanting to replace what is at a key with the new information you are putting. If that is the case, then you have the perfect use case for LWW=true. And indeed, you do not need to pass a vclock with your put request. And it sounds like there is no need for you to fetch-before-put since that is only to get context /resolve siblings. Curious about your use case if you can share more. Cheers Russell Best regards, On 29 January 2014 10:29, Edgar Veiga edgarmve...@gmail.com wrote: Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com wrote: On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? It depends. Does the value you write depend in anyway on the value you read, or is it always that you are just getting a totally new value that replaces what is in Riak (regardless what is in Riak)? On 27 January 2014 23:25, Edgar Veiga edgarmve...@gmail.com wrote: Hi there everyone! I would like to know, if my current application is a good use case to set last_write_wins to true. Basically I have a cluster of node.js workers reading and writing to riak. Each node.js worker is responsible for a set of keys, so I can guarantee some kind of non distributed cache... The real deal here is that the writing operation is not run evertime an object is changed but each 5 seconds in a batch insertion/update style. This brings the guarantee that the same object cannot be write to riak at the same time, not event at the same seconds, there's always a 5 second window between each insertion/update. That said, is it profitable to me if I set last_write_wins to true? I've been facing some massive writting delays under high loads and it would be nice if I have some kind of way to tune riak. Thanks a lot and keep up the good work! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: last_write_wins
I'm not sure Riak is the best fit for this. Riak is great for applications where it is the source of data, and has very strong consistency when used in this way. You are using it as a cache, where Riak will be significantly slower than other cache solutions. Especially since you say that each worker will have a set of documents it is responsible for. Something like a local memcache or redis would likely suit this use case just as well, but do it much faster with less overhead. Riak will guarantee 3 writes to disk (by default), where something like memcache or redis will stay in memory, and if local, won't have network latency either. In the worst case where a node goes offline, the real data can be pulled from the backend again, so it isn't a big deal. It will also simplify your application, because node.js can always request from cache and not worry about the speed, instead of maintaining it's own cache layer. I'm as happy as the next person on this list to see Riak being used for all sorts of uses, but I believe in the right tool for the right job. Unless there is something I don't understand, Riak is probably the wrong tool. It will work, but there is other software that will work much better. I hope this helps, Jason Campbell - Original Message - From: Edgar Veiga edgarmve...@gmail.com To: Russell Brown russell.br...@me.com Cc: riak-users riak-users@lists.basho.com Sent: Friday, 31 January, 2014 3:20:42 AM Subject: Re: last_write_wins I'll try to explain this the best I can, although it's a simples architecture I'm not describing it in my native language :) I have a set of node.js workers (64 for now) that serve as a cache/middleware layer for a dozen of php applications. Each worker deals with a set of documents (it's not a distributed cache system). Each worker updates the documents in memory, and tags them as dirty (just like OS file cache), and from time to time (for now, it's a 5 seconds window interval), a persister module will deal with the persistence of those dirty documents to riak. If the document isn't in memory, it will be fetched from riak. If you want document X, you need to ask to the corresponding worker dealing with it. Two different workers, don't deal with the same document. That way we can guarantee that there will be no concurrent writes to riak. Best Regards, On 30 January 2014 10:46, Russell Brown russell.br...@me.com wrote: On 30 Jan 2014, at 10:37, Edgar Veiga edgarmve...@gmail.com wrote: Also, Using last_write_wins = true, do I need to always send the vclock while on a PUT request? In the official documention it says that riak will look only at the timestamp of the requests. Ok, from what you’ve said it sounds like you are always wanting to replace what is at a key with the new information you are putting. If that is the case, then you have the perfect use case for LWW=true. And indeed, you do not need to pass a vclock with your put request. And it sounds like there is no need for you to fetch-before-put since that is only to get context /resolve siblings. Curious about your use case if you can share more. Cheers Russell Best regards, On 29 January 2014 10:29, Edgar Veiga edgarmve...@gmail.com wrote: Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com wrote: On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? It depends. Does the value you write depend in anyway on the value you read, or is it always that you are just getting a totally new value that replaces what is in Riak (regardless what is in Riak)? On 27 January 2014 23:25, Edgar Veiga edgarmve...@gmail.com wrote: Hi there everyone! I would like to know, if my current application is a good use case to set last_write_wins to true. Basically I have a cluster of node.js workers reading and writing to riak. Each node.js worker is responsible for a set of keys, so I can guarantee some kind of non distributed cache... The real deal here is that the writing operation is not run evertime an object is changed but each 5 seconds in a batch insertion/update style. This brings the guarantee that the same object cannot be write to riak at the same time, not event at the same seconds, there's always a 5 second window between each insertion/update. That said, is it profitable to me if I set last_write_wins to true? I've been facing some massive writting delays under high loads and it would be nice if I have some kind of way to tune riak. Thanks a lot and keep up the good work! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak
Re: last_write_wins
Actually people use Riak as a distributed cache all the time. In fact, many customers use it exclusively as a cache system. Not all backends write to disk. Riak supports a main memory backend[1], complete with size limits and TTL. Eric [1]: http://docs.basho.com/riak/latest/ops/advanced/backends/memory/ On Jan 30, 2014, at 1:48 PM, Jason Campbell xia...@xiaclo.net wrote: I'm not sure Riak is the best fit for this. Riak is great for applications where it is the source of data, and has very strong consistency when used in this way. You are using it as a cache, where Riak will be significantly slower than other cache solutions. Especially since you say that each worker will have a set of documents it is responsible for. Something like a local memcache or redis would likely suit this use case just as well, but do it much faster with less overhead. Riak will guarantee 3 writes to disk (by default), where something like memcache or redis will stay in memory, and if local, won't have network latency either. In the worst case where a node goes offline, the real data can be pulled from the backend again, so it isn't a big deal. It will also simplify your application, because node.js can always request from cache and not worry about the speed, instead of maintaining it's own cache layer. I'm as happy as the next person on this list to see Riak being used for all sorts of uses, but I believe in the right tool for the right job. Unless there is something I don't understand, Riak is probably the wrong tool. It will work, but there is other software that will work much better. I hope this helps, Jason Campbell - Original Message - From: Edgar Veiga edgarmve...@gmail.com To: Russell Brown russell.br...@me.com Cc: riak-users riak-users@lists.basho.com Sent: Friday, 31 January, 2014 3:20:42 AM Subject: Re: last_write_wins I'll try to explain this the best I can, although it's a simples architecture I'm not describing it in my native language :) I have a set of node.js workers (64 for now) that serve as a cache/middleware layer for a dozen of php applications. Each worker deals with a set of documents (it's not a distributed cache system). Each worker updates the documents in memory, and tags them as dirty (just like OS file cache), and from time to time (for now, it's a 5 seconds window interval), a persister module will deal with the persistence of those dirty documents to riak. If the document isn't in memory, it will be fetched from riak. If you want document X, you need to ask to the corresponding worker dealing with it. Two different workers, don't deal with the same document. That way we can guarantee that there will be no concurrent writes to riak. Best Regards, On 30 January 2014 10:46, Russell Brown russell.br...@me.com wrote: On 30 Jan 2014, at 10:37, Edgar Veiga edgarmve...@gmail.com wrote: Also, Using last_write_wins = true, do I need to always send the vclock while on a PUT request? In the official documention it says that riak will look only at the timestamp of the requests. Ok, from what you’ve said it sounds like you are always wanting to replace what is at a key with the new information you are putting. If that is the case, then you have the perfect use case for LWW=true. And indeed, you do not need to pass a vclock with your put request. And it sounds like there is no need for you to fetch-before-put since that is only to get context /resolve siblings. Curious about your use case if you can share more. Cheers Russell Best regards, On 29 January 2014 10:29, Edgar Veiga edgarmve...@gmail.com wrote: Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com wrote: On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? It depends. Does the value you write depend in anyway on the value you read, or is it always that you are just getting a totally new value that replaces what is in Riak (regardless what is in Riak)? On 27 January 2014 23:25, Edgar Veiga edgarmve...@gmail.com wrote: Hi there everyone! I would like to know, if my current application is a good use case to set last_write_wins to true. Basically I have a cluster of node.js workers reading and writing to riak. Each node.js worker is responsible for a set of keys, so I can guarantee some kind of non distributed cache... The real deal here is that the writing operation is not run evertime an object is changed but each 5 seconds in a batch insertion/update style. This brings the guarantee that the same
Re: last_write_wins
Hi! I think that you are making some kind of confusion here... I'm not using riak for cache purposes, thats exactly the opposite! Riak is my end persistence system, I need to store the documents in a strong, secure, available and consistent place. That's riak. It's like I've said before, just make an analogy with the linux file cache system. Node.js workers simulate that in-memory cache, php applications write and read from them and when something is dirty, it's persisted to riak... Best regards On 30 January 2014 22:26, Eric Redmond eredm...@basho.com wrote: Actually people use Riak as a distributed cache all the time. In fact, many customers use it exclusively as a cache system. Not all backends write to disk. Riak supports a main memory backend[1], complete with size limits and TTL. Eric [1]: http://docs.basho.com/riak/latest/ops/advanced/backends/memory/ On Jan 30, 2014, at 1:48 PM, Jason Campbell xia...@xiaclo.net wrote: I'm not sure Riak is the best fit for this. Riak is great for applications where it is the source of data, and has very strong consistency when used in this way. You are using it as a cache, where Riak will be significantly slower than other cache solutions. Especially since you say that each worker will have a set of documents it is responsible for. Something like a local memcache or redis would likely suit this use case just as well, but do it much faster with less overhead. Riak will guarantee 3 writes to disk (by default), where something like memcache or redis will stay in memory, and if local, won't have network latency either. In the worst case where a node goes offline, the real data can be pulled from the backend again, so it isn't a big deal. It will also simplify your application, because node.js can always request from cache and not worry about the speed, instead of maintaining it's own cache layer. I'm as happy as the next person on this list to see Riak being used for all sorts of uses, but I believe in the right tool for the right job. Unless there is something I don't understand, Riak is probably the wrong tool. It will work, but there is other software that will work much better. I hope this helps, Jason Campbell - Original Message - From: Edgar Veiga edgarmve...@gmail.com To: Russell Brown russell.br...@me.com Cc: riak-users riak-users@lists.basho.com Sent: Friday, 31 January, 2014 3:20:42 AM Subject: Re: last_write_wins I'll try to explain this the best I can, although it's a simples architecture I'm not describing it in my native language :) I have a set of node.js workers (64 for now) that serve as a cache/middleware layer for a dozen of php applications. Each worker deals with a set of documents (it's not a distributed cache system). Each worker updates the documents in memory, and tags them as dirty (just like OS file cache), and from time to time (for now, it's a 5 seconds window interval), a persister module will deal with the persistence of those dirty documents to riak. If the document isn't in memory, it will be fetched from riak. If you want document X, you need to ask to the corresponding worker dealing with it. Two different workers, don't deal with the same document. That way we can guarantee that there will be no concurrent writes to riak. Best Regards, On 30 January 2014 10:46, Russell Brown russell.br...@me.com wrote: On 30 Jan 2014, at 10:37, Edgar Veiga edgarmve...@gmail.com wrote: Also, Using last_write_wins = true, do I need to always send the vclock while on a PUT request? In the official documention it says that riak will look only at the timestamp of the requests. Ok, from what you've said it sounds like you are always wanting to replace what is at a key with the new information you are putting. If that is the case, then you have the perfect use case for LWW=true. And indeed, you do not need to pass a vclock with your put request. And it sounds like there is no need for you to fetch-before-put since that is only to get context /resolve siblings. Curious about your use case if you can share more. Cheers Russell Best regards, On 29 January 2014 10:29, Edgar Veiga edgarmve...@gmail.com wrote: Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com wrote: On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? It depends. Does the value you write depend in anyway on the value you read, or is it always that you are just getting a totally new value that replaces what is in Riak (regardless what is in Riak)? On 27 January 2014 23:25, Edgar Veiga edgarmve...@gmail.com wrote: Hi there everyone! I would like to know, if my current application
Re: last_write_wins
For clarity, I was responding to Jason's assertion that Riak shouldn't be used as a cache, not to your specific issue, Edgar. Eric On Jan 30, 2014, at 2:54 PM, Edgar Veiga edgarmve...@gmail.com wrote: Hi! I think that you are making some kind of confusion here... I'm not using riak for cache purposes, thats exactly the opposite! Riak is my end persistence system, I need to store the documents in a strong, secure, available and consistent place. That's riak. It's like I've said before, just make an analogy with the linux file cache system. Node.js workers simulate that in-memory cache, php applications write and read from them and when something is dirty, it's persisted to riak... Best regards On 30 January 2014 22:26, Eric Redmond eredm...@basho.com wrote: Actually people use Riak as a distributed cache all the time. In fact, many customers use it exclusively as a cache system. Not all backends write to disk. Riak supports a main memory backend[1], complete with size limits and TTL. Eric [1]: http://docs.basho.com/riak/latest/ops/advanced/backends/memory/ On Jan 30, 2014, at 1:48 PM, Jason Campbell xia...@xiaclo.net wrote: I'm not sure Riak is the best fit for this. Riak is great for applications where it is the source of data, and has very strong consistency when used in this way. You are using it as a cache, where Riak will be significantly slower than other cache solutions. Especially since you say that each worker will have a set of documents it is responsible for. Something like a local memcache or redis would likely suit this use case just as well, but do it much faster with less overhead. Riak will guarantee 3 writes to disk (by default), where something like memcache or redis will stay in memory, and if local, won't have network latency either. In the worst case where a node goes offline, the real data can be pulled from the backend again, so it isn't a big deal. It will also simplify your application, because node.js can always request from cache and not worry about the speed, instead of maintaining it's own cache layer. I'm as happy as the next person on this list to see Riak being used for all sorts of uses, but I believe in the right tool for the right job. Unless there is something I don't understand, Riak is probably the wrong tool. It will work, but there is other software that will work much better. I hope this helps, Jason Campbell - Original Message - From: Edgar Veiga edgarmve...@gmail.com To: Russell Brown russell.br...@me.com Cc: riak-users riak-users@lists.basho.com Sent: Friday, 31 January, 2014 3:20:42 AM Subject: Re: last_write_wins I'll try to explain this the best I can, although it's a simples architecture I'm not describing it in my native language :) I have a set of node.js workers (64 for now) that serve as a cache/middleware layer for a dozen of php applications. Each worker deals with a set of documents (it's not a distributed cache system). Each worker updates the documents in memory, and tags them as dirty (just like OS file cache), and from time to time (for now, it's a 5 seconds window interval), a persister module will deal with the persistence of those dirty documents to riak. If the document isn't in memory, it will be fetched from riak. If you want document X, you need to ask to the corresponding worker dealing with it. Two different workers, don't deal with the same document. That way we can guarantee that there will be no concurrent writes to riak. Best Regards, On 30 January 2014 10:46, Russell Brown russell.br...@me.com wrote: On 30 Jan 2014, at 10:37, Edgar Veiga edgarmve...@gmail.com wrote: Also, Using last_write_wins = true, do I need to always send the vclock while on a PUT request? In the official documention it says that riak will look only at the timestamp of the requests. Ok, from what you’ve said it sounds like you are always wanting to replace what is at a key with the new information you are putting. If that is the case, then you have the perfect use case for LWW=true. And indeed, you do not need to pass a vclock with your put request. And it sounds like there is no need for you to fetch-before-put since that is only to get context /resolve siblings. Curious about your use case if you can share more. Cheers Russell Best regards, On 29 January 2014 10:29, Edgar Veiga edgarmve...@gmail.com wrote: Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com wrote: On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable
Re: last_write_wins
Yes Eric, I understood :) On 30 January 2014 23:00, Eric Redmond eredm...@basho.com wrote: For clarity, I was responding to Jason's assertion that Riak shouldn't be used as a cache, not to your specific issue, Edgar. Eric On Jan 30, 2014, at 2:54 PM, Edgar Veiga edgarmve...@gmail.com wrote: Hi! I think that you are making some kind of confusion here... I'm not using riak for cache purposes, thats exactly the opposite! Riak is my end persistence system, I need to store the documents in a strong, secure, available and consistent place. That's riak. It's like I've said before, just make an analogy with the linux file cache system. Node.js workers simulate that in-memory cache, php applications write and read from them and when something is dirty, it's persisted to riak... Best regards On 30 January 2014 22:26, Eric Redmond eredm...@basho.com wrote: Actually people use Riak as a distributed cache all the time. In fact, many customers use it exclusively as a cache system. Not all backends write to disk. Riak supports a main memory backend[1], complete with size limits and TTL. Eric [1]: http://docs.basho.com/riak/latest/ops/advanced/backends/memory/ On Jan 30, 2014, at 1:48 PM, Jason Campbell xia...@xiaclo.net wrote: I'm not sure Riak is the best fit for this. Riak is great for applications where it is the source of data, and has very strong consistency when used in this way. You are using it as a cache, where Riak will be significantly slower than other cache solutions. Especially since you say that each worker will have a set of documents it is responsible for. Something like a local memcache or redis would likely suit this use case just as well, but do it much faster with less overhead. Riak will guarantee 3 writes to disk (by default), where something like memcache or redis will stay in memory, and if local, won't have network latency either. In the worst case where a node goes offline, the real data can be pulled from the backend again, so it isn't a big deal. It will also simplify your application, because node.js can always request from cache and not worry about the speed, instead of maintaining it's own cache layer. I'm as happy as the next person on this list to see Riak being used for all sorts of uses, but I believe in the right tool for the right job. Unless there is something I don't understand, Riak is probably the wrong tool. It will work, but there is other software that will work much better. I hope this helps, Jason Campbell - Original Message - From: Edgar Veiga edgarmve...@gmail.com To: Russell Brown russell.br...@me.com Cc: riak-users riak-users@lists.basho.com Sent: Friday, 31 January, 2014 3:20:42 AM Subject: Re: last_write_wins I'll try to explain this the best I can, although it's a simples architecture I'm not describing it in my native language :) I have a set of node.js workers (64 for now) that serve as a cache/middleware layer for a dozen of php applications. Each worker deals with a set of documents (it's not a distributed cache system). Each worker updates the documents in memory, and tags them as dirty (just like OS file cache), and from time to time (for now, it's a 5 seconds window interval), a persister module will deal with the persistence of those dirty documents to riak. If the document isn't in memory, it will be fetched from riak. If you want document X, you need to ask to the corresponding worker dealing with it. Two different workers, don't deal with the same document. That way we can guarantee that there will be no concurrent writes to riak. Best Regards, On 30 January 2014 10:46, Russell Brown russell.br...@me.com wrote: On 30 Jan 2014, at 10:37, Edgar Veiga edgarmve...@gmail.com wrote: Also, Using last_write_wins = true, do I need to always send the vclock while on a PUT request? In the official documention it says that riak will look only at the timestamp of the requests. Ok, from what you've said it sounds like you are always wanting to replace what is at a key with the new information you are putting. If that is the case, then you have the perfect use case for LWW=true. And indeed, you do not need to pass a vclock with your put request. And it sounds like there is no need for you to fetch-before-put since that is only to get context /resolve siblings. Curious about your use case if you can share more. Cheers Russell Best regards, On 29 January 2014 10:29, Edgar Veiga edgarmve...@gmail.com wrote: Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com wrote: On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? It depends. Does the value you write depend
Re: last_write_wins
Here's a (bad) mockup of the solution: https://cloudup.com/cOMhcPry38U Hope that this time I've made myself a little more clear :) Regards On 30 January 2014 23:04, Edgar Veiga edgarmve...@gmail.com wrote: Yes Eric, I understood :) On 30 January 2014 23:00, Eric Redmond eredm...@basho.com wrote: For clarity, I was responding to Jason's assertion that Riak shouldn't be used as a cache, not to your specific issue, Edgar. Eric On Jan 30, 2014, at 2:54 PM, Edgar Veiga edgarmve...@gmail.com wrote: Hi! I think that you are making some kind of confusion here... I'm not using riak for cache purposes, thats exactly the opposite! Riak is my end persistence system, I need to store the documents in a strong, secure, available and consistent place. That's riak. It's like I've said before, just make an analogy with the linux file cache system. Node.js workers simulate that in-memory cache, php applications write and read from them and when something is dirty, it's persisted to riak... Best regards On 30 January 2014 22:26, Eric Redmond eredm...@basho.com wrote: Actually people use Riak as a distributed cache all the time. In fact, many customers use it exclusively as a cache system. Not all backends write to disk. Riak supports a main memory backend[1], complete with size limits and TTL. Eric [1]: http://docs.basho.com/riak/latest/ops/advanced/backends/memory/ On Jan 30, 2014, at 1:48 PM, Jason Campbell xia...@xiaclo.net wrote: I'm not sure Riak is the best fit for this. Riak is great for applications where it is the source of data, and has very strong consistency when used in this way. You are using it as a cache, where Riak will be significantly slower than other cache solutions. Especially since you say that each worker will have a set of documents it is responsible for. Something like a local memcache or redis would likely suit this use case just as well, but do it much faster with less overhead. Riak will guarantee 3 writes to disk (by default), where something like memcache or redis will stay in memory, and if local, won't have network latency either. In the worst case where a node goes offline, the real data can be pulled from the backend again, so it isn't a big deal. It will also simplify your application, because node.js can always request from cache and not worry about the speed, instead of maintaining it's own cache layer. I'm as happy as the next person on this list to see Riak being used for all sorts of uses, but I believe in the right tool for the right job. Unless there is something I don't understand, Riak is probably the wrong tool. It will work, but there is other software that will work much better. I hope this helps, Jason Campbell - Original Message - From: Edgar Veiga edgarmve...@gmail.com To: Russell Brown russell.br...@me.com Cc: riak-users riak-users@lists.basho.com Sent: Friday, 31 January, 2014 3:20:42 AM Subject: Re: last_write_wins I'll try to explain this the best I can, although it's a simples architecture I'm not describing it in my native language :) I have a set of node.js workers (64 for now) that serve as a cache/middleware layer for a dozen of php applications. Each worker deals with a set of documents (it's not a distributed cache system). Each worker updates the documents in memory, and tags them as dirty (just like OS file cache), and from time to time (for now, it's a 5 seconds window interval), a persister module will deal with the persistence of those dirty documents to riak. If the document isn't in memory, it will be fetched from riak. If you want document X, you need to ask to the corresponding worker dealing with it. Two different workers, don't deal with the same document. That way we can guarantee that there will be no concurrent writes to riak. Best Regards, On 30 January 2014 10:46, Russell Brown russell.br...@me.com wrote: On 30 Jan 2014, at 10:37, Edgar Veiga edgarmve...@gmail.com wrote: Also, Using last_write_wins = true, do I need to always send the vclock while on a PUT request? In the official documention it says that riak will look only at the timestamp of the requests. Ok, from what you've said it sounds like you are always wanting to replace what is at a key with the new information you are putting. If that is the case, then you have the perfect use case for LWW=true. And indeed, you do not need to pass a vclock with your put request. And it sounds like there is no need for you to fetch-before-put since that is only to get context /resolve siblings. Curious about your use case if you can share more. Cheers Russell Best regards, On 29 January 2014 10:29, Edgar Veiga edgarmve...@gmail.com wrote: Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com wrote: On 29 Jan 2014, at 09:57
Re: last_write_wins
No problem Jason, I'm glad you've tried to help :) I'm using leveldb backend, and the system is running in production for about 6 months. It's being quiet an interesting experience, but now that the load is getting bigger and the amount of data in riak too, we need to start tunning this little things. Best regards! On 30 January 2014 23:17, Jason Campbell xia...@xiaclo.net wrote: Oh, I completely misunderstood, I'm sorry for that. I was thinking of your application as a typical web application which could regenerate the data at any time (making that the authoritative source, not Riak). In that case, Riak does sound perfect, but I would definitely not use the memory backend if that is the only copy of the data. Eric, I'm sorry if I made is sound like Riak is a poor cache in all situations, I just didn't think it fit here (although I clearly misunderstood). There is a tradeoff between speed and consistency/reliability, and the whole application has to take advantage of the extra consistency and reliability for it to make sense. Sorry again, Jason Campbell - Original Message - From: Edgar Veiga edgarmve...@gmail.com To: Eric Redmond eredm...@basho.com Cc: Jason Campbell xia...@xiaclo.net, riak-users riak-users@lists.basho.com, Russell Brown russell.br...@me.com Sent: Friday, 31 January, 2014 9:54:33 AM Subject: Re: last_write_wins Hi! I think that you are making some kind of confusion here... I'm not using riak for cache purposes, thats exactly the opposite! Riak is my end persistence system, I need to store the documents in a strong, secure, available and consistent place. That's riak. It's like I've said before, just make an analogy with the linux file cache system. Node.js workers simulate that in-memory cache, php applications write and read from them and when something is dirty, it's persisted to riak... Best regards On 30 January 2014 22:26, Eric Redmond eredm...@basho.com wrote: Actually people use Riak as a distributed cache all the time. In fact, many customers use it exclusively as a cache system. Not all backends write to disk. Riak supports a main memory backend[1], complete with size limits and TTL. Eric [1]: http://docs.basho.com/riak/latest/ops/advanced/backends/memory/ On Jan 30, 2014, at 1:48 PM, Jason Campbell xia...@xiaclo.net wrote: I'm not sure Riak is the best fit for this. Riak is great for applications where it is the source of data, and has very strong consistency when used in this way. You are using it as a cache, where Riak will be significantly slower than other cache solutions. Especially since you say that each worker will have a set of documents it is responsible for. Something like a local memcache or redis would likely suit this use case just as well, but do it much faster with less overhead. Riak will guarantee 3 writes to disk (by default), where something like memcache or redis will stay in memory, and if local, won't have network latency either. In the worst case where a node goes offline, the real data can be pulled from the backend again, so it isn't a big deal. It will also simplify your application, because node.js can always request from cache and not worry about the speed, instead of maintaining it's own cache layer. I'm as happy as the next person on this list to see Riak being used for all sorts of uses, but I believe in the right tool for the right job. Unless there is something I don't understand, Riak is probably the wrong tool. It will work, but there is other software that will work much better. I hope this helps, Jason Campbell - Original Message - From: Edgar Veiga edgarmve...@gmail.com To: Russell Brown russell.br...@me.com Cc: riak-users riak-users@lists.basho.com Sent: Friday, 31 January, 2014 3:20:42 AM Subject: Re: last_write_wins I'll try to explain this the best I can, although it's a simples architecture I'm not describing it in my native language :) I have a set of node.js workers (64 for now) that serve as a cache/middleware layer for a dozen of php applications. Each worker deals with a set of documents (it's not a distributed cache system). Each worker updates the documents in memory, and tags them as dirty (just like OS file cache), and from time to time (for now, it's a 5 seconds window interval), a persister module will deal with the persistence of those dirty documents to riak. If the document isn't in memory, it will be fetched from riak. If you want document X, you need to ask to the corresponding worker dealing with it. Two different workers, don't deal with the same document. That way we can guarantee that there will be no concurrent writes to riak. Best Regards, On 30 January 2014 10:46, Russell Brown russell.br...@me.com wrote: On 30 Jan 2014, at 10:37, Edgar Veiga edgarmve...@gmail.com wrote: Also, Using last_write_wins = true, do I need to always send
Re: last_write_wins
Replies inline. (Thanks to Russell for the link to my blog series, but honestly, as I now re-read the section on conflict resolution, I’m unhappy with it. It’s a very confusing topic and I regret not doing a better job of clarifying it. This answer will undoubtedly also be more confusing than I’d like.) On Jan 30, 2014, at 9:53 AM, Guido Medina guido.med...@temetra.com wrote: All of our buckets have allow_multi=false except for the one bucket we have for CRDT counters, our application requires certain some level of consistency so we have full control of our reads/writes using a fine grain locking mechanism combined with in-memory cache so in our case the LWW=true is what we would want?, now, we haven't touched this parameter so it is at its default value. It’s a bit confusing to refer to “LWW because the “last write wins” strategy is often referred to as LWW, and separately we have the a last_write_wins configuration parameter, and they’re not the same thing. I’m going to stick to last_write_wins to be explicit when I’m referring to the parameter, and “last write wins” when referring to the strategy. (Informally I often refer to LWW as the strategy and lww as the parameter, but I’ll spare you that casual pedantry here.) The “last write wins” strategy comes into play whenever allow_mult is set to false, regardless of the value of last_write_wins. Setting last_write_wins=true when allow_mult=false will optimize Bitcask[1] “put requests to not bother reading any existing value to compare vector clocks, but if servers are offline or there are network partitions during the put operation, when read repair or active anti-entropy are invoked later the vector clock (including server timestamp) will be used to guess[2] which version of the object is the “last. If you can truly guarantee serialization at the application layer and you can guarantee that no two updates to a single value will occur within the worst-case clock skew across your cluster, then the “last write wins” strategy is reasonable. If you have any doubt about either and data safety is important, you really should set allow_mult=true and deal with siblings. Unfortunately, worst-case clock skew across a cluster can be pretty bad. NTP will typically keep it under control, but it’s all too easy for both NTP and your monitoring of NTP to be broken. I'm assuming it will improve performance for our case, but, if we set LWW=true, will it affect the bucket(s) with allow_multi=true, is it safe to assume that if allow_multi=true LWW will be ignored? We only modify bucket properties using Riak Java client 1.4.x atm. No, it’s definitely not safe. If you set your cluster default to last_write_wins=true, you should explicitly set your allow_mult=true buckets to last_write_wins=false using the Java client. As Russell indicated, the behavior if both allow_mult and last_write_wins are set to true is undefined and not guaranteed at all to be what you want, regardless of the current state of the code. Also, about safety, LWW=true uses timestamp? and LWW=false uses vclock?, future of both?, should we leave it untouched? we don't really want to use something that could jeopardise our data consistency requirement even if it means better performance. Vector clocks (with embedded server timestamps) are used to help Riak decide what to do about data inconsistencies regardless of the configuration settings. Riak generates (or updates) vector clocks with each put. -John [1] Why does last_write_wins=true only really impact Bitcask writes? If the backend supports 2i (Memory or LevelDB currently) then we have to read the old value from disk to determine whether any indexes need to be updated when that value is replaced. [2] You could use the word “determine” here, but given the inherent unreliability of server clocks, it’s just as accurate to say “guess. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: last_write_wins
tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? On 27 January 2014 23:25, Edgar Veiga edgarmve...@gmail.com wrote: Hi there everyone! I would like to know, if my current application is a good use case to set last_write_wins to true. Basically I have a cluster of node.js workers reading and writing to riak. Each node.js worker is responsible for a set of keys, so I can guarantee some kind of non distributed cache... The real deal here is that the writing operation is not run evertime an object is changed but each 5 seconds in a batch insertion/update style. This brings the guarantee that the same object cannot be write to riak at the same time, not event at the same seconds, there's always a 5 second window between each insertion/update. That said, is it profitable to me if I set last_write_wins to true? I've been facing some massive writting delays under high loads and it would be nice if I have some kind of way to tune riak. Thanks a lot and keep up the good work! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: last_write_wins
On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? It depends. Does the value you write depend in anyway on the value you read, or is it always that you are just getting a totally new value that replaces what is in Riak (regardless what is in Riak)? On 27 January 2014 23:25, Edgar Veiga edgarmve...@gmail.com wrote: Hi there everyone! I would like to know, if my current application is a good use case to set last_write_wins to true. Basically I have a cluster of node.js workers reading and writing to riak. Each node.js worker is responsible for a set of keys, so I can guarantee some kind of non distributed cache... The real deal here is that the writing operation is not run evertime an object is changed but each 5 seconds in a batch insertion/update style. This brings the guarantee that the same object cannot be write to riak at the same time, not event at the same seconds, there's always a 5 second window between each insertion/update. That said, is it profitable to me if I set last_write_wins to true? I've been facing some massive writting delays under high loads and it would be nice if I have some kind of way to tune riak. Thanks a lot and keep up the good work! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: last_write_wins
Hi Russel, No, it doesn't depend. It's always a new value. Best regards On 29 January 2014 10:10, Russell Brown russell.br...@me.com wrote: On 29 Jan 2014, at 09:57, Edgar Veiga edgarmve...@gmail.com wrote: tl;dr If I guarantee that the same key is only written with a 5 second interval, is last_write_wins=true profitable? It depends. Does the value you write depend in anyway on the value you read, or is it always that you are just getting a totally new value that replaces what is in Riak (regardless what is in Riak)? On 27 January 2014 23:25, Edgar Veiga edgarmve...@gmail.com wrote: Hi there everyone! I would like to know, if my current application is a good use case to set last_write_wins to true. Basically I have a cluster of node.js workers reading and writing to riak. Each node.js worker is responsible for a set of keys, so I can guarantee some kind of non distributed cache... The real deal here is that the writing operation is not run evertime an object is changed but each 5 seconds in a batch insertion/update style. This brings the guarantee that the same object cannot be write to riak at the same time, not event at the same seconds, there's always a 5 second window between each insertion/update. That said, is it profitable to me if I set last_write_wins to true? I've been facing some massive writting delays under high loads and it would be nice if I have some kind of way to tune riak. Thanks a lot and keep up the good work! ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com