Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?
I'm planing the migration of our app to HRD. It is a collective buying site, and found lots of places where I need to change my models/queries. In fact, some cases where we need consistency is this scenario: class Product { @Id Long productId; } class Order { @Id Long orderId; ListLong productId; } class Voucher { @Id Long voucherId; Long orderId; Long productId; } Vouchers must be created before orders, so they are currently root entities. When an order is approved, I have a specialized queue with max_concurrent_request = 1 that picks the next available voucher (with has orderId = null) and associates it with an order. To check if the order is filled with all it's vouchers, I count how many Vouchers are linked with that orderId, and if there is Vouchers missing, I schedule another queue to consume a Voucher again. On HRD, this don't work because my the query to get the next Voucher and the query to check how much vouchers I have for an Order is more likelly to don't be consistent. What I'm planning to perform is to group Vouchers that are from the same product (~ 30k vouchers per product) and then perform the ancestor query, as suggested by the docs. In this case, I'll end up with: class Voucher { @Parent KeyProcut productId; @Id Long voucherId; Long orderId; } ... and will be able to query for how many Vouchers are linked to an order (one query for each of items in Order.productId). Is this a good pattern for this particular scenario? The writes/second is not a problem for us: i.e. if the order stays for a few minutes until the Vouchers are all filled, it is ok. Another issue: I have to perform some financial accounting registry, and currently I have this entity: class AccountingRegistry { @Parent KeyAccountRegistry parentRegistry; @Id Long id; Date date; Long ammount; ListString filters; } To represent an accounting transaction, I'm grouping in the same entity group all registry that are related, and that summed up equals 0. To avoid performing the same transaction twice (i.e., register twice the same order approval), I'm using the filters list property to query for another registry that has the same filters (i.e. the order id, the APROVED keyword, the domain, etc.). They are also usefull to have some specialized reports, like all sales that came from this domain (domain is one value for the list property). On M/S, as Jeff said, the time window is small, and the chance to have a problem is small, but on HRD the window may take several minutes, and in this case I may have a very inconsistent sales report at the end of the day. Does you guys think that I can use the same pattern Jeff suggested to solve this problem? Any advice? Thanks in advance - Ronoaldo -- You received this message because you are subscribed to the Google Groups Google App Engine group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/QC0MbTJiIwUJ. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
[google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?
I don't think Ikai read your post... Robert and I wanted to write a little HRD status site to track this and get real data, but we haven't done so yet. I have never seen the replication take more than about 1s. I think 1s will cover about four 9's, but that is just an educated guess. Until we (the users) actually measure this over time I don't think we can know for sure. -Mike On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote: I know that an index update in the HRD will typically be visible within a couple seconds. That's the average case. What is the worst-case? Assuming something in the datacenter goes wacky, how long might it take for an index to update? Tens of seconds, minutes, hours, days? Thanks, Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
[google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?
And then I went and used the word replication... i meant index lag. On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote: I don't think Ikai read your post... Robert and I wanted to write a little HRD status site to track this and get real data, but we haven't done so yet. I have never seen the replication take more than about 1s. I think 1s will cover about four 9's, but that is just an educated guess. Until we (the users) actually measure this over time I don't think we can know for sure. -Mike On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote: I know that an index update in the HRD will typically be visible within a couple seconds. That's the average case. What is the worst-case? Assuming something in the datacenter goes wacky, how long might it take for an index to update? Tens of seconds, minutes, hours, days? Thanks, Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?
Well, indexes are just Bigtable rows, so replication lag does apply to them as well. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote: And then I went and used the word replication... i meant index lag. On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote: I don't think Ikai read your post... Robert and I wanted to write a little HRD status site to track this and get real data, but we haven't done so yet. I have never seen the replication take more than about 1s. I think 1s will cover about four 9's, but that is just an educated guess. Until we (the users) actually measure this over time I don't think we can know for sure. -Mike On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote: I know that an index update in the HRD will typically be visible within a couple seconds. That's the average case. What is the worst-case? Assuming something in the datacenter goes wacky, how long might it take for an index to update? Tens of seconds, minutes, hours, days? Thanks, Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?
I'm doing a lot of work lately with data that requires a large degree of transactional consistency. One pattern I've found that makes some of the pain of HRD eventuality go away is to add an extra entity that uses your query field as a natural key. This really requires global transactions to work (as announced, it's in trusted testing, wheee!) but here's an example: Say you associate a facebook id with an account. In M/S, you'd probably have something like this: class User { @Id Long id; long fbId; ... } ...and then when a request arrives with a facebook id, you would query for the user record. No user record? Create one. With eventual consistency, this creates a larger window (with M/S it was small) where you can get duplicate Users for the same fbId. The solution to transactional integrity and strong consistency is to add a FbId entity: class FbId { @Id String fbId; long userId; } I've now got several of these mapping entities in place now. Using global transactions to create the FbId and the User at the same time, it seems to solve consistency issues entirely. I don't know how it will perform yet under load, but obviously there's not heavy contention in this situation so I would be surprised if the 2pc hurt much. I'm starting to notice several of these FbId-type mapping objects showing up in my code as a way to force queries (for unique items) into strong consistency. I'm guessing you could do this for multi-item queries using a list property instead: Instead of query(Thing.class).filter(color, someColor), you could instead keep updating an entity like this: class ColorThings { @Id String color; ListKeyThing things; } ...which feels upside-down but really has a lot of advantages. If you put ColorThings in memcache, it's like a query cache which actually updates properly. Is anyone else noticing their code being pushed into this pattern by the HRD? Jeff On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com wrote: Well, indexes are just Bigtable rows, so replication lag does apply to them as well. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote: And then I went and used the word replication... i meant index lag. On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote: I don't think Ikai read your post... Robert and I wanted to write a little HRD status site to track this and get real data, but we haven't done so yet. I have never seen the replication take more than about 1s. I think 1s will cover about four 9's, but that is just an educated guess. Until we (the users) actually measure this over time I don't think we can know for sure. -Mike On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote: I know that an index update in the HRD will typically be visible within a couple seconds. That's the average case. What is the worst-case? Assuming something in the datacenter goes wacky, how long might it take for an index to update? Tens of seconds, minutes, hours, days? Thanks, Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?
An interesting notion. Although you could also just use ColorThings(key_name=color) as the parent entity for all the Things. This way the list of things would be queriable directly (using an ancestor query) and there would not be a limit on the number and size of Things. They also exist next to each other in the underlying big table so there is only one 'seek' to find them (which is the largest cost when looking things up if you don't count serialization). On Tue, Sep 20, 2011 at 12:37 PM, Jeff Schnitzer j...@infohazard.orgwrote: I'm doing a lot of work lately with data that requires a large degree of transactional consistency. One pattern I've found that makes some of the pain of HRD eventuality go away is to add an extra entity that uses your query field as a natural key. This really requires global transactions to work (as announced, it's in trusted testing, wheee!) but here's an example: Say you associate a facebook id with an account. In M/S, you'd probably have something like this: class User { @Id Long id; long fbId; ... } ...and then when a request arrives with a facebook id, you would query for the user record. No user record? Create one. With eventual consistency, this creates a larger window (with M/S it was small) where you can get duplicate Users for the same fbId. The solution to transactional integrity and strong consistency is to add a FbId entity: class FbId { @Id String fbId; long userId; } I've now got several of these mapping entities in place now. Using global transactions to create the FbId and the User at the same time, it seems to solve consistency issues entirely. I don't know how it will perform yet under load, but obviously there's not heavy contention in this situation so I would be surprised if the 2pc hurt much. I'm starting to notice several of these FbId-type mapping objects showing up in my code as a way to force queries (for unique items) into strong consistency. I'm guessing you could do this for multi-item queries using a list property instead: Instead of query(Thing.class).filter(color, someColor), you could instead keep updating an entity like this: class ColorThings { @Id String color; ListKeyThing things; } ...which feels upside-down but really has a lot of advantages. If you put ColorThings in memcache, it's like a query cache which actually updates properly. Is anyone else noticing their code being pushed into this pattern by the HRD? Jeff On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com wrote: Well, indexes are just Bigtable rows, so replication lag does apply to them as well. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote: And then I went and used the word replication... i meant index lag. On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote: I don't think Ikai read your post... Robert and I wanted to write a little HRD status site to track this and get real data, but we haven't done so yet. I have never seen the replication take more than about 1s. I think 1s will cover about four 9's, but that is just an educated guess. Until we (the users) actually measure this over time I don't think we can know for sure. -Mike On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote: I know that an index update in the HRD will typically be visible within a couple seconds. That's the average case. What is the worst-case? Assuming something in the datacenter goes wacky, how long might it take for an index to update? Tens of seconds, minutes, hours, days? Thanks, Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are
Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?
Ikai is correct to think about replication in this case. In a single replica you could have one of three states: Applied - fully visible Committed - has the log entry, but has yet to apply it Missing - the log entry has yet to be replicated Only in the first case is it visible to a global query. When you write something, the log is committed to at least a majority of replicas. The datastore returns success, then immediately tries to apply the write everywhere it committed the log entry. It usually takes a couple hundred ms to apply. This is why the majority of cases take O(100 ms) to become visible. For a very small % of writes, the write either cannot commit to the local replica or cannot be applied after the commit. In these cases the datastore will still return success, but the write won't be visible until a background process picks it up and applies it. In these case it can take O(minutes) to be picked up and replicated/applied. If there is something wrong in the replica you are querying (for example replication is backed up or the bigtabale is unavailable or the background processes in that replica are having issues), then it could take a deal longer (this becomes very very unlikely very quickly, but not impossible). There really is no hard upper bounds because distributed systems will have pieces that fail (and are designed to still function when they do). - Alfred On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.comwrote: Well, indexes are just Bigtable rows, so replication lag does apply to them as well. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote: And then I went and used the word replication... i meant index lag. On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote: I don't think Ikai read your post... Robert and I wanted to write a little HRD status site to track this and get real data, but we haven't done so yet. I have never seen the replication take more than about 1s. I think 1s will cover about four 9's, but that is just an educated guess. Until we (the users) actually measure this over time I don't think we can know for sure. -Mike On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote: I know that an index update in the HRD will typically be visible within a couple seconds. That's the average case. What is the worst-case? Assuming something in the datacenter goes wacky, how long might it take for an index to update? Tens of seconds, minutes, hours, days? Thanks, Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?
The problem with using a key-parent is that it limits to a single index -- say I want to index Things by color and texture. The downside of this multiple-thing index entity is that (like a parent-key) it limits throughput. And since there's a 2pc involved, it probably limits throughput quite a lot... Jeff On Tue, Sep 20, 2011 at 1:03 PM, Alfred Fuller arfuller+appeng...@google.com wrote: An interesting notion. Although you could also just use ColorThings(key_name=color) as the parent entity for all the Things. This way the list of things would be queriable directly (using an ancestor query) and there would not be a limit on the number and size of Things. They also exist next to each other in the underlying big table so there is only one 'seek' to find them (which is the largest cost when looking things up if you don't count serialization). On Tue, Sep 20, 2011 at 12:37 PM, Jeff Schnitzer j...@infohazard.org wrote: I'm doing a lot of work lately with data that requires a large degree of transactional consistency. One pattern I've found that makes some of the pain of HRD eventuality go away is to add an extra entity that uses your query field as a natural key. This really requires global transactions to work (as announced, it's in trusted testing, wheee!) but here's an example: Say you associate a facebook id with an account. In M/S, you'd probably have something like this: class User { @Id Long id; long fbId; ... } ...and then when a request arrives with a facebook id, you would query for the user record. No user record? Create one. With eventual consistency, this creates a larger window (with M/S it was small) where you can get duplicate Users for the same fbId. The solution to transactional integrity and strong consistency is to add a FbId entity: class FbId { @Id String fbId; long userId; } I've now got several of these mapping entities in place now. Using global transactions to create the FbId and the User at the same time, it seems to solve consistency issues entirely. I don't know how it will perform yet under load, but obviously there's not heavy contention in this situation so I would be surprised if the 2pc hurt much. I'm starting to notice several of these FbId-type mapping objects showing up in my code as a way to force queries (for unique items) into strong consistency. I'm guessing you could do this for multi-item queries using a list property instead: Instead of query(Thing.class).filter(color, someColor), you could instead keep updating an entity like this: class ColorThings { @Id String color; ListKeyThing things; } ...which feels upside-down but really has a lot of advantages. If you put ColorThings in memcache, it's like a query cache which actually updates properly. Is anyone else noticing their code being pushed into this pattern by the HRD? Jeff On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com wrote: Well, indexes are just Bigtable rows, so replication lag does apply to them as well. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote: And then I went and used the word replication... i meant index lag. On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote: I don't think Ikai read your post... Robert and I wanted to write a little HRD status site to track this and get real data, but we haven't done so yet. I have never seen the replication take more than about 1s. I think 1s will cover about four 9's, but that is just an educated guess. Until we (the users) actually measure this over time I don't think we can know for sure. -Mike On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote: I know that an index update in the HRD will typically be visible within a couple seconds. That's the average case. What is the worst-case? Assuming something in the datacenter goes wacky, how long might it take for an index to update? Tens of seconds, minutes, hours, days? Thanks, Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?
Thanks... while I didn't follow it exactly, I get the gist of what's going on. Sounds like I should expect five- or six-sigma probabilities of minute+ eventuality in global query indexes. Jeff On Tue, Sep 20, 2011 at 1:28 PM, Alfred Fuller arfuller+appeng...@google.com wrote: Ikai is correct to think about replication in this case. In a single replica you could have one of three states: Applied - fully visible Committed - has the log entry, but has yet to apply it Missing - the log entry has yet to be replicated Only in the first case is it visible to a global query. When you write something, the log is committed to at least a majority of replicas. The datastore returns success, then immediately tries to apply the write everywhere it committed the log entry. It usually takes a couple hundred ms to apply. This is why the majority of cases take O(100 ms) to become visible. For a very small % of writes, the write either cannot commit to the local replica or cannot be applied after the commit. In these cases the datastore will still return success, but the write won't be visible until a background process picks it up and applies it. In these case it can take O(minutes) to be picked up and replicated/applied. If there is something wrong in the replica you are querying (for example replication is backed up or the bigtabale is unavailable or the background processes in that replica are having issues), then it could take a deal longer (this becomes very very unlikely very quickly, but not impossible). There really is no hard upper bounds because distributed systems will have pieces that fail (and are designed to still function when they do). - Alfred On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com wrote: Well, indexes are just Bigtable rows, so replication lag does apply to them as well. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote: And then I went and used the word replication... i meant index lag. On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote: I don't think Ikai read your post... Robert and I wanted to write a little HRD status site to track this and get real data, but we haven't done so yet. I have never seen the replication take more than about 1s. I think 1s will cover about four 9's, but that is just an educated guess. Until we (the users) actually measure this over time I don't think we can know for sure. -Mike On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote: I know that an index update in the HRD will typically be visible within a couple seconds. That's the average case. What is the worst-case? Assuming something in the datacenter goes wacky, how long might it take for an index to update? Tens of seconds, minutes, hours, days? Thanks, Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?
I've been using the same pattern as Jeff mentions for quite some time -- even while I was on M/S. I use it because it reduces my problems to fetch by key scenarios, and I can build multiple specialized indexes in this way. Part of the reason I started doing this was due to exploding indexes type issues; this lets me control the explosion, and possibly even defer the writes in some cases. It also allows you to avoid contention issues when the Things are frequently updated, but the indexed values may not be. Robert On Tue, Sep 20, 2011 at 15:03, Alfred Fuller arfuller+appeng...@google.com wrote: An interesting notion. Although you could also just use ColorThings(key_name=color) as the parent entity for all the Things. This way the list of things would be queriable directly (using an ancestor query) and there would not be a limit on the number and size of Things. They also exist next to each other in the underlying big table so there is only one 'seek' to find them (which is the largest cost when looking things up if you don't count serialization). On Tue, Sep 20, 2011 at 12:37 PM, Jeff Schnitzer j...@infohazard.org wrote: I'm doing a lot of work lately with data that requires a large degree of transactional consistency. One pattern I've found that makes some of the pain of HRD eventuality go away is to add an extra entity that uses your query field as a natural key. This really requires global transactions to work (as announced, it's in trusted testing, wheee!) but here's an example: Say you associate a facebook id with an account. In M/S, you'd probably have something like this: class User { @Id Long id; long fbId; ... } ...and then when a request arrives with a facebook id, you would query for the user record. No user record? Create one. With eventual consistency, this creates a larger window (with M/S it was small) where you can get duplicate Users for the same fbId. The solution to transactional integrity and strong consistency is to add a FbId entity: class FbId { @Id String fbId; long userId; } I've now got several of these mapping entities in place now. Using global transactions to create the FbId and the User at the same time, it seems to solve consistency issues entirely. I don't know how it will perform yet under load, but obviously there's not heavy contention in this situation so I would be surprised if the 2pc hurt much. I'm starting to notice several of these FbId-type mapping objects showing up in my code as a way to force queries (for unique items) into strong consistency. I'm guessing you could do this for multi-item queries using a list property instead: Instead of query(Thing.class).filter(color, someColor), you could instead keep updating an entity like this: class ColorThings { @Id String color; ListKeyThing things; } ...which feels upside-down but really has a lot of advantages. If you put ColorThings in memcache, it's like a query cache which actually updates properly. Is anyone else noticing their code being pushed into this pattern by the HRD? Jeff On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com wrote: Well, indexes are just Bigtable rows, so replication lag does apply to them as well. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote: And then I went and used the word replication... i meant index lag. On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote: I don't think Ikai read your post... Robert and I wanted to write a little HRD status site to track this and get real data, but we haven't done so yet. I have never seen the replication take more than about 1s. I think 1s will cover about four 9's, but that is just an educated guess. Until we (the users) actually measure this over time I don't think we can know for sure. -Mike On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote: I know that an index update in the HRD will typically be visible within a couple seconds. That's the average case. What is the worst-case? Assuming something in the datacenter goes wacky, how long might it take for an index to update? Tens of seconds, minutes, hours, days? Thanks, Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to
Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?
I get that indexes are just bigtable rows too, and that the normal replication rules we all know and love apply, so I guess this boils down to indexes being written separately from the entity. Does the index write apply to the same nodes, or possibly to different nodes? Alfred, your next project idea: write some type of low-level high-performance batcher providing crazy high write-rates to a single entity group. Perhaps with that you (or we) could come up with a higher performance way to maintain global indexes. ;) Robert Oh, for those wondering what this thread is about... we're just making up words / phrases. On Tue, Sep 20, 2011 at 15:28, Alfred Fuller arfuller+appeng...@google.com wrote: Ikai is correct to think about replication in this case. In a single replica you could have one of three states: Applied - fully visible Committed - has the log entry, but has yet to apply it Missing - the log entry has yet to be replicated Only in the first case is it visible to a global query. When you write something, the log is committed to at least a majority of replicas. The datastore returns success, then immediately tries to apply the write everywhere it committed the log entry. It usually takes a couple hundred ms to apply. This is why the majority of cases take O(100 ms) to become visible. For a very small % of writes, the write either cannot commit to the local replica or cannot be applied after the commit. In these cases the datastore will still return success, but the write won't be visible until a background process picks it up and applies it. In these case it can take O(minutes) to be picked up and replicated/applied. If there is something wrong in the replica you are querying (for example replication is backed up or the bigtabale is unavailable or the background processes in that replica are having issues), then it could take a deal longer (this becomes very very unlikely very quickly, but not impossible). There really is no hard upper bounds because distributed systems will have pieces that fail (and are designed to still function when they do). - Alfred On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com wrote: Well, indexes are just Bigtable rows, so replication lag does apply to them as well. -- Ikai Lan Developer Programs Engineer, Google App Engine plus.ikailan.com | twitter.com/ikai On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote: And then I went and used the word replication... i meant index lag. On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote: I don't think Ikai read your post... Robert and I wanted to write a little HRD status site to track this and get real data, but we haven't done so yet. I have never seen the replication take more than about 1s. I think 1s will cover about four 9's, but that is just an educated guess. Until we (the users) actually measure this over time I don't think we can know for sure. -Mike On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote: I know that an index update in the HRD will typically be visible within a couple seconds. That's the average case. What is the worst-case? Assuming something in the datacenter goes wacky, how long might it take for an index to update? Tens of seconds, minutes, hours, days? Thanks, Jeff -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en. -- You received this message because you are subscribed to the Google Groups Google App Engine group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.