Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?

2011-09-21 Thread Ronoaldo José de Lana Pereira
I'm planing the migration of our  app to HRD. It is a collective buying 
site, and found lots of places where I need to change my models/queries. In 
fact, some cases where we need consistency is this scenario:

class Product {
  @Id Long productId;
}
class Order {
   @Id Long orderId;
   ListLong productId;
}
class Voucher {
   @Id Long voucherId;
   Long orderId;
   Long productId;
}

Vouchers must be created before orders, so they are currently root entities. 
When an order is approved, I have a specialized queue with 
max_concurrent_request = 1 that picks the next available voucher (with has 
orderId = null) and associates it with an order. To check if the order is 
filled with all it's vouchers, I count how many Vouchers are linked with 
that orderId, and if there is Vouchers missing, I schedule another queue to 
consume a Voucher again.

On HRD, this don't work because my the query to get the next Voucher and the 
query to check how much vouchers I have for an Order is more likelly to 
don't be consistent. What I'm planning to perform is to group Vouchers that 
are from the same product (~ 30k vouchers per product) and then perform 
the ancestor query, as suggested by the docs. In this case, I'll end up 
with:

class Voucher {
   @Parent KeyProcut productId;
   @Id Long voucherId;
   Long orderId;
}

... and will be able to query for how many Vouchers are linked to an order 
(one query for each of items in Order.productId). Is this a good pattern for 
this particular scenario? The writes/second is not a problem for us: i.e. if 
the order stays for a few minutes until the Vouchers are all filled, it is 
ok.

Another issue: I have to perform some financial accounting registry, and 
currently I have this entity:

class AccountingRegistry {
   @Parent KeyAccountRegistry parentRegistry;
   @Id Long id;
   Date date;
   Long ammount;
   ListString filters;
}

To represent an accounting transaction, I'm grouping in the same entity 
group all registry that are related, and that summed up equals 0. To avoid 
performing the same transaction twice (i.e., register twice the same order 
approval), I'm using the filters list property to query for another 
registry that has the same filters (i.e. the order id, the APROVED 
keyword, the domain, etc.). They are also usefull to have some specialized 
reports, like all sales that came from this domain (domain is one value for 
the list property). On M/S, as Jeff said, the time window is small, and the 
chance to have a problem is small, but on HRD the window may take several 
minutes, and in this case I may have a very inconsistent sales report at the 
end of the day.

Does you guys think that I can use the same pattern Jeff suggested to solve 
this problem? Any advice?

Thanks in advance

- Ronoaldo

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/QC0MbTJiIwUJ.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?

2011-09-20 Thread Mike Wesner
I don't think Ikai read your post...

Robert and I wanted to write a little HRD status site to track this
and get real data, but we haven't done so yet.  I have never seen the
replication take more than about 1s.  I think 1s will cover about four
9's, but that is just an educated guess.  Until we (the users)
actually measure this over time I don't think we can know for sure.

-Mike

On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote:
 I know that an index update in the HRD will typically be visible
 within a couple seconds.  That's the average case.  What is the
 worst-case?

 Assuming something in the datacenter goes wacky, how long might it
 take for an index to update?  Tens of seconds, minutes, hours, days?

 Thanks,
 Jeff

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



[google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?

2011-09-20 Thread Mike Wesner
And then I went and used the word replication... i meant index lag.

On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote:
 I don't think Ikai read your post...

 Robert and I wanted to write a little HRD status site to track this
 and get real data, but we haven't done so yet.  I have never seen the
 replication take more than about 1s.  I think 1s will cover about four
 9's, but that is just an educated guess.  Until we (the users)
 actually measure this over time I don't think we can know for sure.

 -Mike

 On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote:







  I know that an index update in the HRD will typically be visible
  within a couple seconds.  That's the average case.  What is the
  worst-case?

  Assuming something in the datacenter goes wacky, how long might it
  take for an index to update?  Tens of seconds, minutes, hours, days?

  Thanks,
  Jeff

-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?

2011-09-20 Thread Ikai Lan (Google)
Well, indexes are just Bigtable rows, so replication lag does apply to them
as well.

--
Ikai Lan
Developer Programs Engineer, Google App Engine
plus.ikailan.com | twitter.com/ikai



On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote:

 And then I went and used the word replication... i meant index lag.

 On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote:
  I don't think Ikai read your post...
 
  Robert and I wanted to write a little HRD status site to track this
  and get real data, but we haven't done so yet.  I have never seen the
  replication take more than about 1s.  I think 1s will cover about four
  9's, but that is just an educated guess.  Until we (the users)
  actually measure this over time I don't think we can know for sure.
 
  -Mike
 
  On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote:
 
 
 
 
 
 
 
   I know that an index update in the HRD will typically be visible
   within a couple seconds.  That's the average case.  What is the
   worst-case?
 
   Assuming something in the datacenter goes wacky, how long might it
   take for an index to update?  Tens of seconds, minutes, hours, days?
 
   Thanks,
   Jeff

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?

2011-09-20 Thread Jeff Schnitzer
I'm doing a lot of work lately with data that requires a large degree
of transactional consistency.  One pattern I've found that makes some
of the pain of HRD eventuality go away is to add an extra entity that
uses your query field as a natural key.  This really requires global
transactions to work (as announced, it's in trusted testing, wheee!)
but here's an example:

Say you associate a facebook id with an account.  In M/S, you'd
probably have something like this:

class User {
@Id Long id;
long fbId;
...
}

...and then when a request arrives with a facebook id, you would query
for the user record.  No user record?  Create one.  With eventual
consistency, this creates a larger window (with M/S it was small)
where you can get duplicate Users for the same fbId.

The solution to transactional integrity and strong consistency is to
add a FbId entity:

class FbId {
@Id String fbId;
long userId;
}

I've now got several of these mapping entities in place now.  Using
global transactions to create the FbId and the User at the same time,
it seems to solve consistency issues entirely.  I don't know how it
will perform yet under load, but obviously there's not heavy
contention in this situation so I would be surprised if the 2pc hurt
much.

I'm starting to notice several of these FbId-type mapping objects
showing up in my code as a way to force queries (for unique items)
into strong consistency.  I'm guessing you could do this for
multi-item queries using a list property instead:

Instead of query(Thing.class).filter(color, someColor), you could
instead keep updating an entity like this:

class ColorThings {
   @Id String color;
   ListKeyThing things;
}

...which feels upside-down but really has a lot of advantages.  If you
put ColorThings in memcache, it's like a query cache which actually
updates properly.

Is anyone else noticing their code being pushed into this pattern by the HRD?

Jeff

On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com wrote:
 Well, indexes are just Bigtable rows, so replication lag does apply to them
 as well.
 --
 Ikai Lan
 Developer Programs Engineer, Google App Engine
 plus.ikailan.com | twitter.com/ikai


 On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote:

 And then I went and used the word replication... i meant index lag.

 On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote:
  I don't think Ikai read your post...
 
  Robert and I wanted to write a little HRD status site to track this
  and get real data, but we haven't done so yet.  I have never seen the
  replication take more than about 1s.  I think 1s will cover about four
  9's, but that is just an educated guess.  Until we (the users)
  actually measure this over time I don't think we can know for sure.
 
  -Mike
 
  On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote:
 
 
 
 
 
 
 
   I know that an index update in the HRD will typically be visible
   within a couple seconds.  That's the average case.  What is the
   worst-case?
 
   Assuming something in the datacenter goes wacky, how long might it
   take for an index to update?  Tens of seconds, minutes, hours, days?
 
   Thanks,
   Jeff

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.


 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?

2011-09-20 Thread Alfred Fuller
An interesting notion. Although you could also just
use ColorThings(key_name=color) as the parent entity for all the Things.
This way the list of things would be queriable directly (using
an ancestor query) and there would not be a limit on the number and size of
Things. They also exist next to each other in the underlying big table so
there is only one 'seek' to find them (which is the largest cost when
looking things up if you don't count serialization).

On Tue, Sep 20, 2011 at 12:37 PM, Jeff Schnitzer j...@infohazard.orgwrote:

 I'm doing a lot of work lately with data that requires a large degree
 of transactional consistency.  One pattern I've found that makes some
 of the pain of HRD eventuality go away is to add an extra entity that
 uses your query field as a natural key.  This really requires global
 transactions to work (as announced, it's in trusted testing, wheee!)
 but here's an example:

 Say you associate a facebook id with an account.  In M/S, you'd
 probably have something like this:

 class User {
@Id Long id;
long fbId;
...
 }

 ...and then when a request arrives with a facebook id, you would query
 for the user record.  No user record?  Create one.  With eventual
 consistency, this creates a larger window (with M/S it was small)
 where you can get duplicate Users for the same fbId.

 The solution to transactional integrity and strong consistency is to
 add a FbId entity:

 class FbId {
@Id String fbId;
long userId;
 }

 I've now got several of these mapping entities in place now.  Using
 global transactions to create the FbId and the User at the same time,
 it seems to solve consistency issues entirely.  I don't know how it
 will perform yet under load, but obviously there's not heavy
 contention in this situation so I would be surprised if the 2pc hurt
 much.

 I'm starting to notice several of these FbId-type mapping objects
 showing up in my code as a way to force queries (for unique items)
 into strong consistency.  I'm guessing you could do this for
 multi-item queries using a list property instead:

 Instead of query(Thing.class).filter(color, someColor), you could
 instead keep updating an entity like this:

 class ColorThings {
   @Id String color;
   ListKeyThing things;
 }

 ...which feels upside-down but really has a lot of advantages.  If you
 put ColorThings in memcache, it's like a query cache which actually
 updates properly.

 Is anyone else noticing their code being pushed into this pattern by the
 HRD?

 Jeff

 On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com
 wrote:
  Well, indexes are just Bigtable rows, so replication lag does apply to
 them
  as well.
  --
  Ikai Lan
  Developer Programs Engineer, Google App Engine
  plus.ikailan.com | twitter.com/ikai
 
 
  On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote:
 
  And then I went and used the word replication... i meant index lag.
 
  On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote:
   I don't think Ikai read your post...
  
   Robert and I wanted to write a little HRD status site to track this
   and get real data, but we haven't done so yet.  I have never seen the
   replication take more than about 1s.  I think 1s will cover about four
   9's, but that is just an educated guess.  Until we (the users)
   actually measure this over time I don't think we can know for sure.
  
   -Mike
  
   On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote:
  
  
  
  
  
  
  
I know that an index update in the HRD will typically be visible
within a couple seconds.  That's the average case.  What is the
worst-case?
  
Assuming something in the datacenter goes wacky, how long might it
take for an index to update?  Tens of seconds, minutes, hours, days?
  
Thanks,
Jeff
 
  --
  You received this message because you are subscribed to the Google
 Groups
  Google App Engine group.
  To post to this group, send email to google-appengine@googlegroups.com.
  To unsubscribe from this group, send email to
  google-appengine+unsubscr...@googlegroups.com.
  For more options, visit this group at
  http://groups.google.com/group/google-appengine?hl=en.
 
 
  --
  You received this message because you are subscribed to the Google Groups
  Google App Engine group.
  To post to this group, send email to google-appengine@googlegroups.com.
  To unsubscribe from this group, send email to
  google-appengine+unsubscr...@googlegroups.com.
  For more options, visit this group at
  http://groups.google.com/group/google-appengine?hl=en.
 

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.



-- 
You received this message because you are 

Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?

2011-09-20 Thread Alfred Fuller
Ikai is correct to think about replication in this case. In a single replica
you could have one of three states:

Applied - fully visible
Committed - has the log entry, but has yet to apply it
Missing - the log entry has yet to be replicated

Only in the first case is it visible to a global query. When you write
something, the log is committed to at least a majority of replicas. The
datastore returns success, then immediately tries to apply the write
everywhere it committed the log entry. It usually takes a couple hundred ms
to apply. This is why the majority of cases take O(100 ms) to become
visible. For a very small % of writes, the write either cannot commit to the
local replica or cannot be applied after the commit. In these cases the
datastore will still return success, but the write won't be visible until a
background process picks it up and applies it. In these case it can take
O(minutes) to be picked up and replicated/applied. If there is something
wrong in the replica you are querying (for example replication is backed up
or the bigtabale is unavailable or the background processes in that replica
are having issues), then it could take a deal longer (this becomes very very
unlikely very quickly, but not impossible). There really is no hard upper
bounds because distributed systems will have pieces that fail (and are
designed to still function when they do).

 - Alfred

On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.comwrote:

 Well, indexes are just Bigtable rows, so replication lag does apply to them
 as well.

 --
 Ikai Lan
 Developer Programs Engineer, Google App Engine
 plus.ikailan.com | twitter.com/ikai



 On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote:

 And then I went and used the word replication... i meant index lag.

 On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote:
  I don't think Ikai read your post...
 
  Robert and I wanted to write a little HRD status site to track this
  and get real data, but we haven't done so yet.  I have never seen the
  replication take more than about 1s.  I think 1s will cover about four
  9's, but that is just an educated guess.  Until we (the users)
  actually measure this over time I don't think we can know for sure.
 
  -Mike
 
  On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote:
 
 
 
 
 
 
 
   I know that an index update in the HRD will typically be visible
   within a couple seconds.  That's the average case.  What is the
   worst-case?
 
   Assuming something in the datacenter goes wacky, how long might it
   take for an index to update?  Tens of seconds, minutes, hours, days?
 
   Thanks,
   Jeff

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.


  --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?

2011-09-20 Thread Jeff Schnitzer
The problem with using a key-parent is that it limits to a single
index -- say I want to index Things by color and texture.

The downside of this multiple-thing index entity is that (like a
parent-key) it limits throughput.  And since there's a 2pc involved,
it probably limits throughput quite a lot...

Jeff

On Tue, Sep 20, 2011 at 1:03 PM, Alfred Fuller
arfuller+appeng...@google.com wrote:
 An interesting notion. Although you could also just
 use ColorThings(key_name=color) as the parent entity for all the Things.
 This way the list of things would be queriable directly (using
 an ancestor query) and there would not be a limit on the number and size of
 Things. They also exist next to each other in the underlying big table so
 there is only one 'seek' to find them (which is the largest cost when
 looking things up if you don't count serialization).

 On Tue, Sep 20, 2011 at 12:37 PM, Jeff Schnitzer j...@infohazard.org
 wrote:

 I'm doing a lot of work lately with data that requires a large degree
 of transactional consistency.  One pattern I've found that makes some
 of the pain of HRD eventuality go away is to add an extra entity that
 uses your query field as a natural key.  This really requires global
 transactions to work (as announced, it's in trusted testing, wheee!)
 but here's an example:

 Say you associate a facebook id with an account.  In M/S, you'd
 probably have something like this:

 class User {
    @Id Long id;
    long fbId;
    ...
 }

 ...and then when a request arrives with a facebook id, you would query
 for the user record.  No user record?  Create one.  With eventual
 consistency, this creates a larger window (with M/S it was small)
 where you can get duplicate Users for the same fbId.

 The solution to transactional integrity and strong consistency is to
 add a FbId entity:

 class FbId {
    @Id String fbId;
    long userId;
 }

 I've now got several of these mapping entities in place now.  Using
 global transactions to create the FbId and the User at the same time,
 it seems to solve consistency issues entirely.  I don't know how it
 will perform yet under load, but obviously there's not heavy
 contention in this situation so I would be surprised if the 2pc hurt
 much.

 I'm starting to notice several of these FbId-type mapping objects
 showing up in my code as a way to force queries (for unique items)
 into strong consistency.  I'm guessing you could do this for
 multi-item queries using a list property instead:

 Instead of query(Thing.class).filter(color, someColor), you could
 instead keep updating an entity like this:

 class ColorThings {
   @Id String color;
   ListKeyThing things;
 }

 ...which feels upside-down but really has a lot of advantages.  If you
 put ColorThings in memcache, it's like a query cache which actually
 updates properly.

 Is anyone else noticing their code being pushed into this pattern by the
 HRD?

 Jeff

 On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com
 wrote:
  Well, indexes are just Bigtable rows, so replication lag does apply to
  them
  as well.
  --
  Ikai Lan
  Developer Programs Engineer, Google App Engine
  plus.ikailan.com | twitter.com/ikai
 
 
  On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote:
 
  And then I went and used the word replication... i meant index lag.
 
  On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote:
   I don't think Ikai read your post...
  
   Robert and I wanted to write a little HRD status site to track this
   and get real data, but we haven't done so yet.  I have never seen the
   replication take more than about 1s.  I think 1s will cover about
   four
   9's, but that is just an educated guess.  Until we (the users)
   actually measure this over time I don't think we can know for sure.
  
   -Mike
  
   On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote:
  
  
  
  
  
  
  
I know that an index update in the HRD will typically be visible
within a couple seconds.  That's the average case.  What is the
worst-case?
  
Assuming something in the datacenter goes wacky, how long might it
take for an index to update?  Tens of seconds, minutes, hours,
days?
  
Thanks,
Jeff
 
  --
  You received this message because you are subscribed to the Google
  Groups
  Google App Engine group.
  To post to this group, send email to google-appengine@googlegroups.com.
  To unsubscribe from this group, send email to
  google-appengine+unsubscr...@googlegroups.com.
  For more options, visit this group at
  http://groups.google.com/group/google-appengine?hl=en.
 
 
  --
  You received this message because you are subscribed to the Google
  Groups
  Google App Engine group.
  To post to this group, send email to google-appengine@googlegroups.com.
  To unsubscribe from this group, send email to
  google-appengine+unsubscr...@googlegroups.com.
  For more options, visit this group at
  http://groups.google.com/group/google-appengine?hl=en.
 

 

Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?

2011-09-20 Thread Jeff Schnitzer
Thanks... while I didn't follow it exactly, I get the gist of what's
going on.  Sounds like I should expect five- or six-sigma
probabilities of minute+ eventuality in global query indexes.

Jeff

On Tue, Sep 20, 2011 at 1:28 PM, Alfred Fuller
arfuller+appeng...@google.com wrote:
 Ikai is correct to think about replication in this case. In a single replica
 you could have one of three states:
 Applied - fully visible
 Committed - has the log entry, but has yet to apply it
 Missing - the log entry has yet to be replicated
 Only in the first case is it visible to a global query. When you write
 something, the log is committed to at least a majority of replicas. The
 datastore returns success, then immediately tries to apply the write
 everywhere it committed the log entry. It usually takes a couple hundred ms
 to apply. This is why the majority of cases take O(100 ms) to become
 visible. For a very small % of writes, the write either cannot commit to the
 local replica or cannot be applied after the commit. In these cases the
 datastore will still return success, but the write won't be visible until a
 background process picks it up and applies it. In these case it can take
 O(minutes) to be picked up and replicated/applied. If there is something
 wrong in the replica you are querying (for example replication is backed up
 or the bigtabale is unavailable or the background processes in that replica
 are having issues), then it could take a deal longer (this becomes very very
 unlikely very quickly, but not impossible). There really is no hard upper
 bounds because distributed systems will have pieces that fail (and are
 designed to still function when they do).
  - Alfred
 On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com
 wrote:

 Well, indexes are just Bigtable rows, so replication lag does apply to
 them as well.
 --
 Ikai Lan
 Developer Programs Engineer, Google App Engine
 plus.ikailan.com | twitter.com/ikai


 On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote:

 And then I went and used the word replication... i meant index lag.

 On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote:
  I don't think Ikai read your post...
 
  Robert and I wanted to write a little HRD status site to track this
  and get real data, but we haven't done so yet.  I have never seen the
  replication take more than about 1s.  I think 1s will cover about four
  9's, but that is just an educated guess.  Until we (the users)
  actually measure this over time I don't think we can know for sure.
 
  -Mike
 
  On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote:
 
 
 
 
 
 
 
   I know that an index update in the HRD will typically be visible
   within a couple seconds.  That's the average case.  What is the
   worst-case?
 
   Assuming something in the datacenter goes wacky, how long might it
   take for an index to update?  Tens of seconds, minutes, hours, days?
 
   Thanks,
   Jeff

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.


 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.



Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?

2011-09-20 Thread Robert Kluin
I've been using the same pattern as Jeff mentions for quite some time
-- even while I was on M/S.  I use it because it reduces my problems
to fetch by key scenarios, and I can build multiple specialized
indexes in this way.  Part of the reason I started doing this was
due to exploding indexes type issues; this lets me control the
explosion, and possibly even defer the writes in some cases.

It also allows you to avoid contention issues when the Things are
frequently updated, but the indexed values may not be.


Robert



On Tue, Sep 20, 2011 at 15:03, Alfred Fuller
arfuller+appeng...@google.com wrote:
 An interesting notion. Although you could also just
 use ColorThings(key_name=color) as the parent entity for all the Things.
 This way the list of things would be queriable directly (using
 an ancestor query) and there would not be a limit on the number and size of
 Things. They also exist next to each other in the underlying big table so
 there is only one 'seek' to find them (which is the largest cost when
 looking things up if you don't count serialization).

 On Tue, Sep 20, 2011 at 12:37 PM, Jeff Schnitzer j...@infohazard.org
 wrote:

 I'm doing a lot of work lately with data that requires a large degree
 of transactional consistency.  One pattern I've found that makes some
 of the pain of HRD eventuality go away is to add an extra entity that
 uses your query field as a natural key.  This really requires global
 transactions to work (as announced, it's in trusted testing, wheee!)
 but here's an example:

 Say you associate a facebook id with an account.  In M/S, you'd
 probably have something like this:

 class User {
    @Id Long id;
    long fbId;
    ...
 }

 ...and then when a request arrives with a facebook id, you would query
 for the user record.  No user record?  Create one.  With eventual
 consistency, this creates a larger window (with M/S it was small)
 where you can get duplicate Users for the same fbId.

 The solution to transactional integrity and strong consistency is to
 add a FbId entity:

 class FbId {
    @Id String fbId;
    long userId;
 }

 I've now got several of these mapping entities in place now.  Using
 global transactions to create the FbId and the User at the same time,
 it seems to solve consistency issues entirely.  I don't know how it
 will perform yet under load, but obviously there's not heavy
 contention in this situation so I would be surprised if the 2pc hurt
 much.

 I'm starting to notice several of these FbId-type mapping objects
 showing up in my code as a way to force queries (for unique items)
 into strong consistency.  I'm guessing you could do this for
 multi-item queries using a list property instead:

 Instead of query(Thing.class).filter(color, someColor), you could
 instead keep updating an entity like this:

 class ColorThings {
   @Id String color;
   ListKeyThing things;
 }

 ...which feels upside-down but really has a lot of advantages.  If you
 put ColorThings in memcache, it's like a query cache which actually
 updates properly.

 Is anyone else noticing their code being pushed into this pattern by the
 HRD?

 Jeff

 On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com
 wrote:
  Well, indexes are just Bigtable rows, so replication lag does apply to
  them
  as well.
  --
  Ikai Lan
  Developer Programs Engineer, Google App Engine
  plus.ikailan.com | twitter.com/ikai
 
 
  On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote:
 
  And then I went and used the word replication... i meant index lag.
 
  On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote:
   I don't think Ikai read your post...
  
   Robert and I wanted to write a little HRD status site to track this
   and get real data, but we haven't done so yet.  I have never seen the
   replication take more than about 1s.  I think 1s will cover about
   four
   9's, but that is just an educated guess.  Until we (the users)
   actually measure this over time I don't think we can know for sure.
  
   -Mike
  
   On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote:
  
  
  
  
  
  
  
I know that an index update in the HRD will typically be visible
within a couple seconds.  That's the average case.  What is the
worst-case?
  
Assuming something in the datacenter goes wacky, how long might it
take for an index to update?  Tens of seconds, minutes, hours,
days?
  
Thanks,
Jeff
 
  --
  You received this message because you are subscribed to the Google
  Groups
  Google App Engine group.
  To post to this group, send email to google-appengine@googlegroups.com.
  To unsubscribe from this group, send email to
  google-appengine+unsubscr...@googlegroups.com.
  For more options, visit this group at
  http://groups.google.com/group/google-appengine?hl=en.
 
 
  --
  You received this message because you are subscribed to the Google
  Groups
  Google App Engine group.
  To post to this group, send email to 

Re: [google-appengine] Re: Worst-case scenario for eventual consistency in the HRD?

2011-09-20 Thread Robert Kluin
I get that indexes are just bigtable rows too, and that the normal
replication rules we all know and love apply, so I guess this boils
down to indexes being written separately from the entity.  Does the
index write apply to the same nodes, or possibly to different nodes?

Alfred, your next project idea: write some type of low-level
high-performance batcher providing crazy high write-rates to a single
entity group.  Perhaps with that you (or we) could come up with a
higher performance way to maintain global indexes.  ;)


Robert

Oh, for those wondering what this thread is about... we're just making
up words / phrases.





On Tue, Sep 20, 2011 at 15:28, Alfred Fuller
arfuller+appeng...@google.com wrote:
 Ikai is correct to think about replication in this case. In a single replica
 you could have one of three states:
 Applied - fully visible
 Committed - has the log entry, but has yet to apply it
 Missing - the log entry has yet to be replicated
 Only in the first case is it visible to a global query. When you write
 something, the log is committed to at least a majority of replicas. The
 datastore returns success, then immediately tries to apply the write
 everywhere it committed the log entry. It usually takes a couple hundred ms
 to apply. This is why the majority of cases take O(100 ms) to become
 visible. For a very small % of writes, the write either cannot commit to the
 local replica or cannot be applied after the commit. In these cases the
 datastore will still return success, but the write won't be visible until a
 background process picks it up and applies it. In these case it can take
 O(minutes) to be picked up and replicated/applied. If there is something
 wrong in the replica you are querying (for example replication is backed up
 or the bigtabale is unavailable or the background processes in that replica
 are having issues), then it could take a deal longer (this becomes very very
 unlikely very quickly, but not impossible). There really is no hard upper
 bounds because distributed systems will have pieces that fail (and are
 designed to still function when they do).
  - Alfred
 On Tue, Sep 20, 2011 at 10:10 AM, Ikai Lan (Google) ika...@google.com
 wrote:

 Well, indexes are just Bigtable rows, so replication lag does apply to
 them as well.
 --
 Ikai Lan
 Developer Programs Engineer, Google App Engine
 plus.ikailan.com | twitter.com/ikai


 On Tue, Sep 20, 2011 at 7:42 AM, Mike Wesner mbwes...@gmail.com wrote:

 And then I went and used the word replication... i meant index lag.

 On Sep 20, 9:40 am, Mike Wesner mbwes...@gmail.com wrote:
  I don't think Ikai read your post...
 
  Robert and I wanted to write a little HRD status site to track this
  and get real data, but we haven't done so yet.  I have never seen the
  replication take more than about 1s.  I think 1s will cover about four
  9's, but that is just an educated guess.  Until we (the users)
  actually measure this over time I don't think we can know for sure.
 
  -Mike
 
  On Sep 19, 7:16 pm, Jeff Schnitzer j...@infohazard.org wrote:
 
 
 
 
 
 
 
   I know that an index update in the HRD will typically be visible
   within a couple seconds.  That's the average case.  What is the
   worst-case?
 
   Assuming something in the datacenter goes wacky, how long might it
   take for an index to update?  Tens of seconds, minutes, hours, days?
 
   Thanks,
   Jeff

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.


 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.

 --
 You received this message because you are subscribed to the Google Groups
 Google App Engine group.
 To post to this group, send email to google-appengine@googlegroups.com.
 To unsubscribe from this group, send email to
 google-appengine+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/google-appengine?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Google App Engine group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.