This past Wednesday, the App Engine team hosted the latest session of its bimonthly IRC office hours. A transcript of the session and a summary of the topics covered is provided below. The next session will take place on Wednesday, November 4th from 7:00-8:00 p.m. PST in the #appengine channel on irc.freenode.net.
Note that this will be the first Chat Time to occur after daylight time in the U.S. ends, which means that it will be taking place one hour earlier than usual in countries or states that don't observe daylight savings time. Please be aware of this time difference so you don't inadvertently miss the session. - Jason --SUMMARY----------------------------------------------------------- - Q: Why am I seeing > 0.1% rate of datastore operations time out and is anything being done to reduce this? A: A certain level of datastore timeouts is expected (generally between 0.1% and 0.2% of all datastore operations) but, we are actively working on ways to improve datastore reliability. If you are seeing a much higher rate, be sure to inspect your data model for write contention which often manifests as datastore timeouts. [9:02-9:07] - Q: What is the recommended approach to datastore capacity planning ahead of a large bulk upload? A: Entities are stored as protocol buffers (http://code.google.com/p/protobuf/) -- if you familiarize yourself with the protobuf specification, you can determine the space needed to store each entity, minus the datastore overhead, fairly easily. An article is coming out soon which explains how entities and indexes are stored in much more detail. [9:04-9:05] - Q: Can a high level of read operations result in datastore contention? A: Datastore contention is usually the result of too many attempted concurrent writes to the same datastore entity or entity group. Before implementing your data model, consider the expected read/ write access patterns and design your data model accordingly, sharding entities that you expect to update more than once per second (http:// code.google.com/appengine/articles/scaling/contention.html). While concurrent writes generally result in contention, concurrent reads generally result in better performance due to caching. [9:08-9:09, 9:11-9:13, 9:18] - Q: Are there any plans to support more file extensions for attachments to outgoing email, e.g. .doc, .docx, etc.? A: There are no immediate plans to support these extensions due to the prevalence of viruses contained in files of these types. In the meantime, you can include links to the files or share them via Google Docs. [9:14, 9:16, 9:19-9:20] - Q: What is the recommended approach to paging large data sets in App Engine? A: The offset approach is *not* recommended because it won't work for result sets larger than 1,000. Until datastore cursors are available, the recommended approaches are summarized in http://code.google.com/appengine/articles/paging.html. [9:21-9:23] - Q: How can one avoid exploding indexes when using list properties? A: In general, you should avoid referencing more than one list property in any query, especially if one or both list properties contain a large number of elements. Index rows have to be added for every permutation of values in the lists, which can result in index "explosion". See the video at http://sites.google.com/site/io/under-the-covers-of-the-google-app-engine-datastore to learn more about why exploding indexes occur. [9:22, 9:26, 9:28-9:30, 9:32-9:33, 9:40] - Q: In Java, can one use sequence methods in JPA to get a sequence of datastore IDs? A: No, you have to use the low-level datastore API's allocateIds() method for now. [9:31, 9:33] - If you're looking to use Google Web Toolkit (GWT) and App Engine together, there are a number of combo samples available in http://code.google.com/p/googleappengine/source/browse/#svn/trunk/java/demos including gwtguestbook, sticky, and taskengine. [9:46, 9:48, 9:50-9:51] - Q: What is being done to address long initialization times for Java applications? A: We are definitely aware of the issue and are rolling out several back-end enhancements over the next few releases to try to minimize this startup time as much as possible. [9:52-9:53] --FULL TRANSCRIPT--------------------------------------------------- [09:01am] scudder_google: Hi all, welcome to another instlallment of our hour long chat time with people on the App Engine team [09:01am] johnvdenley: Is there any kind of formality to this session? or is it just a free for all? [09:01am] moraes: take what you can! [09:02am] moraes: meh. [09:02am] Jason_Google_: johnvdenley: It's basically a free-for-all. [09:02am] scudder_google: so far from Google we have nickjohnson, Jason_Google and a few others may join as we go [09:02am] scudder_google: yes, jump right in questions and comments welcome [09:02am] mbw: Is anything being done to reduce timeouts? I am seeing a lot more than .01% timeouts. We even use a low level catch and retry trick to try and reduce its effect. We saw a huge spike of them yesterday at one point. [09:02am] johnvdenley: OK, brb then, just need to move my car!... [09:03am] scudder_google: mbw are these timeouts with datastore operations? [09:03am] mbw: yes [09:03am] nickjohnson: mbw: We're actively working on datastore timeouts. Bear in mind that they frequently highly correlated: When you see them at all, they come in batches. [09:04am] brett_ae: heyo [09:04am] dw: re: idle question from last week, is there any good advice going on capacity planning for datastore? i note that even very small entities have a metadata overhead of 100+ bytes, and was wondering how that metadata number is calculated (is it constant, dependent on indexed fields, field count, etc.) [09:04am] scudder_google: ah ok, there are a few things that you can do but a small percentage of timeouts is currecntly expected [09:04am] mbw: we see a steady amount of timeouts during the day. [09:04am] mbw: i'd be happy with .01% ... [09:05am] Jason_Google_: dw: I have an article coming out really soon that explains all this. I'll try to get it published in the next week, if you can hold out. [09:05am] nickjohnson: dw: Entities are stored as Protocol Buffers; the overhead in the datastore stats is simply the total size of the entity's PB less the space used for each field. [09:05am] dw: Jason_Google: that's great. more a curiosity than anything right now [09:05am] scudder_google: I'm assuming these are timeouts on writes, about how many indexes need to be updated with a write [09:05am] nickjohnson: The simplest way to reduce overhead is to use shorter field names. [09:05am] dw: nice [09:06am] mbw: timeouts happen on reads for us as much as writes. They don't seem to happen any more on big operations vs. small simple queries or gets [09:06am] nickjohnson: You can specify the field name to use internally to the Property subclass constructor, by the way, so you don't need to compromise the design of your model. [09:06am] dw: nickjohnson: +10 points for preempting the evil thoughts i was having [09:07am] nickjohnson: mbw: Do you typically tend to write a lot to the same entity groups? [09:07am] scudder_google: mbw: ah ok, I'd like to look into this more sepecifically for your app, what is the app ID? [09:07am] mbw: scudder_google: ill PM it [09:07am] moraes: i was thinking alongthe lines of 'store everything in a big pickle property named "a".' [09:08am] _tmatsuo: Talking of timeouts, if there's too many accesses to a particlar node in a short period of time, could it be a reason for datastore timeouts? [09:08am] nickjohnson: moraes: Pickle is, amongst other things, bulky. [09:09am] nickjohnson: _tmatsuo: For a given value of 'too many', yes [09:09am] brett_ae: _tmatsuo: for writes, possibly; for reads, no; it should actually get faster [09:09am] brett_ae: because of caching [09:09am] johnvdenley: whats the status of a local datastore viewer? [09:09am] moraes: johnvdenley: there's one. [09:10am] dw: johnvdenley: /_ah/admin/datastore url when running dev_appserver [09:11am] Jason_Google_: johnvdenley: A local data viewer was added in the Java SDK a couple of releases back. [09:11am] mbw: scudder_google: did you receive my PM? [09:11am] _tmatsuo: nick: brett_ae: thanks. In such a case(timeouts on writes because of massive access), in my opinion, re-partitioning of data will help reducing timeouts. Is there any mechanism for re- partitioning of data? [09:12am] scudder_google: mbw: yes, just replied, apologies for the delay [09:12am] nickjohnson: _tmatsuo: That's too general a question to answer as-is. It depends highly on the data model in question. [09:12am] nickjohnson: Frequently, simple optimisations do make a big difference, though [09:12am] brett_ae: _tmatsuo: you can do a migration (by changing your schema/entity groups) yourself, which can be difficult; easiest thing to do is think about your datamodel ahead of time and think of your read/write access patterns [09:12am] johnvdenley: ah, apologies, i must have been reading an old message [09:13am] brett_ae: _tmatsuo: So if you know you're writing to a single piece of data more than once per second, maybe split it somehow? [09:14am] max-oizo: When I was doing "diff" between versions 1.2.5 and 1.2.6, I found a CompiledQuery. What is it? Part future support cursors? [09:14am] Sylvain_: News to support new extensions in the e-mail service ? and particularly MS Office (Word, Excel) and Open Office files ? (issue 494). [09:14am] brett_ae: max: [09:14am] Sylvain_: For example, We'd like to create a "HR section" where people can send their resum (most of the time) : .doc, docx. We'd like to send them by mail then. [09:14am] Sylvain_: And if possible not case sensitive (issue 493) [09:16am] brett_ae: sylvain_: It's something we should support; the concern thus far has been virus propagation [09:17am] johnvdenley: AWESOME http://localhost:8080/_ah/admin/datastore works beautifully thanks! [09:18am] _tmatsuo: brett_ae: Ok. That's understandable, but what if any node which holds my particular entity group also has another entity-group of other application, which is massively accessed by other application? Is there anything I could do? [09:18am] wcr: Good morning folks. [09:18am] wcr: or afternoon [09:18am] brett_ae: _tmatsuo: You're confusing bigtable separation of data (which is transparent to you, the developer) and entity group separation (which you, as a developer, are in full control of) [09:18am] brett_ae: bigtable separation should not affect you [09:19am] brett_ae: some of this is here: http://code.google.com/appengine/articles/scaling/contention.html [09:19am] Sylvain_: brett_ae: ok, thank you. I didn't know word, excel,... was a big source of virus. I can understand .cmd .bat .vbs, .js,.... [09:19am] nickjohnson: Sylvain_: In the meantime, emailing users links to download the doc is probably a good plan. That or share it with them on Google Docs. [09:19am] Jason_Google: wcr: Hi [09:20am] moraes: max-oizo: CompiledQuery is used by cursors. res = query.fetch(10) / cursor = query.cursor() / next_res = query.with_cursor(cursor).fetch(10) -> last time i checked, was only working in dev, cursors are always None in production. [09:20am] brett_ae: sylvain_: Yeah they're pretty notorious [09:20am] Sylvain_: yes nickjohnson , but my users want to receive a mail with an attachement not a link [09:20am] max-oizo: And another, i found that an images API will support a blob_key in the constructor. When can we expect a support of "Service for storing and serving large files"? [09:21am] brett_ae: max: good digging you've done [09:21am] brett_ae: nothing to announce today, but it's on the roadmap [09:21am] wcr: What is currently the best method for paging results, since offset is not an option > 1000. Someone mentioned something about sorting by key, anyone have any more details? [09:21am] practicalint: newbie. Updated eclipse with latest GWT and AE plugins. Can't get Taskengine demo to run. should it work out of the box with the latest? [09:22am] nickjohnson: wcr: The basic technique is to store the value of the sort field (Which by default is the key) for the last entity you saw, then pick up where you left off. [09:22am] max-oizo: 2brett_ae: only diff with winmerge [09:22am] nickjohnson: There are libraries that will help with this [09:22am] Aneon: it would be nice with some more documentation/ articles about indexes, especially related to list properties, as these are so important in app engine. for example, i would like to know more about when exploding indexes actually becomes a performance/ storage problem [09:23am] wcr: nickjohnson: Do you have a blog post about this by any chance? [09:23am] scudder_google: There are also several pagination techniquest discussed in this article http://code.google.com/appengine/articles/paging.html [09:23am] johnvdenley: practicalint, there is a setting you have to add to the properties for the java VM, Ill see if I can find the article about this, unless someone else beats me to it! [09:23am] nickjohnson: wcr: http://code.google.com/appengine/articles/paging.html [09:23am] nickjohnson: wcr: also http://appengine-cookbook.appspot.com/recipe/efficient-paging-for-any-query-and-any-model/ [09:23am] nickjohnson: scudder_google: Snap! [09:24am] wcr: lol [09:24am] scudder_google: nickjohnson yes jinx! [09:25am] scudder_google: Aneon: that's a great idea, in fact we've been thinking about publishing some more datastore related articles in the not too distant future [09:26am] max-oizo: 2google_team: in issue 354 niall.kenned wrote recently: "bslatkin of GAE team confirmed last week he is working on this feature and will work similar to urlfetch." - is it true? [09:26am] scudder_google: Aneon: the threshold of pain for exploding indexes depends in part on how much data you have. [09:27am] wcr: soon(tm!) [09:27am] max-oizo: * Issue 354: Feature: DNS Lookup [09:27am] nickjohnson: max-oizo: What feature? [09:27am] Aneon: scudder_google: that sounds great! just what i feel missing. because i feel that it's very hard in the beginning, before you have your app up and running and can perform tests, to actually make good judgement on model design, especially related to list properties. [09:27am] max-oizo: 2nick: Feature: DNS Lookup [09:28am] Aneon: scudder_google: how much data you have in the specific list property you mean? or in total? [09:29am] scudder_google: Aneon: what I was trying to get at is that say you had a list property in a model and each entity has just a few values in it [09:29am] nickjohnson: Aneon: The 'under the covers' datastore talk is an excellent one to watch if you want to learn about why exploding indexes happen [09:29am] scudder_google: ah yes, good suggestion Nick [09:29am] Aneon: thanks, i'll make a search for that [09:30am] Jason_Google: Aneon: There's a reasonable explanation in the indexes documentation, but you basically want to avoid, as much as possible, having more than one indexed list property for a given kind, especially if you plan to store a lot of values in them. Due to the way that indexes have to be built using the various permutations of these lists, etc. [09:30am] _tmatsuo: brett_ae: Thanks. Thats good to know developpers souldn't care about other applications datastore acccess. [09:30am] scudder_google: Aneon: so with just a few values, indexing all 2 pair combinations might not be too bad, but if you start indexing three pairs, or you have lists with a large number of values, the number of index rows per entity increases exponentially [09:30am] Jason_Google: Jinx x 2 [09:31am] scudder_google: we're on a roll today [09:31am] wcr: Anyone happen to be using any libraries that handle paging? Looking for a recommendation [09:31am] lent: <java> Is there any way to define and access a sequence directly in JPA with Appengine? When allocation of ids came in with issue 110, Max Ross commented that appengine supports pm.getSequence() in JDO which allows for accessing sequence directly. Any thing like this with JPA? If not, is the only way to directly use the low level api with allocated ids? [09:32am] johnvdenley: practicalint, im not sure this is the right link, but It sounds like the same issue I had: http://groups.google.com/group/google-appengine-java/browse_frm/thread/3497eec1c4908bbf/14b2963f245a37f4?lnk=gst&q=1.7.1#14b2963f245a37f4 [09:32am] nickjohnson: wcr: My recommendation would be to wait, personally [09:32am] moraes: wcr: simply use the one described here http://code.google.com/appengine/articles/paging.html while cursors don't come. [09:32am] wcr: nickjohnson: wait for what? [09:32am] max-oizo: 2google_team: So it's true or not? (about bslatkin and comment into the issue 354:Feature: DNS Lookup) [09:32am] Aneon: Jason_Google: that's what i was suspecting. so a kind with two list properties with about 20-40 values each could be a problem? and is it only a problem when you actually filter on them on reads (if you disregard the storage costs)? [09:32am] moraes: cursors! [09:32am] wcr: oh! [09:33am] wcr: I haven't read about cursors... someone link me! [09:33am] tobyr: lent: Yes, currently if you're using JPA, you'll need to use the low level API to get at "sequences". [09:33am] nickjohnson: Aneon: Only if they're both used in the same custom index [09:33am] nickjohnson: Using the same list property more than once in a custom index has the same issues [09:34am] nickjohnson: wcr: There's no public docs yet [09:34am] moraes: wcr: there're no link, as they were not relesead. you'll find hints in google.appengine.ext.db code. [09:35am] wcr: I'll look forward to seeing this completed sometime this weekend, thank you. [09:35am] wcr: BUt yeah, that's great to hear [09:35am] practicalint: johnvdenley doesn't look like my problem. Have RT arg of tasks. I can get it to run to the point of logging in from web loading the task module, then runtime exception cannot load module. [09:36am] johnvdenley: practicalint, i would suggest cutting and pasting the error into the groups search and seeing what comes up [09:39am] practicalint: johnvdenley I have - most of the hits are about files not containing all the proper values which came from the demo. Had to change one set properties to set configuration properties due to GWT changes. [09:39am] _tmatsuo: google_team: Will appengine/java have remote_api soon? [09:39am] nickjohnson: _tmatsuo: It's being worked on. [09:40am] Aneon: nickjohnson: you mean only when i define them in index.yaml? ah, didn't know the same problem could happen with one property used two times in one index. i'll try to read up and experiment more with this, but it would be really cool with some more in-depth articles and best-practices/example texts than those available in the official docs. they're nice, but i feel they could be expanded on a lot as this is such an important area for [09:40am] Aneon: optimization [09:40am] practicalint: johnvdenley I suspect other GWT changes not compatible with demo code, maybe in conjunction with I have ie8 on the box maybe? [09:40am] nickjohnson: Aneon: yes [09:41am] johnvdenley: practicalin, ive only been on GWT/GAE for a couple of months, your problem seems to be beyond my capabilities! [09:41am] Aneon: thanks for the response guys, have to go. keep up the good work, i like app engine a lot! [09:43am] johnvdenley: Id just like to say a big thanks to all at google for providing GWT/GAE, Ive only been using it (and java) since September, and I did a demo of my new application today to my business partners, who were simply stunned at the functionality and speed that has been achieved in such a short development period [09:44am] nickjohnson: johnvdenley: We're the wrong people to thank, but happy to accept your gratitude anyway [09:44am] practicalint: johnvdenley OK thanks, I was picking on you cause you reponded. anyone know if re-running the demos would occur before the plugin release - trying to figure out is it me or is the demo broken [09:46am] johnvdenley: nickjohnson, feel free to pass it on to the "right" people [09:46am] practicalint: I learn/code best from example and was trying to use demo as GWT/GAE example to build on. suggestions for another place to get such an example with code ? [09:48am] Jason_Google: practicalint: There are a few GWT/GAE sample apps -- gwtguestbook, stickynotes, etc. Have you looked at these? [09:48am] Jason_Google: practicalint: The full set of demos is here: http://code.google.com/p/googleappengine/source/browse/#svn/trunk/java/demos (stickynotes is actually called sticky) [09:49am] _tmatsuo: currently, google moderator have many users and series. Is google moderator billing enabled? [09:49am] johnvdenley: practicalint, i think i have a fully working stockwatcher demo which takes some concepts from the stickynotes/ guestbook somewhere which I can send over to you. Sounds like you learn the same way I do, as I say Im only 2 months ahead of you and I had real trouble getting the standard demos actually working fully! [09:50am] scudder_google: practicalint: the taskengine demo also uses GWT [09:50am] practicalint: Jason_Google yes - each different purposes (and I can run them). I want GWT with GAE as I am trying to learn/ deploy with both [09:50am] Jason_Google: _tmatsuo: I imagine that Moderator does require more than the free quotas, but it's also a Google application so it's not necessarily billed the same as third-party apps. [09:51am] practicalint: scudder_google thats the one I can't get to run [09:51am] Jason_Google: practicalint: After the chat, I'll try to look into why the other demo isn't working and see if it's an issue in the demo itself. In the meantime, the other apps that I mentioned and the updated StockWatcher app should be enough to get you started. [09:52am] WdWeaver: I'm interest in improvement of spinning up time for appengine/java. How is that status? [09:52am] _tmatsuo: Jason_Google: Thank you. Could it be possible for us to know how much it costs if google moderator is third-party app and billing enabled? [09:53am] Jason_Google: WdWeaver: We have various improvements that we're working on, spreading them out over the next few SDK releases. Hopefully you'll begin to see the startup time improving, slowly but surely. [09:53am] max-oizo: 2Jason: You promised to do the off-line documentation in other languages (ru/etc.). How soon it happens or better to parse it me with a special programm? [09:54am] Jason_Google: _tmatsuo: I don't have this data point on- hand, but I'll try to find out sometime. [09:55am] johnvdenley: oh, have we solved the problem with the hosted browser trying to load 127.0.0.1 instead of the application.html? the groups suggest its IE caching, but after having this problem hundreds of times, ive pretty much prooved that clearing the IE cache has no direct effect, it usually is just a matter of waiting a few minutes, but sometimes even waiting 20 minutes still doesnt clear the problem! [09:55am] Jason_Google: max: I asked the person who built the script that automatically packages the documentation set shortly after our last conversation. They're working on enhancements to automatically get the intl docs in there also, but he didn't give me an ETA. So it will probably be a little while, but it will happen. [09:55am] practicalint: johnvdenley: does your example use GWT? [09:57am] Jason_Google: johnvdenley: Unfortunately, I'm not a GWT expert. Best to ask that in the GWT discussion forums. [09:57am] practicalint: Jason_Google: thanks please let me know what you find. my gmail is the same nick here. [09:57am] _tmatsuo: Jason_Google: Thanks. That data could encourage more developers diving into appengine [09:58am] max-oizo: 2Jason: thank you very match [09:59am] wcr: Are there any plans to allow adsense generated revenue to go into your billing enabled pool, or is that too much to coordinate ? [09:59am] johnvdenley: practicalint yes stockwatcher uses GWT & GAE. Ive been using GWT/GAE as if they are one tool, as such i often get confused about which question to direct (case in point, my question above, which appears to be a GWT question rather than a GAE question!!) [10:00am] johnvdenley: (Im being surprisingly chatty for someone who WAS planning on just watching this discussion!) [10:01am] Jason_Google: wcr: Interesting, I just got that question in person a few days back. The short answer is, it's not on the short- term roadmap, so I wouldn't expect anything like this around the corner. But I do concede that it's an interesting integration angle. [10:01am] nickjohnson: deferred.defer(developer_chat, _eta=datetime.datetime(2009, 11, 4, 19, 00, 00)) [10:03am] nickjohnson: In other words, this marks the end of this week's official developer chat. Some of us will be around for longer, and there are many enthusiasts in the channel, so feel free to ask questions any time. [10:03am] max-oizo: goodbye! --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to google-appengine-java@googlegroups.com To unsubscribe from this group, send email to google-appengine-java+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en -~----------~----~----~----~------~----~------~--~---