I hope everyone is having/has a great Tuesday. I also encourage you to join in with me and spend Tuesday - just tuesday - really looking at performance issues.
But enough of the sermon; did Robert do anything interesting today? Why yes, yes he did. The restricted librarian change to be visible on the internet is 'code complete' - that is, there are still deployment changes to make, HTTPS certificates to get hold of and so on. The change is guarded on the appserver side by a feature flag: once all the bits are ready and any rollout fallout has completed, we can switch it on, live, on the appservers. We also still have to QA it (which needs the new certs)... we may find, for instance, that there are tweaks to do to support it in APIs (API clients will need to follow redirects, for instance). This will: - remove all the 'high duration' times which are really internet clients downloading soyuz logs and bug attachments from the appserver - as a result make the appservers less prone to fall over in a screaming heap when the librarian glitches. I also landed a very small tweak to requesttimeline to not use a 999999ms sentinel, so OOPS reports after todays edge rollout will be back to normal (but with the extra bits still added). Following on from that I put up a storm patch to let us instrument how long it takes to retrieve data from the raw resultset in storm : this may be unneeded, James Henstridge is very sure (has checked code sure) that its all in-memory and essentially free. Then, and this is a bit that may interest Danilo, I got stuck back into Distribution:+assignments - bug 618367; this page shows 3000 specification assignments, and takes quite a while to do so. However the query to retrieve them is only 800ms (with eager loading of people) or 500ms (without, but the separate person query then takes 1.5 seconds... go figure). So where goeth the time? All the gory details are in the bug, but to save folk chasing pointers I'll summarise here. Gary's ++profile++ decorator is very sexy. Thanks! I asked spm to enable that on staging for a limited time, and grabbed a profile of the page. The profile showed 16 seconds (of 25 up to the kill-point) going into storm object deserialisation. 30000 objects. (Thats 3000 rows, 10 objects per row). So, I filed bugs on storm - 632140 for NULL object checks (small saving, but probably worth doing), and bug 632145 about the overall performance. I then went to see if doing two queries would be better. Again, some help from spm to apply the patch and enable profiling and we go the answer.. its a wash : it still takes 12 seconds (with profiler, 10 without) to process the *3000* rows of Specification objects *alone*. So in the 30K object trace, 4 seconds for 27K objects, 12 seconds for the other 3K. I suspect there are some constructs in the class definition that perform poorly in storm / trigger some sort of pathology; more digging is needed. For instance, there are 100K checkpont calls, on 3000 objects - 30 checkpoints *per object* on average. That is 2% of the total runtime of the page, so not a big candidate to spend time on - but its still worrying that this housekeeping, which is *entirely uninteresting in read only transactions* is being paid for. I speculate that ripping that code out of the main path in storm and putting it into a separate facility used for mutation operations would make a phenominal difference. Now the bit that *may* interest Danilo: if this is right, when we have high nonsql time, and lots of rows coming back, we should add storm into the list of suspects (and get a profile, always get a profile :)). -Rob _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp

