Re: Clearing cache between tests (#11505)
On Oct 30, 2011, at 4:19 PM, Jeremy Dunck wrote: Hi Jeremy, > I locally have a test-runner class that calls .clear() on setUp. > Works for me. But we run CI, so test against prod. I needed to point > to different CACHES under test to avoid flushing prod cache 20 times > per day. Right. You've got a great real world use case where cleaning the cache is valuable, yet it's important for you to ensure it's not getting cleared frequently. > Another approach that can work is to have the test runner generate a > unique key prefix for each test run. Then the keys will all be misses > on later runs. If invading the available key length isn't acceptable, > then the test runner could also have a custom KEY_FUNCTION to hash the > resulting key. A downside is that this can create significant > pressure on the cache. I think that is less bad than clearing cache, > though, since that is effectively infinite cache pressure. I think this is indeed a workable alternative and I recall it from the previous discussion. I too had the same concern about the downside, namely that pressure it puts on the cache. If you're someone like you who's running integration tests 20 times a day on production, I do think it would add up. The end result might be cache data falling out of production, though certainly it would be preferable to clearing out the cache. BTW also in the last conversation, Jacob proposed something similar to what you do now, i.e. a "test" cache(s) that the user could specify, and if one is not defined, fall back to locmem. So I'm curious what your opinion is about the alternatives we've bandied about here? If you had your personal wish, which implementation would best suit you? -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Clearing cache between tests (#11505)
On Oct 29, 2011, at 11:14 AM, Aymeric Augustin wrote: Thank Aymeric, > >> As an alternative, I propose we: >> >> 1. Add a new optional setting under CACHES, CLEAR_BETWEEN_TESTS. The default >> for this setting is False. To be clear, the value is set independently for >> each cache defined in CACHES. >> >> 2. Between each test, cycle through each cache in CACHES. If >> CLEAR_BETWEEN_TESTS is True, then clear() that cache. Otherwise, move on. > > Here's what cache.clear() does on each cache backend: > - db: it runs a SQL query "DELETE FROM " > - dummy: it's a no-op > - filebased: it calls shutil.rmtree on the cache directory > - locmem: it clears two Python dicts > - memcached: it calls the "flush_all" API method > > It's dirt cheap with locmem, who is probably the most popular cache backend > for testing. It's a little bit expensive with db, but if you're using the > database cache backend, you're clearly not a performance junkie. > > The fact that Django doesn't reset cache state between tests is a bug, which > means we don't need to be 100% backwards compatible. I'd be comfortable with > calling clear() unconditionally on each cache, after each test. I don't feel > strongly against the CLEAR_BETWEEN_TESTS flag, but if you add it, I suggest > making it True by default. > > To sum up: +1 on .clear(), -0 on CLEAR_BETWEEN_TESTS. > I *believe* that when this conversation has been had in the past, the simple solution of cache.clear() has been sort of a nonstarter. Certainly it's by far the simplest solution, and it's tolerable for most of the caches. The main issue is that, using the memcached backend, clear() flushes out the entire cache….i.e. not just the cache for the tests, not just the cache you might have set with this application, but literally the whole of the cache. So the challenge has been to find a way to clear the cache between tests without potentially destructive side effects. If I personally were given the choice between cache.clear() and nothing at all, i'd take cache.clear() and be careful with it. But I'm not sure if the rest of the community is willing to accept such a solution. Hence the CLEAR_BETWEEN_TESTS flag as a compromise. Cheers, Jim -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Clearing cache between tests (#11505)
I noticed that Aymeric Augustin committed a patch last week [17042] that fixed some issues in the cache backends. Apparently the fact that the cache carries over test to test remains a problem. I've progressed pretty far with an aggressive "fix" for this problem with the patch in #11505. The current patch modifies all the keys set during a test run (so they don't overwrite any existing values), it tracks any values set during each test and it deletes those and only those that were set during that test. In essence, the cache is "restored" to its original state after each test. This is achieved by monkeypatching each cache backend in use and adding some "bookkeeping" code so to speak to track the keys set. The patch includes thorough tests, and as of last week includes a fix for the cache middleware problem as well (where views cached by middleware get carried over test to test so that the view function does not run.). The full Django test suite passes with it. That all said, for the most part reception has been lukewarm for this patch. I think that's pretty understandable, since there's a fair amount of fiddly stuff taking place that makes people uncomfortable. Anyhow, this morning I thought of a vastly simpler alternative proposal, which would fix the cache between tests and ensure we did not step on anyone's toes by unknowingly clearing a cache that the developer did not want to be cleared. As an alternative, I propose we: 1. Add a new optional setting under CACHES, CLEAR_BETWEEN_TESTS. The default for this setting is False. To be clear, the value is set independently for each cache defined in CACHES. 2. Between each test, cycle through each cache in CACHES. If CLEAR_BETWEEN_TESTS is True, then clear() that cache. Otherwise, move on. Since test problems caused by cache values set in the tests are fairly infrequent, the default use case is just to leave this setting alone. However, if your test(s) need it, the option is there. It's also then trivial to e.g. set it for True in your local environment but have it on False in production, if e.g. you wanted to run your tests there but didn't want to clear your entire production cache. I'd love to move #11505 through the logjam here and get it taken care of. Are any core devs willing to buy in to this alternative proposal? If so, I can create a patch for it with relative ease. Part of me still likes my original patch and I think there are good reasons to take the "thorough" approach. But this works as well and I could really go either way. Let me know and thanks for listening. Here are some references for previous conversations that have taken place about this: - https://groups.google.com/d/topic/django-developers/zlaPsP13dUY/discussion - https://code.djangoproject.com/ticket/11505 -- You received this message because you are subscribed to the Google Groups "Django developers" group. To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/TTma3dJU1ccJ. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: The state of per-site/per-view middleware caching in Django
On Oct 21, 2011, at 8:04 AM, Kääriäinen Anssi wrote: > I do not know nearly enough about caching to participate fully in this > discussion. But it strikes me that the attempt to have CSRF protected > anonymous page cached is not that smart. If you have an anonymous submittable > form, why bother with CSRF protection? I mean, what is it protecting against? > Making complex arrangements in the caching layer for this use case seems like > wasted effort. Or am I missing something obvious? First issue is that CSRF can matter for anonymous users. From here http://www.squarefree.com/securitytips/web-developers.html#CSRF: Attacks can also be based on the victim's IP address rather than cookies: • Post an anonymous comment that is shown as coming from the victim's IP address. ... • Perform a distributed password-guessing attack without a botnet. (This assumes they have a way to tell whether the login succeeded, perhaps by submitting a second form that isn't protected against CSRF.) So two very common uses cases for anonymous forms are log in forms and anonymous comment forms, both of which are potentially vulnerable. I guess I feel like it's quite common to have forms on a page these days even for anonymous users. Second is -- and I don't know about this -- but I don't know how well CSRF handles authentication conditionally. Like if I have a page and let's say that page has forms in it for logged in users but nothing for anonymous user, can I conditionally exempt the formless page from CSRF? I have no idea, but buy default I presume it's on and I presume the cache is varying on it. So, yes, you could probably optimize a lot of this to sort of skip around the CSRF issue and it's not a deal breaker. But my main argument has been the ubiquity of CSRF + user authentication in Django projects to me means a solution to both of these is a requirement for page caching to become easy and applicable in most scenarios. > > The following is from the stupid ideas department: Maybe there could be a > "reverse cache" template tag, such that you would mark the places where you > want changing content as non-cacheable. You would need two views for this, > one which would construct the "base content" and then another which would > construct the dynamic parts. Something like: > Your idea sounds a lot like the "server side include" or "two phased template rendering" approach that I know some people are doing. Here's an excellent example of this approach being used in EveryBlock (from two years ago): http://www.holovaty.com/writing/django-two-phased-rendering/ And looks like some core devs have been involved at some point in this implementation of that concept: https://github.com/codysoyland/django-phased That looks almost exactly like your idea: "django-phased contains a templatetag, phased, which defines blocks that are to be parsed during the second phase. A middleware class, PhasedRenderMiddleware, processes the response to render the parts that were skipped during the first rendering." I guess that was sort of what I was hinting at in my previous discussion about how to handle CSRF. In the link he is taking it to the next level (where even logged in users get the page and that stuff is done after. Anyhow it's obviously a sensible conceptual approach. It would be a stretch to fit that into the existing page caching approach of Django obviously. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: The state of per-site/per-view middleware caching in Django
On Oct 20, 2011, at 6:02 PM, Carl Meyer wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi Jim, > > This is a really useful summary of the current state of things, thanks > for putting it together. > > Re the anonymous/authenticated issue, CSRF token, and Google Analytics > cookies, it all boils down to the same root issue. And Niran is right, > what we currently do re setting Vary: Cookie is what we have to do in > order to be correct with respect to HTTP and upstream caches. For > instance, we can't just remove Vary: Cookie from unauthenticated > responses, because then upstream caches could serve that unauthenticated > response to anyone, even if they are actually authenticated. > > Currently the Django page caching middleware behaves pretty much just > like an upstream cache in terms of the Vary header. Apart from the > CACHE_MIDDLEWARE_ANONYMOUS_ONLY setting, it just looks at the response, > it doesn't make use of any additional "inside information" about what > your Django site did to generate that response in order to decide what > to cache and how to cache it. > > This approach is pretty attractive, because it's conceptually simple, > consistent with upstream HTTP caching, and conservative (quite unlikely > to serve the wrong cached content). > > It might be possible to make it "smarter" in certain cases, and allow it > to cache more aggressively than an upstream cache can. #9249 is one > proposal to do this for cookies that aren't used on the server, either > via explicit setting or (in a recently-added proposal) via tracking > which cookie values are accessed. If we did that, plus special-cased the > session cookie if the user is unauthenticated and the session isn't used > outside of contrib.auth, I think that could possibly solve the > unauthenticated-users and GA issues. > > However, this (especially the latter) would come with the cost of making > the cache middleware implementation more fragile and coupled to other > parts of the framework. And it still doesn't help with CSRF, which is a > much tougher nut to crack, because every response for pages using CSRF > come with a Set-Cookie header and probably with a CSRF token embedded in > the response content; and those both mean that response really can't be > re-used for anyone else. (Getting rid of the token embedded in the HTML > means forms couldn't ever POST without JS help, which is not an option > as the documented default approach). You can mark some form-using views > that are available to anonymous users as csrf-exempt, which exposes you > potentially to CSRF-based spam, but isn't a security issue if you aren't > treating authenticated submissions any differently from > non-authenticated ones. > > Generally, I come down on the side of skepticism that introducing these > special cases into the cache middleware really buys enough to be worth > the added complexity (though I could be convinced that #9249 is worth it). Thanks Carl. This is definitely a good, clarifying response to what I was mulling around about. A few thoughts of my own to add here: * You and Nihan are certainly right about upstream caches. Regardless of what we do here, we'll have to vary by cookie in the response header. This makes sense for a site that offers authentication: Django needs to check on every page view to see if the user is authenticated, so we can't have the upstream cache holding on to a page for us. * Agreed about how the "smartness" comes at the cost of brittleness if the implementations are too tightly coupled. That said, I can squint and sort of see an implementation that could thread the needle here. It would require something like: - An API in the cache middleware instructing it to ignore certain cookies for the purposes of caching (i.e. something along the lines of #9249). - Some kind of "pre-fetch" hook in the cache middleware. Whether it's a flag in the request object, a signal or something else, give other systems the ability to look at a request before it hits the FetchFromCacheMiddleware and either allow or prevent the response from being pulled from the cache. E.g if there was a flag request.invalidate_cache that defaults to False, the contrib.auth app could, in combination with the above, pull the session id from consideration in the cache key and do an authentication check on its own, invalidating the cache on its own if the user is authenticated. The core idea is what you already suggested, I'm more illustrating here that this can conceivably be implemented as an API, making it less brittle. - Some kind of "post-fetch" hook in the cache middleware, combined with a retooling of the CSRF middleware. This is getting in the clouds here a bit, but a hook on the opposite end of the fetch operation could allow the CSRF app to add its token after the response was pulled from the cache. I say we're in the clouds here because for something like this to work the CSRF would have to do a little
Re: The state of per-site/per-view middleware caching in Django
On Oct 20, 2011, at 10:26 AM, Niran Babalola wrote: > This problem is inherent to page caching. Workarounds to avoid varying > by cookie for anonymous users are conceptually incorrect. If a single > URL can give different responses depending on who's viewing it, then > it varies by cookie. Preventing CSRF is inherently session-variable as > well. Loading the token via a separate AJAX call is possible, but > there are simpler solutions. You may in fact be correct, but I'm not convinced by what you're saying here (not that there is any onus on you to convince me of anything of course). I"m suggesting that all anonymous users *could* receive an identical page from the server, theoretically, since the same URL does *not* need to return a different response depending on which (anonymous) user is viewing it. CSRF is obviously a trickier problem, and it's not really worth solving the anonymous user problem if CSRF isn't solved as well. But if both problems were somehow solvable, then we're in a position where per-site cache would be viable for many common scenarios such as the one I described in my original post. If these two problems are in fact unsolvable or not worth solving because simpler alternatives exist, that's fine and understandable. Perhaps per-site/per-view caching are indeed exceptionally limited tools that are beneficial in a very limited number of use cases, and perhaps the "solution" here is tidying up the outstanding bugs and perhaps clarifying the documentation as needed to make the limitations more explicit. > > If you want to cache pages with small portions that vary by user, then > you want edge site includes and something like Varnish to process > them. If you want a much slower, pure-python solution that doesn't > require a separate service running somewhere, then you want > armstrong.esi[1]. Thanks. This post wasn't really about what *I* need btw; I can definitely sort out my caching strategies in other areas as I need to. The post only relates to "me" because I sat down yesterday and said, "Gee, I wonder if I could make use of Django's per-site caching feature for this project I'm working on." I turned it "on" to test it out and then spent the next 6 hours delving into the source code, IRC, ticket tracker, Google etc. to figure out why it wasn't working at all and why @cache_page was, and then after finally sorting it out and grokking all of the moving parts etc, realizing that there was extraordinarily limited value in a per-site/view caching strategy that caches per unique visitor, which is pretty much unavoidable for most common usage patterns. So yeah, maybe it's me and I'm looking at things the wrong way, but needless to say it wasn't a particularly pleasant or worthwhile experience. Not looking for pity btw, but just wondering what I/we can or should do to make it better. Jim -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
The state of per-site/per-view middleware caching in Django
I spent the better part of yesterday mucking around in the dregs of Django's cache middleware and related modules, and in doing so I've come to the conclusion that, due to an accumulation of hinderances and minor bugs, the per-site and per-view caching mechanism are effectively broken for many fairly typical usage patterns. Let me demonstrate by fictional example, with what I would consider to be a pretty typical configuration and use case for the per-site cache: Let's pretend I'm developing a blog powered by Django. I'm using memcached, and I would like to cache pages on that blog for anonymous users, who are going to make up the vast majority of my site's visitors. Ideally, I will serve the exact same cached version of a blog post to every single anonymous visitor to my site, which will help keep server load under control, particularly when I get slashdotted/reddited/what-have-you. Like any blog, a typical page view features the content primarily (e.g a blog post). It also has some "auth" stuff at the top right, which will say "Log in / Register" for non logged in users but show a username and welcome message for logged in users. Each blog post also has an empty comment form at the bottom of it where users can leave comments on the post. Like 99% of the websites out there, I will be using Google Analytics to track my visitors etc. Pretty straightforward, right? Let me count the ways that Django's cache middleware will muck up my goals in the above scenario. First, I'm going to try use the per site cache. Here's what's going to go wrong for me: * It's going to be virtually impossible for me to avoid my cache varying by cookie and thus by visitor. Because in my templates I am checking to see if the current user is logged in, I'm touching the session, which is going to now set the vary cookie header. That means if there is any difference in the cookies users are requesting my pages with, I'm going to be sending each user a separate cached page, keyed off of SESSION_COOKIE_NAME, which is unique for every visitor. * Even if I avoid touching the request user somehow, the CSRF middleware presents the same issue. Because I have a comment form on every page, I have a unique CSRF token for each visitor. Thankfully Django doesn't let me completely shoot myself in the foot by caching the page with one user's token and serving it to everybody else. At least it helpfully sets a CSRF token cookie and varies on it to prevent this. However, that cookie is different for every unique user. That triggers the the same problem as above. I again cannot avoid caching a unique page for each unique visitor. * Unfortunately, my troubles are not over, even if I resign myself to having a cache that varies per visitor. You see, Google Analytics actually sets a handful of other cookies with each page request. And guess what? The values for those cookies are unique *for each request*. This mean...I'm actually not caching at all. Cookies are unique for each and every page request thanks to Google Analytics. My per-site cache configuration is totally and completely inoperable, all because I'm using a tracking service that pretty much *everybody* uses. Since that didn't work, I wonder if it'll work if I do per-view caching? It shouldn't work at all, should it, since it's not like any of the factors I outlined above are different if I'm using the @cache_page decorator to do my caching vs the per-site cache. Well, the sad news is caching does "work" when I use cache_page, and that's not a good thing: * @cache_page caches the direct output of the view/render function. It skips over the middleware that might have very good reason to introduce vary headers and doesn't introduce any vary headers of it's own. So now, with this applied, I *am* serving a cached version of this page even though I absolutely should not be. Some poor user's token is now being sent to everybody. My only chance of redemption is if I happen to have read the docs and discovered that this incantation is required to prevent having cache_page improperly cache the page: @cache_page(60 * 15) @csrf_protect def my_view(request): # ... Of course, the above just puts me right back where I started at the per-site level. There was never any chance of making cache_page work any different from the per-site cache, but it certainly proved to be a temptation if I'm a hurried developer, frustrated by why my per site cache wasn't working and "thankful" for the fact that I could get the cache to start "working" with the cache_page decorator. Hopefully the above example really makes it clear to you guys how all of the seemingly minor bugs and imperfections really do add up to a broken situation for someone coming to this with a pretty standard set of expectations and requirements. Anyhow, the good news is that a good portion of what I have written about already has open tickets which in some
Re: Deferred constraint checking causing problems during test runs
On Jul 14, 2011, at 1:31 AM, Simon Riggs wrote: > On Sun, Jul 10, 2011 at 3:27 PM, Jim Dalton <jim.dal...@gmail.com> wrote: >> On Jul 10, 2011, at 3:13 AM, Simon Riggs wrote: >> >>> Maintaining the property of deferrable constraints seems important >>> here, so changing the deferrability of constraints, or overriding it >>> using the SET CONSTRAINTS command at the top of the transaction might >>> not be what we want. >> >> Well, that's just it. We want tests to behave as much like production code >> as possible, so we actually *don't* want constraint checks to be deferred. >> >> When you're working with DB normally in production, i.e. outside of a >> transaction, constraints are checked immediately. It's only when you get >> into a transaction that they are deferred. Each of our tests run inside of a >> transaction, which is an artificial construct. So to emulate production, I'm >> arguing that this property is not something we want (*unless* we are testing >> a transaction or transaction like behavior, in which case it does make sense >> to temporarily suspend constraint checks). > > My role here is to help with Postgres details, so any comments I make > are just attempts at providing useful information. Thank you, btw, for offering your input in that capacity. I found it quite reassuring when I read your first reply that someone with your credentials was giving feedback on this issue. :) > > It sounds like there is a slight confusion about the way PostgreSQL > works. Everything is a transaction, so there is no difference between > inside/outside a transaction. > You can choose whether you wish immediate or deferred constraint > checking. It's completely user selectable, though the default is > immediate. If you're seeing deferred checks, they can be changed. > > ISTM that the tests should work the same way as things do in > production, and it looks possible to make that happen. Okay, this is interesting -- and thank you for clarifying this point, which I had grossly oversimplified. I think you actually shed a huge amount of light on part of the problem here (at least for me). So in normal, everyday use, Django opens a connection to Postgresql with an open transaction. Per the docs, it commits this transaction immediately whenever you do an INSERT or UPDATE etc. At that point Postgresql would run its constraint checks and rollback the transaction on error or else commit. During a test, we open up a big transaction during setUp and then *disable* the transaction commit and rollback operations, i.e. a call to either does nothing. Since we can't nest transactions in Postgresql (right?) this has been the only sensible way to allow tests to proceed. The problem has been -- I am positing -- that we're getting away with a lot of stuff as a result, because constraint checks are *never* checked during testing. The small slew of bugs I found is a demonstration of that, to me. My solution up to now, however, is also "unlifelike", which your comment has helped me to realize. It works kind of similar to the way Django works normally because in some ways Django sort of emulates immediate checks in its default behavior. But I think my present solution is a kludge in the other direction and I agree is not close enough to reality. I'm thinking this might not be too hard to solve though. I'm thinking that rather than *only* enabling constraint checks immediately as I was suggesting before, we could instead focus on the transaction-related methods that we are presently just blanking out (by this I mean when tests are set up, transaction.commit() transaction.rollback() etc. are being monkey patched with a function that does nothing). Even though we can't nest a transaction explicitly, we can better emulate the behavior of normal Django operation. We can keep things in deferred mode as they are in production, but then do a temporary flip to IMMEDIATE at commit() which will fire the constraint checks. Presumably, we could even do a SAVEPOINT, as you suggested, at the start so that the test could be recovered and continue on if e.g. the IntegrityError needed to be caught during testing. I imagine that this will still flush out most if not all of the little errors I had discovered (and prevent us from writing test cases going forward with similar problems) but at the same time will very closely emulate production. The challenge will be wrangling all this together with the other DBs (i.e. MySQL and SQLite) which don't share this execution model. Should be workable though. Anyhow, thank you once again Simon for shedding light on this for me. Cheers Jim -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send e
Re: Deferred constraint checking causing problems during test runs
On Jul 10, 2011, at 3:13 AM, Simon Riggs wrote: > Maintaining the property of deferrable constraints seems important > here, so changing the deferrability of constraints, or overriding it > using the SET CONSTRAINTS command at the top of the transaction might > not be what we want. Well, that's just it. We want tests to behave as much like production code as possible, so we actually *don't* want constraint checks to be deferred. When you're working with DB normally in production, i.e. outside of a transaction, constraints are checked immediately. It's only when you get into a transaction that they are deferred. Each of our tests run inside of a transaction, which is an artificial construct. So to emulate production, I'm arguing that this property is not something we want (*unless* we are testing a transaction or transaction like behavior, in which case it does make sense to temporarily suspend constraint checks). > What I would recommend is that we issue an explicit SET CONSTRAINTS > ALL IMMEDIATE command immediately before the ROLLBACK at *end* of > test. This will fire any outstanding checks. That way all constraint > checks will occur in the same place they would during a commit, yet we > can maintain the situation that the test ends with a rollback. This would conceivably work I think. I'm pretty sure Ramiro was exploring this approach actually. My feeling, however, is that this still allows you to get away with stuff you might not otherwise get away with in production. I also think it's more helpful seeing exactly where an integrity issue came up so you can address it. This is, for example, what allowed me to understand the handful of bugs that were hiding, i.e. because I could trace the Intergrity Error to the exact line of code that was triggering it. Your questions raise one issue that had not occurred to me though. One possible "problem" with putting constraint checks at the beginning is that there is now way for a test to recover from them. If you try to put bad data into the DB with immediate constraint checks on, you will raise and error *and* if I'm not mistaken the transaction will be rolled back at that very instant. So if for some reason you knew you were putting bad data in and wanted to recover from it in your test and keep going, that would not be possible. I'm not sure that's actually a problem, but it's certainly something to consider. It's another dimension of behavior that is changed. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Resetting cache gracefully at the end of every test?
On Jul 4, 2011, at 8:01 AM, Jacob Kaplan-Moss wrote: > On Mon, Jul 4, 2011 at 8:54 AM, Jim Dalton <jim.dal...@gmail.com> wrote: >> I've created a ticket for this and uploaded a patch: >> https://code.djangoproject.com/ticket/16401 >> Please feel free to review and let me know if this is a good idea in the >> first place, as well as if the patch makes sense and is the right approach. > > Hm. > > While I think the feature's a good idea (I've been stung by tests > running against a live cache) I can't say I'm particularly thrilled by > the approach here. Monkeypatching is acceptable in situations like > these, but there's a limit... and I think patching five methods is > starting to reach it. Remember: as much as possible, tests should > operate just like production. Any difference, no matter how small, > introduces the worst kind of bugs: ones that only show up in > production. Yeah, I agree with you in principle there. I'm definitely an advocate of avoiding monkey patching like the plague in production code. While I also agree that this code looks a bit hairy when you sort of squint your eyes at it, if you actually look closely at what's going on, we're not modifying things in any way that's tricky or complicated. Really, the only two things we are doing are adding a bit of code that that tracks keys and then swapping a few methods out. Pretty similar to what we do already with Template.render -- just x5. The monkey patching stuff is not modifying any arguments or results, nor is it changing the observable behavior of the modified methods. The tests demonstrate this pretty thoroughly. The only place I've managed to think of where this is different from production is in the length of the key. Because we are appending "_test_" to the start of KEY_PREFIX, you're going to get 6 less characters in your key than you would otherwise. It *is* a difference, so I don't want to downplay it, but at the same time it's a difference where you'll get an error in testing that you wouldn't get in production -- which is better than the opposite. The only other place where someone might run into trouble is if they themselves were monkey patching the cache and their changes collided somehow. I understand that this is the kind of thing that happens in Ruby a lot, and it strikes me as unpleasant. Anyhow, that's not to say that there isn't a better solution or that your proposal below isn't preferable (more on that in a sec), but I'd argue that my patch is at least a plausible solution, i.e. it works in lieu of a better alternative. > I have an alternate proposal: > > Instead of using the default cache defined in CACHES, use a "test > cache." This can safely be flushed, meaning that no monkeypatching is > needed. The added benefit here is that this works similarly to the > database engine, so it'll be fairly easy to understand. > > Now, unlike DB test we can't just prefix "test_" to the database name > (because we haven't got one), nor can we prefix keys (because not all > cache backends can do something like "delete test:*"). So I'd suggest > we look for a "test" backend defined in CACHES and fall back to a > local memory cache if not defined. > > I think this simplifies things immensely, and also gives a fair bit of > control -- if I want to test against a real cache backend, I just > define CACHES['test'] and roll. So, in many ways I like this idea. I agree that the implementation is going to be more straightforward. Here's just a quick list of concerns or possible drawbacks I was able to think of: * Not sure how to handle multiple cache backends. I imagine if you're someone who has implemented multiple cache backends, then you are using them in specific ways for specific reasons. If your tests end up sharing a test backend, that could result in unpredictable behavior. One possible resolution would be to amend your proposal a bit: For each cache backend defined, we swap it out with the locmem backend unless you have yourself defined a test_CACHE_NAME version of that cache. So e.g. fi I have three backends named "default", "file" and "db", I then need to define test_default, test_file and test_db if I want to specify how the test framework is going to set up the test cache. Otherwise, it'll fall back to replacing it with a locmem cache. * At least when it comes to memcached, this doesn't help you much with the cache.clear() issue, since cache.clear() will blank your entire memcached cache. So if we're calling cache.clear() at the end of each test and it's blanking the whole cache, I start to wonder why we bother at all? In other words, why not just call cache.clear() at the end of every test and call it a day, and make a note in the documentation that running the test will clear a
Re: Resetting cache gracefully at the end of every test?
I've created a ticket for this and uploaded a patch: https://code.djangoproject.com/ticket/16401 Please feel free to review and let me know if this is a good idea in the first place, as well as if the patch makes sense and is the right approach. Thanks, Jim -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Resetting cache gracefully at the end of every test?
Awesome feedback, thanks. On Jul 1, 2011, at 10:01 AM, Jeremy Dunck wrote: > Well, I think you forgot all the other low-level cache options, like > .incr, .add, etc.? Yep, forgot those so they would conceivably need to be tracked. > > If I were to do this, I wouldn't have cache shared between tests or > environments, so I'd be tempted to use flush. I'm not totally sure if I'm following you on this point? > > Rather than "test_" and bookkeeping on key names, the key prefix was > the hostname + timestamp from the beginning of the run, so that test > keys would never collide? What I like about this idea is it avoids the bookkeeping and monkey-patching nonsense. What I don't like about it is that it feels like you could potentially be adding a lot of junk to your cache. A test run might end up hitting it with many values, each unique. A requirement I'm sort of working with in my mind here is that the cache returns to the "pristine" state it was in prior to the test run. > > Or, what if we just had hooks called from setUp and tearDown, either > NotImplemented methods, or signals? > > Book-keeping with monkey-patched methods sounds fiddly to me; consider > the case where celery pokes keys in, but our bookkeeping doesn't know > about it? I know what you mean about it being fiddly, but I'm kind of envisioning something not all that different from what goes on with Template.render() settings.EMAIL_BACKEND, or the database backends during test setup and teardown. The bookkeeping required would in truth be nothing more than a set() where you add keys. The monkey patching would entail nothing more than a line to add the passed keys to that set and then a call to the original function. I'm just not sure it's much more fiddly than the fiddling we already do for those examples is what I'm saying. I"m not quite sure what you mean about celery poking keys in. Maybe you can elaborate a bit? To be clear I'm not sure we could keep the actual underling cache entirely pristine if you were using it in different ways from different systems; the goal would really be to ensure calls using the cache backend API are rolled back with every test. Anyhow thanks once again for the feedback and the critical thoughts. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Thoughts on solution to forward references in MySQL (#3615)
On Jun 28, 2011, at 6:29 AM, Cal Leeming [Simplicity Media Ltd] wrote: > Sorry for the noobish question but, could someone explain the definition of > "forward references"?? Is this a MySQL or a django term?? Google wasn't very > forthcoming :X Jacob actually requested that I add a note in the database docs on the topic this patch addresses. So as a way of answering your question I'll paste that note verbatim here: In previous versions of Django, fixtures with forward references (i.e. relations to rows that have not yet been inserted into the database) would fail to load when using the InnoDB storage engine. This was due to the fact that InnoDB deviates from the SQL standard by checking foreign key constraints immediately instead of deferring the check until the transaction is committed. This problem has been resolved in Django 1.4. Fixture data is now loaded with foreign key checks turned off; foreign key checks are then re-enabled when the data has finished loading, at which point the entire table is checked for invalid foreign key references and an `IntegrityError` is raised if any are found. Is that note illuminating to you? If not, then presumably it needs work :) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Thoughts on solution to forward references in MySQL (#3615)
On Jun 28, 2011, at 6:25 AM, Karen Tracey wrote: > It actually doesn't *need* to return False; pass is the same as not returning > anything or returning None. The boolean check just treats it the same way as > False. "Should it?" is another question. On the one hand it's a bit more > clear, this value is called and always returns False, unless a backend has > overridden it. On the other hand, pass is in keeping with other methods in > that class that are meant to be overridden in backends, so I went with pass > to emphasize that aspect of the code. > > Hmm, well, I did not know that falling off the end of a method with pass > guaranteed a return value of False or None (and can't find that noted in the > doc here:http://docs.python.org/tutorial/controlflow.html#pass-statements) so > in my mind explicitly returning False would be clearer/better... Ah, okay. To be clear, the return value actually has nothing to do specifically with the use of pass or not. The behavior of Python is to return None from any function that does not explicitly return a value. This is from the exact same page in the Python docs you linked to: "In fact, even functions without a return statement do return a value, albeit a rather boring one. This value is called None (it’s a built-in name). Writing the value None is normally suppressed by the interpreter if it would be the only value written." Obviously it's a fairly minor point but we want things to be as clear as possible. Anyone else wants to weigh in on "pass" vs. an explicit return value? -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Thoughts on solution to forward references in MySQL (#3615)
On Jun 28, 2011, at 5:27 AM, Karen Tracey wrote: > Also, though I don't have MySQL 4 handy to test on, I'd be astonished if > there were any issue there compared to MySQL 5. The set foreign_key_check > command is certainly supported in MySLQ 4 and the select being issued to do > the check is nothing special at all. Agreed. I did find some circumstantial evidence to support the idea that ti would work on MySQL 4 here: http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html#c3009 This comment from 2003 (when MySQL 4 was still in beta) described the technique. If it worked in 2003 I have to imagine we're in good shape. Slightly higher concern is the new get_key_relations() method. Since the information_schema table is not available before MySQL 5 there is a workaround in there for version 4. The contents of this method were originally part of get_indexes() and I just moved them to their own method (that's why you don't see it in the commit). So assuming that function worked fine previously in Django, I don't anticipate this will have a problem either. > > I have not had time to try out the patch, but did look at it. Doesn't the > base implementation of disable_foreign_key_checks need to return False > instead of just passing? The return value is used in loaddata processing to > decide whether it's necessary to re-enable/check. It actually doesn't *need* to return False; pass is the same as not returning anything or returning None. The boolean check just treats it the same way as False. "Should it?" is another question. On the one hand it's a bit more clear, this value is called and always returns False, unless a backend has overridden it. On the other hand, pass is in keeping with other methods in that class that are meant to be overridden in backends, so I went with pass to emphasize that aspect of the code. > > Thanks for working on this -- wish I'd thought of that idea two years ago! You're welcome. It was actually fun to solve a cold case :) -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Thoughts on solution to forward references in MySQL (#3615)
On Jun 27, 2011, at 5:44 PM, Michael Blume wrote: > I see a variable saved_objects being written, but I don't see it being > accessed -- is this to ease future features, or am I missing a code path? Thanks good catch. This was a remnant from an earlier iteration of this patch, in which I tried (unsuccessfully) to call save() on any objects created during the fixture load after foreign key checks had been re-enabled, in hopes that the UPDATE call this generated would trigger a constraint check. It did not, because MySQL ignores UPDATE if the data being set matches what's already in the DB. I have some other changes I'm making in response to some of Jacob's comments on the ticket, so this will be gone in the next version of the patch. > If I'm reading correctly, check_for_invalid_foreign_keys extends over all the > rows in a table. loaddata is called by syncdb and South's migrate, not just > when a db is set up, so this could easily wind up run over lots and lots of > non-fixture data. I don't know MySQL's performance characteristics that well > -- is this likely to be expensive? This is a good question. First, though it's true that loaddata is called in several places other than during tests, I would imagine it is rarely if ever called during a normal request/response cycle in production. I'm operating under the assumption that the performance of this patch is important but not as critical as it would be if this was a command used in production. So that caveat aside, I will try later to find a large data set I can run the query on to get an idea of what kind of performance hit the check entails. If anyone else has a large data set with two related tables, you can try it out as well and report your results. The basic structure of the SQL statement being run is as follows: SELECT REFERRING.`id`, REFERRING.`id_pointing_to_some_related_table ` FROM `some_table` as REFERRING LEFT JOIN `some_related_table` as REFERRED ON (REFERRING.`id_pointing_to_some_related_table` = REFERRED.`id`) WHERE REFERRING.`id_pointing_to_some_related_table ` IS NOT NULL AND REFERRED.`id` IS NULL And keep in mind this is being called for *each* relation in the table. So that'll run through all the rows once for every relation. Of course, it won't return anything unless you have bad data, so the i/o aspect should be minimal. If we notice an "unacceptable" performance hit the likely solution would be to scope that select statement to only rows that were added during the load data routine. Ideally I'm hoping to steer clear of that, because it entails more work and complexity in loaddata, since you'd have to track all the IDs that were added and then pass them along here. Wouldn't be a big deal but that's why I only wanted to do it if necessary. Thanks for your input and help testing this btw! Jim -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Re: Thoughts on solution to forward references in MySQL (#3615)
On Jun 27, 2011, at 4:52 PM, Russell Keith-Magee wrote: > Unfortunately, not much. Your test has validated that the extra code > doesn't break anything under MyISAM, and this is certainly useful. > However, the root problem only exists with InnoDB because of its... > eclectic... implementation of row level constraints. MyISAM doesn't > have constraints, so your test hasn't exercised the part of the code > that needs to be exercised heavily. One unintended benefit of the current patch is that it actually will (if I'm not mistaken) check for invalid foreign key references in MyISAM as well, since the check is run for all MySQL implementations (though not for any other backends). So MyISAM would normally silently allow bad foreign keys to be loaded from your fixture data, but with the patch you'll now get an error. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
Should test failures/errors in the Django test suite when executed under MySQL be considered blockers?
I've been trying to do a little work recently on some tickets that related to MySQL-specific issues that come up using Django (#2495 and #3615 specifically). As part of trying to get my head around resolving these, I executed the test suite in the current Django trunk using a MySQL InnoDB backend. The results weren't pretty. I had to make a few changes to get the tests to even execute (related to a missing "pk" field in regression_tests/admin_views), and once I did I got FAILED (failures=91, errors=219, skipped=21, expected failures=2)*.* It's a bit hard to know where a lot of these are stemming from, but presumably a big chunk relate to the fixture loading problems identified in #3615. Regardless, it appears that many of these errors have existed across multiple version releases. Also, some of the tickets that relate to these issues are marked as severity normal instead of blocker. I guess I'd like to understand a bit better what to make of this. Are test failures with the MySQL backend considered acceptable? Do we consider it okay to drop new releases while these issues go unresolved? Is MySQL not considered a fully supported backend? As I mentioned in a comment recently on #3615, I'd like to get involved in helping move forward on some of these issues (my intent is not to gripe, if that's what it sounds like I'm doing). At the same time, I guess I'd like to have a more clear understanding of the Django developers' commitment to supporting MySQL in the test suite. If this is not something that's considered to be a big deal, or if other decisions have been made on this topic already that I missed, then that's cool. Hopefully someone can shed some light on that if this is the case. For what it's worth, I haven't come across any serious issues running Django under MySQL in production -- it's all basically related to running the test suite, etc. Still, I'd love to get the test suite for the MySQL backend up to speed because it would make developing and fixing bugs for that backend that much easier. Thanks for any thoughts on this. -- You received this message because you are subscribed to the Google Groups "Django developers" group. To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/20XgIVa-4AIJ. To post to this group, send email to django-developers@googlegroups.com. To unsubscribe from this group, send email to django-developers+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.