Re: Url regex keeps django busy/crashing
On 3-8-2012 2:34, Melvyn Sopacua wrote: Correction: > url(r'^(?P\w[\w-]+-/$', 'detail') insert question mark here --> ? -- Melvyn Sopacua -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Url regex keeps django busy/crashing
On 26-7-2012 16:45, Joe wrote: > Hey, I have a url regex like this which is keeping django extremely busy > (20secs to 1min to handle a request). On some urls it even crashes. > > my regex: > > url(r'^(?P(\w+-?)*)/$', 'detail'), Turn the * into a + and you'll see great improvements and I also think you don't want to match '//' as a valid URL part. Also, I think this example will satisfy your requirements in practice: url(r'^(?P\w[\w-]+-/$', 'detail') The only difference is that dashes are allowed to follow each other. I can only think of one valid reason to not use the above URL and that is if "multiple dashes" are captured in another URL. Remember that URL patterns are not your validators. It's nice if you can prevent a view from being called by carefully constructing your URL patterns, but if parsing the regex takes longer then calling the view you loose performance instead. Also, validating if a URL contains two or more consecutive dashes is easily done in a view and does not even need regular expressions: def detail(request, item_url) : if '--' in item_url : raise 404 Even more improvements if you keep the urls lower case (or uppercase, but not mixed case) and use [a-z0-9_] instead of \w. -- Melvyn Sopacua -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Url regex keeps django busy/crashing
On 07/26/12 09:45, Joe wrote: > url(r'^(?P(\w+-?)*)/$', 'detail'), > > replaced with: > > url(r'^(?P[\w-]+)/$', 'detail'), Russell gave you good background on the why (including that Django was stung by the same issue). It would help if you more clearly defined what you wanted to target. Your first one can match things like x-x-x-x- with trailing dashes, and your second one can match things like -- # pure dashes ---xxx # leading dashes --xx-- # leading and trailing dashes I suspect you want an expression something like (?P\w+(?:-\w+)*)/$ perhaps having a "?" after the terminal slash to make it optional. This expression is roughly "one or more \words separated by one dash." You might change "\w" to "[a-zA-Z]" to ensure you can't match odd things like _-_-_-_ ("\w" includes underscores). -tkc -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Re: Url regex keeps django busy/crashing
On Thu, Jul 26, 2012 at 10:45 PM, Joewrote: > Hey, I have a url regex like this which is keeping django extremely busy > (20secs to 1min to handle a request). On some urls it even crashes. > > my regex: > > url(r'^(?P(\w+-?)*)/$', 'detail'), > > > view: > > def detail(request, item_url): >i = get_object_or_404(Page, url=item_url,published=True) >return render_to_response('item/detail.html', {'item':i}, >context_instance=RequestContext(request)) > > replaced with: > > url(r'^(?P[\w-]+)/$', 'detail'), > > > The replacement works like a charm. What is wrong with the first regex? Hi Joe, There's nothing strictly *wrong* with the first regex -- it's just describes a very complex lookup strategy, and as a result, it takes extra time to compute it. In the second regex, you're asking for "a string of 1 or more characters that are either word-like or '-'". That's a very easy thing to check - if you think of how you would manually implement code that check that policy, it could be done with a simple if inside a while loop; as soon as you find a character that doesn't match, you can bail out. However, the first regex is asking for "0 or more groups of word like characters, each of which might be followed by a '-'". Consider a trivial case, matching against the string abcde. It can match the first regex in an incredible number of ways: (a)(b)(c)(d)(e) (ab)(c)(d)(e) (abc)(d)(e) (abcd)(e) (abcde) (a)(bc)(d)(e) (a)(bcd)(e) (a)(bcde) (a)(b)(cde) … and so on. Because you're asking the regex to preserve groups, the algorithm needs to essentially work out every single one of these groups, and then determine which set will be reported as the actual match. As you can guess, this can take some time, which you're observing as a 1 minute delay in serving a URL. This is one of the gotchas that comes from using regular expressions. They're a very powerful language for expressing constraints, but you need to be careful that you don't accidentally fall into a trap where you're asking for something very complex. And don't worry - you're in good company being bitten by this problem. There was a Django security release caused *specifically* by a regular expression like yours. Django uses regular expressions to validate URLs and email form inputs, and at one point, the regex that was used to validate email addresses was constructed in such a way that it was possible to provide a very simple string that would cause the validator to take 30 seconds to confirm that it wasn't valid. Write a tool that hits the same URL and validates the same string 100 times, and you've got yourself a DDOS attack. So - when you're building your URL patterns, you should be trying to keep your regular expressions as simple as possible -- i.e., simple linear probes. If you really do need to match a complex pattern, you'd be better served using a simple regex in the URL pattern, and then doing more specific validation in the view (and raising 404 if the pattern doesn't match what you need it to). Yours, Russ Magee %-) -- You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Url regex keeps django busy/crashing
Hey, I have a url regex like this which is keeping django extremely busy (20secs to 1min to handle a request). On some urls it even crashes. my regex: url(r'^(?P(\w+-?)*)/$', 'detail'), view: def detail(request, item_url): i = get_object_or_404(Page, url=item_url,published=True) return render_to_response('item/detail.html', {'item':i}, context_instance=RequestContext(request)) replaced with: url(r'^(?P[\w-]+)/$', 'detail'), The replacement works like a charm. What is wrong with the first regex? Thanks in advance. -- You received this message because you are subscribed to the Google Groups "Django users" group. To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/lVIrewdZipMJ. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.