Re: (Non-) Capturing REs

2011-10-26 Thread Karsten Bräckelmann
On Tue, 2011-10-25 at 18:46 -0700, Adam Katz wrote:
> On 10/24/2011 02:45 PM, Karsten Bräckelmann wrote:

> [...] though I seem to recall the SA debug output includes the
> matched text (which implies $&), though if this were important, I'm sure
> we'd have already concluded it worthwhile to do stupid things like
> surrounding entire regexps with (?=this).

Good point, the debug output indeed includes the match -- coincidentally
a fact I very recently stressed in another thread. As I said earlier, I
didn't find any use of these special match capturing variables grepping
the code. This made me curious, and digging for it quickly shows the
following in Check.pm:

  # note: keep this in 'single quotes' to avoid the $ & performance hit,
  # unless specifically requested by the caller.   Also split the
  # two chars, just to be paranoid and ensure that a buggy perl interp
  # doesn't impose that hit anyway (just in case)

And the code evaluated later to the special $& match variable, only run
in debug mode:

$match = '($' . '&' . '|| "negative match")';


Given the rather explicit comment and this code, I am getting convinced
my interpretation of the perlre docs was correct. The global performance
penalty for capturing grouping ONLY applies, if these special vars are
used. It's a non-issue for normal operation [1], regardless of capturing
or non-capturing grouping.

I'd still be happy for someone with more in-depth knowledge about the
subject to confirm -- or disprove.


[1] For local rules, anyway. Regarding the bulk of the stock rules, I
still support using (?:non) capturing grouping as best-practice.


> > Not trying to be confrontational, just honestly asking and wondering
> > about the real impact. After all, the perlre docs specifically 
> > mention to strongly prefer non-capturing grouping basically once
> > only -- in the warning paragraph about the special vars.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Responsibility of sites that hold user-created documents (was Re: One-line URI body spam)

2011-10-26 Thread SM

At 13:03 19-10-2011, David F. Skoll wrote:

In my dream world, people would blacklist Google.  I made a suggestion


The approach would also be applicable for pastebin (which is 
generally suggested on this mailing list) and any other free 
service.  The subject could be rewritten as "responsibility of free 
services that hold user-created documents".


Regards,
-sm