On Mar 17, 2:25 pm, kangax <kan...@gmail.com> wrote:
> On Mar 16, 9:44 pm, RobG <rg...@iinet.net.au> wrote:
>
> > On Mar 17, 4:10 am, kangax <kan...@gmail.com> wrote:
>
> > > On Mar 16, 1:13 pm, arkady <arkad...@gmail.com> wrote:
>
> > > > if trying to strip off the everything before the <body> and everything
> > > > after </body>
>
> > > response.replace(/.*(?=<body>)/, '').replace(/(<\/body>).*/, '$1');
>
> > That seems a bit risky, the string may not always have lower case tag
> > names and the body opening tag may include attributes. New lines in
>
> I actually took OP's issue too literally; i.e. - "strip off everything
> before the <body> and after </body>" : )
>
> > the string might trip it up too. In any case, it doesn't work for me
> > at all in Firefox 3 or IE 6.
>
> Which string did you feed it with? dot doesn't match newlines, does
> it? [\s\S] should match:
>
> response.replace(/[\s\S]*(?=<body)/i, '');
I was giving it the innerHTML of the doc it was in (with new lines,
returns, etc.), the above does the trick ( I guess any countermanding
character class would do - [\d\D] works too). For a more general (and
the OP’s) case, it needs to also trim everything after the closing </
body> (which *should* only ever be </html> with maybe some whitespace
but who knows what a server might send?) as you’ve done below.
According to the innerHTML, Firebug puts a div between the head and
body elements - not sure if I like that, it will be dealt with by
error correction (moved into the body or perhaps ignored completely)
if fed back to the browser.
> > An alternative, provided all new lines are removed, is:
>
> > response.match(/<body.*body>/i)[0];
>
> > or
>
> > response.replace(/\s/g,' ').match(/\<body.+body\>/i)[0];
>
> > A sub-string version is:
>
> > var start = response.toLowerCase().indexOf('<body');
> > var end = response.toLowerCase().indexOf('</body>') + 7;
> > var theBody = response.substring(start, end)
>
> Obviously, string-based matching should be marginally faster than
> regex, especially when that regex is based on a relatively slow
> positive lookahead : )
But the substring stuff just *looks* clunky. :-p
>
> var response = document.documentElement.innerHTML;
> console.time(1);
> for (var i=0; i<100; i++) {
> var l = response.toLowerCase();
> response.substring(l.indexOf('<body'), l.indexOf('</body>') + 7);}
>
> console.timeEnd(1);
>
> var response = document.documentElement.innerHTML;
> console.time(2);
> for (var i=0; i<100; i++) {
> response.replace(/[\s\S]*(?=<body)/i, '')
> .replace(/(<\/body>)[\s\S]*/i, '$1');}
>
> console.timeEnd(2);
>
> //1: 186ms
> //2: 2664ms
For that sort of speed gain, I’d use substring every time - match is
about 50% slower again.
--
Rob
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Prototype & script.aculo.us" group.
To post to this group, send email to prototype-scriptaculous@googlegroups.com
To unsubscribe from this group, send email to
prototype-scriptaculous+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/prototype-scriptaculous?hl=en
-~----------~----~----~----~------~----~------~--~---