Re: Regexp. Should be easy I think..

2004-04-21 Thread Januk Aggarwal
Hello Leif,

On Tuesday, April 20, 2004 at 20:13 GMT -0600, a stampede was started
when Leif Gregory hollered:

It just occurred to me, if you're using PHP, can't you get all the
info you need without using regexps?  I seem to recall that there are
built in functions that can get you pretty much any info you want,
though I can't find it in the phpinfo() function...

 I'm pretty sure the HTTP request type will always be first so I can
 anchor with ^, but IIS likes to put the SERVER before the DATE, and
 Sambar is the reverse. So, my thinking was that if I did the HTTP part
 and then moved up to the first : and backed up to the first
 whitespace, I could grab the next chunk (either DATE or SERVER) up to
 the next : (and then back to the first whitespace), and continue that
 until I hit the EOL.

You can do that with the chunk that I wrote. What I don't know is how
php handles regexps. I know in VBScript, when you do a regexp, all
possible matches are stored in an array, so it is pretty easy to
get out all the parts you want. In TB, that isn't the case, so I tend
to forget about that option. If php will populate an array, then
you're golden. The regexp could be fairly simple.

JA (?i)(Date:\s*.*?)((\s\S+:\s)|\z)
JA But increasing accuracy has the price of decreasing tolerance for
JA errors in the string.

 Exactly.

The way I wrote the above regexp, you should be pretty accurate
without losing any generality.

 I didn't check an Apache server (I'll do that tomorrow) to
 see how it outputs its HTTP headers. I am looking for something
 generic, hence my hoping I could use the : as jump points to back up
 from.

If you really want to do that, you should use a look-ahead assertion.
Something like:
(\S*:\s*.*?)\s(?=\S*:\s)

I haven't tried this in PHP, but in principle it should work.

 Right. That shouldn't be a problem. I have a list of the atoms for PHP
 and they are close to TB.

Excellent.  Do you mind sending me either a link or the list (off list
if you like)?  I was slowly learning some PHP stuff myself, so that
could be very useful.

 I had considered that (just doing multiple reg matches), but wondered
 if there was a better way. It is a very small script, so it wouldn't
 really kill the performance by doing multiple reg matches.

Like I mentioned above, if PHP fills an array with all the matches,
you get the best of both worlds.

 So far, this one has always been first. It'll get ugly if it pops up
 somewhere else on some strange webserver.

Well then, it doesn't have to be hard, just use:
^(.*?)\s+(\S*:\s)

 The order does change with exception to HTTP that I've discovered so
 far anyways.

Well, with multiple regexps, this isn't an issue.  A single TB style
match is more difficult with this restriction.  The only way around it
would be to use If..then statements, but the question becomes: which
is worse?  Running several matches, or processing the matches through
a conditional cascade?

 This might just be the best way to do it.

It certainly is the easiest, though you will probably pay in
performance if every clock cycle counts.

 Yeah, but then I'd have to read at least ten posts telling me to
 Google it. Like I really hadn't thought of that! grin

sigh That's why we need TBPHP, TBEverything_Under_The_Sun.  You'd be
willing to moderate a few more lists, right? ;-)

-- 
Thanks for writing,
 Januk Aggarwal






http://www.silverstones.com/thebat/TBUDLInfo.html


Regexp. Should be easy I think..

2004-04-20 Thread Leif Gregory
Hello tbtech,

  Ok, I love coding PHP and ASP, but regexps just really kick my
  rear... Don't know why, but they do...

  At any rate, can anyone come up with a regexp to break this down:

  HTTP/1.1 200 OK Date: Tue, 20 Apr 2004 17:28:23 GMT Server: SAMBAR Last-modified: 
Thu, 01 Jan 2004 19:56:39 GMT Connection: close Content-type: text/html

  So that I can echo it out like this:

  HTTP/1.1 200 OK
  Date: Tue, 20 Apr 2004 17:28:23 GMT
  Server: SAMBAR
  Last-modified: Thu, 01 Jan 2004 19:56:39 GMT
  Connection: close Content-type: text/html

  I think if I can at least see how to chop it up, then I can get it
  to work with PHP preg_match_all() or preg_match().

  I don't know if I should home in on the colon and then back up to
  the first whitespace, or what.

  My plan is to use if...then...elseif to output them regardless of
  what order they get pulled out of the initial string (i.e. IIS
  switches the order of the items in the original string).

  And yes, I know this is a TB list. The PHP list is less than
  friendly.

  Thanks.
  

Tagline of the day:
It's is not, it isn't ain't, and it's it's, not its, if you
mean it is.  If you don't, it's its.  Then too, it's hers.
It isn't her's.  It isn't our's either.  It's ours, and
likewise yours and theirs.
   -- Oxford University Press, Edpress News



-- 
Leif (TB list moderator and fellow end user).

Using The Bat! 2.10 RC/1 under Windows 2000 5.0
Build 2195 Service Pack 4 on a Pentium 4 2GHz with 512MB









http://www.silverstones.com/thebat/TBUDLInfo.html


Re: Regexp. Should be easy I think..

2004-04-20 Thread Leif Gregory
Hi Januk,

On Tue, 20 Apr 2004, at 15:16:34 [GMT -0700] (which was 4:16 PM where
I live) you wrote:
JA I'll try. What really helps is if you can determine what parts of
JA your string are constant and which parts change. For example, the
JA Date part is probably always going to be:
JA Date: DAY, DD MMM  HH:mm:ss GMT

I'm pretty sure the HTTP request type will always be first so I can
anchor with ^, but IIS likes to put the SERVER before the DATE, and
Sambar is the reverse. So, my thinking was that if I did the HTTP part
and then moved up to the first : and backed up to the first
whitespace, I could grab the next chunk (either DATE or SERVER) up to
the next : (and then back to the first whitespace), and continue that
until I hit the EOL.

I just wasn't sure what I should do to get it started.


JA (?i)(Date:\s*.*?)((\s\S+:\s)|\z)
JA But increasing accuracy has the price of decreasing tolerance for
JA errors in the string.

Exactly. I didn't check an Apache server (I'll do that tomorrow) to
see how it outputs its HTTP headers. I am looking for something
generic, hence my hoping I could use the : as jump points to back up
from.


JA Note that I'm using TB specific atoms. You may have to modify the
JA syntax of these to work in PHP, I don't know. \s means any white
JA space character. \S means any non-whitespace character. \z is
JA end of subject (independent of multiline settings). The (?i)
JA just sets the regexp to be case-insensitive. I don't know if php
JA requires a different method for internal option setting. The +
JA means, match one or more characters of the preceding type.

Right. That shouldn't be a problem. I have a list of the atoms for PHP
and they are close to TB.

JA So in this case, your basic regexp would be:
JA (?i)(Date:\s*.*?)((\s\S*:\s*)|\z)

JA And you could just change the term Date to Server,
JA Last-modified, or Connection as necessary.  The desired
JA information should be in subpattern 1

I had considered that (just doing multiple reg matches), but wondered
if there was a better way. It is a very small script, so it wouldn't
really kill the performance by doing multiple reg matches.


JA Now the HTTP one is a bit trickier. If you know that the HTTP
JA section is always first, and the next field is always the date
JA field, then your easiest bet is:

So far, this one has always been first. It'll get ugly if it pops up
somewhere else on some strange webserver.


JA That seems fairly reasonable. What would also work is if you
JA definitely know the order that the tokens will be listed in. Then
JA you could search for everything between two labels.

The order does change with exception to HTTP that I've discovered so
far anyways.

JA No need to do that if you do a few simple searches instead of one
JA complex one.

This might just be the best way to do it.

JA But they would have a better shot at correct syntax... ;-)

Yeah, but then I'd have to read at least ten posts telling me to
Google it. Like I really hadn't thought of that! grin


Thank you very much for the help.



-- 
Cheers,
Leif Gregory 

List Moderator (and fellow registered end-user)
PCWize Editor  /  ICQ 216395  /  PGP Key ID 0x7CD4926F
Web Site http://www.PCWize.com
TB FAQ   http://www.silverstones.com/thebat/FAQ.html
Using The Bat! 2.05 Beta/16 under Windows 2000 5.0 Build 2195 Service Pack 4 
on a P4 1.6Ghz OC'd to 2.32Ghz with 512MB.

Tagline of the day:
A bad day: Transfer completed (5720468 bytes, 56651 errors, 1 CPS)






http://www.silverstones.com/thebat/TBUDLInfo.html