Re: Regexp. Should be easy I think..
Hello Leif, On Tuesday, April 20, 2004 at 20:13 GMT -0600, a stampede was started when Leif Gregory hollered: It just occurred to me, if you're using PHP, can't you get all the info you need without using regexps? I seem to recall that there are built in functions that can get you pretty much any info you want, though I can't find it in the phpinfo() function... I'm pretty sure the HTTP request type will always be first so I can anchor with ^, but IIS likes to put the SERVER before the DATE, and Sambar is the reverse. So, my thinking was that if I did the HTTP part and then moved up to the first : and backed up to the first whitespace, I could grab the next chunk (either DATE or SERVER) up to the next : (and then back to the first whitespace), and continue that until I hit the EOL. You can do that with the chunk that I wrote. What I don't know is how php handles regexps. I know in VBScript, when you do a regexp, all possible matches are stored in an array, so it is pretty easy to get out all the parts you want. In TB, that isn't the case, so I tend to forget about that option. If php will populate an array, then you're golden. The regexp could be fairly simple. JA (?i)(Date:\s*.*?)((\s\S+:\s)|\z) JA But increasing accuracy has the price of decreasing tolerance for JA errors in the string. Exactly. The way I wrote the above regexp, you should be pretty accurate without losing any generality. I didn't check an Apache server (I'll do that tomorrow) to see how it outputs its HTTP headers. I am looking for something generic, hence my hoping I could use the : as jump points to back up from. If you really want to do that, you should use a look-ahead assertion. Something like: (\S*:\s*.*?)\s(?=\S*:\s) I haven't tried this in PHP, but in principle it should work. Right. That shouldn't be a problem. I have a list of the atoms for PHP and they are close to TB. Excellent. Do you mind sending me either a link or the list (off list if you like)? I was slowly learning some PHP stuff myself, so that could be very useful. I had considered that (just doing multiple reg matches), but wondered if there was a better way. It is a very small script, so it wouldn't really kill the performance by doing multiple reg matches. Like I mentioned above, if PHP fills an array with all the matches, you get the best of both worlds. So far, this one has always been first. It'll get ugly if it pops up somewhere else on some strange webserver. Well then, it doesn't have to be hard, just use: ^(.*?)\s+(\S*:\s) The order does change with exception to HTTP that I've discovered so far anyways. Well, with multiple regexps, this isn't an issue. A single TB style match is more difficult with this restriction. The only way around it would be to use If..then statements, but the question becomes: which is worse? Running several matches, or processing the matches through a conditional cascade? This might just be the best way to do it. It certainly is the easiest, though you will probably pay in performance if every clock cycle counts. Yeah, but then I'd have to read at least ten posts telling me to Google it. Like I really hadn't thought of that! grin sigh That's why we need TBPHP, TBEverything_Under_The_Sun. You'd be willing to moderate a few more lists, right? ;-) -- Thanks for writing, Januk Aggarwal http://www.silverstones.com/thebat/TBUDLInfo.html
Regexp. Should be easy I think..
Hello tbtech, Ok, I love coding PHP and ASP, but regexps just really kick my rear... Don't know why, but they do... At any rate, can anyone come up with a regexp to break this down: HTTP/1.1 200 OK Date: Tue, 20 Apr 2004 17:28:23 GMT Server: SAMBAR Last-modified: Thu, 01 Jan 2004 19:56:39 GMT Connection: close Content-type: text/html So that I can echo it out like this: HTTP/1.1 200 OK Date: Tue, 20 Apr 2004 17:28:23 GMT Server: SAMBAR Last-modified: Thu, 01 Jan 2004 19:56:39 GMT Connection: close Content-type: text/html I think if I can at least see how to chop it up, then I can get it to work with PHP preg_match_all() or preg_match(). I don't know if I should home in on the colon and then back up to the first whitespace, or what. My plan is to use if...then...elseif to output them regardless of what order they get pulled out of the initial string (i.e. IIS switches the order of the items in the original string). And yes, I know this is a TB list. The PHP list is less than friendly. Thanks. Tagline of the day: It's is not, it isn't ain't, and it's it's, not its, if you mean it is. If you don't, it's its. Then too, it's hers. It isn't her's. It isn't our's either. It's ours, and likewise yours and theirs. -- Oxford University Press, Edpress News -- Leif (TB list moderator and fellow end user). Using The Bat! 2.10 RC/1 under Windows 2000 5.0 Build 2195 Service Pack 4 on a Pentium 4 2GHz with 512MB http://www.silverstones.com/thebat/TBUDLInfo.html
Re: Regexp. Should be easy I think..
Hi Januk, On Tue, 20 Apr 2004, at 15:16:34 [GMT -0700] (which was 4:16 PM where I live) you wrote: JA I'll try. What really helps is if you can determine what parts of JA your string are constant and which parts change. For example, the JA Date part is probably always going to be: JA Date: DAY, DD MMM HH:mm:ss GMT I'm pretty sure the HTTP request type will always be first so I can anchor with ^, but IIS likes to put the SERVER before the DATE, and Sambar is the reverse. So, my thinking was that if I did the HTTP part and then moved up to the first : and backed up to the first whitespace, I could grab the next chunk (either DATE or SERVER) up to the next : (and then back to the first whitespace), and continue that until I hit the EOL. I just wasn't sure what I should do to get it started. JA (?i)(Date:\s*.*?)((\s\S+:\s)|\z) JA But increasing accuracy has the price of decreasing tolerance for JA errors in the string. Exactly. I didn't check an Apache server (I'll do that tomorrow) to see how it outputs its HTTP headers. I am looking for something generic, hence my hoping I could use the : as jump points to back up from. JA Note that I'm using TB specific atoms. You may have to modify the JA syntax of these to work in PHP, I don't know. \s means any white JA space character. \S means any non-whitespace character. \z is JA end of subject (independent of multiline settings). The (?i) JA just sets the regexp to be case-insensitive. I don't know if php JA requires a different method for internal option setting. The + JA means, match one or more characters of the preceding type. Right. That shouldn't be a problem. I have a list of the atoms for PHP and they are close to TB. JA So in this case, your basic regexp would be: JA (?i)(Date:\s*.*?)((\s\S*:\s*)|\z) JA And you could just change the term Date to Server, JA Last-modified, or Connection as necessary. The desired JA information should be in subpattern 1 I had considered that (just doing multiple reg matches), but wondered if there was a better way. It is a very small script, so it wouldn't really kill the performance by doing multiple reg matches. JA Now the HTTP one is a bit trickier. If you know that the HTTP JA section is always first, and the next field is always the date JA field, then your easiest bet is: So far, this one has always been first. It'll get ugly if it pops up somewhere else on some strange webserver. JA That seems fairly reasonable. What would also work is if you JA definitely know the order that the tokens will be listed in. Then JA you could search for everything between two labels. The order does change with exception to HTTP that I've discovered so far anyways. JA No need to do that if you do a few simple searches instead of one JA complex one. This might just be the best way to do it. JA But they would have a better shot at correct syntax... ;-) Yeah, but then I'd have to read at least ten posts telling me to Google it. Like I really hadn't thought of that! grin Thank you very much for the help. -- Cheers, Leif Gregory List Moderator (and fellow registered end-user) PCWize Editor / ICQ 216395 / PGP Key ID 0x7CD4926F Web Site http://www.PCWize.com TB FAQ http://www.silverstones.com/thebat/FAQ.html Using The Bat! 2.05 Beta/16 under Windows 2000 5.0 Build 2195 Service Pack 4 on a P4 1.6Ghz OC'd to 2.32Ghz with 512MB. Tagline of the day: A bad day: Transfer completed (5720468 bytes, 56651 errors, 1 CPS) http://www.silverstones.com/thebat/TBUDLInfo.html