Re: Regex Question
* rahlqu...@gmail.com : > As said before blocking at the MTA would be less resource intensive but I > want the whole message to feed bayes. But you already KNOW you don't want that stuff :) No need to poison your bayesdb with that... > As for Ralf and his lightly gruff response, its to be expected when > asking for help on the net and I grew my thick skin 8 years ago asking > questions on setting up SMTP Auth on the Sendmail list. Compared to > some of the folks there Ralf nearly blew me a kiss. I was not trying to be rude. I just want to keep things simple. -- Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 ralf.hildebra...@charite.de | http://www.charite.de
Re: Regex Question
On Tue, 10 Nov 2009, rahlqu...@gmail.com wrote: Thanks! Your earlier Regex is in place and doing quite well. Pleased to be of service. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Individual liberties are always "loopholes" to absolute authority. --- Tomorrow: Veterans Day
Re: Regex Question
On Tue, Nov 10, 2009 at 3:57 PM, John Hardin wrote: > On Tue, 10 Nov 2009, Ralf Hildebrandt wrote: > > On Tue, 2009-11-10 at 14:32 +0100, Ralf Hildebrandt wrote: >>> * rahlqu...@gmail.com : > Ok regex is not my strong suit by any means. Trying to get a match for > email addresses that start with a pipe character ( about 15% of my spam is > this ). > That's not needed. Why are you accepting mail to NON-EXISTING recipients at all? >>> > {snip} > > > He's generating throwaway addresses to find out who's selling these >> contact addresses. >> > > In that case, depending on the MTA logging, perhaps he could still disable > catchall and then troll the logs to see which invalid addresses were > attempted. > > ...or does _no_ modern MTA log the recipient addresses it rejects? I > haven't actually looked... :) > > > -- > John, Thanks! Your earlier Regex is in place and doing quite well. As said before blocking at the MTA would be less resource intensive but I want the whole message to feed bayes. The emails that do make it through the specified layout I described before drop into an account I can search or dump as I wish and sometimes I even eek out a wanted email, ad, or Pizza coupon (solicited). This all works well and if I am at a store and they request an email addy it really easy to just give them one. As for Ralf and his lightly gruff response, its to be expected when asking for help on the net and I grew my thick skin 8 years ago asking questions on setting up SMTP Auth on the Sendmail list. Compared to some of the folks there Ralf nearly blew me a kiss. Thanks for all the help and yall be good to each other.
Re: Regex Question
Ralf Hildebrandt wrote: > * Benny Pedersen : >> On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote >>> Please keep this in your mind in future before trotting out that tired >>> old gas. >> imho Ralf have never being banned in maillist here, if you dont like >> his answers just unsubscribe > > Good point, but richard has been banned multiple times on the postfix > list for asocial behaviour... Be careful, Ralf, else you risk inciting richard to reappear on the SA list as another fictitious user and start his flaming rants and raves again, as he has done in the past... Bill
RE: Regex Question
some centos people are having a pub party and the "kings and queens" in london it might be over already based upon time difference from usa maybe all of you could go there and drink beer and duke it out or something constructive ;-> - rh
Re: Regex Question
* John Hardin : > In that case, depending on the MTA logging, perhaps he could still > disable catchall and then troll the logs to see which invalid > addresses were attempted. Or block tke mail to any recipient starting with "|" In postfix that could be done with check_recipient_access regexp:/etc/postfix/blocked_recipients with /etc/postfix/blocked_recipients: /^\|/REJECT I would weed that absolutely unwanted stuff out at the MTA level to keep resource usage low (bandwidth, mostly) > ...or does _no_ modern MTA log the recipient addresses it rejects? I > haven't actually looked... :) I'v seen sendmail & postfix log the non-existing addresses. -- Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 ralf.hildebra...@charite.de | http://www.charite.de
Re: Regex Question
On Tue, 10 Nov 2009, Ralf Hildebrandt wrote: On Tue, 2009-11-10 at 14:32 +0100, Ralf Hildebrandt wrote: * rahlqu...@gmail.com : Ok regex is not my strong suit by any means. Trying to get a match for email addresses that start with a pipe character ( about 15% of my spam is this ). That's not needed. Why are you accepting mail to NON-EXISTING recipients at all? {snip} He's generating throwaway addresses to find out who's selling these contact addresses. In that case, depending on the MTA logging, perhaps he could still disable catchall and then troll the logs to see which invalid addresses were attempted. ...or does _no_ modern MTA log the recipient addresses it rejects? I haven't actually looked... :) -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Perfect Security and Absolute Safety are unattainable; beware those who would try to sell them to you, regardless of the cost, for they are trying to sell you your own slavery. --- Tomorrow: Veterans Day
Re: Regex Question
* Matus UHLAR - fantomas : > Ralf's question was in no way offensive. He is just trying to solve the > problem by way that is most efficient for most of e-mail users and admins. What the OP intends to do ("Who's selling away my addresses?") can be done in the MTA entirely. A colleague at tu-bs.de did that over 15 years ago by simply "increasing" a numerical portin in his email addresses. Problem being: Making an address "valid" -- If I define: bahn...@example.org as a contact address when contacting "bahn.de", then I have to have some sort of database WHICH addresses have been "used", and which have been "abused" (targeted by anybody BUT bahn.de senders). To avoid this DB he simply made all addresses valid and forwarded them to his real address (or something like that with a filter in between). Isn't there an automatic tool for this? -- Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 ralf.hildebra...@charite.de | http://www.charite.de
Re: Regex Question
* Benny Pedersen : > On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote > >Please keep this in your mind in future before trotting out that tired > >old gas. > > imho Ralf have never being banned in maillist here, if you dont like > his answers just unsubscribe Good point, but richard has been banned multiple times on the postfix list for asocial behaviour... -- Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 ralf.hildebra...@charite.de | http://www.charite.de
Re: Regex Question
* rich...@buzzhost.co.uk : > On Tue, 2009-11-10 at 14:32 +0100, Ralf Hildebrandt wrote: > > * rahlqu...@gmail.com : > > > Ok regex is not my strong suit by any means. Trying to get a match for > > > email > > > addresses that start with a pipe character ( about 15% of my spam is this > > > ). > > > > That's not needed. Why are you accepting mail to NON-EXISTING > > recipients at all? > > > Ralf, may I ask, do you predictably trot this offensive answer out all > the time for fun, or just because you are bored? If you make your system accept mail for non existing addresses, then you can do all kinds of useful research, but then you also usually know how to handle stuff you REALLY don't want to receive. In the OP's case (like he said in a PM), it's probably better to block RCPT TO:<|.*> on the MTA level. He's generating throwaway addresses to find out who's selling these contact addresses. > FYI, the last time I looked it was not a criminal offence to use a catch > all, unless the law is different in Germany? I fail to see how that matters, since he's not in Germany. And it's not. > I make heavy use of catchalls for spam tracking using 'balloon race' and > watermarking. I may, however, wish to skew and filter some combinations > despite running catch all. Makes perfect sense. > Please keep this in your mind in future before trotting out that tired > old gas. For everybody but the old scientific anti-spam geek in his/her sekrit lab it's really safer to just block mail to non-existing recipients. We're still getting enough spam that way. -- Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 ralf.hildebra...@charite.de | http://www.charite.de
Re: Regex Question
From: Sent: Tuesday, 2009/November/10 09:14 On Tue, 2009-11-10 at 11:45 -0500, Alex wrote: >> imho Ralf have never being banned in maillist here, if you dont like >> his answers just unsubscribe >> > Trotting out useless, pointless, tardy, curt, terse replies benefit > nobody at all and makes the poster look arrogant especially when the > answer is mere opinion. I sometimes welcome the terse replies; it illicit's clarification from the OP. I hardly think Ralf is interested in wasting his time playing games on this mailing list. Even if it were true, I think Ralf has also earned the ability to be a bit arrogant. Regards, Alex ... Rather than let this drift into a hijacked free-for-all perhaps one of the guru's of REGEX here would actually like to answer the OP's question. This is a human being asking for help. I don't know the answer myself or I would. I'm guessing that escaping the pipe \| does not work? Condescendingly pats the youngster on the head, "It's too late, boy. Stop yourself before it is too late." {^_^} (I get to do that at my age to most of the people on the net. {^_-})
Re: Regex Question
From: Sent: Tuesday, 2009/November/10 08:27 On Tue, 2009-11-10 at 16:50 +0100, Benny Pedersen wrote: On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote > Please keep this in your mind in future before trotting out that tired > old gas. imho Ralf have never being banned in maillist here, if you dont like his answers just unsubscribe Trotting out useless, pointless, tardy, curt, terse replies benefit nobody at all and makes the poster look arrogant especially when the answer is mere opinion. The OP asked a perfectly civil question that did not warrant such a tired, rude old skool style micro flaming. It does not make someone look superior or 'clever' to offer such a response, it simply makes them look like a backside lacking in social skills. Your support for the response is duly noted, but there is no love lost between us in any case. 1) Justifying your curt thoughtless reply is adding noise to the list. (That's just a thought to bear in mind here.) 2) The way the question was asked I almost made exactly the same reply. With the number of replies present, I stayed silent. Fuggheadedness (note gg not ck, different things) draws me out sometimes, though. 3) Once the question was asked properly an answer useful for you was forthcoming. Should that be a wake-up call for you to ask your questions with a little more detail about why and what you are trying to do. {^_^}
Re: Regex Question
On Tue, 10 Nov 2009, rich...@buzzhost.co.uk wrote: Rather than let this drift into a hijacked free-for-all perhaps one of the guru's of REGEX here would actually like to answer the OP's question. If you hadn't gotten distracted by your multiple nemeses you would have noticed I've done so. :) -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- North Korea: the only country in the world where people would risk execution to flee to communist China. -- Ride Fast --- Tomorrow: Veterans Day
Re: Regex Question
On Tue, 2009-11-10 at 11:45 -0500, Alex wrote: > >> imho Ralf have never being banned in maillist here, if you dont like > >> his answers just unsubscribe > >> > > Trotting out useless, pointless, tardy, curt, terse replies benefit > > nobody at all and makes the poster look arrogant especially when the > > answer is mere opinion. > > I sometimes welcome the terse replies; it illicit's clarification from > the OP. I hardly think Ralf is interested in wasting his time playing > games on this mailing list. Even if it were true, I think Ralf has > also earned the ability to be a bit arrogant. > > Regards, > Alex I don't think that being plain bloody rude is playing games, and it's surprisingly common output - not just from Ralf, but from that Postfix set who seem to place some extreme value on their self importance. Alex, you hold Ralf in high regard and that is noble. There are many people I hold in high regard, but I base it on a process of merit, pivotal to which is how they treat 'little' people asking perfectly polite questions. In my eyes it is perfectly acceptable to challenging people who seem to have lost their place in reality, when they treat people in such a negative way. The terse answer given was nothing more than opinion. There are clearly occasions when accepting mail for @domain is a perfectly legitimate thing to do, provided, of course you don't bounce it after accepting it. Rather than let this drift into a hijacked free-for-all perhaps one of the guru's of REGEX here would actually like to answer the OP's question. This is a human being asking for help. I don't know the answer myself or I would. I'm guessing that escaping the pipe \| does not work?
Re: Regex Question
> On Tue, 2009-11-10 at 14:32 +0100, Ralf Hildebrandt wrote: > > * rahlqu...@gmail.com : > > > Ok regex is not my strong suit by any means. Trying to get a match for > > > email > > > addresses that start with a pipe character ( about 15% of my spam is this > > > ). > > > > That's not needed. Why are you accepting mail to NON-EXISTING > > recipients at all? On 10.11.09 14:26, rich...@buzzhost.co.uk wrote: > Ralf, may I ask, do you predictably trot this offensive answer out all > the time for fun, or just because you are bored? Ralf's question was in no way offensive. He is just trying to solve the problem by way that is most efficient for most of e-mail users and admins. > FYI, the last time I looked it was not a criminal offence to use a catch > all, unless the law is different in Germany? And it is not criminal offence to ask why is someone using using catch-all. Maybe the OP DOES want to use catch-all for this reason. Maybe the OP does NOT need catch-all. We can find this out by asking the poster WHY. > I make heavy use of catchalls for spam tracking using 'balloon race' and > watermarking. I may, however, wish to skew and filter some combinations > despite running catch all. you are, others are not. > Please keep this in your mind in future before trotting out that tired > old gas. Please keep that above in your mind before you start accusing people of being trolls and thus behaving exactly as troll. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Atheism is a non-prophet organization.
Re: Regex Question
On Tue, Nov 10, 2009 at 11:49 AM, John Hardin wrote: > On Tue, 10 Nov 2009, rahlqu...@gmail.com wrote: > > On Tue, Nov 10, 2009 at 9:09 AM, John Hardin wrote: >> >> * rahlqu...@gmail.com : >>> Ok regex is not my strong suit by any means. Trying to get a match > for email addresses that start with a pipe character ( about 15% of my > spam is this ). > >>> Richard, could you post the headers from one such to pastebin so we can >>> see >>> exactly what you're talking about? >>> >> >> Here you are John; >> http://pastebin.com/m733a7113 >> >> And no, I do indeed mean sent to. >> > > Okay. > > Comment: it would be better to catch and reject these at the MTA level, if > at all possible. I'm sure one of the Postfix admins could suggest how to do > so. > > How about this? > > header ENV_TO_BAR Received =~ / for <\|/ > > You don't need to match the entire address syntax. > > You might want to tighten it up a tiny bit (assuming the headers weren't > sanitized): > > header ENV_TO_BAR Received =~ / by dark\.pcsites\.com .* for <\|/ > > > -- > I could reject at the MTA but I want it to help me to filter and train bayes, many of these are going to multiple users. I'll give these a whack and see if anything squeaks! Thanks!
Re: Regex Question
>> I sometimes welcome the terse replies; it illicit's clarification from the >> OP. > > ITYM "elicits". Heh, yes, thanks. I don't think they're involved in some illicit sex scandal :-) In either case, the apostrophe was wrong, too. Working on getting a new toolchain compiled and working straight since 4pm yesterday :-) Thanks, Alex
Re: Regex Question
On 10-Nov-2009, at 09:27, rich...@buzzhost.co.uk wrote: > On Tue, 2009-11-10 at 16:50 +0100, Benny Pedersen wrote: >> On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote >>> Please keep this in your mind in future before trotting out that tired >>> old gas. >> >> imho Ralf have never being banned in maillist here, if you dont like >> his answers just unsubscribe >> > Trotting out useless, pointless, tardy, curt, terse replies benefit > nobody at all and makes the poster look arrogant especially when the > answer is mere opinion. I think you need to grow a thicker skin. > but there is no love lost between us in any case. Ah, there's the reason you wigged out. -- a freudian slip is when you say one thing but you're really thinking about a mother. no, a freudian slip is sexy underwear your mother wears
Re: Regex Question
On Tue, 10 Nov 2009, Alex wrote: imho Ralf have never being banned in maillist here, if you dont like his answers just unsubscribe Trotting out useless, pointless, tardy, curt, terse replies benefit nobody at all and makes the poster look arrogant especially when the answer is mere opinion. I sometimes welcome the terse replies; it illicit's clarification from the OP. ITYM "elicits". -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Our government should bear in mind the fact that the American Revolution was touched off by the then-current government attempting to confiscate firearms from the people. --- Tomorrow: Veterans Day
Re: Regex Question
On Tue, 10 Nov 2009, rahlqu...@gmail.com wrote: On Tue, Nov 10, 2009 at 9:09 AM, John Hardin wrote: * rahlqu...@gmail.com : Ok regex is not my strong suit by any means. Trying to get a match for email addresses that start with a pipe character ( about 15% of my spam is this ). Richard, could you post the headers from one such to pastebin so we can see exactly what you're talking about? Here you are John; http://pastebin.com/m733a7113 And no, I do indeed mean sent to. Okay. Comment: it would be better to catch and reject these at the MTA level, if at all possible. I'm sure one of the Postfix admins could suggest how to do so. How about this? header ENV_TO_BAR Received =~ / for <\|/ You don't need to match the entire address syntax. You might want to tighten it up a tiny bit (assuming the headers weren't sanitized): header ENV_TO_BAR Received =~ / by dark\.pcsites\.com .* for <\|/ -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- I have never learned to fight for my freedom. I was only good at enjoying it.-- Dutchman Oscar van den Boogaard, showing why Europe is doomed --- Tomorrow: Veterans Day
Re: Regex Question
>> imho Ralf have never being banned in maillist here, if you dont like >> his answers just unsubscribe >> > Trotting out useless, pointless, tardy, curt, terse replies benefit > nobody at all and makes the poster look arrogant especially when the > answer is mere opinion. I sometimes welcome the terse replies; it illicit's clarification from the OP. I hardly think Ralf is interested in wasting his time playing games on this mailing list. Even if it were true, I think Ralf has also earned the ability to be a bit arrogant. Regards, Alex
Re: Regex Question
On Tue, 2009-11-10 at 16:50 +0100, Benny Pedersen wrote: > On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote > > Please keep this in your mind in future before trotting out that tired > > old gas. > > imho Ralf have never being banned in maillist here, if you dont like > his answers just unsubscribe > Trotting out useless, pointless, tardy, curt, terse replies benefit nobody at all and makes the poster look arrogant especially when the answer is mere opinion. The OP asked a perfectly civil question that did not warrant such a tired, rude old skool style micro flaming. It does not make someone look superior or 'clever' to offer such a response, it simply makes them look like a backside lacking in social skills. Your support for the response is duly noted, but there is no love lost between us in any case.
Re: Regex Question
On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote Please keep this in your mind in future before trotting out that tired old gas. imho Ralf have never being banned in maillist here, if you dont like his answers just unsubscribe -- xpoint
Re: Regex Question
On Tue, 2009-11-10 at 14:32 +0100, Ralf Hildebrandt wrote: > * rahlqu...@gmail.com : > > Ok regex is not my strong suit by any means. Trying to get a match for email > > addresses that start with a pipe character ( about 15% of my spam is this ). > > That's not needed. Why are you accepting mail to NON-EXISTING > recipients at all? > Ralf, may I ask, do you predictably trot this offensive answer out all the time for fun, or just because you are bored? FYI, the last time I looked it was not a criminal offence to use a catch all, unless the law is different in Germany? I make heavy use of catchalls for spam tracking using 'balloon race' and watermarking. I may, however, wish to skew and filter some combinations despite running catch all. Please keep this in your mind in future before trotting out that tired old gas.
Re: Regex Question
On Tue, 10 Nov 2009, Ralf Hildebrandt wrote: * rahlqu...@gmail.com : Ok regex is not my strong suit by any means. Trying to get a match for email addresses that start with a pipe character ( about 15% of my spam is this ). That's not needed. Why are you accepting mail to NON-EXISTING recipients at all? He may be referring to the From: header, not the envelope header. Richard, could you post the headers from one such to pastebin so we can see exactly what you're talking about? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Tomorrow: Veterans Day
Re: Regex Question
* rahlqu...@gmail.com : > Ok regex is not my strong suit by any means. Trying to get a match for email > addresses that start with a pipe character ( about 15% of my spam is this ). That's not needed. Why are you accepting mail to NON-EXISTING recipients at all? -- Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 ralf.hildebra...@charite.de | http://www.charite.de
Regex Question
Ok regex is not my strong suit by any means. Trying to get a match for email addresses that start with a pipe character ( about 15% of my spam is this ). What I have so far is this; [^a-z0-9]\b[a-z0-9._%+...@[a-z0-9.-]+\.[a-z]{2,4}\b To me that looks right but its not hitting. Any other suggestions? I've tried just \\|\b[a-z0-9._%+...@[a-z0-9.-]+\.[a-z]{2,4}\b along wiht dozens of others. Thanks! -- Richard Ahlquist Systems Analyst http://www.patentlystupid.com
Re: Regex Question
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Nigel Frankcom wrote: > pointed out by a kind list member, there are various 'flavours' of > regex. Can anyone tell me which particular flavour I'm best > concentrating on for SA rules? man perlre - -- Matthias -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFF6cQjxbHw2nyi/okRAo6AAJ0TPjQ6oP0Nnlpf2VdmJRzhaMThmwCfQ714 CZIYR0/Zv453TzmjFcQKlNI= =SA1a -END PGP SIGNATURE-
Regex Question
Hi All, I've recently invested in some books and software to help me figure out what I *thought* I already knew pretty well (regex). As was pointed out by a kind list member, there are various 'flavours' of regex. Can anyone tell me which particular flavour I'm best concentrating on for SA rules? TIA Nigel
Re: Rule Regex Question.
On Mon, 26 Feb 2007, Nigel Frankcom wrote: > Can anyone tell me if I need to escape the characters within the > square braces in the following? > > body NF_REM_CHAR1 /remove [*%!+`"£$%^&()_-=#~]/i A dash indicates a range (e.g. a-z) - if you need that, it's safest to put it as the first character in the set. Right now you're specifying "any character between _ and =, inclusive". The ^ would have significance if it was first, but it's not. You don't have a closing square bracket, which is the third one to worry about. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED] key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Microsoft is not a standards body. --- 15 days until Albert Einstein's 128th Birthday
Rule Regex Question.
Hi All, Can anyone tell me if I need to escape the characters within the square braces in the following? body NF_REM_CHAR1 /remove [*%!+`"£$%^&()_-=#~]/i score NF_REM_CHAR1 4.0 describe NF_REM_CHAR1 remove chars for URL spams TIA Nigel
Re: Advanced regex question - backtracking vs. negative lookaheads
Good point, you're completely right! Thanks for pointing that out... :) Cheers, Jeremy "John Rudd" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > > On Apr 25, 2006, at 6:33 AM, Jeremy Fairbrass wrote: > >> >> >> /style="[^>]+color:blue/ >> >> >> >> >> > > Just a small note, which may be mostly a digression but: > > I don't think the above regex will match that string at all. > > The regex, because it has a + instead of a *, requires at least one > character between the " and color:blue ... your string doesn't have that. > > >
Re: Advanced regex question - backtracking vs. negative lookaheads
On Apr 25, 2006, at 6:33 AM, Jeremy Fairbrass wrote: /style="[^>]+color:blue/ Just a small note, which may be mostly a digression but: I don't think the above regex will match that string at all. The regex, because it has a + instead of a *, requires at least one character between the " and color:blue ... your string doesn't have that.
Re: Advanced regex question - backtracking vs. negative lookaheads
Thanks guys for the clarifications! My understanding of how regex worked was the same as Bowie's, ie: - > My understanding is that with [^"]+ the engine will scan from left to > right until it finds a quote. Then, in the context of the previous > regex, it will start backtracking to find a match for "color:blue". - I use the free Regex Coach tool from http://www.weitz.de/regex-coach/ to test my regex, and it works the way Bowie described above, ie. using backtracking. In other words, using: /style="[^>]+color:blue/ ...the [^>]+ causes the regex to go all the way to the closing > character, then backtracks until it finds the "color:blue" part. This also agrees with what is explained at www.regular-expressions.info which I believe is a reliable guide to Perl regex. Also, Bowie suggested using laziness instead: /style="[^"]+?color:blue/ But I believe laziness also uses backtracking, so I'm not sure there is *much* of an advantage of this over the greedy regex shown above. Probably the main advantage of the lazy version would be if there was little or no text between the first quote-mark and the "color:blue" part, and/or lots of text between "color:blue" and the last quote-mark, eg: ...The regex would hit this much quicker using the lazy version than the greedy version. But I'm not sure if there really is a difference, especially if I want to be able to hit on SPAN tags that might have more text before the "color:blue" OR might have more text afterwards. Probably it's six of one and half a dozen of the other, right?! Why did David describe the lazy version as "slightly less good" than the greedy version? Incidentally the reason I used [^>]+ rather than [^"]+ was to prevent it from using lots of memory if there was no closing quote - as an alternative to using {1,20}. In any case, both Bowie and David agree that my first solution using (.(?!color))+ is a really bad idea, and that was the main thing I wanted to know! :) Thanks, Jeremy "Bowie Bailey" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > David Landgren wrote: >> Bowie Bailey wrote: >> >> [...] >> >> > > An alternative solution would be this: >> > > >> > > /style="[^>]+color:blue/ >> > >> > This looks better. It is probably less resource-intensive than >> > your previous attempt and is definitely easier to read. But why >> > are you looking for > when you anchor the beginning with a quote? >> > >> > How about this: >> > >> > /style="[^"]+?color:blue/ >> > >> > This is also non-greedy, so it will start looking for the >> > "color:blue" match at the beginning of the string instead of >> > having the + slurp up everything up to the quote and then >> > backtracking to find the match. >> >> The regexp engine doesn't slurp. It just scans from left to right, >> noting "I might have to come back here" along the way. > > Ok, so "slurp" was a bit of a simplification. :) > > My understanding is that with [^"]+ the engine will scan from left to > right until it finds a quote. Then, in the context of the previous > regex, it will start backtracking to find a match for "color:blue". > > In any case, with the non-greedy quantifier, it will stop looking when > it finds the first "color:blue" string instead of continuing to the > end of the string. > > -- > Bowie >
RE: Advanced regex question - backtracking vs. negative lookahead s
David Landgren wrote: > Bowie Bailey wrote: > > [...] > > > > An alternative solution would be this: > > > > > > /style="[^>]+color:blue/ > > > > This looks better. It is probably less resource-intensive than > > your previous attempt and is definitely easier to read. But why > > are you looking for > when you anchor the beginning with a quote? > > > > How about this: > > > > /style="[^"]+?color:blue/ > > > > This is also non-greedy, so it will start looking for the > > "color:blue" match at the beginning of the string instead of > > having the + slurp up everything up to the quote and then > > backtracking to find the match. > > The regexp engine doesn't slurp. It just scans from left to right, > noting "I might have to come back here" along the way. Ok, so "slurp" was a bit of a simplification. :) My understanding is that with [^"]+ the engine will scan from left to right until it finds a quote. Then, in the context of the previous regex, it will start backtracking to find a match for "color:blue". In any case, with the non-greedy quantifier, it will stop looking when it finds the first "color:blue" string instead of continuing to the end of the string. -- Bowie
Re: Advanced regex question - backtracking vs. negative lookahead s
Bowie Bailey wrote: [...] An alternative solution would be this: /style="[^>]+color:blue/ This looks better. It is probably less resource-intensive than your previous attempt and is definitely easier to read. But why are you looking for > when you anchor the beginning with a quote? How about this: /style="[^"]+?color:blue/ This is also non-greedy, so it will start looking for the "color:blue" match at the beginning of the string instead of having the + slurp up everything up to the quote and then backtracking to find the match. The regexp engine doesn't slurp. It just scans from left to right, noting "I might have to come back here" along the way. For SA purposes, you may want to limit the search as well. /style="[^"]{1,20}?color:blue/ This way, it will stop looking after 20 characters. This prevents it from using lots of memory if the quotes aren't closed. Good point. But this will certainly involve some backtracking, especially if there is even more text after the "color:blue" but before the closing > character, for example the "font-size:small" text. No it won't. It will scan once and quit. It never encountered any other alternatives that would require backtracking. David -- "It's overkill of course, but you can never have too much overkill."
RE: Advanced regex question - backtracking vs. negative lookahead s
Jeremy Fairbrass wrote: > > Let's say I want to use regex to search for the phrase "color:blue" > within a tag as in the example below (just a made-up example > for the sake of this question): > > > > In this case, the "color:blue" part is preceeded by some other text > ("border:0px") after the first quote mark, but that preceeding text > could in fact be anything, and I want to allow for the fact that it > could be anything. > > I've read at http://www.regular-expressions.info that it's best to > avoid backtracking if possible because that is resource-intensive. > > So one possible solution would be the following: > > /style="(.(?!color))+.color:blue/ This seems to me to be very inefficient. At each point in the string it has to read forward to check for "color". > In other words, after the first " (quote mark) it looks for any > character NOT followed by the word "color", and repeats that with the > + character, until it gets to the actual word "color". I believe this > results in no (or almost no?) backtracking. But I'm not sure if it's > resource-intensive anyway, because of the negative lookahead - are > negative lookaheads particularly resource intensive, when compared to > backtracking? Is one preferable over the other? > > An alternative solution would be this: > > /style="[^>]+color:blue/ This looks better. It is probably less resource-intensive than your previous attempt and is definitely easier to read. But why are you looking for > when you anchor the beginning with a quote? How about this: /style="[^"]+?color:blue/ This is also non-greedy, so it will start looking for the "color:blue" match at the beginning of the string instead of having the + slurp up everything up to the quote and then backtracking to find the match. For SA purposes, you may want to limit the search as well. /style="[^"]{1,20}?color:blue/ This way, it will stop looking after 20 characters. This prevents it from using lots of memory if the quotes aren't closed. > But this will certainly involve some backtracking, especially if > there is even more text after the "color:blue" but before the > closing > character, for example the "font-size:small" text. > > So what do you think?! Which way is best, ie. most efficient or least > resource-intensive? -- Bowie
Re: Advanced regex question - backtracking vs. negative lookaheads
Jeremy Fairbrass wrote: [...] So one possible solution would be the following: /style="(.(?!color))+.color:blue/ Eeep! In other words, after the first " (quote mark) it looks for any character NOT followed by the word "color", and repeats that with the + character, until it gets to the actual word "color". I believe this results in no (or almost no?) backtracking. But I'm not sure if it's resource-intensive anyway, because of the negative lookahead - are negative lookaheads particularly resource intensive, when compared to backtracking? Is one preferable over the other? An alternative solution would be this: /style="[^>]+color:blue/ But this will certainly involve some backtracking, False. especially if there is even more text after the "color:blue" but before the closing > character, for example the "font-size:small" text. Irrelevant. If you get a '>', the you didn't find what you were looking for. If you get a "color:blue", you did. In either case, the regexp quits right there. You can use perl from the command line to get a good idea of the differences: # all on one line perl -Mre=debug -e 'q{} =~ /style="(.(?!color))+.color:blue/' The above produces scads of output: you can see it inching down the string char by char. On the other hand, perl -Mre=debug -e 'q{} =~ /style="[^>]+color:blue/' is *fast*. Slightly less good is /style=".*?color:blue/ David -- "It's overkill of course, but you can never have too much overkill."
Advanced regex question - backtracking vs. negative lookaheads
Hi all, I wonder if one of you regex gurus might be able to give me some advice regarding the most efficiant way of writing a particular rule Let's say I want to use regex to search for the phrase "color:blue" within a tag as in the example below (just a made-up example for the sake of this question): In this case, the "color:blue" part is preceeded by some other text ("border:0px") after the first quote mark, but that preceeding text could in fact be anything, and I want to allow for the fact that it could be anything. I've read at http://www.regular-expressions.info that it's best to avoid backtracking if possible because that is resource-intensive. So one possible solution would be the following: /style="(.(?!color))+.color:blue/ In other words, after the first " (quote mark) it looks for any character NOT followed by the word "color", and repeats that with the + character, until it gets to the actual word "color". I believe this results in no (or almost no?) backtracking. But I'm not sure if it's resource-intensive anyway, because of the negative lookahead - are negative lookaheads particularly resource intensive, when compared to backtracking? Is one preferable over the other? An alternative solution would be this: /style="[^>]+color:blue/ But this will certainly involve some backtracking, especially if there is even more text after the "color:blue" but before the closing > character, for example the "font-size:small" text. So what do you think?! Which way is best, ie. most efficient or least resource-intensive? Cheers, Jeremy