Re: Regex Question

2009-11-11 Thread Ralf Hildebrandt
* rahlqu...@gmail.com :

> As said before blocking at the MTA would be less resource intensive but I
> want the whole message to feed bayes.

But you already KNOW you don't want that stuff :) No need to poison
your bayesdb with that...

> As for Ralf and his lightly gruff response, its to be expected when
> asking for help on the net and I grew my thick skin 8 years ago asking
> questions on setting up SMTP Auth on the Sendmail list. Compared to
> some of the folks there Ralf nearly blew me a kiss.

I was not trying to be rude. I just want to keep things simple.

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: Regex Question

2009-11-10 Thread John Hardin

On Tue, 10 Nov 2009, rahlqu...@gmail.com wrote:


Thanks! Your earlier Regex is in place and doing quite well.


Pleased to be of service.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Individual liberties are always "loopholes" to absolute authority.
---
 Tomorrow: Veterans Day


Re: Regex Question

2009-11-10 Thread rahlquist
On Tue, Nov 10, 2009 at 3:57 PM, John Hardin  wrote:

> On Tue, 10 Nov 2009, Ralf Hildebrandt wrote:
>
>  On Tue, 2009-11-10 at 14:32 +0100, Ralf Hildebrandt wrote:
>>>
 * rahlqu...@gmail.com :

> Ok regex is not my strong suit by any means. Trying to get a match for
> email addresses that start with a pipe character ( about 15% of my spam is
> this ).
>

 That's not needed. Why are you accepting mail to NON-EXISTING recipients
 at all?

>>>
> {snip}
>
>
>  He's generating throwaway addresses to find out who's selling these
>> contact addresses.
>>
>
> In that case, depending on the MTA logging, perhaps he could still disable
> catchall and then troll the logs to see which invalid addresses were
> attempted.
>
> ...or does _no_ modern MTA log the recipient addresses it rejects? I
> haven't actually looked... :)
>
>
> --
>
John,

Thanks! Your earlier Regex is in place and doing quite well.

As said before blocking at the MTA would be less resource intensive but I
want the whole message to feed bayes.

The emails that do make it through the specified layout I described before
drop into an account I can search or dump as I wish and sometimes I even eek
out a wanted email, ad, or Pizza coupon (solicited). This all works well and
if I am at a store and they request an email addy it really easy to just
give them one.

As for Ralf and his lightly gruff response, its to be expected when asking
for help on the net and I grew my thick skin 8 years ago asking questions on
setting up SMTP Auth on the Sendmail list. Compared to some of the folks
there Ralf nearly blew me a kiss.

Thanks for all the help and yall be good to each other.


Re: Regex Question

2009-11-10 Thread Bill Landry
Ralf Hildebrandt wrote:
> * Benny Pedersen :
>> On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote
>>> Please keep this in your mind in future before trotting out that tired
>>> old gas.
>> imho Ralf have never being banned in maillist here, if you dont like
>> his answers just unsubscribe
> 
> Good point, but richard has been banned multiple times on the postfix
> list for asocial behaviour...

Be careful, Ralf, else you risk inciting richard to reappear on the SA
list as another fictitious user and start his flaming rants and raves
again, as he has done in the past...

Bill



RE: Regex Question

2009-11-10 Thread R-Elists

some centos people are having a pub party and the "kings and queens" in
london

it might be over already based upon time difference from usa

maybe all of you could go there and drink beer and duke it out or something
constructive

;->

 - rh



Re: Regex Question

2009-11-10 Thread Ralf Hildebrandt
* John Hardin :

> In that case, depending on the MTA logging, perhaps he could still
> disable catchall and then troll the logs to see which invalid
> addresses were attempted.

Or block tke mail to any recipient starting with "|"
In postfix that could be done with

check_recipient_access regexp:/etc/postfix/blocked_recipients

with /etc/postfix/blocked_recipients:

/^\|/REJECT

I would weed that absolutely unwanted stuff out at the MTA level to
keep resource usage low (bandwidth, mostly)

> ...or does _no_ modern MTA log the recipient addresses it rejects? I
> haven't actually looked... :)

I'v seen sendmail & postfix log the non-existing addresses.

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: Regex Question

2009-11-10 Thread John Hardin

On Tue, 10 Nov 2009, Ralf Hildebrandt wrote:


On Tue, 2009-11-10 at 14:32 +0100, Ralf Hildebrandt wrote:

* rahlqu...@gmail.com :
Ok regex is not my strong suit by any means. Trying to get a match 
for email addresses that start with a pipe character ( about 15% of 
my spam is this ).


That's not needed. Why are you accepting mail to NON-EXISTING 
recipients at all?


{snip}


He's generating throwaway addresses to find out who's selling these
contact addresses.


In that case, depending on the MTA logging, perhaps he could still disable 
catchall and then troll the logs to see which invalid addresses were 
attempted.


...or does _no_ modern MTA log the recipient addresses it rejects? I 
haven't actually looked... :)


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Perfect Security and Absolute Safety are unattainable; beware
  those who would try to sell them to you, regardless of the cost,
  for they are trying to sell you your own slavery.
---
 Tomorrow: Veterans Day


Re: Regex Question

2009-11-10 Thread Ralf Hildebrandt
* Matus UHLAR - fantomas :

> Ralf's question was in no way offensive. He is just trying to solve the
> problem by way that is most efficient for most of e-mail users and admins.

What the OP intends to do ("Who's selling away my addresses?") can be
done in the MTA entirely. A colleague at tu-bs.de did that over 15
years ago by simply "increasing" a numerical portin in his email
addresses.

Problem being: Making an address "valid" -- If I define:

bahn...@example.org

as a contact address when contacting "bahn.de", then I have to have
some sort of database WHICH addresses have been "used", and which have
been "abused" (targeted by anybody BUT bahn.de senders).

To avoid this DB he simply made all addresses valid and forwarded them
to his real address (or something like that with a filter in between).

Isn't there an automatic tool for this?

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: Regex Question

2009-11-10 Thread Ralf Hildebrandt
* Benny Pedersen :
> On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote
> >Please keep this in your mind in future before trotting out that tired
> >old gas.
> 
> imho Ralf have never being banned in maillist here, if you dont like
> his answers just unsubscribe

Good point, but richard has been banned multiple times on the postfix
list for asocial behaviour...

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: Regex Question

2009-11-10 Thread Ralf Hildebrandt
* rich...@buzzhost.co.uk :
> On Tue, 2009-11-10 at 14:32 +0100, Ralf Hildebrandt wrote:
> > * rahlqu...@gmail.com :
> > > Ok regex is not my strong suit by any means. Trying to get a match for 
> > > email
> > > addresses that start with a pipe character ( about 15% of my spam is this 
> > > ).
> > 
> > That's not needed. Why are you accepting mail to NON-EXISTING
> > recipients at all?
> > 
> Ralf, may I ask, do you predictably trot this offensive answer out all
> the time for fun, or just because you are bored?

If you make your system accept mail for non existing addresses, then you
can do all kinds of useful research, but then you also usually know how to
handle stuff you REALLY don't want to receive. In the OP's case (like he
said in a PM), it's probably better to block RCPT TO:<|.*> on the MTA
level.

He's generating throwaway addresses to find out who's selling these
contact addresses.

> FYI, the last time I looked it was not a criminal offence to use a catch
> all, unless the law is different in Germany?

I fail to see how that matters, since he's not in Germany. And it's not.
 
> I make heavy use of catchalls for spam tracking using 'balloon race' and
> watermarking. I may, however, wish to skew and filter some combinations
> despite running catch all.

Makes perfect sense.

> Please keep this in your mind in future before trotting out that tired
> old gas.

For everybody but the old scientific anti-spam geek in his/her sekrit lab
it's really safer to just block mail to non-existing recipients. We're
still getting enough spam that way. 

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Re: Regex Question

2009-11-10 Thread jdow

From: 
Sent: Tuesday, 2009/November/10 09:14



On Tue, 2009-11-10 at 11:45 -0500, Alex wrote:

>> imho Ralf have never being banned in maillist here, if you dont like
>> his answers just unsubscribe
>>
> Trotting out useless, pointless, tardy, curt, terse replies benefit
> nobody at all and makes the poster look arrogant especially when the
> answer is mere opinion.

I sometimes welcome the terse replies; it illicit's clarification from
the OP. I hardly think Ralf is interested in wasting his time playing
games on this mailing list. Even if it were true, I think Ralf has
also earned the ability to be a bit arrogant.

Regards,
Alex



...

Rather than let this drift into a hijacked free-for-all perhaps one of
the guru's of REGEX here would actually like to answer the OP's
question. This is a human being asking for help. I don't know the answer
myself or I would. I'm guessing that escaping the pipe \| does not work?


Condescendingly pats the youngster on the head, "It's too late, boy. Stop
yourself before it is too late."

{^_^}   (I get to do that at my age to most of the people on the net. {^_-})


Re: Regex Question

2009-11-10 Thread jdow

From: 
Sent: Tuesday, 2009/November/10 08:27



On Tue, 2009-11-10 at 16:50 +0100, Benny Pedersen wrote:

On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote
> Please keep this in your mind in future before trotting out that tired
> old gas.

imho Ralf have never being banned in maillist here, if you dont like  
his answers just unsubscribe



Trotting out useless, pointless, tardy, curt, terse replies benefit
nobody at all and makes the poster look arrogant especially when the
answer is mere opinion.

The OP asked a perfectly civil question that did not warrant such a
tired, rude old skool style micro flaming. It does not make someone look
superior or 'clever' to offer such a response, it simply makes them look
like a backside lacking in social skills. Your support for the response
is duly noted, but there is no love lost between us in any case.


1) Justifying your curt thoughtless reply is adding noise to the list.
  (That's just a thought to bear in mind here.)
2) The way the question was asked I almost made exactly the same reply.
  With the number of replies present, I stayed silent. Fuggheadedness
  (note gg not ck, different things) draws me out sometimes, though.
3) Once the question was asked properly an answer useful for you was
  forthcoming. Should that be a wake-up call for you to ask your
  questions with a little more detail about why and what you are trying
  to do.

{^_^}


Re: Regex Question

2009-11-10 Thread John Hardin

On Tue, 10 Nov 2009, rich...@buzzhost.co.uk wrote:

Rather than let this drift into a hijacked free-for-all perhaps one of 
the guru's of REGEX here would actually like to answer the OP's 
question.


If you hadn't gotten distracted by your multiple nemeses you would have 
noticed I've done so. :)


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  North Korea: the only country in the world where people would risk
  execution to flee to communist China.  -- Ride Fast
---
 Tomorrow: Veterans Day


Re: Regex Question

2009-11-10 Thread rich...@buzzhost.co.uk
On Tue, 2009-11-10 at 11:45 -0500, Alex wrote:
> >> imho Ralf have never being banned in maillist here, if you dont like
> >> his answers just unsubscribe
> >>
> > Trotting out useless, pointless, tardy, curt, terse replies benefit
> > nobody at all and makes the poster look arrogant especially when the
> > answer is mere opinion.
> 
> I sometimes welcome the terse replies; it illicit's clarification from
> the OP. I hardly think Ralf is interested in wasting his time playing
> games on this mailing list. Even if it were true, I think Ralf has
> also earned the ability to be a bit arrogant.
> 
> Regards,
> Alex

I don't think that being plain bloody rude is playing games, and it's
surprisingly common output - not just from Ralf, but from that Postfix
set who seem to place some extreme value on their self importance.

Alex, you hold Ralf in high regard and that is noble. There are many
people I hold in high regard, but I base it on a process of merit,
pivotal to which is how they treat 'little' people asking perfectly
polite questions. In my eyes it is perfectly acceptable to challenging
people who seem to have lost their place in reality, when they treat
people in such a negative way.

The terse answer given was nothing more than opinion. There are clearly
occasions when accepting mail for @domain is a perfectly
legitimate thing to do, provided, of course you don't bounce it after
accepting it.

Rather than let this drift into a hijacked free-for-all perhaps one of
the guru's of REGEX here would actually like to answer the OP's
question. This is a human being asking for help. I don't know the answer
myself or I would. I'm guessing that escaping the pipe \| does not work?










Re: Regex Question

2009-11-10 Thread Matus UHLAR - fantomas
> On Tue, 2009-11-10 at 14:32 +0100, Ralf Hildebrandt wrote:
> > * rahlqu...@gmail.com :
> > > Ok regex is not my strong suit by any means. Trying to get a match for 
> > > email
> > > addresses that start with a pipe character ( about 15% of my spam is this 
> > > ).
> > 
> > That's not needed. Why are you accepting mail to NON-EXISTING
> > recipients at all?

On 10.11.09 14:26, rich...@buzzhost.co.uk wrote:
> Ralf, may I ask, do you predictably trot this offensive answer out all
> the time for fun, or just because you are bored?

Ralf's question was in no way offensive. He is just trying to solve the
problem by way that is most efficient for most of e-mail users and admins.

> FYI, the last time I looked it was not a criminal offence to use a catch
> all, unless the law is different in Germany?

And it is not criminal offence to ask why is someone using using catch-all.
Maybe the OP DOES want to use catch-all for this reason. Maybe the OP does
NOT need catch-all. We can find this out by asking the poster WHY.

> I make heavy use of catchalls for spam tracking using 'balloon race' and
> watermarking. I may, however, wish to skew and filter some combinations
> despite running catch all.

you are, others are not.

> Please keep this in your mind in future before trotting out that tired
> old gas.

Please keep that above in your mind before you start accusing people of
being trolls and thus behaving exactly as troll.
-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Atheism is a non-prophet organization. 


Re: Regex Question

2009-11-10 Thread rahlquist
On Tue, Nov 10, 2009 at 11:49 AM, John Hardin  wrote:

> On Tue, 10 Nov 2009, rahlqu...@gmail.com wrote:
>
>  On Tue, Nov 10, 2009 at 9:09 AM, John Hardin  wrote:
>>
>>   * rahlqu...@gmail.com :
>>>

  Ok regex is not my strong suit by any means. Trying to get a match
> for email addresses that start with a pipe character ( about 15% of my
> spam is this ).
>

>>> Richard, could you post the headers from one such to pastebin so we can
>>> see
>>> exactly what you're talking about?
>>>
>>
>> Here you are John;
>> http://pastebin.com/m733a7113
>>
>> And no, I do indeed mean sent to.
>>
>
> Okay.
>
> Comment: it would be better to catch and reject these at the MTA level, if
> at all possible. I'm sure one of the Postfix admins could suggest how to do
> so.
>
> How about this?
>
>  header  ENV_TO_BAR   Received =~ / for <\|/
>
> You don't need to match the entire address syntax.
>
> You might want to tighten it up a tiny bit (assuming the headers weren't
> sanitized):
>
>  header  ENV_TO_BAR   Received =~ / by dark\.pcsites\.com .* for <\|/
>
>
> --
>
I could reject at the MTA but I want it to help me to filter and train
bayes, many of these are going to multiple users.

I'll give these a whack and see if anything squeaks! Thanks!


Re: Regex Question

2009-11-10 Thread Alex
>> I sometimes welcome the terse replies; it illicit's clarification from the
>> OP.
>
> ITYM "elicits".

Heh, yes, thanks. I don't think they're involved in some illicit sex scandal :-)

In either case, the apostrophe was wrong, too. Working on getting a
new toolchain compiled and working straight since 4pm yesterday :-)

Thanks,
Alex


Re: Regex Question

2009-11-10 Thread LuKreme
On 10-Nov-2009, at 09:27, rich...@buzzhost.co.uk wrote:
> On Tue, 2009-11-10 at 16:50 +0100, Benny Pedersen wrote:
>> On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote
>>> Please keep this in your mind in future before trotting out that tired
>>> old gas.
>> 
>> imho Ralf have never being banned in maillist here, if you dont like  
>> his answers just unsubscribe
>> 
> Trotting out useless, pointless, tardy, curt, terse replies benefit
> nobody at all and makes the poster look arrogant especially when the
> answer is mere opinion.

I think you need to grow a thicker skin.

> but there is no love lost between us in any case.

Ah, there's the reason you wigged out.


-- 
 a freudian slip is when you say one thing but you're
really thinking about a mother. 
 no, a freudian slip is sexy underwear your mother wears



Re: Regex Question

2009-11-10 Thread John Hardin

On Tue, 10 Nov 2009, Alex wrote:


imho Ralf have never being banned in maillist here, if you dont like
his answers just unsubscribe


Trotting out useless, pointless, tardy, curt, terse replies benefit
nobody at all and makes the poster look arrogant especially when the
answer is mere opinion.


I sometimes welcome the terse replies; it illicit's clarification from 
the OP.


ITYM "elicits".

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Our government should bear in mind the fact that the American
  Revolution was touched off by the then-current government
  attempting to confiscate firearms from the people.
---
 Tomorrow: Veterans Day


Re: Regex Question

2009-11-10 Thread John Hardin

On Tue, 10 Nov 2009, rahlqu...@gmail.com wrote:


On Tue, Nov 10, 2009 at 9:09 AM, John Hardin  wrote:


 * rahlqu...@gmail.com :



Ok regex is not my strong suit by any means. Trying to get a match
for email addresses that start with a pipe character ( about 15% of my
spam is this ).


Richard, could you post the headers from one such to pastebin so we can see
exactly what you're talking about?


Here you are John;
http://pastebin.com/m733a7113

And no, I do indeed mean sent to.


Okay.

Comment: it would be better to catch and reject these at the MTA level, if 
at all possible. I'm sure one of the Postfix admins could suggest how to 
do so.


How about this?

  header  ENV_TO_BAR   Received =~ / for <\|/

You don't need to match the entire address syntax.

You might want to tighten it up a tiny bit (assuming the headers weren't 
sanitized):


  header  ENV_TO_BAR   Received =~ / by dark\.pcsites\.com .* for <\|/

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  I have never learned to fight for my freedom. I was only good at
  enjoying it.-- Dutchman Oscar van den Boogaard,
 showing why Europe is doomed
---
 Tomorrow: Veterans Day


Re: Regex Question

2009-11-10 Thread Alex
>> imho Ralf have never being banned in maillist here, if you dont like
>> his answers just unsubscribe
>>
> Trotting out useless, pointless, tardy, curt, terse replies benefit
> nobody at all and makes the poster look arrogant especially when the
> answer is mere opinion.

I sometimes welcome the terse replies; it illicit's clarification from
the OP. I hardly think Ralf is interested in wasting his time playing
games on this mailing list. Even if it were true, I think Ralf has
also earned the ability to be a bit arrogant.

Regards,
Alex


Re: Regex Question

2009-11-10 Thread rich...@buzzhost.co.uk
On Tue, 2009-11-10 at 16:50 +0100, Benny Pedersen wrote:
> On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote
> > Please keep this in your mind in future before trotting out that tired
> > old gas.
> 
> imho Ralf have never being banned in maillist here, if you dont like  
> his answers just unsubscribe
> 
Trotting out useless, pointless, tardy, curt, terse replies benefit
nobody at all and makes the poster look arrogant especially when the
answer is mere opinion.

The OP asked a perfectly civil question that did not warrant such a
tired, rude old skool style micro flaming. It does not make someone look
superior or 'clever' to offer such a response, it simply makes them look
like a backside lacking in social skills. Your support for the response
is duly noted, but there is no love lost between us in any case.





Re: Regex Question

2009-11-10 Thread Benny Pedersen

On tir 10 nov 2009 15:26:43 CET, "rich...@buzzhost.co.uk" wrote

Please keep this in your mind in future before trotting out that tired
old gas.


imho Ralf have never being banned in maillist here, if you dont like  
his answers just unsubscribe


--
xpoint



Re: Regex Question

2009-11-10 Thread rich...@buzzhost.co.uk
On Tue, 2009-11-10 at 14:32 +0100, Ralf Hildebrandt wrote:
> * rahlqu...@gmail.com :
> > Ok regex is not my strong suit by any means. Trying to get a match for email
> > addresses that start with a pipe character ( about 15% of my spam is this ).
> 
> That's not needed. Why are you accepting mail to NON-EXISTING
> recipients at all?
> 
Ralf, may I ask, do you predictably trot this offensive answer out all
the time for fun, or just because you are bored?

FYI, the last time I looked it was not a criminal offence to use a catch
all, unless the law is different in Germany?

I make heavy use of catchalls for spam tracking using 'balloon race' and
watermarking. I may, however, wish to skew and filter some combinations
despite running catch all.

Please keep this in your mind in future before trotting out that tired
old gas.




Re: Regex Question

2009-11-10 Thread John Hardin

On Tue, 10 Nov 2009, Ralf Hildebrandt wrote:


* rahlqu...@gmail.com :

Ok regex is not my strong suit by any means. Trying to get a match
for email addresses that start with a pipe character ( about 15% of 
my spam is this ).


That's not needed. Why are you accepting mail to NON-EXISTING
recipients at all?


He may be referring to the From: header, not the envelope header.

Richard, could you post the headers from one such to pastebin so we can 
see exactly what you're talking about?


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Tomorrow: Veterans Day


Re: Regex Question

2009-11-10 Thread Ralf Hildebrandt
* rahlqu...@gmail.com :
> Ok regex is not my strong suit by any means. Trying to get a match for email
> addresses that start with a pipe character ( about 15% of my spam is this ).

That's not needed. Why are you accepting mail to NON-EXISTING
recipients at all?

-- 
Ralf Hildebrandt
  Geschäftsbereich IT | Abteilung Netzwerk
  Charité - Universitätsmedizin Berlin
  Campus Benjamin Franklin
  Hindenburgdamm 30 | D-12203 Berlin
  Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
  ralf.hildebra...@charite.de | http://www.charite.de



Regex Question

2009-11-10 Thread rahlquist
Ok regex is not my strong suit by any means. Trying to get a match for email
addresses that start with a pipe character ( about 15% of my spam is this ).
What I have so far is this;

[^a-z0-9]\b[a-z0-9._%+...@[a-z0-9.-]+\.[a-z]{2,4}\b

To me that looks right but its not hitting. Any other suggestions? I've
tried just \\|\b[a-z0-9._%+...@[a-z0-9.-]+\.[a-z]{2,4}\b along wiht dozens
of others.

Thanks!

-- 
Richard Ahlquist
Systems Analyst
http://www.patentlystupid.com


Re: Regex Question

2007-03-03 Thread Matthias Leisi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



Nigel Frankcom wrote:

> pointed out by a kind list member, there are various 'flavours' of
> regex. Can anyone tell me which particular flavour I'm best
> concentrating on for SA rules?

man perlre

- -- Matthias

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFF6cQjxbHw2nyi/okRAo6AAJ0TPjQ6oP0Nnlpf2VdmJRzhaMThmwCfQ714
CZIYR0/Zv453TzmjFcQKlNI=
=SA1a
-END PGP SIGNATURE-


Regex Question

2007-03-03 Thread Nigel Frankcom
Hi All,

I've recently invested in some books and software to help me figure
out what I *thought* I already knew pretty well (regex). As was
pointed out by a kind list member, there are various 'flavours' of
regex. Can anyone tell me which particular flavour I'm best
concentrating on for SA rules?

TIA

Nigel


Re: Rule Regex Question.

2007-02-26 Thread John D. Hardin
On Mon, 26 Feb 2007, Nigel Frankcom wrote:

> Can anyone tell me if I need to escape the characters within the
> square braces in the following?
> 
> body NF_REM_CHAR1 /remove [*%!+`"£$%^&()_-=#~]/i

A dash indicates a range (e.g. a-z) - if you need that, it's safest to 
put it as the first character in the set. Right now you're specifying 
"any character between _ and =, inclusive".

The ^ would have significance if it was first, but it's not.

You don't have a closing square bracket, which is the third one to 
worry about.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Microsoft is not a standards body.
---
 15 days until Albert Einstein's 128th Birthday



Rule Regex Question.

2007-02-26 Thread Nigel Frankcom
Hi All,

Can anyone tell me if I need to escape the characters within the
square braces in the following?

body NF_REM_CHAR1 /remove [*%!+`"£$%^&()_-=#~]/i
score NF_REM_CHAR1 4.0
describe NF_REM_CHAR1 remove chars for URL spams

TIA

Nigel


Re: Advanced regex question - backtracking vs. negative lookaheads

2006-04-26 Thread Jeremy Fairbrass
Good point, you're completely right! Thanks for pointing that out... :)

Cheers,
Jeremy


"John Rudd" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
>
> On Apr 25, 2006, at 6:33 AM, Jeremy Fairbrass wrote:
>
>>
>>
>> /style="[^>]+color:blue/
>>
>>
>>
>> 
>>
>
> Just a small note, which may be mostly a digression but:
>
> I don't think the above regex will match that string at all.
>
> The regex, because it has a + instead of a *, requires at least one 
> character between the " and color:blue ... your string doesn't have that.
>
>
> 





Re: Advanced regex question - backtracking vs. negative lookaheads

2006-04-25 Thread John Rudd


On Apr 25, 2006, at 6:33 AM, Jeremy Fairbrass wrote:




/style="[^>]+color:blue/







Just a small note, which may be mostly a digression but:

I don't think the above regex will match that string at all.

The regex, because it has a + instead of a *, requires at least one 
character between the " and color:blue ... your string doesn't have 
that.





Re: Advanced regex question - backtracking vs. negative lookaheads

2006-04-25 Thread Jeremy Fairbrass
Thanks guys for the clarifications! My understanding of how regex worked was
the same as Bowie's, ie:
-
> My understanding is that with [^"]+ the engine will scan from left to
> right until it finds a quote.  Then, in the context of the previous
> regex, it will start backtracking to find a match for "color:blue".
-

I  use the free Regex Coach tool from http://www.weitz.de/regex-coach/ to
test my regex, and it works the way Bowie described above, ie. using
backtracking. In other words, using:

/style="[^>]+color:blue/

...the [^>]+ causes the regex to go all the way to the closing > character,
then backtracks until it finds the "color:blue" part. This also agrees with
what is explained at www.regular-expressions.info which I believe is a
reliable guide to Perl regex.

Also, Bowie suggested using laziness instead:

/style="[^"]+?color:blue/

But I believe laziness also uses backtracking, so I'm not sure there is
*much* of an advantage of this over the greedy regex shown above. Probably
the main advantage of the lazy version would be if there was little or no
text between the first quote-mark and the "color:blue" part, and/or lots of
text between "color:blue" and the last quote-mark, eg:



...The regex would hit this much quicker using the lazy version than the
greedy version. But I'm not sure if there really is a difference, especially
if I want to be able to hit on SPAN tags that might have more text before
the "color:blue" OR might have more text afterwards. Probably it's six of
one and half a dozen of the other, right?! Why did David describe the lazy
version as "slightly less good" than the greedy version?

Incidentally the reason I used [^>]+ rather than [^"]+ was to prevent it
from using lots of memory if there was no closing quote - as an alternative
to using {1,20}.

In any case, both Bowie and David agree that my first solution using
(.(?!color))+ is a really bad idea, and that was the main thing I wanted to
know! :)

Thanks,
Jeremy




"Bowie Bailey" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> David Landgren wrote:
>> Bowie Bailey wrote:
>>
>> [...]
>>
>> > > An alternative solution would be this:
>> > >
>> > > /style="[^>]+color:blue/
>> >
>> > This looks better.  It is probably less resource-intensive than
>> > your previous attempt and is definitely easier to read.  But why
>> > are you looking for > when you anchor the beginning with a quote?
>> >
>> > How about this:
>> >
>> > /style="[^"]+?color:blue/
>> >
>> > This is also non-greedy, so it will start looking for the
>> > "color:blue" match at the beginning of the string instead of
>> > having the + slurp up everything up to the quote and then
>> > backtracking to find the match.
>>
>> The regexp engine doesn't slurp. It just scans from left to right,
>> noting "I might have to come back here" along the way.
>
> Ok, so "slurp" was a bit of a simplification.  :)
>
> My understanding is that with [^"]+ the engine will scan from left to
> right until it finds a quote.  Then, in the context of the previous
> regex, it will start backtracking to find a match for "color:blue".
>
> In any case, with the non-greedy quantifier, it will stop looking when
> it finds the first "color:blue" string instead of continuing to the
> end of the string.
>
> -- 
> Bowie
>






RE: Advanced regex question - backtracking vs. negative lookahead s

2006-04-21 Thread Bowie Bailey
David Landgren wrote:
> Bowie Bailey wrote:
> 
> [...]
> 
> > > An alternative solution would be this:
> > > 
> > > /style="[^>]+color:blue/
> > 
> > This looks better.  It is probably less resource-intensive than
> > your previous attempt and is definitely easier to read.  But why
> > are you looking for > when you anchor the beginning with a quote?
> > 
> > How about this:
> > 
> > /style="[^"]+?color:blue/
> > 
> > This is also non-greedy, so it will start looking for the
> > "color:blue" match at the beginning of the string instead of
> > having the + slurp up everything up to the quote and then
> > backtracking to find the match. 
> 
> The regexp engine doesn't slurp. It just scans from left to right,
> noting "I might have to come back here" along the way.

Ok, so "slurp" was a bit of a simplification.  :)

My understanding is that with [^"]+ the engine will scan from left to
right until it finds a quote.  Then, in the context of the previous
regex, it will start backtracking to find a match for "color:blue".

In any case, with the non-greedy quantifier, it will stop looking when
it finds the first "color:blue" string instead of continuing to the
end of the string.

-- 
Bowie


Re: Advanced regex question - backtracking vs. negative lookahead s

2006-04-21 Thread David Landgren

Bowie Bailey wrote:

[...]


An alternative solution would be this:

/style="[^>]+color:blue/


This looks better.  It is probably less resource-intensive than your
previous attempt and is definitely easier to read.  But why are you
looking for > when you anchor the beginning with a quote?

How about this:

/style="[^"]+?color:blue/

This is also non-greedy, so it will start looking for the "color:blue"
match at the beginning of the string instead of having the + slurp up
everything up to the quote and then backtracking to find the match.


The regexp engine doesn't slurp. It just scans from left to right, 
noting "I might have to come back here" along the way.



For SA purposes, you may want to limit the search as well.

/style="[^"]{1,20}?color:blue/

This way, it will stop looking after 20 characters.  This prevents it
from using lots of memory if the quotes aren't closed.


Good point.


But this will certainly involve some backtracking, especially if
there is even more text after the "color:blue" but before the
closing > character, for example the "font-size:small" text.


No it won't. It will scan once and quit. It never encountered any other 
alternatives that would require backtracking.


David
--
"It's overkill of course, but you can never have too much overkill."



RE: Advanced regex question - backtracking vs. negative lookahead s

2006-04-21 Thread Bowie Bailey
Jeremy Fairbrass wrote:
> 
> Let's say I want to use regex to search for the phrase "color:blue"
> within a  tag as in the example below (just a made-up example
> for the sake of this question):
> 
> 
> 
> In this case, the "color:blue" part is preceeded by some other text
> ("border:0px") after the first quote mark, but that preceeding text
> could in fact be anything, and I want to allow for the fact that it
> could be anything.
> 
> I've read at http://www.regular-expressions.info that it's best to
> avoid backtracking if possible because that is resource-intensive.
> 
> So one possible solution would be the following:
> 
> /style="(.(?!color))+.color:blue/

This seems to me to be very inefficient.  At each point in the string
it has to read forward to check for "color".

> In other words, after the first " (quote mark) it looks for any
> character NOT followed by the word "color", and repeats that with the
> + character, until it gets to the actual word "color". I believe this
> results in no (or almost no?) backtracking. But I'm not sure if it's
> resource-intensive anyway, because of the negative lookahead - are
> negative lookaheads particularly resource intensive, when compared to
> backtracking? Is one preferable over the other?
> 
> An alternative solution would be this:
> 
> /style="[^>]+color:blue/

This looks better.  It is probably less resource-intensive than your
previous attempt and is definitely easier to read.  But why are you
looking for > when you anchor the beginning with a quote?

How about this:

/style="[^"]+?color:blue/

This is also non-greedy, so it will start looking for the "color:blue"
match at the beginning of the string instead of having the + slurp up
everything up to the quote and then backtracking to find the match.

For SA purposes, you may want to limit the search as well.

/style="[^"]{1,20}?color:blue/

This way, it will stop looking after 20 characters.  This prevents it
from using lots of memory if the quotes aren't closed.

> But this will certainly involve some backtracking, especially if
> there is even more text after the "color:blue" but before the
> closing > character, for example the "font-size:small" text.
> 
> So what do you think?! Which way is best, ie. most efficient or least
> resource-intensive?

-- 
Bowie


Re: Advanced regex question - backtracking vs. negative lookaheads

2006-04-21 Thread David Landgren

Jeremy Fairbrass wrote:

[...]


So one possible solution would be the following:

/style="(.(?!color))+.color:blue/


Eeep!

In other words, after the first " (quote mark) it looks for any character 
NOT followed by the word "color", and repeats that with the + character, 
until it gets to the actual word "color". I believe this results in no (or 
almost no?) backtracking. But I'm not sure if it's resource-intensive 
anyway, because of the negative lookahead - are negative lookaheads 
particularly resource intensive, when compared to backtracking? Is one 
preferable over the other?


An alternative solution would be this:

/style="[^>]+color:blue/

But this will certainly involve some backtracking,


False.

especially if there is 
even more text after the "color:blue" but before the closing > character, 
for example the "font-size:small" text.


Irrelevant. If you get a '>', the you didn't find what you were looking 
for. If you get a "color:blue", you did. In either case, the regexp 
quits right there.


You can use perl from the command line to get a good idea of the 
differences:


# all on one line

perl -Mre=debug -e 'q{} =~ /style="(.(?!color))+.color:blue/'


The above produces scads of output: you can see it inching down the 
string char by char. On the other hand,


perl -Mre=debug -e 'q{} =~ /style="[^>]+color:blue/'


is *fast*. Slightly less good is

/style=".*?color:blue/

David
--
"It's overkill of course, but you can never have too much overkill."



Advanced regex question - backtracking vs. negative lookaheads

2006-04-21 Thread Jeremy Fairbrass
Hi all,
I wonder if one of you regex gurus might be able to give me some advice 
regarding the most efficiant way of writing a particular rule

Let's say I want to use regex to search for the phrase "color:blue" within a 
 tag as in the example below (just a made-up example for the sake of 
this question):



In this case, the "color:blue" part is preceeded by some other text 
("border:0px") after the first quote mark, but that preceeding text could in 
fact be anything, and I want to allow for the fact that it could be 
anything.

I've read at http://www.regular-expressions.info that it's best to avoid 
backtracking if possible because that is resource-intensive.

So one possible solution would be the following:

/style="(.(?!color))+.color:blue/

In other words, after the first " (quote mark) it looks for any character 
NOT followed by the word "color", and repeats that with the + character, 
until it gets to the actual word "color". I believe this results in no (or 
almost no?) backtracking. But I'm not sure if it's resource-intensive 
anyway, because of the negative lookahead - are negative lookaheads 
particularly resource intensive, when compared to backtracking? Is one 
preferable over the other?

An alternative solution would be this:

/style="[^>]+color:blue/

But this will certainly involve some backtracking, especially if there is 
even more text after the "color:blue" but before the closing > character, 
for example the "font-size:small" text.

So what do you think?! Which way is best, ie. most efficient or least 
resource-intensive?

Cheers,
Jeremy