Re: [R] R-help "spam" detection; please help the moderators

2010-06-02 Thread Joshua Wiley
Hello Martin and Ted,

First off thank you to you guys and all the volunteers for providing
this wonderful service.  I have two questions.

1)  Do you know if it is a problem to respond to a post from nabble
using a gmail account?

2)  Would it be easier for you if people just used non free accounts?
I don't particularly relish the idea, but if it helped it would be
worth it.


Thanks again,

Josh


On Tue, Jun 1, 2010 at 6:25 AM, Martin Maechler
 wrote:
> Dear readers of R-help
>
> as most of you will *not* be aware, R-help has continued to work the
> way it does, only thanks to a dozen of volunteers,
> see https://stat.ethz.ch/mailman/listinfo/r-help .
>
> The volunteers manually moderate e-mails that "look like spam" (and
> sometimes are and sometimes are not).
> While much more than 90% of the spam is filtered out long before
> a human sees it, with the increasing sophistication of spammers,
> manual intervention has deemed to be necessary and served the
> community very well.
>
> OTOH, in recent weeks, the amount of work for the volunteers has
> increased, mainly because an increasingly number of non-spam postings are
> erronously tagged as "possibly spam".
> We have discussed about this and done some analysis and found
> that most of these message that produce a considerable amount of
> extra work share two properties :
>  1) they are posted via Nabble  {which *always* attaches a small
>                                 pro-Nabble spam at the end of the message}
>  2) the e-mail address of the sender is from a freemail
>    provider, quite often 'at gmail dot com', and often the part
>    *before* the '@' (at-sign) ends with digits.
>
> We hereby ask those among you who use a freemail account to
> please no longer post via nabble.
>
> Thank you for your support of R-help, *the* "community mailing
> list" of the R project since even before that project existed
> "formally", namely since 1997-04-01,
> today 13 years and two months.
>
> Martin Maechler, ETH Zurich
> (and R-help creator and principal manager)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Senior in Psychology
University of California, Riverside
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help "spam" detection; please help the moderators

2010-06-01 Thread Ted Harding
Hi Joris,
The "matched a filter rule" is the principal reason for holding
messages for moderation. Please don't become anxious about the
situation -- one of the reasons we have become concerned about
the situation is that people whose messages get held up do tend
to become worried about it. This is unnecessary -- they, and you,
are not really doing anything wrong!

As I understand it, the filter rules are set by the ethz.ch admins,
and even Martin does not seem to know, in detail, what they are.
Also, it is likely that the filters "learn", and they may well
be "learning" from a lot of other emails received at eth.ch which
have nothing to do with R-help but which are true spam -- then the
headers of such messages could be folded into "Bayes spam scores"
which can trigger "matched a filter rule".

As far as R-help is concerned, the situation seems to be that
gmail.com and nabble.com are important triggers (though there
are plenty of others). Lots of email addresses which do not have
the username ending in digits are trapped in this way.

In round figures, proportions I have logged myself amongst the
messages held because they "matched a filter rule" are:

non-gmail, non-nabble: 30%
non-gmail, nabble: 25%
gmail, non-nabble: 32%
gmail, nabble: 13%

gmail : 45%
nabble: 52%

The true nature of this situation is still unclear!

Ted.


On 01-Jun-10 16:00:40, Joris Meys wrote:
> Hi all,
> 
> I also couldn't help but notice that some of my messages are bounced
> for following reason:
> 
>The message headers matched a filter rule
> 
> I included the header of one of the messages below, but neither of
> these messages is sent trough Nabble, nor does any mail address has
> digits in it.
> I also never had that before. Did you change some of the rules somehow?
> 
> Cheers
> Joris
> 
> ---
> 
> MIME-Version: 1.0
> Received: by 10.140.173.9 with HTTP; Fri, 28 May 2010 05:32:32 -0700
> (PDT)
> In-Reply-To:
> 
> References:
> 
>   
> Date: Fri, 28 May 2010 14:32:32 +0200
> Delivered-To: jorism...@gmail.com
> Message-ID:
> 
> Subject: Re: [R] How to get values out of a string using regular
> expressions?
> From: Joris Meys 
> To: Gabor Grothendieck 
> Cc: R mailing list 
> Content-Type: multipart/alternative;
> boundary=000e0cd2295481515c0487a6b3be
> 
> --000e0cd2295481515c0487a6b3be
> Content-Type: text/plain; charset=ISO-8859-1
> 
> 
> 
> On Tue, Jun 1, 2010 at 3:25 PM, Martin Maechler
> wrote:
> 
>> Dear readers of R-help
>>
>> as most of you will *not* be aware, R-help has continued to work the
>> way it does, only thanks to a dozen of volunteers,
>> see https://stat.ethz.ch/mailman/listinfo/r-help .
>>
>> The volunteers manually moderate e-mails that "look like spam" (and
>> sometimes are and sometimes are not).
>> While much more than 90% of the spam is filtered out long before
>> a human sees it, with the increasing sophistication of spammers,
>> manual intervention has deemed to be necessary and served the
>> community very well.
>>
>> OTOH, in recent weeks, the amount of work for the volunteers has
>> increased, mainly because an increasingly number of non-spam postings
>> are
>> erronously tagged as "possibly spam".
>> We have discussed about this and done some analysis and found
>> that most of these message that produce a considerable amount of
>> extra work share two properties :
>>  1) they are posted via Nabble  {which *always* attaches a small
>> pro-Nabble spam at the end of the
>> message}
>>  2) the e-mail address of the sender is from a freemail
>>provider, quite often 'at gmail dot com', and often the part
>>*before* the '@' (at-sign) ends with digits.
>>
>> We hereby ask those among you who use a freemail account to
>> please no longer post via nabble.
>>
>> Thank you for your support of R-help, *the* "community mailing
>> list" of the R project since even before that project existed
>> "formally", namely since 1997-04-01,
>> today 13 years and two months.
>>
>> Martin Maechler, ETH Zurich
>> (and R-help creator and principal manager)
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Joris Meys
> Statistical Consultant
> 
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
> 
> Coupure Links 653
> B-9000 Gent
> 
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide

Re: [R] R-help "spam" detection; please help the moderators

2010-06-01 Thread Joris Meys
Hi all,

I also couldn't help but notice that some of my messages are bounced for
following reason:

   The message headers matched a filter rule

I included the header of one of the messages below, but neither of these
messages is sent trough Nabble, nor does any mail address has digits in it.
I also never had that before. Did you change some of the rules somehow?

Cheers
Joris

---

MIME-Version: 1.0
Received: by 10.140.173.9 with HTTP; Fri, 28 May 2010 05:32:32 -0700 (PDT)
In-Reply-To: 
References: 

Date: Fri, 28 May 2010 14:32:32 +0200
Delivered-To: jorism...@gmail.com
Message-ID: 
Subject: Re: [R] How to get values out of a string using regular expressions?
From: Joris Meys 
To: Gabor Grothendieck 
Cc: R mailing list 
Content-Type: multipart/alternative; boundary=000e0cd2295481515c0487a6b3be

--000e0cd2295481515c0487a6b3be
Content-Type: text/plain; charset=ISO-8859-1



On Tue, Jun 1, 2010 at 3:25 PM, Martin Maechler
wrote:

> Dear readers of R-help
>
> as most of you will *not* be aware, R-help has continued to work the
> way it does, only thanks to a dozen of volunteers,
> see https://stat.ethz.ch/mailman/listinfo/r-help .
>
> The volunteers manually moderate e-mails that "look like spam" (and
> sometimes are and sometimes are not).
> While much more than 90% of the spam is filtered out long before
> a human sees it, with the increasing sophistication of spammers,
> manual intervention has deemed to be necessary and served the
> community very well.
>
> OTOH, in recent weeks, the amount of work for the volunteers has
> increased, mainly because an increasingly number of non-spam postings are
> erronously tagged as "possibly spam".
> We have discussed about this and done some analysis and found
> that most of these message that produce a considerable amount of
> extra work share two properties :
>  1) they are posted via Nabble  {which *always* attaches a small
> pro-Nabble spam at the end of the message}
>  2) the e-mail address of the sender is from a freemail
>provider, quite often 'at gmail dot com', and often the part
>*before* the '@' (at-sign) ends with digits.
>
> We hereby ask those among you who use a freemail account to
> please no longer post via nabble.
>
> Thank you for your support of R-help, *the* "community mailing
> list" of the R project since even before that project existed
> "formally", namely since 1997-04-01,
> today 13 years and two months.
>
> Martin Maechler, ETH Zurich
> (and R-help creator and principal manager)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R-help "spam" detection; please help the moderators

2010-06-01 Thread Martin Maechler
Dear readers of R-help

as most of you will *not* be aware, R-help has continued to work the
way it does, only thanks to a dozen of volunteers,
see https://stat.ethz.ch/mailman/listinfo/r-help .

The volunteers manually moderate e-mails that "look like spam" (and
sometimes are and sometimes are not).
While much more than 90% of the spam is filtered out long before
a human sees it, with the increasing sophistication of spammers,
manual intervention has deemed to be necessary and served the
community very well.

OTOH, in recent weeks, the amount of work for the volunteers has
increased, mainly because an increasingly number of non-spam postings are
erronously tagged as "possibly spam".
We have discussed about this and done some analysis and found
that most of these message that produce a considerable amount of
extra work share two properties :
 1) they are posted via Nabble  {which *always* attaches a small
 pro-Nabble spam at the end of the message}
 2) the e-mail address of the sender is from a freemail
provider, quite often 'at gmail dot com', and often the part
*before* the '@' (at-sign) ends with digits.

We hereby ask those among you who use a freemail account to
please no longer post via nabble.

Thank you for your support of R-help, *the* "community mailing
list" of the R project since even before that project existed
"formally", namely since 1997-04-01,
today 13 years and two months.

Martin Maechler, ETH Zurich
(and R-help creator and principal manager)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.