Re: filtering on language

2001-11-21 Thread Dierk Haasis

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello Thomas!

On Mittwoch, 21. November 2001 at 04:56:39 you wrote:

 No, I just received a spam message about a diet. No sex was required,
 so I deleted it. You lost the bet.

You must have overlooked something. Any diet without sex isn't worth a
mention ...

 You are right. 50% of my private messages are about either sex or
 money, or the combination of both.

Bad count, but it shows that most of us don't have a life. Well, none
worth living. The correct ratio for sex-or-money messages to
non-sex-or-money* should be at least 9:1.

 Just a hint: sex in Turkish is sex, like in most other languages I
 know.

I just took the examples given.


*This is a non-exclusive or opposed to exclusive either-or.



- --
Dierk Haasis
http://www.Write4U.de

PGP keys available: mailto:[EMAIL PROTECTED]?Subject=SendMyPGPkeys

The Bat 1.54/10e on Windows 95 4.0 67306684 C

The whole problem with the world is that fools and fanatics are always
so certain of themselves, but wiser people so full of doubts.
(Bertrand Russell)

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8ckt
Comment: Privacy is the core element to Freedom!

iQA/AwUBO/tYK/To1oA8g8dLEQJJGACgwZxYjIOJnfWUOWrqYu8BgDeg6TgAoK79
kw4h6P3+AcP2l9gX4KIeQ2QM
=vQOl
-END PGP SIGNATURE-


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




filtering on language

2001-11-20 Thread Jan Rifkinson

Hello TB! Listers.

  I've been getting a lot of msgs like the one below  its
  really annoying.

 Original Message Starts --
www.internethaber.com yilin en iyi
ancorhman_n_ belirliyor.
Siz de oy kullanarak tercihinizi koyun.

www.internethaber.com ve www.gazeteoku.com
ile biz haberin tekelini kirdik; gelin siz
de bu siteleri ziyaret ederek, bu tekeli
kirin.
- Original Message Ends ---

  Can anyone imagine a built in macro that identifies
  language used, i.e. %IF not %English? Does anyone think
  this is even possible?

-- 
Jan Rifkinson
Ridgefield, CT USA
TB! V1.54/10/W2K_SP2/PGP Key ID: 0x3F14A060
ICQ 41116329


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Peter Palmreuther

Hello Jan,

On Dienstag, 20. November 2001 at 14:02:14 you wrote (at least in part):

JR   Can anyone imagine a built in macro that identifies
JR   language used, i.e. %IF not %English? Does anyone think
JR   this is even possible?

1.) No
2.) Define 'English' and on what basis setting '%English' should be
affected :-) - Now you should know why: 'No' :-)
-- 
Regards
Peter Palmreuthermailto:[EMAIL PROTECTED]
(The Bat! v1.54/10e on Windows NT 5.0 Build 2195 Service Pack 2)

Let's start a new country up


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Thomas F

Hello Jan,

On Tue, 20 Nov 2001 08:02:14 -0500 GMT (20/11/2001, 21:02 +0800 GMT),
Jan Rifkinson wrote:

JR   Can anyone imagine a built in macro that identifies
JR   language used, i.e. %IF not %English? Does anyone think
JR   this is even possible?

You can build a RegEx that looks for the encoding.

-- 

Cheers,
Thomas.

Moderator der deutschen The Bat! Beginner Liste.

Outside a disco: SMARTS IS THE MOST EXCLUSIVE DISCO IN TOWN. EVERYONE
WELCOME.

Message reply created with The Bat! 1.54/10
under Chinese Windows 98 4.10 Build  A 
using an AMD Athlon K7 1.2GHz, 128MB RAM


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Jan Rifkinson

Hello TB! List.

On Tue, 20 Nov 2001 at 14:29 GMT +0100 (11/20/2001 8:29 AM
where I live) [EMAIL PROTECTED] [Peter] wrote to
[EMAIL PROTECTED] re: 'filtering on language':

Peter 2.) Define 'English' and on what basis setting
Peter '%English' should be [...]

  Well I'm not as technically oriented as you are but I
  would think it could be related to the %language macro
  that already exists.

-- 
Jan Rifkinson
Ridgefield, CT USA
TB! V1.54/10/W2K_SP2/PGP Key ID: 0x3F14A060
ICQ 41116329


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Thomas F

Hello Jan,

On Tue, 20 Nov 2001 09:04:32 -0500 GMT (20/11/2001, 22:04 +0800 GMT),
Jan Rifkinson wrote:

Peter 2.) Define 'English' and on what basis setting
Peter '%English' should be [...]

JR   Well I'm not as technically oriented as you are but I
JR   would think it could be related to the %language macro
JR   that already exists.

That macro will choose the dictionary you use for writing the message
or reply.

-- 

Cheers,
Thomas.

Moderator der deutschen The Bat! Beginner Liste.

Bei Vollmond spricht man nicht.

Message reply created with The Bat! 1.54/10
under Chinese Windows 98 4.10 Build  A 
using an AMD Athlon K7 1.2GHz, 128MB RAM


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Jan Rifkinson

Hello Thomas.

At 8:59 AM on Tuesday, November 20, 2001 you wrote the
following on the posted subject 'filtering on language':

JR   Can anyone imagine a built in macro that identifies
JR   language used, i.e. %IF not %English? Does anyone think
JR   this is even possible?

Thomas You can build a RegEx that looks for the encoding.

  I'm not sure what this means but since it deals with
  RegExp, I should move it to TBTech.

  Thanks, Thomas.

-- 
Jan Rifkinson
Ridgefield, CT USA
TB! V1.54/10/W2K_SP2/PGP Key ID: 0x3F14A060
ICQ 41116329


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Jan Rifkinson

Hello Thomas.

At 9:10 AM on Tuesday, November 20, 2001 you wrote the
following on the posted subject 'filtering on language':

Thomas That macro will choose the dictionary you use for writing the message
Thomas or reply.

  Yes, I understand which is why I used to word related. I
  guess I wasn't clear because I was thinking along the
  lines of:

  %IF %TEXT does not = %language=English (it looked
  thru the dictionary), then Take an action. ???

-- 
Jan Rifkinson
Ridgefield, CT USA
TB! V1.54/10/W2K_SP2/PGP Key ID: 0x3F14A060
ICQ 41116329


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Peter Palmreuther

Hello Jan,

On Dienstag, 20. November 2001 at 15:19:38 you wrote (at least in part):

JR I guess I wasn't clear because I was thinking along the lines of:

JR %IF %TEXT does not = %language=English (it looked thru the
JR dictionary), then Take an action. ???

I think this ain't as clear as you wanted it. What action (e.g.)
should be token? If you want the macro in filters to scan incomming
mails? In a template? What else?
-- 
Regards
Peter Palmreuthermailto:[EMAIL PROTECTED]
(The Bat! v1.54/10e on Windows NT 5.0 Build 2195 Service Pack 2)

I'm not sure if life is trying to pass me by or run me over!


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Thomas F

Hello Jan,

On Tue, 20 Nov 2001 09:06:53 -0500 GMT (20/11/2001, 22:06 +0800 GMT),
Jan Rifkinson wrote:

Thomas You can build a RegEx that looks for the encoding.

JR   I'm not sure what this means but since it deals with
JR   RegExp, I should move it to TBTech.

Yes, but only when it gets technical. g The way you do it is the
looks for the Content-encoding header or any header line that contains
8859-x (substitute Turkish encoding suffix for the x) or whatever you
can identify in these messages. How you do that with RegEx, now *that*
is for TBTECH.

Be careful though, it will catch all messages with that encoding, also
legitimate ones.

JR   Thanks, Thomas.

Don't mention it. FWIW I don't use any spam filters at all, because I
get a lot more carbon-based spam in my snail mail inbox than e-spam.
;-)

-- 

Cheers,
Thomas.

Moderator der deutschen The Bat! Beginner Liste.

Ich bin ferner mit meinen Nerven am Ende und habe mit einer schweren
Kastritis zu tun.

Message reply created with The Bat! 1.54/10
under Chinese Windows 98 4.10 Build  A 
using an AMD Athlon K7 1.2GHz, 128MB RAM


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Thomas F

Hello Jan,

On Tue, 20 Nov 2001 09:19:38 -0500 GMT (20/11/2001, 22:19 +0800 GMT),
Jan Rifkinson wrote:

JR   %IF %TEXT does not = %language=English (it looked
JR   thru the dictionary), then Take an action. ???

Let me assume you are not a programmer. ;-) What you suggest is
technically possible but would mean such an overhead that it is
impractical for our use. I am sure that intelligence agencies use this
approach, though.

-- 

Cheers,
Thomas.

Moderator der deutschen The Bat! Beginner Liste.

Salmon day: Swimming upstream all day to get screwed in the end.

Message reply created with The Bat! 1.54/10
under Chinese Windows 98 4.10 Build  A 
using an AMD Athlon K7 1.2GHz, 128MB RAM


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Alastair Scott


- Original Message -
From: Jan Rifkinson [EMAIL PROTECTED]
To: TB! UDL [EMAIL PROTECTED]
Sent: 20 November 2001 1:02 pm
Subject: filtering on language


 Hello TB! Listers.

   I've been getting a lot of msgs like the one below  its
   really annoying.

  Original Message Starts --
 www.internethaber.com yilin en iyi
 ancorhman_n_ belirliyor.
 Siz de oy kullanarak tercihinizi koyun.

 www.internethaber.com ve www.gazeteoku.com
 ile biz haberin tekelini kirdik; gelin siz
 de bu siteleri ziyaret ederek, bu tekeli
 kirin.
 - Original Message Ends ---

   Can anyone imagine a built in macro that identifies
   language used, i.e. %IF not %English? Does anyone think
   this is even possible?

It's certainly possible through some sort of dictionary comparison, but that
would probably be fatally slow on modest PCs (imagine checking a 2,000 word
email :)

There may be more clever statistical methods - the above is Turkish, and
it's pretty obvious the relative frequency of various letters (eg z and
i) is entirely different from that of English - but more similar languages
(eg two Indo-European ones ;) might not be so simple to differentiate. I am
not a professional linguist so I can't comment definitively.

Alastair



_
This message has been checked for all known viruses by the 
MessageLabs Virus Scanning Service. For further information visit
http://www.messagelabs.com/stats.asp


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Thomas F

Hello Alastair,

On Tue, 20 Nov 2001 14:29:05 - GMT (20/11/2001, 22:29 +0800 GMT),
Alastair Scott wrote:

AS There may be more clever statistical methods - the above is Turkish, and
AS it's pretty obvious the relative frequency of various letters (eg z and
AS i) is entirely different from that of English -

This would be difficult to implement in a TB filter. But I just had an
idea:

You can actually filter for certain words that are likely to occur in
most Turkish-language spams, such as siteler (web sites), for example.
You can also use other simple words from the Turkish language. Without
a scoring mechanism - i.e. just if one of those five or ten words is
found, it's a hit - make your own, very simple, language parser in the
form of a TB filter.

-- 

Cheers,
Thomas.

Moderator der deutschen The Bat! Beginner Liste.

Analogies in writing are like feathers on a snake.

Message reply created with The Bat! 1.54/10
under Chinese Windows 98 4.10 Build  A 
using an AMD Athlon K7 1.2GHz, 128MB RAM


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Alastair Scott


- Original Message -
From: Thomas F [EMAIL PROTECTED]
To: Alastair Scott on TBUDL [EMAIL PROTECTED]
Sent: 20 November 2001 2:49 pm
Subject: Re: filtering on language


 Hello Alastair,

 On Tue, 20 Nov 2001 14:29:05 - GMT (20/11/2001, 22:29 +0800 GMT),
 Alastair Scott wrote:

 AS There may be more clever statistical methods - the above is Turkish,
and
 AS it's pretty obvious the relative frequency of various letters (eg z
and
 AS i) is entirely different from that of English -

 This would be difficult to implement in a TB filter. But I just had an
 idea:

 You can actually filter for certain words that are likely to occur in
 most Turkish-language spams, such as siteler (web sites), for example.
 You can also use other simple words from the Turkish language. Without
 a scoring mechanism - i.e. just if one of those five or ten words is
 found, it's a hit - make your own, very simple, language parser in the
 form of a TB filter.

That would work - translations of sex and money would probably catch 95
per cent of spam ;)

The frequency analysis is actually very subtle - two other languages which
have lots of zs that come to mind are German and Polish. The huge mass of
rules needed to differentiate one language from another would probably be
just as slow as the dictionary lookup.

Alastair



_
This message has been checked for all known viruses by the 
MessageLabs Virus Scanning Service. For further information visit
http://www.messagelabs.com/stats.asp


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Dierk Haasis

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello Alastair!

On Dienstag, 20. November 2001 at 15:29:05 you wrote:

 There may be more clever statistical methods - the above is Turkish, and
 it's pretty obvious the relative frequency of various letters (eg z and
 i) is entirely different from that of English - but more similar languages
 (eg two Indo-European ones ;) might not be so simple to differentiate. I am
 not a professional linguist so I can't comment definitively.

You mean, for instance, Faeroese and Pashto?


- --
Dierk Haasis
http://www.Write4U.de

PGP keys available: mailto:[EMAIL PROTECTED]?Subject=SendMyPGPkeys

The Bat 1.54/10e on Windows 95 4.0 67306684 C

Talk slowly, but think quickly.

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8ckt
Comment: Privacy is the core element to Freedom!

iQA/AwUBO/pnnPTo1oA8g8dLEQKKeQCfZzZvntnFJ6gwBV4lysgLw16f5wwAoPK5
JL/OwG3TjzYE3c3PVbaOgxOt
=cAiN
-END PGP SIGNATURE-


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re[2]: filtering on language

2001-11-20 Thread Gerard de Vries


Tuesday, November 20, 2001, 4:12:25 PM, you wrote:


AS - Original Message -
AS From: Thomas F [EMAIL PROTECTED]
AS To: Alastair Scott on TBUDL [EMAIL PROTECTED]
AS Sent: 20 November 2001 2:49 pm
AS Subject: Re: filtering on language




 On Tue, 20 Nov 2001 14:29:05 - GMT (20/11/2001, 22:29 +0800 GMT),
 Alastair Scott wrote:

AS That would work - translations of sex and money would probably catch 95
AS per cent of spam ;)

To bad sex is sex in almost any language ;-)

Best regards,
 Gerard  

Real men don't ask directions




-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




OT: filtering on language

2001-11-20 Thread Thomas F

Hello Alastair,

On Tue, 20 Nov 2001 15:12:25 - GMT (20/11/2001, 23:12 +0800 GMT),
Alastair Scott wrote:

AS The frequency analysis is actually very subtle - two other languages which
AS have lots of zs that come to mind are German and Polish. The huge mass of
AS rules needed to differentiate one language from another would probably be
AS just as slow as the dictionary lookup.

I just came across some language guessers on the internet:

http://www.xrce.xerox.com/research/mltt/tools/guesser/

This one identified the text Jan originally posted as Turkish_iso9.

http://odur.let.rug.nl/~vannoord/TextCat/Demo/textcat.html

This one identified the language as unkown, even though Turkish is
in their list of supported languages. However, it is open source and -
yes, a Perl script! - so you can run it in TB v2. Oh, and I just saw
that he gives a comprehensive list of competitors, i.e. links to
other language identifiers.

-- 

Cheers,
Thomas.

Moderator der deutschen The Bat! Beginner Liste.

It was so hot during football practice that a lot of kids keeled over
from nervous prostitution.

Message reply created with The Bat! 1.54/10
under Chinese Windows 98 4.10 Build  A 
using an AMD Athlon K7 1.2GHz, 128MB RAM


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Jan Rifkinson

Hello Peter

At 9:25 AM on Tuesday, November 20, 2001 you wrote the
following on the posted subject 'filtering on language':

JR %IF %TEXT does not = %language=English (it looked thru the
JR dictionary), then Take an action. ???

Peter I think this ain't as clear as you wanted it. What action (e.g.)
Peter should be token? If you want the macro in filters to scan incomming
Peter mails? In a template? What else?

  Move to Trash for example.

-- 
Jan Rifkinson
Ridgefield, CT USA
TB! V1.54/10/W2K_SP2/PGP Key ID: 0x3F14A060
ICQ 41116329


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Jan Rifkinson

Hello Thomas

At 9:26 AM on Tuesday, November 20, 2001 you wrote the
following on the posted subject 'filtering on language':

Thomas [...] The way you do it is the looks for the
Thomas Content-encoding header or any header line that
Thomas contains 8859-x (substitute Turkish encoding suffix
Thomas for the x) or whatever you can identify in these
Thomas messages. How you do that with RegEx, now *that* is
Thomas for TBTECH. [...]

  I'm not sure I understand where this content-encoding
  header 8859- resides. I searched the header for this w/o
  result.

Thomas Be careful though, it will catch all messages with
Thomas that encoding, also legitimate ones.

  Well, my assumption is that if anyone from Turkey wants to
  communicate w me legitimately, they'll write me in English
  -- not because I'm being arrogant or nationalistic --
  because that's my native language.

  I'm going to follow up on those translator links you
  posted as well. Thanks.

-- 
Jan Rifkinson
Ridgefield, CT USA
TB! V1.54/10/W2K_SP2/PGP Key ID: 0x3F14A060
ICQ 41116329


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re[2]: filtering on language

2001-11-20 Thread Jernej Simoni

Hello Jan,

20. november 2001, 18:32:50, you wrote:

JR   I'm not sure I understand where this content-encoding
JR   header 8859- resides. I searched the header for this w/o
JR   result.

Check this message. It's encoding is ISO-8859-2, so it has this in the
headers:
Content-Type: text/plain; charset=ISO-8859-2

JR   Well, my assumption is that if anyone from Turkey wants to
JR   communicate w me legitimately, they'll write me in English
JR   -- not because I'm being arrogant or nationalistic --
JR   because that's my native language.

Many users don't even know how to set character encoding. Besides,
with ISO-8859-x encodings you can still write English - just look at
this message. The only thing that differs is the display of special
characters, like ...

-- 
Jernej Simoncic, [EMAIL PROTECTED]
http://www2.arnes.si/~sopjsimo/
ICQ: 26266467

[The Bat! v1.54/10e on Windows 98 4.10.67766222. ]

1. Life can only be understood backwards, but it must be lived
   forwards.
2. No matter what goes wrong, there is always somebody who knew it
   would.
   -- Laws of Understanding


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Jan Rifkinson

Hello Jernej.

At 1:19 PM on Tuesday, November 20, 2001 you wrote the
following on the posted subject 'filtering on language':

Jernej Check this message. It's encoding is ISO-8859-2

  Found it.

Jernej Many users don't even know how to set character
Jernej encoding.

  So this is not something that is set by the email client
  automatically? Would it be safe for me to assume that if
  this guy is using a Turkish email program that it would
  just be set up that way or if he was using a non-Turkish
  email client that he would have to set something that
  would reveal the language setting?

Jernej Besides, with ISO-8859-x encodings you can still
Jernej write English - just look at this message. The only
Jernej thing that differs is the display of special
Jernej characters, like ...

  So you are saying that I could also lose a lot of English
  msgs as well if I start fooling around with this idea as
  a filter?

-- 
Jan Rifkinson
Ridgefield, CT USA
TB! V1.54/10/W2K_SP2/PGP Key ID: 0x3F14A060
ICQ 41116329


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re[2]: filtering on language

2001-11-20 Thread Jernej Simoni

Hello Jan,

20. november 2001, 23:01:58, you wrote:

JR   So this is not something that is set by the email client
JR   automatically? Would it be safe for me to assume that if
JR   this guy is using a Turkish email program that it would
JR   just be set up that way or if he was using a non-Turkish
JR   email client that he would have to set something that
JR   would reveal the language setting?

You usually set the default encoding, but not necessary. Some mailers
automatically set the encoding to the corresponding Windows codepage.

JR   So you are saying that I could also lose a lot of English
JR   msgs as well if I start fooling around with this idea as
JR   a filter?

You'd probably loose a lot of messages, as many are written in
ISO-8859-x, and in different Windows- encodings...

-- 
Jernej Simoncic, [EMAIL PROTECTED]
http://www2.arnes.si/~sopjsimo/
ICQ: 26266467

[The Bat! v1.54/10e on Windows 98 4.10.67766222. ]

What you don't know will always hurt you.
   -- Law of Blissful Ignorance


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Jan Rifkinson

On Tue, 20 Nov 2001 at 23:52 GMT +0100 (11/20/2001 5:52 PM
where I live) [EMAIL PROTECTED] [Jernej]
wrote to [EMAIL PROTECTED] re: 'filtering on
language':

Jernej You'd probably loose a lot of messages, as many are
Jernej written in ISO-8859-x, and in different Windows-
Jernej encodings...

  OK, thanks for helping me understand.
  
-- 
Jan Rifkinson
Ridgefield, CT USA
TB! V1.54/10/W2K_SP2/PGP Key ID: 0x3F14A060
ICQ 41116329


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com 




Re: filtering on language

2001-11-20 Thread Thomas F

Hi Dierk,

On Tue, 20 Nov 2001 17:01:50 +0100GMT (21/11/2001, 00:01 +0800GMT),
Dierk Haasis wrote:

 That would work - translations of sex and money would probably
 catch 95 per cent of spam ;)

DH I bet it would get 100%, but it doesn't matter how good it works
DH positively.

No, I just received a spam message about a diet. No sex was required,
so I deleted it. You lost the bet.

DH You have to take into account its negative: How many percent of
DH legitimate - and maybe wanted - messages get caught?

You are right. 50% of my private messages are about either sex or
money, or the combination of both.

DH In this special case - presumed the receiver doesn't even speak/read a
DH language - any message in this language can be positively deleted.
DH None of them would be alright. As long as no one tries to send him a
DH message containing some of the trigger words (e.g. money or sex in
DH Turkish)

Just a hint: sex in Turkish is sex, like in most other languages I
know.

-- 

Cheers,
Thomas.

Moderator der deutschen The Bat! Beginner Liste. Anmeldung unter:
[EMAIL PROTECTED]  

Message reply created with The Bat! 1.53t
under Chinese Windows 98 4.10 Build 1998  
on a Pentium II/350 MHz.


-- 

Archives   : http://tbudl.thebat.dutaint.com
Moderators : mailto:[EMAIL PROTECTED]
TBTech List: mailto:[EMAIL PROTECTED]
Unsubscribe: mailto:[EMAIL PROTECTED]
Latest Vers: 1.53d
FAQ: http://faq.thebat.dutaint.com