Re: help with regular expressions
Hi Task, 22-Feb-2003, 03:02 -0400 (07:02 UK time) Task Control said: To procces the texts of the mails in vampire, i use first the uppercase function (i.e: uppercase(GooGLe) = GOOGLE), my questions are: - what happened if i applied uppercase function to a regular expression. ¿it's will work fine? - Don't. It won't work fine. The RegEx must be passed unmodified raw text and let the analysis routines within RegEx do their own case conversion. ¿what is better? a. tell to plug'in user: your mails will be converted to a uppercase letters to be processed. Make your regex's filters accord this. No. b. in silence convert the expresions and mail texts to uppercase. No. Best to say If an expression is regex leave the case unmodified. - did you like see a regex's filter in vampire? Yes. ¿anyone knows a good regular expresions guide? please reply with the url. http://www.silverstones.com/thebat/RegEx.html and TheBat help file. -- Cheers -- .\\arck D Pearlstone -- List moderator TB! v1.63 Beta/7 on Windows 2000 5.0.2195 Service Pack 2 ' smime.p7s Description: S/MIME Cryptographic Signature Current version is 1.62 | Using TBDEV information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[4]: Regular expressions in AntiSpam: is it possible?
Hello, Task. You wrote in mid:[EMAIL PROTECTED] TC and I don't undertand: what are you thinking men please, explain TC you. TC first question: what is a regular expression. Well, the simplest example: you can see my letter-prefix: You wrote in mid:[EMAIL PROTECTED]. This is generated automatically by template using regular expression. Any letter contain ID as [EMAIL PROTECTED]. You need to insert the mid:; into the angle brackets. It realized by just one line in the answer template: %SetPattRegExp=\(.*)\ You wrote in mid:%RegExpMatch=%OMSGID; The pattern for regular expression is \(.*)\. The first \ and the last \ simple mean and . Phrase in parentheses (.*) means the very subsrting which will be returned as a result. Inside the phrase: . called 'atom' and means any symbol, * called 'quantifier', is attached to . and means previous atom any times, so .* means some symbols. By applying this whole regexp pattern to a string enclosed in angle brackets, like '[EMAIL PROTECTED]' you'll get the string without brackets, ie. '[EMAIL PROTECTED]' because the brackets in the pattern are outside the parentheses. So the macros defines the pattern '\(.*)\' by %SetPattRegExp, then applies it to message ID by %RegExpMatch='%OMSGID' and then places the result between You wrote in mid:; and . ...just few more examples: ...if you have to find 'elephant' in a letter, you just need to define appropriate pattern for recognizing by it different variants (let me treat non-empty result as 'true' and empty string as 'false') '\belephant\b'FALSE for 'my elephant red' and TRUE for 'telephantom' '\Belephant\B'TRUE for 'my elephant red' and FALSE for 'telephantom' '\W(elephant)\W' TRUE for '$elephant#' and FALSE for '3elephants', 's elephant df' 'e\s+l\s+e\s+p\s+h\s+a\s+n\s+t' (the \s+ between every letters) TRUE for 'elephant' and 'e l e ph a nt' '(?s)e\.?l\.?e\.?p\.?h\.?a\.?n\.?t' (the \.? between every letters) TRUE for 'elephant', 'e.l.e.p.h.a.n.t' and 'e l p h a n t' So, if you need to filter any specific variant you mus just construct the regexp from your signal word by adding to it prefix and/or suffix and (if necessary) by inserting something between letters. That us take a line for search, for example our 'elephant'. We can test a text for the line by two steps: 1. Construct a regexp pattern from signal line: - for 'partial search' - leave the line: 'elephant' -- 'elephant' (no change) - for 'wholeword search', add '\b' before and after line; 'elephant' -- '\belephant\b' - for 'space-delimited search', add '\s+' between letters 'elephant' -- 'e\s+l\s+e\s+p\s+h\s+a\s+n\s+t' - for any other phantasy - just learn how to make appropriate regexp. 2. Just apply constructed pattern to a target text by calling pcre.dll. That's all! TC we can make a plug'in that call a dll, and use the dll in the bat! I meant that The Bat! is already uses regular expressions! And if I understand clear, it uses the same pcre.dll, but it is statically linked into TheBat.exe (I think that Stefan Tanurkov knows it better :) My point was if Tha Bat! developers make the library accessible for sharing then it would not be necessary for 'outsiders' to deploy an extra copy of the library with their plugins. -- Sincerely, Alexey. Using TB 1.63b7 on WinXP SP1 Corp + MUI RU, spelling by ORFO2002 mailto:[EMAIL PROTECTED] Current version is 1.62 | Using TBDEV information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: vampire updated now 0.01c
Estimados seguidores del tbdev arroba thebat.dutaint.com: En relación a lo que Luc en su momento posteó: L Good evening Task, L It was foretold that on 22-2-2003 16:50:28 GMT-0400 (which was L 21:50:28 where I live) Task Control would mumble: L snipped a bit TC the firsttime was puted in the same directory that we have put TC vampire? you see the correct directory in the title of the windows in TC the cionfioguration utility? L Yes TCincluding c:\vampire.ini L Question first: does the vampire.ini file need to be in the same L directory as vampire? no, vampire only works is vampire.ini is it in c:\, in the nexts releases i move the information of vampire.ini to the register of windows. -- Se despide, Task Control mail: TaskControl at SoftHome dot net correo: TaskControl arroba SoftHome punto net Usando: - Windows 98 4.10.1998 - AVG 6.0 Free Edition - The Bat! 1.63 Beta/7 - Trillian PRO 1.0 B Current version is 1.62 | Using TBDEV information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re: PACSPAM words list
Estimados seguidores del tbdev arroba thebat.dutaint.com: En relación a lo que Leif en su momento posteó: LG Hello users, LG Well, here again are my current list of words. These have been tweaked LG over and over since PACSPAM came out, and so far, LG the only ones getting through right now are the ones completely in LG Russian, or Korean. Anyone have any thoughts on being able to LG filter those ones? easy, capture the ascii chars that represents this words and put it's in your list files. LG The problem I'm having filtering on those is trying to find words LG (quoted, because it all looks like garbage on my screen), which would LG be more or less common in all of them. maybe you can filter using the name of the table of codification, i think you can find it in the raw message. LG So, these words are in the UNKNOWN / BODY / COMPLETE WORDS list. I'd LG really like to see other people's lists too. Once I'm happy that LG nothing is being caught that shouldn't. I'll move this words list into LG the TRASH tab. ¿why you are not using vampire? i was abandoned the developed of pacspam because, ... in the pacspam page you can find the explain. You can use your pacspam list in vampire. LG :pacSpam - unknow list LG : [...] LG : if you put the list of words in the body, the plug'in can say: it is spam, you know, ¡a lot of words from spam! please zip it (or rar it) when you send. -- Se despide, Task Control mail: TaskControl at SoftHome dot net correo: TaskControl arroba SoftHome punto net Usando: - Windows 98 4.10.1998 - AVG 6.0 Free Edition - The Bat! 1.63 Beta/7 - Trillian PRO 1.0 B Current version is 1.62 | Using TBDEV information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re: help with regular expressions
Hi Task, On Sat, 22 Feb 2003, at 03:02:49 [GMT -0400] (which was 12:02 AM where I live) you wrote: TC Hi, i'm working in a regular expresions (regex's) filter in TC vampire, i use the freeware TRegExpr, avalaible in TC http://anso.da.ru One thing that should help using regexps, is the case I mentioned once before where I was receiving a lot of SPAM with similar to the below: To be re move d from fu ture e mail ings cli ck here With regexps, we can have it ignore whitespaces (space, tab etc). That should help out considerably if Bayesian filtering catches on, because it's the only way I can foresee them being able to send you their message without inventing a whole new lexicon (short of just misspelling everything). Speaking of which.. Everyone has put the thebat.dutaint.com as a PARTIAL word in the KLUDGES of the SECURE tab right? If you don't do something to that effect, since we are discussing SPAM and word lists, they'll be caught by the PACSPAM plugin. By doing that, it covers all the lists except for TBOT, but here's my PARTIAL words list for KLUDGES on the SECURE tab: thebat.dutaint.com [EMAIL PROTECTED] TBLH silverstones.com The last two are for another list I run, and for list moderation of the TB lists that Marck has set up, so you probably don't need those last two. -- Cheers, Leif Gregory List Moderator (and fellow registered end-user) PCWize Editor / ICQ 216395 / PGP Key ID 0x7CD4926F Web Site http://www.PCWize.com TB FAQ http://www.silverstones.com/thebat/FAQ.html Using The Bat! 1.63 Beta/6 under Windows 2000 5.0 Build 2195 Service Pack 3 on a P4 1.6Ghz OC'd to 2.32Ghz with 512MB. Tagline of the day: Where are we going and why am I in this handbasket? Current version is 1.62 | Using TBDEV information: http://www.silverstones.com/thebat/TBUDLInfo.html
vampire with support to regular expressions uploaded
Estimados seguidores del tbdev arroba thebat.dutaint.com a beta version for test, this time with support to regular expressions, you can get it, and test from: http://fyberger.tripod.com/vampire/vampire.htm news in the version: - new way to make the international support - regular expressions - syntax check utility for regex's -- Se despide, Task Control mail: TaskControl at SoftHome dot net correo: TaskControl arroba SoftHome punto net Usando: - Windows 98 4.10.1998 - AVG 6.0 Free Edition - The Bat! 1.63 Beta/7 - Trillian PRO 1.0 B Current version is 1.62 | Using TBDEV information: http://www.silverstones.com/thebat/TBUDLInfo.html