Hi there,

On Fri, 31 Aug 2012, Maarten Broekman wrote:

I see where your confusion comes from.  I'm not generating pdb
signatures.  I'm generating ndb signatures ...

Sorry, bit of a senior moment there.  They seem to be creeping up on
me lately. :(  I had to go back and read

http://www.clamav.net/doc/latest/signatures.pdf

again.

I'm still perplexed by the numbers here.  You say that you have
signatures of the order of 8k characters, and you want to save (O)10
characters here and there in the signatures.  It seems like you're
fighting an uphill battle, what else am I missing?  Have you estimated
the gains you're going to be able to make?  How many occurrences of
the target replacements do you expect to find in the signatures?

A *long* time ago I was faced with something superficially similar, in
the context of trying to fit the descriptions for 50,000+ stationery
products into 40 character strings.  Descriptions were abbreviated,
ad-hoc, apparently by careless staff for whom English was at best a
second language.  A very large number of corrections was necessary.
It was a nightmare, and it needed to be done four times per annum, so
I wrote a simple parser in Perl.  Amongst other things, it used a kind
of 'thesaurus' of text strings.  Here's a brief extract:
...
*B/FILE
*BXFILE
*BOXFILE
BOX FILE
*BRACKETS
*BRCK
BRACKET
...

The asterisk is just a character which didn't often appear in the
input descriptions.  Your thesaurus would probably look something like
...
*hyyp://
*hyyps://
{7-8}
...

It's a very simple idea.  The input is a catalogue which contains tens
of thousands of single-line descriptions of products.  A description
line is matched against the thesaurus.  If a string is found in the
line which matches one of the strings in the thesaurus which you see
prefixed by an asterisk, then it is replaced by the string following
next in the thesaurus which is not prefixed by an asterisk.  It's an
easy thing to do in Perl, but if Perl isn't your second language you
might find it testing.  If it's of interest please give me some more
examples of your replacement requirements and I'll dust off the code.

--

73,
Ged.
_______________________________________________
Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net
http://www.clamav.net/support/ml

Reply via email to