On Thu, 16 Jul 2009, Justin Mason wrote:
I plan to ask (on users@, on my blog etc.) for submissions of archives
of ham. Submissions of _just_ false positives is OK, as long as they're
labelled as such, because they'll have differing profiles and too many
FPs in the corpus will cause trouble for the score generation step.
I'll then have a quick go at hand-classifying the submitted corpora,
spotting obvious FNs that slipped in, etc., and will then leave them on
the zone for nightly mass-checks to use as well. So the corpora won't
be private submissions.
Thoughts?
Liability? Someone who provides you with a corpus voluntarily is implying
they don't care if it becomes public; you might want to require a
liability release.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
[email protected] FALaholic #11174 pgpk -a [email protected]
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
You know things are bad when Pravda says we [the USA] have gone
too far to the left. -- Joe Huffman
-----------------------------------------------------------------------
Today: the 64th anniversary of the dawn of the Atomic Age