On 25/06/2024 16:21, Bill Cole wrote:
On 2024-06-24 at 17:18:11 UTC-0400 (Mon, 24 Jun 2024 17:18:11 -0400)
Mark London <m...@psfc.mit.edu>
is rumored to have said:
I received a spam email with the text below, that wasn't caught by
Spamassasin (at least mine). The text actually looks like something
that was generated using ChatGPT. In any event, I put the text
through ChatGPT, and asked if it looked like spam. At the bottom of
this email , is it's analysis. I've not been fully reading this
group. Has there been any work to allow Spamassassin to use AI?
"Artificial intelligence" does not exist. It is a misnomer.
Large language models like ChatGPT have a provenance problem. There's
no way to know why exactly the model "says" anything. In a single
paragraph, ChatGPT is capable of making completely and directly
inconsistent assertions. The only way to explain that is that despite
appearances, a request to answer the ham/spasm question generates text
with no semantic connection to the original, but which seems like an
explanation.
SpamAssassin's code and rules all come from ASF committers, and the
scores are determined by examining the scan results from contributors
and optimizing them to a threshold of 5.0. Every scan of a message
results in a list of hits against documented rules. The results can be
analyzed and understood.
We know that ChatGPT and other LLMs that are publicly available have
been trained on data to which they had no license. There is no way to
remove any particular ingested data. There's no way to know where any
particular LLM will have problems and no way to fix those problems.
This all puts them outside of the boundaries we have as an ASF
project. However, we do have a plugin architecture, so it is possible
for 3rd parties to create a plugin for LLM integration.
I asked a well known chatbot: What would Bill Cole say if he was asked
about integrating AI in to spamassassin?
Bill Cole, the maintainer of SpamAssassin, is likely to be
enthusiastic about the potential of AI to improve spam detection.
Here's what he might say:
* Improved Spam Filtering: AI could be used to analyze spam emails
with greater accuracy than traditional rule-based methods.
SpamAssassin currently relies on matching email content to predefined
rules, but AI could identify spam based on subtler cues like phrasing,
sender behavior, and content structure.
* Adapting to Evolving Threats: Spammers constantly develop new
tactics to bypass filters. AI can learn and adapt to these evolving
threats much faster than manually updating rules.
* Mitigating False Positives: AI could help reduce the number of
legitimate emails mistakenly flagged as spam (false positives). This
is a major concern for SpamAssassin users.
However, Bill Cole might also mention some challenges:
* Training Data Bias: The effectiveness of AI depends on the quality
and balance of training data. Biases in the training data could lead
to the AI disproportionately flagging certain types of emails as spam.
* Transparency and Explainability: It can be difficult to understand
how AI systems arrive at their decisions. This lack of transparency
could make it hard to debug errors or identify potential biases.
* Computational Cost: Training and running AI models can require
significant computational resources, which might not be feasible for
all SpamAssassin users.
Overall, Bill Cole would likely be cautiously optimistic about the
potential of AI to improve SpamAssassin. He would acknowledge the
potential benefits but also highlight the challenges that need to be
addressed.
:)