On 25/06/2024 16:21, Bill Cole wrote:
On 2024-06-24 at 17:18:11 UTC-0400 (Mon, 24 Jun 2024 17:18:11 -0400)
Mark London <m...@psfc.mit.edu>
is rumored to have said:

I received a spam email with the text below, that wasn't caught by Spamassasin (at least mine).   The text actually looks like something that was generated using ChatGPT.  In any event,  I put the text through ChatGPT, and asked if it looked like spam.  At the bottom of this email , is it's analysis.  I've not been fully reading this group.  Has there been any work to allow Spamassassin to use AI?

"Artificial intelligence" does not exist. It is a misnomer.

Large language models like ChatGPT have a provenance problem. There's no way to know why exactly the model "says" anything. In a single paragraph, ChatGPT is capable of making completely and directly inconsistent assertions. The only way to explain that is that despite appearances, a request to answer the ham/spasm question generates text with no semantic connection to the original, but which seems like an explanation.

SpamAssassin's code and rules all come from ASF committers, and the scores are determined by examining the scan results from contributors and optimizing them to a threshold of 5.0. Every scan of a message results in a list of hits against documented rules. The results can be analyzed and understood.

We know that ChatGPT and other LLMs that are publicly available have been trained on data to which they had no license. There is no way to remove any particular ingested data. There's no way to know where any particular LLM will have problems and no way to fix those problems. This all puts them outside of the boundaries we have as an ASF project. However, we do have a plugin architecture, so it is possible for 3rd parties to create a plugin for LLM integration.

I asked a well known chatbot: What would Bill Cole say if he was asked about integrating AI in to spamassassin?

Bill Cole, the maintainer of SpamAssassin, is likely to be enthusiastic about the potential of AI to improve spam detection.

Here's what he might say:

* Improved Spam Filtering: AI could be used to analyze spam emails with greater accuracy than traditional rule-based methods. SpamAssassin currently relies on matching email content to predefined rules, but AI could identify spam based on subtler cues like phrasing, sender behavior, and content structure.

* Adapting to Evolving Threats: Spammers constantly develop new tactics to bypass filters. AI can learn and adapt to these evolving threats much faster than manually updating rules.

* Mitigating False Positives: AI could help reduce the number of legitimate emails mistakenly flagged as spam (false positives). This is a major concern for SpamAssassin users.

However, Bill Cole might also mention some challenges:

* Training Data Bias: The effectiveness of AI depends on the quality and balance of training data. Biases in the training data could lead to the AI disproportionately flagging certain types of emails as spam. * Transparency and Explainability: It can be difficult to understand how AI systems arrive at their decisions. This lack of transparency could make it hard to debug errors or identify potential biases. * Computational Cost: Training and running AI models can require significant computational resources, which might not be feasible for all SpamAssassin users.

Overall, Bill Cole would likely be cautiously optimistic about the potential of AI to improve SpamAssassin. He would acknowledge the potential benefits but also highlight the challenges that need to be addressed.

:)

Reply via email to