Five things privacy experts know about AI:

1. AI models memorize their training data
2. AI models then leak their training data
3. Ad hoc protections don't work
4. Robust protections exist, though their mileage may vary
5. The larger the model, the worse it gets
Bonus thing: AI companies are overwhelmingly dishonest


In November, I participated in a technologist roundtable about privacy and AI, 
for an audience of policy folks and regulators. The discussion was great! It 
also led me to realize that there a lot of things that privacy experts know and 
agree on about AI… but might not be common knowledge outside our bubble.

That seems the kind of thing I should write a blog post about!

1. AI models memorize their training data
When you train a model with some input data, the model will retain a 
high-fidelity copy of some data points. If you "open up" the model and analyze 
it in the right way, you can reconstruct some of its input data nearly exactly. 
This phenomenon is called memorization.

Memorization happens by default, to all but the most basic AI models. It's 
often hard to quantify: you can't say in advance which data points will be 
memorized, or how many. Even after the fact, it can be hard to measure 
precisely. Memorization is also hard to avoid: most naive attempts at 
preventing it fail miserably — more on this later.

Memorization can be lossy, especially with images, which aren't memorized 
pixel-to-pixel. But if your training data contains things like phone numbers, 
email addresses, recognizable faces… Some of it will inevitably be stored by 
your AI model. This has obvious consequences for privacy considerations.

2. AI models then leak their training data
Once a model has memorized some training data, an adversary can typically 
extract it, even without direct access to the internals of the model. So the 
privacy risks of memorization are not theoretical: AI models don't just 
memorize data, they regurgitate it as well.

In general, we don't know how to robustly prevent AI models from doing things 
they're not supposed to do. That includes giving away the data they dutifully 
memorized. There's a lot of research on this topic, called "adversarial machine 
learning"… and it's fair to say that the attackers are winning against the 
defenders by a comfortable margin.

Will this change in the future? Maybe, but I'm not holding my breath. To really 
secure a thing against clever adversaries, we first have to understand how the 
thing works. We do not understand how AI models work. Nothing seems to indicate 
that we will figure it out in the near future.

3. Ad hoc protections don't work
There are a bunch of naive things you can do to try and avoid problems 1 and 2. 
You can remove obvious identifiers in your training data. You can deduplicate 
the input data. You can use regularization during training. You can apply 
alignment techniques after the fact to try and teach your model to not do bad 
things. You can tweak your prompt and tell your chatbot to pretty please don't 
reidentify people like a creep1. You can add a filter to your language model to 
catch things that look bad before they reach users.

You can list all those in a nice-looking document, give it a fancy title like 
"Best practices in AI privacy", and feel really good about yourself. But at 
best, these will limit the chances that something goes wrong during normal 
operation, and make it marginally more difficult for attackers. The model will 
still have memorized a bunch of data. It will still leak some of this data if 
someone finds a clever way to extract it.

Fundamental problems don't get solved by adding layers of ad hoc mitigations.

4. Robust protections exist, though their mileage may vary
To prevent AI models from memorizing their input, we know exactly one robust 
method: differential privacy (DP). But crucially, DP requires you to precisely 
define what you want to protect. For example, to protect individual people, you 
must know which piece of data comes from which person in your dataset. If you 
have a dataset with identifiers, that's easy. If you want to use a humongous 
pile of data crawled from the open Web, that's not just hard: that's 
fundamentally impossible.

In practice, this means that for massive AI models, you can't really protect 
the massive pile of training data. This probably doesn't matter to you: chances 
are, you can't afford to train one from scratch anyway. But you may want to use 
sensitive data to fine-tune them, so they can perform better on some task. 
There, you may be able to use DP to mitigate the memorization risks on your 
sensitive data.

This still requires you to be OK with the inherent risk of the off-the-shelf 
LLMs, whose privacy and compliance story boils down to "everyone else is doing 
it, so it's probably fine?".

To avoid this last problem, and get robust protection, and probably get better 
results… Why not train a reasonably-sized model entirely on data that you fully 
understand instead?

It will likely require additional work. But it will get you higher-quality 
models, with a much cleaner privacy and compliance story. Understanding your 
training data better will also lead to safer models, that you can debug and 
improve more easily.

5. The larger the model, the worse it gets
Every privacy problem gets worse for larger models. They memorize more training 
data. They do so in ways that more difficult to predict and measure. Their 
attack surface is larger. Ad hoc protections get less effective.

Larger, more complex models also make it harder to use robust privacy notions 
for the entire training data. The privacy-accuracy trade-offs are steeper, the 
performance costs are higher, and it typically gets more difficult to really 
understand the privacy properties of the original data.


Bonus thing: AI companies are overwhelmingly dishonest
I think most privacy experts would agree with this post so far. There are 
divergences of opinion when you start asking "do the benefits of AI outweigh 
the risks". If you ask me, the benefits are extremely over-hyped, while the 
harms (including, but not limited to, privacy risks) are very tangible and 
costly. But other privacy experts I respect are more bullish on the potentials 
of this technology, so I don't think there's a consensus there.

AI companies, however, do not want to carefully weigh benefits against risks. 
They want to sell you more AI, so they have a strong incentive to downplay the 
risks, and no ethical qualms doing so. So all these facts about privacy and AI… 
they're pretty inconvenient. AI salespeople would like it a lot if everyone — 
especially regulators — stayed blissfully unaware of these.

Conveniently for AI companies, things that are obvious truths to privacy 
experts are not widely understood. In fact, they can be pretty 
counter-intuitive!

>From a distance, memorization is surprising. When you train an LLM, sentences 
>are tokenized, words are transformed into numbers, then a whole bunch of math 
>happens. It certainly doesn't look like you copy-pasted the input anywhere.
LLMs do an impressive job at pretending to be human. It's super easy for us to 
antropomorphize them, and think that if we give them good enough instructions, 
they'll "understand", and behave well. It can seem strange that they're so 
vulnerable to adversarial inputs. The attacks that work on them would never 
work on real people!
People really want to believe that every problem can be fixed with just a 
little more work, a few more patches. We're very resistant to the idea that 
some problem might be fundamental, and not have a solution at all.
Companies building large AI models use this to their advantage, and do not 
hesitate making statements that they clearly know to be false. Here's OpenAI 
publishing statements like « memorization is a rare failure of the training 
process ». This isn't an unintentional blunder, they know how this stuff works! 
They're lying through their teeth, hoping that you won't notice.

Like every other point outlined in this post, this isn't actually AI-specific. 
But that's a story for another day…

Additional remarks and further reading
On memorization: I recommend Katharine Jarmul's blog post series on the topic. 
It goes into much more detail about this phenomenon and its causes, and comes 
with a bunch of references. One thing I find pretty interesting is that 
memorization may be unavoidable: some theoretical results suggest that some 
learning tasks cannot be solved without memorizing some of the input!

On privacy attacks on AI models: this paper is a famous example of how to 
extract training data from language models. It also gives figures on how much 
training data gets memorized. This paper is another great example of how bad 
these attacks can be. Both come with lots of great examples in the appendix.

On the impossibility of robustly preventing attacks on AI models: I recommend 
two blog posts by Arvind Narayanan and Sayash Kapoor: one about what alignment 
can and cannot do, the other about safety not being a property of the model. 
The entire blog post series is worth a read.

On robust mitigations against memorization: this survey paper provides a great 
overview of how to train AI models with DP. Depending on the use case, 
achieving a meaningful privacy notion can be very tricky: this paper discusses 
the specific complexities of natural language data, while this paper outlines 
the subtleties of using a combination of public and private data during AI 
training.

Acknowledgments
Thanks a ton to Alexander Knop, Amartya Sanyal, Gavin Brown, Joe Near, Liudas 
Panavas, Marika Swanberg, and Thomas Steinke for their excellent feedback on 
earlier versions of this post.


<https://desfontain.es/blog/privacy-in-ai.html>

Reply via email to