It seems to me that we are full circle back to a Turing Test. If the LLM encodes and demonstrates skill (they certainly do), and these skills can be progress a solution of some real-world problem, then it is just empty chauvinism to say they don’t understand a topic.
From: Friam <[email protected]> on behalf of Steve Smith <[email protected]> Date: Thursday, September 11, 2025 at 10:12 AM To: [email protected] <[email protected]> Subject: Re: [FRIAM] Hallucinations I find LLM engagement to be somewhere between that with a highly plausible gossip and a well researched survey paper in a subject I am interested in? Where a given conversation lands in this interval almost exclusively seems to rely on my care in crafting my prompts. I don't expect 'truth' out of either gossip or a survey paper... just 'perspective'? On 9/11/25 10:55 am, glen wrote: > OK. You're right in principle. But we might want to think of this in > the context of all algorithms. For example, let's say you run a FFT on > a signal and it outputs some frequencies. Does the signal *actually* > contain or express those frequencies? Or is it just an inference that > we find reliable? > > The same is true of the LLM inferences. Whether one ascribes truth or > falsity to those inferences is only relevant to metaphysicians and > philosophers. What matters is how reliable the inferences are when we > do some task. Yelling at the kids on your lawn doesn't achieve > anything. It's better to go out there and talk to them. 8^D > > > On 9/10/25 8:38 PM, Russ Abbott wrote: >> Glen, I wish people would stop talking about whether LLM-generated >> sentences are true or false. The mechanisms LLMs employ to generate a >> sentence have nothing to do with whether the sentence turns out to be >> true or false. A sentence may have a higher probability of being true >> if the training data consisted entirely of true sentences. (Even >> that's not guaranteed; similar true sentences might have their >> components interchanged when used during generation.) But the point >> is: the transformer process has no connection to the validity of its >> output. If an LLM reliably generates true sentences, no credit is due >> to the transformer. If the training data consists entirely of >> true/false sentences, the generated output is more likely to be >> true/false. Output validity plays no role in how an LLM generates its >> output. >> >> Marcus, if an LLM is trained entirely on false statements, its >> "confidence" in its output will presumably be the same as it would be >> if it were trained entirely on true statements. Truthfulness is not a >> consideration in the generation process. Speaking of a need to reduce >> ambiguity suggests that the LLM understands the input and realizes it >> might have multiple meanings. But of course, LLMs don't understand >> anything, they don't realize anything, and they can't take meaning >> into consideration when generating output. >> >> >> >> >> >> On Tue, Sep 9, 2025 at 5:20 PM glen <[email protected] >> <mailto:[email protected] <mailto:[email protected]>>> wrote: >> >> It's unfortunate jargon [⛧]. So it's nothing like whether an LLM >> is red (unless you adopt a jargonal definition of "red"). And your >> example is a great one for understanding how language fluency *is* at >> least somewhat correlated with fidelity. The statistical probability >> of the phrase "LLMs hallucinate" is >> 0, whereas the prob for the >> phrase "LLMs are red" is vanishingly small. It would be the same for >> black swans and Lewis Carroll writings *if* they weren't canonical >> teaching devices. It can't be that sophisticated if children think >> it's funny. >> >> But imagine all the woo out there where words like "entropy" or >> "entanglement" are used falsely. IDK for sure, but my guess is the >> false sentences outnumber the true ones by a lot. So the LLM has a >> high probability of forming false sentences. >> >> Of course, in that sense, if a physicist finds themselves talking >> to an expert in the "Law of Attraction" (e.g. the movie "The Secret") >> and makes scientifically true statements about entanglement, the guru >> may well judge them as false. So there's "true in context" (validity) >> and "ontologically true" (soundness). A sentence can be true in >> context but false in the world and vice versa, depending on who's in >> control of the reinforcement. >> >> >> [⛧] We could discuss the strength of the analogy between human >> hallucination and LLM "hallucination", especially in the context of >> prediction coding. But we don't need to. Just consider it jargon and >> move on. >> >> On 9/9/25 4:37 PM, Russ Abbott wrote: >> > Marcus, Glen, >> > >> > Your responses are much too sophisticated for me. Now that I'm >> retired (and, in truth, probably before as well), I tend to think in >> much simpler terms. >> > >> > My basic point was to express my surprise at realizing that it >> makes as much sense to ask whether an LLM hallucinates as it does to >> ask whether an LLM is red. It's a category mismatch--at least I now >> think so. >> > _ >> > _ >> > __-- Russ <https://russabbott.substack.com/ >> <https://russabbott.substack.com/ <https://russabbott.substack.com/>>> >> > >> > >> > >> > >> > On Tue, Sep 9, 2025 at 3:45 PM glen <[email protected] >> <mailto:[email protected] <mailto:[email protected]>> >> <mailto:[email protected] >> <mailto:[email protected] <mailto:[email protected]>>>> wrote: >> > >> > The question of whether fluency is (well) correlated to >> accuracy seems to assume something like mentalizing, the idea that >> there's a correspondence between minds mediated by a correspondence >> between the structure of the world and the structure of our >> minds/language. We've talked about the "interface theory of >> perception", where Hoffman (I think?) argues we're more likely to >> learn *false* things than we are true things. And we've argued about >> realism, pragmatism, prediction coding, and everything else under the >> sun on this list. >> > >> > So it doesn't surprise me if most people assume there will >> be more true statements in the corpus than false statements, at least >> in domains where there exists a common sense, where the laity *can* >> perceive the truth. In things like quantum mechanics or whatever, >> then all bets are off becuase there are probably more false sentences >> than true ones. >> > >> > If there are more true than false sentences in the corpus, >> then reinforcement methods like Marcus' only bear a small burden (in >> lay domains). The implicit fidelity does the lion's share. But in >> those domains where counter-intuitive facts dominate, the >> reinforcement does the most work. >> > >> > >> > On 9/9/25 3:12 PM, Marcus Daniels wrote: >> > > Three ways some to mind.. I would guess that OpenAI, >> Google, Anthropic, and xAI are far more sophisticated.. >> > > >> > > 1. Add a softmax penalty to the loss that tracks >> non-factual statements or grammatical constraints. Cross entropy may >> not understand that some parts of content are more important than >> others. >> > > 2. Change how the beam search works during inference >> to skip sequences that fail certain predicates – like a lookahead >> that says “Oh, I can’t say that..” >> > > 3. Grade the output, either using human or non-LLM >> supervision, and re-train. >> > > >> > > *From:*Friam <[email protected] >> <mailto:[email protected] <mailto:[email protected]>> >> <mailto:[email protected] >> <mailto:[email protected] <mailto:[email protected]>>>> *On >> Behalf Of *Russ Abbott >> > > *Sent:* Tuesday, September 9, 2025 3:03 PM >> > > *To:* The Friday Morning Applied Complexity Coffee >> Group <[email protected] <mailto:[email protected] >> <mailto:[email protected]>> >> <mailto:[email protected] <mailto:[email protected] >> <mailto:[email protected]>>>> >> > > *Subject:* [FRIAM] Hallucinations >> > > >> > > OpenAI just published a paper on hallucinations >> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf >> >> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf >> >> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf>> >> >> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf >> >> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf >> >> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf>>>> >> as >> well as a post summarizing the paper >> <https://openai.com/index/why-language-models-hallucinate/ >> <https://openai.com/index/why-language-models-hallucinate/ >> <https://openai.com/index/why-language-models-hallucinate/>> >> <https://openai.com/index/why-language-models-hallucinate/ >> <https://openai.com/index/why-language-models-hallucinate/ >> <https://openai.com/index/why-language-models-hallucinate/>>>>. The >> two of them seem wrong-headed in such a simple and obvious way that >> I'm surprised the issue they discuss is still alive. >> > > >> > > The paper and post point out that LLMs are trained to >> generate fluent language--which they do extraordinarily well. The >> paper and post also point out that LLMs are not trained to >> distinguish valid from invalid statements. Given those facts about >> LLMs, it's not clear why one should expect LLMs to be able to >> distinguish true statements from false statements--and hence why one >> should expect to be able to prevent LLMs from hallucinating. >> > > >> > > In other words, LLMs are built to generate text; they >> are not built to understand the texts they generate and certainly not >> to be able to determine whether the texts they generate make >> factually correct or incorrect statements. >> > > >> > > Please see my post >> <https://russabbott.substack.com/p/why-language-models-hallucinate-according >> <https://russabbott.substack.com/p/why-language-models-hallucinate-according >> <https://russabbott.substack.com/p/why-language-models-hallucinate-according>> >> >> <https://russabbott.substack.com/p/why-language-models-hallucinate-according >> <https://russabbott.substack.com/p/why-language-models-hallucinate-according >> <https://russabbott.substack.com/p/why-language-models-hallucinate-according>>>> >> >> elaborating on this. >> > > >> > > Why is this not obvious, and why is OpenAI still >> talking about it? >> > > >> -- > > .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. FRIAM Applied Complexity Group listserv Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom https://bit.ly/virtualfriam <https://bit.ly/virtualfriam> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com <http://redfish.com/mailman/listinfo/friam_redfish.com> FRIAM-COMIC http://friam-comic.blogspot.com/ <http://friam-comic.blogspot.com/> archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/ <https://redfish.com/pipermail/friam_redfish.com/> 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ <http://friam.383.s1.nabble.com/>
smime.p7s
Description: S/MIME cryptographic signature
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. FRIAM Applied Complexity Group listserv Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom https://bit.ly/virtualfriam to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com FRIAM-COMIC http://friam-comic.blogspot.com/ archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/ 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/
