It seems to me that we are full circle back to a Turing Test. If the LLM 
encodes and demonstrates skill (they certainly do), and these skills can be 
progress a solution of some real-world problem, then it is just empty 
chauvinism to say they don’t understand a topic. 

From: Friam <[email protected]> on behalf of Steve Smith 
<[email protected]>
Date: Thursday, September 11, 2025 at 10:12 AM
To: [email protected] <[email protected]>
Subject: Re: [FRIAM] Hallucinations 

I find LLM engagement to be somewhere between that with a highly 
plausible gossip and a well researched survey paper in a subject I am 
interested in?

Where a given conversation lands in this interval almost exclusively 
seems to rely on my care in crafting my prompts.

I don't expect 'truth' out of either gossip or a survey paper... just 
'perspective'?

On 9/11/25 10:55 am, glen wrote:
> OK. You're right in principle. But we might want to think of this in 
> the context of all algorithms. For example, let's say you run a FFT on 
> a signal and it outputs some frequencies. Does the signal *actually* 
> contain or express those frequencies? Or is it just an inference that 
> we find reliable?
>
> The same is true of the LLM inferences. Whether one ascribes truth or 
> falsity to those inferences is only relevant to metaphysicians and 
> philosophers. What matters is how reliable the inferences are when we 
> do some task. Yelling at the kids on your lawn doesn't achieve 
> anything. It's better to go out there and talk to them. 8^D
>
>
> On 9/10/25 8:38 PM, Russ Abbott wrote:
>> Glen, I wish people would stop talking about whether LLM-generated 
>> sentences are true or false. The mechanisms LLMs employ to generate a 
>> sentence have nothing to do with whether the sentence turns out to be 
>> true or false. A sentence may have a higher probability of being true 
>> if the training data consisted entirely of true sentences. (Even 
>> that's not guaranteed; similar true sentences might have their 
>> components interchanged when used during generation.) But the point 
>> is: the transformer process has no connection to the validity of its 
>> output. If an LLM reliably generates true sentences, no credit is due 
>> to the transformer. If the training data consists entirely of 
>> true/false sentences, the generated output is more likely to be 
>> true/false. Output validity plays no role in how an LLM generates its 
>> output.
>>
>> Marcus, if an LLM is trained entirely on false statements, its 
>> "confidence" in its output will presumably be the same as it would be 
>> if it were trained entirely on true statements. Truthfulness is not a 
>> consideration in the generation process. Speaking of a need to reduce 
>> ambiguity suggests that the LLM understands the input and realizes it 
>> might have multiple meanings. But of course, LLMs don't understand 
>> anything, they don't realize anything, and they can't take meaning 
>> into consideration when generating output.
>>
>>
>>
>>
>>
>> On Tue, Sep 9, 2025 at 5:20 PM glen <[email protected] 
>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>
>> It's unfortunate jargon [⛧]. So it's nothing like whether an LLM 
>> is red (unless you adopt a jargonal definition of "red"). And your 
>> example is a great one for understanding how language fluency *is* at 
>> least somewhat correlated with fidelity. The statistical probability 
>> of the phrase "LLMs hallucinate" is >> 0, whereas the prob for the 
>> phrase "LLMs are red" is vanishingly small. It would be the same for 
>> black swans and Lewis Carroll writings *if* they weren't canonical 
>> teaching devices. It can't be that sophisticated if children think 
>> it's funny.
>>
>> But imagine all the woo out there where words like "entropy" or 
>> "entanglement" are used falsely. IDK for sure, but my guess is the 
>> false sentences outnumber the true ones by a lot. So the LLM has a 
>> high probability of forming false sentences.
>>
>> Of course, in that sense, if a physicist finds themselves talking 
>> to an expert in the "Law of Attraction" (e.g. the movie "The Secret") 
>> and makes scientifically true statements about entanglement, the guru 
>> may well judge them as false. So there's "true in context" (validity) 
>> and "ontologically true" (soundness). A sentence can be true in 
>> context but false in the world and vice versa, depending on who's in 
>> control of the reinforcement.
>>
>>
>> [⛧] We could discuss the strength of the analogy between human 
>> hallucination and LLM "hallucination", especially in the context of 
>> prediction coding. But we don't need to. Just consider it jargon and 
>> move on.
>>
>> On 9/9/25 4:37 PM, Russ Abbott wrote:
>> > Marcus, Glen,
>> >
>> > Your responses are much too sophisticated for me. Now that I'm 
>> retired (and, in truth, probably before as well), I tend to think in 
>> much simpler terms.
>> >
>> > My basic point was to express my surprise at realizing that it 
>> makes as much sense to ask whether an LLM hallucinates as it does to 
>> ask whether an LLM is red. It's a category mismatch--at least I now 
>> think so.
>> > _
>> > _
>> > __-- Russ <https://russabbott.substack.com/ 
>> <https://russabbott.substack.com/ <https://russabbott.substack.com/>>>
>> >
>> >
>> >
>> >
>> > On Tue, Sep 9, 2025 at 3:45 PM glen <[email protected] 
>> <mailto:[email protected] <mailto:[email protected]>> 
>> <mailto:[email protected] 
>> <mailto:[email protected] <mailto:[email protected]>>>> wrote:
>> >
>> > The question of whether fluency is (well) correlated to 
>> accuracy seems to assume something like mentalizing, the idea that 
>> there's a correspondence between minds mediated by a correspondence 
>> between the structure of the world and the structure of our 
>> minds/language. We've talked about the "interface theory of 
>> perception", where Hoffman (I think?) argues we're more likely to 
>> learn *false* things than we are true things. And we've argued about 
>> realism, pragmatism, prediction coding, and everything else under the 
>> sun on this list.
>> >
>> > So it doesn't surprise me if most people assume there will 
>> be more true statements in the corpus than false statements, at least 
>> in domains where there exists a common sense, where the laity *can* 
>> perceive the truth. In things like quantum mechanics or whatever, 
>> then all bets are off becuase there are probably more false sentences 
>> than true ones.
>> >
>> > If there are more true than false sentences in the corpus, 
>> then reinforcement methods like Marcus' only bear a small burden (in 
>> lay domains). The implicit fidelity does the lion's share. But in 
>> those domains where counter-intuitive facts dominate, the 
>> reinforcement does the most work.
>> >
>> >
>> > On 9/9/25 3:12 PM, Marcus Daniels wrote:
>> > > Three ways some to mind.. I would guess that OpenAI, 
>> Google, Anthropic, and xAI are far more sophisticated..
>> > >
>> > > 1. Add a softmax penalty to the loss that tracks 
>> non-factual statements or grammatical constraints. Cross entropy may 
>> not understand that some parts of content are more important than 
>> others.
>> > > 2. Change how the beam search works during inference 
>> to skip sequences that fail certain predicates – like a lookahead 
>> that says “Oh, I can’t say that..”
>> > > 3. Grade the output, either using human or non-LLM 
>> supervision, and re-train.
>> > >
>> > > *From:*Friam <[email protected] 
>> <mailto:[email protected] <mailto:[email protected]>> 
>> <mailto:[email protected] 
>> <mailto:[email protected] <mailto:[email protected]>>>> *On 
>> Behalf Of *Russ Abbott
>> > > *Sent:* Tuesday, September 9, 2025 3:03 PM
>> > > *To:* The Friday Morning Applied Complexity Coffee 
>> Group <[email protected] <mailto:[email protected] 
>> <mailto:[email protected]>> 
>> <mailto:[email protected] <mailto:[email protected] 
>> <mailto:[email protected]>>>>
>> > > *Subject:* [FRIAM] Hallucinations
>> > >
>> > > OpenAI just published a paper on hallucinations 
>> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
>>  
>> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
>>  
>> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf>>
>>  
>> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
>>  
>> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
>>  
>> <https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf>>>>
>>  as 
>> well as a post summarizing the paper 
>> <https://openai.com/index/why-language-models-hallucinate/ 
>> <https://openai.com/index/why-language-models-hallucinate/ 
>> <https://openai.com/index/why-language-models-hallucinate/>> 
>> <https://openai.com/index/why-language-models-hallucinate/ 
>> <https://openai.com/index/why-language-models-hallucinate/ 
>> <https://openai.com/index/why-language-models-hallucinate/>>>>. The 
>> two of them seem wrong-headed in such a simple and obvious way that 
>> I'm surprised the issue they discuss is still alive.
>> > >
>> > > The paper and post point out that LLMs are trained to 
>> generate fluent language--which they do extraordinarily well. The 
>> paper and post also point out that LLMs are not trained to 
>> distinguish valid from invalid statements. Given those facts about 
>> LLMs, it's not clear why one should expect LLMs to be able to 
>> distinguish true statements from false statements--and hence why one 
>> should expect to be able to prevent LLMs from hallucinating.
>> > >
>> > > In other words, LLMs are built to generate text; they 
>> are not built to understand the texts they generate and certainly not 
>> to be able to determine whether the texts they generate make 
>> factually correct or incorrect statements.
>> > >
>> > > Please see my post 
>> <https://russabbott.substack.com/p/why-language-models-hallucinate-according 
>> <https://russabbott.substack.com/p/why-language-models-hallucinate-according 
>> <https://russabbott.substack.com/p/why-language-models-hallucinate-according>>
>>  
>> <https://russabbott.substack.com/p/why-language-models-hallucinate-according 
>> <https://russabbott.substack.com/p/why-language-models-hallucinate-according 
>> <https://russabbott.substack.com/p/why-language-models-hallucinate-according>>>>
>>  
>> elaborating on this.
>> > >
>> > > Why is this not obvious, and why is OpenAI still 
>> talking about it?
>> > >
>> -- 
>
>

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam <https://bit.ly/virtualfriam>
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com 
<http://redfish.com/mailman/listinfo/friam_redfish.com>
FRIAM-COMIC http://friam-comic.blogspot.com/ <http://friam-comic.blogspot.com/>
archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/ 
<https://redfish.com/pipermail/friam_redfish.com/>
1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ 
<http://friam.383.s1.nabble.com/> 


Attachment: smime.p7s
Description: S/MIME cryptographic signature

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Reply via email to