On Sunday, June 18, 2023 at 11:06:30 PM UTC-4 tbp1...@gmail.com wrote:

Very thoughtful piece by Jon Udell - Why LLM-assisted table transformation 
is a big deal 
<https://blog.jonudell.net/2023/06/18/why-llm-assisted-table-transformation-is-a-big-deal/>
.

 
In my day job, I have to pull useful items out of PDFs  - pictures, text, 
tables. PDFs often make this difficult - because of password-protected 
access, and because the information that renders as neatly organized text 
and tables when printed or displayed in a viewer is not neatly organized - 
the data in the PDF requires rearrangement. Jon Udell's article mentions 
this without discussing the specifics of the articles he processes. 

It is true that tools like ChatGPT are trained on text and as such most 
likely to work on text, but they do not reason about non-text. I would 
argue that a PDF is non-text, and as such, recreating neatly organized text 
and tables is error-prone; if we really value the facts in a technical 
publication, we need to start with suitable source, which probably needs 
carefully done markup created by experts in the subject matter of the 
publication. 

I would not trust a complex table produced by ChatGPT, since it is not only 
not a subject matter expert, it cannot reason as a human being can when 
making sense of such a document. 

I don't know what to say about the extraordinary domain of software that 
produces those PDFs. How many of those software applications incorporate 
features meant to allow exploration of the structure of a document? This 
sounds to me like the sort of job for which Leo is well-equipped! 

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/78528da2-3174-4437-af37-2d763dc28bcfn%40googlegroups.com.

Reply via email to