Even copying selected text out of a pdf file can be unpleasant. Often there will be no newlines, so words may run together when they were visually separated by a line break.
On Thursday, June 22, 2023 at 8:52:14 AM UTC-4 David Szent-Györgyi wrote: > On Sunday, June 18, 2023 at 11:06:30 PM UTC-4 tbp1...@gmail.com wrote: > > Very thoughtful piece by Jon Udell - Why LLM-assisted table > transformation is a big deal > <https://blog.jonudell.net/2023/06/18/why-llm-assisted-table-transformation-is-a-big-deal/> > . > > > In my day job, I have to pull useful items out of PDFs - pictures, text, > tables. PDFs often make this difficult - because of password-protected > access, and because the information that renders as neatly organized text > and tables when printed or displayed in a viewer is not neatly organized - > the data in the PDF requires rearrangement. Jon Udell's article mentions > this without discussing the specifics of the articles he processes. > > It is true that tools like ChatGPT are trained on text and as such most > likely to work on text, but they do not reason about non-text. I would > argue that a PDF is non-text, and as such, recreating neatly organized text > and tables is error-prone; if we really value the facts in a technical > publication, we need to start with suitable source, which probably needs > carefully done markup created by experts in the subject matter of the > publication. > > I would not trust a complex table produced by ChatGPT, since it is not > only not a subject matter expert, it cannot reason as a human being can > when making sense of such a document. > > I don't know what to say about the extraordinary domain of software that > produces those PDFs. How many of those software applications incorporate > features meant to allow exploration of the structure of a document? This > sounds to me like the sort of job for which Leo is well-equipped! > -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/leo-editor/acc2979a-b113-4cd1-8b52-283a5aa61e63n%40googlegroups.com.