Re: ChatGPT Helpful In Translating Tables

2023-06-22 Thread Thomas Passin
Even copying selected text out of a pdf file can be unpleasant.  Often 
there will be no newlines, so words may run together when they were 
visually separated by a line break.

On Thursday, June 22, 2023 at 8:52:14 AM UTC-4 David Szent-Györgyi wrote:

> On Sunday, June 18, 2023 at 11:06:30 PM UTC-4 tbp1...@gmail.com wrote:
>
> Very thoughtful piece by Jon Udell - Why LLM-assisted table 
> transformation is a big deal 
> 
> .
>
>  
> In my day job, I have to pull useful items out of PDFs  - pictures, text, 
> tables. PDFs often make this difficult - because of password-protected 
> access, and because the information that renders as neatly organized text 
> and tables when printed or displayed in a viewer is not neatly organized - 
> the data in the PDF requires rearrangement. Jon Udell's article mentions 
> this without discussing the specifics of the articles he processes. 
>
> It is true that tools like ChatGPT are trained on text and as such most 
> likely to work on text, but they do not reason about non-text. I would 
> argue that a PDF is non-text, and as such, recreating neatly organized text 
> and tables is error-prone; if we really value the facts in a technical 
> publication, we need to start with suitable source, which probably needs 
> carefully done markup created by experts in the subject matter of the 
> publication. 
>
> I would not trust a complex table produced by ChatGPT, since it is not 
> only not a subject matter expert, it cannot reason as a human being can 
> when making sense of such a document. 
>
> I don't know what to say about the extraordinary domain of software that 
> produces those PDFs. How many of those software applications incorporate 
> features meant to allow exploration of the structure of a document? This 
> sounds to me like the sort of job for which Leo is well-equipped! 
>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/acc2979a-b113-4cd1-8b52-283a5aa61e63n%40googlegroups.com.


Re: ChatGPT Helpful In Translating Tables

2023-06-22 Thread David Szent-Györgyi
On Sunday, June 18, 2023 at 11:06:30 PM UTC-4 tbp1...@gmail.com wrote:

Very thoughtful piece by Jon Udell - Why LLM-assisted table transformation 
is a big deal 

.

 
In my day job, I have to pull useful items out of PDFs  - pictures, text, 
tables. PDFs often make this difficult - because of password-protected 
access, and because the information that renders as neatly organized text 
and tables when printed or displayed in a viewer is not neatly organized - 
the data in the PDF requires rearrangement. Jon Udell's article mentions 
this without discussing the specifics of the articles he processes. 

It is true that tools like ChatGPT are trained on text and as such most 
likely to work on text, but they do not reason about non-text. I would 
argue that a PDF is non-text, and as such, recreating neatly organized text 
and tables is error-prone; if we really value the facts in a technical 
publication, we need to start with suitable source, which probably needs 
carefully done markup created by experts in the subject matter of the 
publication. 

I would not trust a complex table produced by ChatGPT, since it is not only 
not a subject matter expert, it cannot reason as a human being can when 
making sense of such a document. 

I don't know what to say about the extraordinary domain of software that 
produces those PDFs. How many of those software applications incorporate 
features meant to allow exploration of the structure of a document? This 
sounds to me like the sort of job for which Leo is well-equipped! 

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/78528da2-3174-4437-af37-2d763dc28bcfn%40googlegroups.com.


ENB: Test Driven Study (TDS)

2023-06-22 Thread Edward K. Ream
This Engineering notebook post records Ahas that arose in confronting 
#3181 . These Ahas 
may seem small, but there is no such thing as a small Aha!

*Aha:* Start with the unit test.

Reading the code of *g.findUNL* put me to sleep. g.findUNL is part of an 
inherently complex ecosystem. 

Improving  *test_g_findUnl* woke me up. This unit test now contains smaller 
sub-tests of *all *the possible UNLs that can arise from error messages. 
These sub-tests allowed me to focus on more manageable issues.

*Aha*: The unit test should test all expected error messages.


Each tool (flake8, mypy, pyflakes, pylint, and python) uses a different 
format for its error message. Tests should cover each.


Similarly, the recently-improved unit test for *LM.scanOptions* tests all 
variations of Leo's command-line options, both valid and invalid.


*Aha:* All error messages (regardless of the tool that generates the test) 
contain the full *absolute* path to the erroneous file.


*Summary*


- Working on a unit test is a great way to get into action.

- TDD works for study as well as development and testing.

  Let's call this approach *TDS: test-driven study*.

- 100% code coverage is not always enough.

  Tests should cover all options.


Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/b4995963-ba92-4214-95c0-0595f10df916n%40googlegroups.com.


Re: leojs alpha

2023-06-22 Thread Edward K. Ream
> ...So anywhere from a week or two, or a month or two, hard to say, but 
it's going to be this summer! :D

Assuming vs-code allows it, I encourage you to release an alpha version 
asap. There is nothing wrong with a list of known bugs.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/e6601fef-780d-46c7-821e-3715836db4f6n%40googlegroups.com.


Re: leojs alpha

2023-06-22 Thread Edward K. Ream
On Wednesday, June 21, 2023 at 11:40:46 PM UTC-5 Félix wrote:

After coding for a few years, I just spent a few minutes tonight playing 
around with a 'somewhat working' leojs...!

... 

...So anywhere from a week or two, or a month or two, hard to say, but it's 
going to be this summer! :D


The first release of leoJS will be an important milestone:

- leoJS will integrate with vscode more smoothly.
- leoJS fully transliterates Leo's core into typescript.

Thanks to all who supported me, (or just gave feedback and suggestion) It 
really made a difference and motivated me!


You're welcome. And many thanks for *your* tireless work :-)

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/6f4b7d49-2b2a-472c-a5ac-29d6e39cf91en%40googlegroups.com.