I am willing to donate some gpu time from my personal machine towards this
project. Maybe by running some OCR?
>From what I understand, you will be setting up a "vibe" search for epstein
files?


_ Cody Smith _
[email protected]


On Fri, Feb 6, 2026 at 10:31 PM Roger Critchlow <[email protected]> wrote:

> There was an amusing revelation on another list of mine about PDF
> conversion.  A blind user was complaining that the PDF manuals are useless
> for screen readers.  another reader on the list produced an html conversion
> of the pdf in a few hours,, It was done by Claude.  I was gobsmacked,
> having been thwarted by pdf documents many times before, but of course it's
> the perfect task for an LLM.
>
> -- rec --
>
>
> On Fri, Feb 6, 2026 at 7:19 AM Tom Johnson <[email protected]> wrote:
>
>> Thanks, Marcus.
>>
>> So you're saying there seems to be no consistency or standardization of
>> methodology in the so-called review and release process. That is a story in
>> itself.
>>
>> Also, having to convert PDF to Text is another time suck but necessary at
>> some point.
>>
>> Another approach would be to FOIA the DOJ for directives from whoTK to
>> those assigned to do the reviews/redactions. Of course given that Trump has
>> shut down many of the offices that responded to FOIAs, it's unlikely we
>> would see those documents in our life time.
>>
>> Onward,
>> Tom
>> (TK means "to come" in journalize)
>> =======================
>> Tom Johnson
>> Inst. for Analytic Journalism
>> Santa Fe, New Mexico
>> 505-577-6482
>> =======================
>>
>> On Fri, Feb 6, 2026, 1:36 AM Marcus Daniels <[email protected]> wrote:
>>
>>> So.. The early tranches were the FBI searches of the properties.   Then
>>> there were a bunch of personal photographs of Epstein and Maxwell on their
>>> travels with various famous people. Amusingly, faces some folks on this
>>> list would recognize.   (Read 2and3.md if so inclined and look-up Maxwell’s
>>> recent proffer to Blanche.)
>>>
>>> The early volume was modest enough in the early sets that I could push a
>>> lot through Claude, even images.  Summaries attached of that.
>>>
>>> The new documents vary a lot in size.  There are examples of subpoenaed
>>> e-mail accounts that go on and on for hundreds of pages, but also singled
>>> isolated e-mails.   There’s an unusually large volume on investigating
>>> Epstein’s demise in prison.   Overall, it is mostly PDF format, and it
>>> often the case that text can be extracted, e.g., using pdftotext.   It’s
>>> just the DOJ convention to use PDF. It doesn’t mean they are composed
>>> documents.
>>>
>>>
>>>
>>> I’ve been focused on “Dataset 9” as that one is large, and the DOJ
>>> failed (or refused?) to make zip file that would be easy to download. This
>>> dataset gives more insight into Epstein’s contemptible personality.  There
>>> are many emotionally manipulative e-mails to some of his more independent
>>> young female associates.   I haven’t worked with the new data
>>> systematically yet, just spot checking the download from time to time.   I
>>> feel guilty wasting GPU cycles and energy on traumatizing a perfectly good
>>> AI on this stuff.
>>>
>>>
>>>
>>> The file numbering has become sparse in the later datasets.   In the
>>> early batches, that occurred when Donald Trump was in a picture.  Just
>>> sayin.
>>>
>>>
>>>
>>> Marcus
>>>
>>> *From: *Friam <[email protected]> on behalf of Tom Johnson <
>>> [email protected]>
>>> *Date: *Thursday, February 5, 2026 at 9:38 PM
>>> *To: *The Friday Morning Applied Complexity Coffee Group <
>>> [email protected]>
>>> *Subject: *Re: [FRIAM] Gauging interest..
>>>
>>> Marcus--
>>>
>>> Congrats and many thanks for harvesting this whole crop and keeping it
>>> in various grain bins.
>>>
>>>
>>>
>>> Quick questions:
>>>
>>>
>>>
>>> The DOJ, on multiple occasions, has talked about various numbers of
>>> pages. How many "pages" do you think you have? Are they all standard 8.5x11
>>> pages? All PDF? If so, searchable PDF?
>>>
>>> Do the various batches released come with any kind of title page, index?
>>> Glossary?
>>>
>>>
>>>
>>> Are the pages/documents in any chronological order or any categorical
>>> order?
>>>
>>>
>>>
>>> Do you think we could do a word count vs. lines (each containing an
>>> words-per-line estimate) redacted? (i.e a story reporting X percent of the
>>> documents still hidden or useless).
>>>
>>>
>>>
>>> I'm sure I can bug you for more.
>>>
>>> Tom
>>>
>>>
>>>
>>> =======================
>>> Tom Johnson
>>> Inst. for Analytic Journalism
>>> Santa Fe, New Mexico
>>> 505-577-6482
>>> =======================
>>>
>>>
>>>
>>> On Thu, Feb 5, 2026, 10:37 PM Marcus Daniels <[email protected]>
>>> wrote:
>>>
>>> I’m closing-in on a full download of Dataset 9 of the Epstein
>>> Transparency Act.  (I have the rest.)   I’m thinking of building a vector
>>> database (e.g. pgvector for Postgres).   I was thinking of wrapping a MCP
>>> server around it so LLMs can get a directory of articles and then
>>> summarize, or cross-reference sets of them.   RAG is what Perplexity does,
>>> but apparently, they don’t have the content yet.
>>>
>>>
>>>
>>> I imagine a SETI-at-home type project to reduce the data.  Another
>>> analogy that comes to mind is annotations of the genome: Line all the
>>> documents up and then slowly fill in the summaries.   The vector database
>>> could help inform how to combine documents for consumption within context
>>> window limits (PCA vicinity).
>>>
>>>
>>>
>>> I could keep my Max subscription on it and make some progress, but
>>> really such a project needs tens or hundreds of workers.
>>>
>>>
>>>
>>> Marcus
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --.
>>> / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
>>> FRIAM Applied Complexity Group listserv
>>> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
>>> https://bit.ly/virtualfriam
>>> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
>>> FRIAM-COMIC http://friam-comic.blogspot.com/
>>> archives:  5/2017 thru present
>>> https://redfish.com/pipermail/friam_redfish.com/
>>>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>>>
>>> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --.
>>> / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
>>> FRIAM Applied Complexity Group listserv
>>> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
>>> https://bit.ly/virtualfriam
>>> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
>>> FRIAM-COMIC http://friam-comic.blogspot.com/
>>> archives:  5/2017 thru present
>>> https://redfish.com/pipermail/friam_redfish.com/
>>>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>>>
>> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. /
>> ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
>> FRIAM Applied Complexity Group listserv
>> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
>> https://bit.ly/virtualfriam
>> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
>> FRIAM-COMIC http://friam-comic.blogspot.com/
>> archives:  5/2017 thru present
>> https://redfish.com/pipermail/friam_redfish.com/
>>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>>
> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. /
> ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Reply via email to