There was an amusing revelation on another list of mine about PDF
conversion.  A blind user was complaining that the PDF manuals are useless
for screen readers.  another reader on the list produced an html conversion
of the pdf in a few hours,, It was done by Claude.  I was gobsmacked,
having been thwarted by pdf documents many times before, but of course it's
the perfect task for an LLM.

-- rec --


On Fri, Feb 6, 2026 at 7:19 AM Tom Johnson <[email protected]> wrote:

> Thanks, Marcus.
>
> So you're saying there seems to be no consistency or standardization of
> methodology in the so-called review and release process. That is a story in
> itself.
>
> Also, having to convert PDF to Text is another time suck but necessary at
> some point.
>
> Another approach would be to FOIA the DOJ for directives from whoTK to
> those assigned to do the reviews/redactions. Of course given that Trump has
> shut down many of the offices that responded to FOIAs, it's unlikely we
> would see those documents in our life time.
>
> Onward,
> Tom
> (TK means "to come" in journalize)
> =======================
> Tom Johnson
> Inst. for Analytic Journalism
> Santa Fe, New Mexico
> 505-577-6482
> =======================
>
> On Fri, Feb 6, 2026, 1:36 AM Marcus Daniels <[email protected]> wrote:
>
>> So.. The early tranches were the FBI searches of the properties.   Then
>> there were a bunch of personal photographs of Epstein and Maxwell on their
>> travels with various famous people. Amusingly, faces some folks on this
>> list would recognize.   (Read 2and3.md if so inclined and look-up Maxwell’s
>> recent proffer to Blanche.)
>>
>> The early volume was modest enough in the early sets that I could push a
>> lot through Claude, even images.  Summaries attached of that.
>>
>> The new documents vary a lot in size.  There are examples of subpoenaed
>> e-mail accounts that go on and on for hundreds of pages, but also singled
>> isolated e-mails.   There’s an unusually large volume on investigating
>> Epstein’s demise in prison.   Overall, it is mostly PDF format, and it
>> often the case that text can be extracted, e.g., using pdftotext.   It’s
>> just the DOJ convention to use PDF. It doesn’t mean they are composed
>> documents.
>>
>>
>>
>> I’ve been focused on “Dataset 9” as that one is large, and the DOJ failed
>> (or refused?) to make zip file that would be easy to download. This dataset
>> gives more insight into Epstein’s contemptible personality.  There are many
>> emotionally manipulative e-mails to some of his more independent young
>> female associates.   I haven’t worked with the new data systematically yet,
>> just spot checking the download from time to time.   I feel guilty wasting
>> GPU cycles and energy on traumatizing a perfectly good AI on this stuff.
>>
>>
>>
>> The file numbering has become sparse in the later datasets.   In the
>> early batches, that occurred when Donald Trump was in a picture.  Just
>> sayin.
>>
>>
>>
>> Marcus
>>
>> *From: *Friam <[email protected]> on behalf of Tom Johnson <
>> [email protected]>
>> *Date: *Thursday, February 5, 2026 at 9:38 PM
>> *To: *The Friday Morning Applied Complexity Coffee Group <
>> [email protected]>
>> *Subject: *Re: [FRIAM] Gauging interest..
>>
>> Marcus--
>>
>> Congrats and many thanks for harvesting this whole crop and keeping it in
>> various grain bins.
>>
>>
>>
>> Quick questions:
>>
>>
>>
>> The DOJ, on multiple occasions, has talked about various numbers of
>> pages. How many "pages" do you think you have? Are they all standard 8.5x11
>> pages? All PDF? If so, searchable PDF?
>>
>> Do the various batches released come with any kind of title page, index?
>> Glossary?
>>
>>
>>
>> Are the pages/documents in any chronological order or any categorical
>> order?
>>
>>
>>
>> Do you think we could do a word count vs. lines (each containing an
>> words-per-line estimate) redacted? (i.e a story reporting X percent of the
>> documents still hidden or useless).
>>
>>
>>
>> I'm sure I can bug you for more.
>>
>> Tom
>>
>>
>>
>> =======================
>> Tom Johnson
>> Inst. for Analytic Journalism
>> Santa Fe, New Mexico
>> 505-577-6482
>> =======================
>>
>>
>>
>> On Thu, Feb 5, 2026, 10:37 PM Marcus Daniels <[email protected]>
>> wrote:
>>
>> I’m closing-in on a full download of Dataset 9 of the Epstein
>> Transparency Act.  (I have the rest.)   I’m thinking of building a vector
>> database (e.g. pgvector for Postgres).   I was thinking of wrapping a MCP
>> server around it so LLMs can get a directory of articles and then
>> summarize, or cross-reference sets of them.   RAG is what Perplexity does,
>> but apparently, they don’t have the content yet.
>>
>>
>>
>> I imagine a SETI-at-home type project to reduce the data.  Another
>> analogy that comes to mind is annotations of the genome: Line all the
>> documents up and then slowly fill in the summaries.   The vector database
>> could help inform how to combine documents for consumption within context
>> window limits (PCA vicinity).
>>
>>
>>
>> I could keep my Max subscription on it and make some progress, but really
>> such a project needs tens or hundreds of workers.
>>
>>
>>
>> Marcus
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. /
>> ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
>> FRIAM Applied Complexity Group listserv
>> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
>> https://bit.ly/virtualfriam
>> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
>> FRIAM-COMIC http://friam-comic.blogspot.com/
>> archives:  5/2017 thru present
>> https://redfish.com/pipermail/friam_redfish.com/
>>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>>
>> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. /
>> ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
>> FRIAM Applied Complexity Group listserv
>> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
>> https://bit.ly/virtualfriam
>> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
>> FRIAM-COMIC http://friam-comic.blogspot.com/
>> archives:  5/2017 thru present
>> https://redfish.com/pipermail/friam_redfish.com/
>>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>>
> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. /
> ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Reply via email to