Tom writes:

 

“As usual, I'm more interested in tracking the metadata and methods and process 
than the outcome.”

 

Similar interests.   A special concern here is that metadata is not provided in 
a meaningful way, thus the motive for AI reading, and a bottom-up approach.

 

 

From: Friam <[email protected]> On Behalf Of Tom Johnson
Sent: Wednesday, February 11, 2026 4:51 PM
To: The Friday Morning Applied Complexity Coffee Group <[email protected]>
Subject: Re: [FRIAM] Gauging interest..

 

  
<https://www.cloudhq-mkt23.us/mail_track/mail/28e0b94927c6f4ce5f_1770857441480?uid=226430>
 Steve Smith:
Thanks for your interest and taking the time to research and write.

 

Yes indeed, there's a mountain of relatively new digital tools for 
investigators around today. Take a careful look at what BellingCat has done and 
is doing:  https://www.bing.com/search?FORM=ARPSEC 
<https://www.cloudhq-mkt23.us/mail_track/link/28e0b94927c6f4ce5f_1770857441480?uid=226430&url=https%3A%2F%2Fwww.bing.com%2Fsearch%3FFORM%3DARPSEC%26PC%3DARPL%26PTAG%3D1318%26q%3Dbellingcat%2520toolkit>
 &PC=ARPL&PTAG=1318&q=bellingcat%20toolkit  And there's an ever increasing 
about of Open Data in our galaxy for investigators. But the Epstein Files 
<https://www.cloudhq-mkt23.us/mail_track/link/28e0b94927c6f4ce5f_1770857441480?uid=226430&url=https%3A%2F%2Fwww.justice.gov%2Fepstein>
  present, I think, a unique set of problems for investigators and the 
interested public. As I mentioned earlier, it seems that there were some people 
knowledge about data structure and categorization at the DOJ I am trying to 
track down the instructions and attorneys probably issued to the hundreds of 
people (all lawyers????) doing the redactions. For example, what were they told 
to redact in terms of images? Or if there are AV documents, how many and how 
were they to be edited.

 

As usual, I'm more interested in tracking the metadata and methods and process 
than the outcome. 

 

Tom

 

On Wed, Feb 11, 2026 at 12:29 PM cody dooderson <[email protected] 
<mailto:[email protected]> > wrote:

I am willing to donate some gpu time from my personal machine towards this 
project. Maybe by running some OCR?

>From what I understand, you will be setting up a "vibe" search for epstein 
>files?

 

 

_ Cody Smith _

[email protected] <mailto:[email protected]> 

 

 

On Fri, Feb 6, 2026 at 10:31 PM Roger Critchlow <[email protected] 
<mailto:[email protected]> > wrote:

There was an amusing revelation on another list of mine about PDF conversion.  
A blind user was complaining that the PDF manuals are useless for screen 
readers.  another reader on the list produced an html conversion of the pdf in 
a few hours,, It was done by Claude.  I was gobsmacked, having been thwarted by 
pdf documents many times before, but of course it's the perfect task for an LLM.

 

-- rec --

 

 

On Fri, Feb 6, 2026 at 7:19 AM Tom Johnson <[email protected] 
<mailto:[email protected]> > wrote:

Thanks, Marcus.

 

So you're saying there seems to be no consistency or standardization of 
methodology in the so-called review and release process. That is a story in 
itself. 

 

Also, having to convert PDF to Text is another time suck but necessary at some 
point.

 

Another approach would be to FOIA the DOJ for directives from whoTK to those 
assigned to do the reviews/redactions. Of course given that Trump has shut down 
many of the offices that responded to FOIAs, it's unlikely we would see those 
documents in our life time.

 

Onward,

Tom

(TK means "to come" in journalize)

=======================
Tom Johnson
Inst. for Analytic Journalism
Santa Fe, New Mexico
505-577-6482
=======================

 

On Fri, Feb 6, 2026, 1:36 AM Marcus Daniels <[email protected] 
<mailto:[email protected]> > wrote:

So.. The early tranches were the FBI searches of the properties.   Then there 
were a bunch of personal photographs of Epstein and Maxwell on their travels 
with various famous people. Amusingly, faces some folks on this list would 
recognize.   (Read 2and3.md if so inclined and look-up Maxwell’s recent proffer 
to Blanche.)

The early volume was modest enough in the early sets that I could push a lot 
through Claude, even images.  Summaries attached of that.

The new documents vary a lot in size.  There are examples of subpoenaed e-mail 
accounts that go on and on for hundreds of pages, but also singled isolated 
e-mails.   There’s an unusually large volume on investigating Epstein’s demise 
in prison.   Overall, it is mostly PDF format, and it often the case that text 
can be extracted, e.g., using pdftotext.   It’s just the DOJ convention to use 
PDF. It doesn’t mean they are composed documents.

 

I’ve been focused on “Dataset 9” as that one is large, and the DOJ failed (or 
refused?) to make zip file that would be easy to download. This dataset gives 
more insight into Epstein’s contemptible personality.  There are many 
emotionally manipulative e-mails to some of his more independent young female 
associates.   I haven’t worked with the new data systematically yet, just spot 
checking the download from time to time.   I feel guilty wasting GPU cycles and 
energy on traumatizing a perfectly good AI on this stuff.

 

The file numbering has become sparse in the later datasets.   In the early 
batches, that occurred when Donald Trump was in a picture.  Just sayin.

 

Marcus

From: Friam <[email protected] <mailto:[email protected]> > on 
behalf of Tom Johnson <[email protected] <mailto:[email protected]> >
Date: Thursday, February 5, 2026 at 9:38 PM
To: The Friday Morning Applied Complexity Coffee Group <[email protected] 
<mailto:[email protected]> >
Subject: Re: [FRIAM] Gauging interest..

Marcus--

Congrats and many thanks for harvesting this whole crop and keeping it in 
various grain bins.

 

Quick questions:

 

The DOJ, on multiple occasions, has talked about various numbers of pages. How 
many "pages" do you think you have? Are they all standard 8.5x11 pages? All 
PDF? If so, searchable PDF?

Do the various batches released come with any kind of title page, index? 
Glossary?

 

Are the pages/documents in any chronological order or any categorical order? 

 

Do you think we could do a word count vs. lines (each containing an 
words-per-line estimate) redacted? (i.e a story reporting X percent of the 
documents still hidden or useless).

 

I'm sure I can bug you for more.

Tom

 

=======================
Tom Johnson
Inst. for Analytic Journalism
Santa Fe, New Mexico
505-577-6482
=======================

 

On Thu, Feb 5, 2026, 10:37 PM Marcus Daniels <[email protected] 
<mailto:[email protected]> > wrote:

I’m closing-in on a full download of Dataset 9 of the Epstein Transparency Act. 
 (I have the rest.)   I’m thinking of building a vector database (e.g. pgvector 
for Postgres).   I was thinking of wrapping a MCP server around it so LLMs can 
get a directory of articles and then summarize, or cross-reference sets of 
them.   RAG is what Perplexity does, but apparently, they don’t have the 
content yet. 

 

I imagine a SETI-at-home type project to reduce the data.  Another analogy that 
comes to mind is annotations of the genome: Line all the documents up and then 
slowly fill in the summaries.   The vector database could help inform how to 
combine documents for consumption within context window limits (PCA vicinity).  

 

I could keep my Max subscription on it and make some progress, but really such 
a project needs tens or hundreds of workers.

 

Marcus

 

 

 

 

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/




 

-- 

++++++++++++++++++++++++++++
Tom Johnson - [email protected] <mailto:[email protected]> 
+1 505 577 6482
Santa Fe, New Mexico USA
New Mexico Writers <https://nmwriters.org/> 
++++++++++++++++++++++++++++

Attachment: smime.p7s
Description: S/MIME cryptographic signature

.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Reply via email to