Thanks, Marcus.

So you're saying there seems to be no consistency or standardization of
methodology in the so-called review and release process. That is a story in
itself.

Also, having to convert PDF to Text is another time suck but necessary at
some point.

Another approach would be to FOIA the DOJ for directives from whoTK to
those assigned to do the reviews/redactions. Of course given that Trump has
shut down many of the offices that responded to FOIAs, it's unlikely we
would see those documents in our life time.

Onward,
Tom
(TK means "to come" in journalize)
=======================
Tom Johnson
Inst. for Analytic Journalism
Santa Fe, New Mexico
505-577-6482
=======================

On Fri, Feb 6, 2026, 1:36 AM Marcus Daniels <[email protected]> wrote:

> So.. The early tranches were the FBI searches of the properties.   Then
> there were a bunch of personal photographs of Epstein and Maxwell on their
> travels with various famous people. Amusingly, faces some folks on this
> list would recognize.   (Read 2and3.md if so inclined and look-up Maxwell’s
> recent proffer to Blanche.)
>
> The early volume was modest enough in the early sets that I could push a
> lot through Claude, even images.  Summaries attached of that.
>
> The new documents vary a lot in size.  There are examples of subpoenaed
> e-mail accounts that go on and on for hundreds of pages, but also singled
> isolated e-mails.   There’s an unusually large volume on investigating
> Epstein’s demise in prison.   Overall, it is mostly PDF format, and it
> often the case that text can be extracted, e.g., using pdftotext.   It’s
> just the DOJ convention to use PDF. It doesn’t mean they are composed
> documents.
>
>
>
> I’ve been focused on “Dataset 9” as that one is large, and the DOJ failed
> (or refused?) to make zip file that would be easy to download. This dataset
> gives more insight into Epstein’s contemptible personality.  There are many
> emotionally manipulative e-mails to some of his more independent young
> female associates.   I haven’t worked with the new data systematically yet,
> just spot checking the download from time to time.   I feel guilty wasting
> GPU cycles and energy on traumatizing a perfectly good AI on this stuff.
>
>
>
> The file numbering has become sparse in the later datasets.   In the early
> batches, that occurred when Donald Trump was in a picture.  Just sayin.
>
>
>
> Marcus
>
> *From: *Friam <[email protected]> on behalf of Tom Johnson <
> [email protected]>
> *Date: *Thursday, February 5, 2026 at 9:38 PM
> *To: *The Friday Morning Applied Complexity Coffee Group <
> [email protected]>
> *Subject: *Re: [FRIAM] Gauging interest..
>
> Marcus--
>
> Congrats and many thanks for harvesting this whole crop and keeping it in
> various grain bins.
>
>
>
> Quick questions:
>
>
>
> The DOJ, on multiple occasions, has talked about various numbers of pages.
> How many "pages" do you think you have? Are they all standard 8.5x11 pages?
> All PDF? If so, searchable PDF?
>
> Do the various batches released come with any kind of title page, index?
> Glossary?
>
>
>
> Are the pages/documents in any chronological order or any categorical
> order?
>
>
>
> Do you think we could do a word count vs. lines (each containing an
> words-per-line estimate) redacted? (i.e a story reporting X percent of the
> documents still hidden or useless).
>
>
>
> I'm sure I can bug you for more.
>
> Tom
>
>
>
> =======================
> Tom Johnson
> Inst. for Analytic Journalism
> Santa Fe, New Mexico
> 505-577-6482
> =======================
>
>
>
> On Thu, Feb 5, 2026, 10:37 PM Marcus Daniels <[email protected]> wrote:
>
> I’m closing-in on a full download of Dataset 9 of the Epstein Transparency
> Act.  (I have the rest.)   I’m thinking of building a vector database (e.g.
> pgvector for Postgres).   I was thinking of wrapping a MCP server around it
> so LLMs can get a directory of articles and then summarize, or
> cross-reference sets of them.   RAG is what Perplexity does, but
> apparently, they don’t have the content yet.
>
>
>
> I imagine a SETI-at-home type project to reduce the data.  Another analogy
> that comes to mind is annotations of the genome: Line all the documents up
> and then slowly fill in the summaries.   The vector database could help
> inform how to combine documents for consumption within context window
> limits (PCA vicinity).
>
>
>
> I could keep my Max subscription on it and make some progress, but really
> such a project needs tens or hundreds of workers.
>
>
>
> Marcus
>
>
>
>
>
>
>
>
>
> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. /
> ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>
> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. /
> ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-..
> FRIAM Applied Complexity Group listserv
> Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom
> https://bit.ly/virtualfriam
> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/
> archives:  5/2017 thru present
> https://redfish.com/pipermail/friam_redfish.com/
>   1/2003 thru 6/2021  http://friam.383.s1.nabble.com/
>
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... 
--- -- . / .- .-. . / ..- ... . ..-. ..- .-..
FRIAM Applied Complexity Group listserv
Fridays 9a-12p Friday St. Johns Cafe   /   Thursdays 9a-12p Zoom 
https://bit.ly/virtualfriam
to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com
FRIAM-COMIC http://friam-comic.blogspot.com/
archives:  5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/
  1/2003 thru 6/2021  http://friam.383.s1.nabble.com/

Reply via email to