I am willing to donate some gpu time from my personal machine towards this project. Maybe by running some OCR? >From what I understand, you will be setting up a "vibe" search for epstein files?
_ Cody Smith _ [email protected] On Fri, Feb 6, 2026 at 10:31 PM Roger Critchlow <[email protected]> wrote: > There was an amusing revelation on another list of mine about PDF > conversion. A blind user was complaining that the PDF manuals are useless > for screen readers. another reader on the list produced an html conversion > of the pdf in a few hours,, It was done by Claude. I was gobsmacked, > having been thwarted by pdf documents many times before, but of course it's > the perfect task for an LLM. > > -- rec -- > > > On Fri, Feb 6, 2026 at 7:19 AM Tom Johnson <[email protected]> wrote: > >> Thanks, Marcus. >> >> So you're saying there seems to be no consistency or standardization of >> methodology in the so-called review and release process. That is a story in >> itself. >> >> Also, having to convert PDF to Text is another time suck but necessary at >> some point. >> >> Another approach would be to FOIA the DOJ for directives from whoTK to >> those assigned to do the reviews/redactions. Of course given that Trump has >> shut down many of the offices that responded to FOIAs, it's unlikely we >> would see those documents in our life time. >> >> Onward, >> Tom >> (TK means "to come" in journalize) >> ======================= >> Tom Johnson >> Inst. for Analytic Journalism >> Santa Fe, New Mexico >> 505-577-6482 >> ======================= >> >> On Fri, Feb 6, 2026, 1:36 AM Marcus Daniels <[email protected]> wrote: >> >>> So.. The early tranches were the FBI searches of the properties. Then >>> there were a bunch of personal photographs of Epstein and Maxwell on their >>> travels with various famous people. Amusingly, faces some folks on this >>> list would recognize. (Read 2and3.md if so inclined and look-up Maxwell’s >>> recent proffer to Blanche.) >>> >>> The early volume was modest enough in the early sets that I could push a >>> lot through Claude, even images. Summaries attached of that. >>> >>> The new documents vary a lot in size. There are examples of subpoenaed >>> e-mail accounts that go on and on for hundreds of pages, but also singled >>> isolated e-mails. There’s an unusually large volume on investigating >>> Epstein’s demise in prison. Overall, it is mostly PDF format, and it >>> often the case that text can be extracted, e.g., using pdftotext. It’s >>> just the DOJ convention to use PDF. It doesn’t mean they are composed >>> documents. >>> >>> >>> >>> I’ve been focused on “Dataset 9” as that one is large, and the DOJ >>> failed (or refused?) to make zip file that would be easy to download. This >>> dataset gives more insight into Epstein’s contemptible personality. There >>> are many emotionally manipulative e-mails to some of his more independent >>> young female associates. I haven’t worked with the new data >>> systematically yet, just spot checking the download from time to time. I >>> feel guilty wasting GPU cycles and energy on traumatizing a perfectly good >>> AI on this stuff. >>> >>> >>> >>> The file numbering has become sparse in the later datasets. In the >>> early batches, that occurred when Donald Trump was in a picture. Just >>> sayin. >>> >>> >>> >>> Marcus >>> >>> *From: *Friam <[email protected]> on behalf of Tom Johnson < >>> [email protected]> >>> *Date: *Thursday, February 5, 2026 at 9:38 PM >>> *To: *The Friday Morning Applied Complexity Coffee Group < >>> [email protected]> >>> *Subject: *Re: [FRIAM] Gauging interest.. >>> >>> Marcus-- >>> >>> Congrats and many thanks for harvesting this whole crop and keeping it >>> in various grain bins. >>> >>> >>> >>> Quick questions: >>> >>> >>> >>> The DOJ, on multiple occasions, has talked about various numbers of >>> pages. How many "pages" do you think you have? Are they all standard 8.5x11 >>> pages? All PDF? If so, searchable PDF? >>> >>> Do the various batches released come with any kind of title page, index? >>> Glossary? >>> >>> >>> >>> Are the pages/documents in any chronological order or any categorical >>> order? >>> >>> >>> >>> Do you think we could do a word count vs. lines (each containing an >>> words-per-line estimate) redacted? (i.e a story reporting X percent of the >>> documents still hidden or useless). >>> >>> >>> >>> I'm sure I can bug you for more. >>> >>> Tom >>> >>> >>> >>> ======================= >>> Tom Johnson >>> Inst. for Analytic Journalism >>> Santa Fe, New Mexico >>> 505-577-6482 >>> ======================= >>> >>> >>> >>> On Thu, Feb 5, 2026, 10:37 PM Marcus Daniels <[email protected]> >>> wrote: >>> >>> I’m closing-in on a full download of Dataset 9 of the Epstein >>> Transparency Act. (I have the rest.) I’m thinking of building a vector >>> database (e.g. pgvector for Postgres). I was thinking of wrapping a MCP >>> server around it so LLMs can get a directory of articles and then >>> summarize, or cross-reference sets of them. RAG is what Perplexity does, >>> but apparently, they don’t have the content yet. >>> >>> >>> >>> I imagine a SETI-at-home type project to reduce the data. Another >>> analogy that comes to mind is annotations of the genome: Line all the >>> documents up and then slowly fill in the summaries. The vector database >>> could help inform how to combine documents for consumption within context >>> window limits (PCA vicinity). >>> >>> >>> >>> I could keep my Max subscription on it and make some progress, but >>> really such a project needs tens or hundreds of workers. >>> >>> >>> >>> Marcus >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. >>> / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. >>> FRIAM Applied Complexity Group listserv >>> Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom >>> https://bit.ly/virtualfriam >>> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com >>> FRIAM-COMIC http://friam-comic.blogspot.com/ >>> archives: 5/2017 thru present >>> https://redfish.com/pipermail/friam_redfish.com/ >>> 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ >>> >>> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. >>> / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. >>> FRIAM Applied Complexity Group listserv >>> Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom >>> https://bit.ly/virtualfriam >>> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com >>> FRIAM-COMIC http://friam-comic.blogspot.com/ >>> archives: 5/2017 thru present >>> https://redfish.com/pipermail/friam_redfish.com/ >>> 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ >>> >> .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / >> ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. >> FRIAM Applied Complexity Group listserv >> Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom >> https://bit.ly/virtualfriam >> to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com >> FRIAM-COMIC http://friam-comic.blogspot.com/ >> archives: 5/2017 thru present >> https://redfish.com/pipermail/friam_redfish.com/ >> 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ >> > .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / > ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. > FRIAM Applied Complexity Group listserv > Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom > https://bit.ly/virtualfriam > to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com > FRIAM-COMIC http://friam-comic.blogspot.com/ > archives: 5/2017 thru present > https://redfish.com/pipermail/friam_redfish.com/ > 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ >
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. FRIAM Applied Complexity Group listserv Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom https://bit.ly/virtualfriam to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com FRIAM-COMIC http://friam-comic.blogspot.com/ archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/ 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/
