Marcus-- Congrats and many thanks for harvesting this whole crop and keeping it in various grain bins.
Quick questions: The DOJ, on multiple occasions, has talked about various numbers of pages. How many "pages" do you think you have? Are they all standard 8.5x11 pages? All PDF? If so, searchable PDF? Do the various batches released come with any kind of title page, index? Glossary? Are the pages/documents in any chronological order or any categorical order? Do you think we could do a word count vs. lines (each containing an words-per-line estimate) redacted? (i.e a story reporting X percent of the documents still hidden or useless). I'm sure I can bug you for more. Tom ======================= Tom Johnson Inst. for Analytic Journalism Santa Fe, New Mexico 505-577-6482 ======================= On Thu, Feb 5, 2026, 10:37 PM Marcus Daniels <[email protected]> wrote: > I’m closing-in on a full download of Dataset 9 of the Epstein Transparency > Act. (I have the rest.) I’m thinking of building a vector database (e.g. > pgvector for Postgres). I was thinking of wrapping a MCP server around it > so LLMs can get a directory of articles and then summarize, or > cross-reference sets of them. RAG is what Perplexity does, but > apparently, they don’t have the content yet. > > > > I imagine a SETI-at-home type project to reduce the data. Another analogy > that comes to mind is annotations of the genome: Line all the documents up > and then slowly fill in the summaries. The vector database could help > inform how to combine documents for consumption within context window > limits (PCA vicinity). > > > > I could keep my Max subscription on it and make some progress, but really > such a project needs tens or hundreds of workers. > > > > Marcus > > > > > > > > > .- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / > ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. > FRIAM Applied Complexity Group listserv > Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom > https://bit.ly/virtualfriam > to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com > FRIAM-COMIC http://friam-comic.blogspot.com/ > archives: 5/2017 thru present > https://redfish.com/pipermail/friam_redfish.com/ > 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/ >
.- .-.. .-.. / ..-. --- --- - . .-. ... / .- .-. . / .-- .-. --- -. --. / ... --- -- . / .- .-. . / ..- ... . ..-. ..- .-.. FRIAM Applied Complexity Group listserv Fridays 9a-12p Friday St. Johns Cafe / Thursdays 9a-12p Zoom https://bit.ly/virtualfriam to (un)subscribe http://redfish.com/mailman/listinfo/friam_redfish.com FRIAM-COMIC http://friam-comic.blogspot.com/ archives: 5/2017 thru present https://redfish.com/pipermail/friam_redfish.com/ 1/2003 thru 6/2021 http://friam.383.s1.nabble.com/
