On Tue, May 19, 2026 at 10:27 AM Martin Mueller < [email protected]> wrote:
> I use Postgres with a GUI frontend (Aquafold) as a very large spreadsheet > on steroids that analyzes rare or defective spellings in a corpus of 65,000 > texts and1.5 billion words. I typically extract data from the corpus with > python scripts, turn them into tables and load them into the database. > > > On my Mac with 32 GB of memory performance is OK with queries that > typically within seconds extract data rows from tables with up to ten > million rows. If the result set is large, I suspect that most of time > machine's time is spent displaying result sets. I have used indexing > sparingly. While it helps, the time savings often don't matter much. > > > I am thinking about scaling up to table with about 60 million rows. Are > there things to do or watch out for? > Use the correct tool for the task at hand, even if you are not a carpenter and thus only know how to use a hammer. Or should I proceed on the assumption that that 60 million records are > within scope and that the added timecost is roughly linear? > In my experience, database performance shows a hockey stick graph: good while stuff fits in memory, and then suddenly not so good. The correct tool for full text search is PG's Full Text Search (ts_vector) facility, paired with GIN indexes. Do you use them? Probably not, based on your comments, but that would "keep 'everything' in memory", thus staving off performance degradation. -- Death to <Redacted>, and butter sauce. Don't boil me, I'm still alive. <Redacted> lobster!
