On Wed, Feb 10, 2010 at 10:11 AM, Steve Atkins <st...@blighty.com> wrote: > > On Feb 9, 2010, at 10:38 PM, AI Rumman wrote: > >> Thanks for your quick answes. >> >> But if I use a file and then store the name in the database, is it possible >> to use TEXT search tsvector and tsquery indexing on these external files? > > No, no way at all. > > A database isn't really the right way to do full text search for single files > that big. Even if they'd fit in the database it's way bigger than the > underlying index types tsquery uses are designed for. > > Are you sure that the documents are that big? A single document of that size > would be 400 times the size of the bible. That's a ridiculously large amount > of text, most of a small library. > > If the answer is "yes, it's really that big and it's really text" then look > at clucene or, better, hiring a specialist. > > Cheers, > Steve > >> >> On Wed, Feb 10, 2010 at 12:26 PM, Steve Atkins <st...@blighty.com> wrote: >> >> On Feb 9, 2010, at 9:52 PM, Scott Marlowe wrote: >> >> > On Tue, Feb 9, 2010 at 9:38 PM, AI Rumman <rumman...@gmail.com> wrote: >> >> How to save 2 GB or more text string in Postgresql? >> >> Which data type should I use? >> > >> > If you have to you can use either the lo interface, or you can use >> > bytea. Large Object (i.e. lo) allows for access much like fopen / >> > fseek etc in C, but the actual data are not stored in a row with >> > other data, but alone in the lo space. Bytea is a legit type that you >> > can have as one of many in a row, but you retrieve the whole thing at >> > once when you get the row. >> >> Bytea definitely won't handle more than 1 GB. I don't think the lo interface >> will handle more than 2GB. >> >> > >> > Preferred way to store 2GB data is to put it into a file and put the >> > name of the file into the database. >> >> >> This. >> >> Cheers, >> Steve >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
I have some simple (and possibly wrong) solutions are presented below. a) Store only the file name in the db and leave the file on the file system where you can access it after after reading in the file name from the DB to determine it's location This solution may offer the possibility of using many of the very fast and efficient file content search and manipulation algorithms and tools that are part of most OSes (for instance grep, sed, ...) b) If you really need to store the file contents in the DB. You may write a small method which reads in the file contents in batches of n number of bytes and writes each of these batches in a new table row, but one thing to remember is to capture and store the batch number (which should be sequencial) in another field of the respective batch. And finally indicate the last batch by maybe setting another field in the row as 'true'. Doing so will provide you with the opportunity to parallize your the batch persistance to DB and still have a way to correctly put together the sections of the file contents (the batches) while reconstructing the file. Also will ensure the size of your transaction will be as small the size of a batch. Allan. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general