Hi Kevin, Thanks again for the reply. I suspect casting and using octet_length() is not accurate. Using "extract[ed] text" keyword or summaries would indeed be quick but is not what I'm looking for. I am inquiring about real-world numbers for full text search of large documents, I'm not sure what more detail you could want. I'm not demanding anything, just using examples to clarify my inquiry. I am inded open to alternatives.
Thank you Kevin, pg_column_size looks like it's exactly what I'm looking for. http://www.postgresql.org/docs/9.0/static/functions-admin.html pg_column_size(any) int Number of bytes used to store a particular value (possibly compressed) On Tue, Jun 14, 2011 at 11:36 AM, Kevin Grittner < kevin.gritt...@wicourts.gov> wrote: > Tim <elatl...@gmail.com> wrote: > > > I would be surprised if there is no general "how big is this > > object" method in PostgreSQL. > > You could cast to text and use octet_length(). > > > If it's "bad design" to store large text documents (pdf,docx,etc) > > as a BLOBs or on a filesystem and make them searchable with > > tsvectors can you suggest a good design? > > Well, I suggested that storing a series of novels as a single entry > seemed bad design to me. Perhaps one entry per novel or even finer > granularity would make more sense in most applications, but there > could be exceptions. Likewise, a list of distinct words is of > dubious value in most applications' text searches. We extract text > from court documents and store a tsvector for each document; we > don't aggregate all court documents for a year and create a > tsvector for that -- that would not be useful for us. > > > If making your own search implementation is "better" what is the > > point of tsvectors? > > I remember you asking about doing that, but I don't think anyone > else has advocated it. > > > Maybe I'm missing something here? > > If you were to ask for real-world numbers you'd probably get farther > than demanding that people volunteer their time to perform tests > that you define but don't seem willing to run. Or if you describe > your use case in more detail, with questions about alternative > approaches, you're likely to get useful advice. > > -Kevin > On Tue, Jun 14, 2011 at 11:44 AM, Kevin Grittner < kevin.gritt...@wicourts.gov> wrote: > "Kevin Grittner" <kevin.gritt...@wicourts.gov> wrote: > > > You could cast to text and use octet_length(). > > Or perhaps you're looking for pg_column_size(). > > > http://www.postgresql.org/docs/9.0/interactive/functions-admin.html#FUNCTIONS-ADMIN-DBSIZE > > -Kevin >