Re: [SQL] column type for pdf file
On Wed, May 18, 2011 at 05:06:36PM -0600, Eric McKeeth wrote: > On Wed, May 18, 2011 at 2:20 PM, Emi Lu wrote: > > > Hello, > > > > To save pdf files into postgresql8.3, what is the best column type? > > > > bytea, blob, etc? > > > > Thank you, > > Emi > > > > Everyone else has pointed out reasons for not doing this, and I agree with > them that in the large majority of cases just storing a reference to a file > stored outside the database is preferable. However, to answer the question > you asked, my rule of thumb is that if you need to store binary data in the > database is to use a bytea column, unless you need the random access > capabilities that the large object interface provides. A bytea column is > typically easier to use, and has proper transactional behavior, enforcement > of referential integrity, etc. > I'm with Eric on this one: for smaller use cases, the convenience of bytea in the db is nice. As to random access, I wrote a client-side wrapper for our middleware that implements a file iterator interface for python on top of substr(bytea,position,blocksize). I was sort of surprised at how well it performed. We're using it in production right now. I actually store files in a leaf table w/ and id and hash, with filenames in a separate linking table, so I'm even getting data deduplication (all the rage in biz these days) for free. Ross -- Ross Reedstrom, Ph.D. reeds...@rice.edu Systems Engineer & Admin, Research Scientistphone: 713-348-6166 Connexions http://cnx.orgfax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] extracting location info from string
On Thu, 26 May 2011 10:15:50 +1200 Andrej wrote: >On 26 May 2011 09:13, Tarlika Elisabeth Schmitz > wrote: >> On Wed, 25 May 2011 09:25:48 -0600 >> Rob Sargent wrote: >> >>> >>> >>>On 05/24/2011 10:57 AM, Lew wrote: Tarlika Elisabeth Schmitz wrote: > CREATE TABLE person > ( > id integer NOT NULL, > "name" character varying(256) NOT NULL, > "location" character varying(256), > CONSTRAINT person_pkey PRIMARY KEY (id) > ); > > this was just a TEMPORARY table I created for quick analysis > of my CSV data (now renamed to temp_person). >> >> CREATE TABLE country >> ( >> id character varying(3) NOT NULL, -- alpha-3 code >> "name" character varying(50) NOT NULL, >> CONSTRAINT country_pkey PRIMARY KEY (id) >> ); >> >> >>>To minimize the ultimately quite necessary human adjudication, one >>>might make good use of what is often termed "crowd sourcing": Keep >>>all the distinct "hand entered" values and a map to the final human >>>assessment. >>[...] >> I could do with a concept for this problem, which applies to a lot of >> string-type info. > >I'd start w/ downloading a list as mentioned here: >http://answers.google.com/answers/threadview?id=596822 > >And run it through a wee perl script using >http://search.cpan.org/~maurice/Text-DoubleMetaphone-0.07/DoubleMetaphone.pm >to make phonetic matches ... > >Then I'd run your own data through DoubleMetaphone, and clean up >matches if not too many false positives show up. Many thanks for all your suggestions. It will take me a while to work my way through these as I have several open ends. In a similar vein, the PERSONs names are 1) 2) (more common) 3) | Where I have firstname and or title I'd be quite keen to determine sex as it would be interesting from a statistics point of view to distinguish. I am basically just interested in people from two countries, names mainly English. -- Best Regards, Tarlika Elisabeth Schmitz -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql