Re: [SQL] column type for pdf file

2011-05-26 Thread Ross J. Reedstrom
On Wed, May 18, 2011 at 05:06:36PM -0600, Eric McKeeth wrote:
> On Wed, May 18, 2011 at 2:20 PM, Emi Lu  wrote:
> 
> > Hello,
> >
> > To save pdf files into postgresql8.3, what is the best column type?
> >
> > bytea, blob, etc?
> >
> > Thank you,
> > Emi
> >
> 
> Everyone else has pointed out reasons for not doing this, and I agree with
> them that in the large majority of cases just storing a reference to a file
> stored outside the database is preferable. However, to answer the question
> you asked, my rule of thumb is that if you need to store binary data in the
> database is to use a bytea column, unless you need the random access
> capabilities that the large object interface provides. A bytea column is
> typically easier to use, and has proper transactional behavior, enforcement
> of referential integrity, etc.
> 

I'm with Eric on this one: for smaller use cases, the convenience of bytea
in the db is nice. As to random access, I wrote a client-side wrapper
for our middleware that implements a file iterator interface for python
on top of substr(bytea,position,blocksize). I was sort of surprised at
how well it performed. We're using it in production right now.

I actually store files in a leaf table w/ and id and hash, with
filenames in a separate linking table, so I'm even getting data
deduplication (all the rage in biz these days) for free.

Ross
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer & Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] extracting location info from string

2011-05-26 Thread Tarlika Elisabeth Schmitz
On Thu, 26 May 2011 10:15:50 +1200
Andrej  wrote:

>On 26 May 2011 09:13, Tarlika Elisabeth Schmitz
> wrote:
>> On Wed, 25 May 2011 09:25:48 -0600
>> Rob Sargent  wrote:
>>
>>>
>>>
>>>On 05/24/2011 10:57 AM, Lew wrote:
 Tarlika Elisabeth Schmitz wrote:

> CREATE TABLE person
> (
> id integer NOT NULL,
> "name" character varying(256) NOT NULL,
> "location" character varying(256),
> CONSTRAINT person_pkey PRIMARY KEY (id)
> );
>
> this was just a TEMPORARY table I created for quick analysis
> of my CSV data (now renamed to temp_person).
>>
>> CREATE TABLE country
>> (
>>  id character varying(3) NOT NULL, -- alpha-3 code
>>  "name" character varying(50) NOT NULL,
>>  CONSTRAINT country_pkey PRIMARY KEY (id)
>> );
>>
>>
>>>To minimize the ultimately quite necessary human adjudication, one
>>>might make good use of what is often termed "crowd sourcing":  Keep
>>>all the distinct "hand entered" values and a map to the final human
>>>assessment.
>>[...]
>> I could do with a concept for this problem, which applies to a lot of
>> string-type info.
>
>I'd start w/ downloading a list as mentioned here:
>http://answers.google.com/answers/threadview?id=596822
>
>And run it through a wee perl script using
>http://search.cpan.org/~maurice/Text-DoubleMetaphone-0.07/DoubleMetaphone.pm
>to make phonetic matches ...
>
>Then I'd run your own data through DoubleMetaphone, and clean up
>matches if not too many false positives show up.


Many thanks for all your suggestions. It will take me a while to work
my way through these as I have several open ends.


In a similar vein, the PERSONs names are
1)  
2)   (more common)
3)  | 


Where I have firstname and or title I'd be quite keen to determine sex
as it would be interesting from a statistics point of view to
distinguish.

I am basically just interested in people from two countries, names
mainly English.

-- 

Best Regards,
Tarlika Elisabeth Schmitz

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql