Re: [GENERAL] Best way to store case-insensitive data?

2010-06-13 Thread Lew

Please don't top-post.

Mike Christensen wrote:

Ah, I should probably upgrade to 8.4.  However, I'll probably just
wait for 9.0 to come out.  So it seems like citext will be about the
same as casting both sides to LOWER(), plus putting an index on the
lowercase version of the text.  I'd probably use that if it were out
of the box, but I'm trying to stay away from adding too many
dependencies..  I think I'll stick with my original approach of only
storing lowercase data in the DB, and perhaps put a CHECK constraint
on there to ensure no upper case letters sneak in.


If your db contains international text there are some corner cases where 
lower( upper( val )) != val or upper( lower( val )) != val.  Or there should 
be, because that's what happens in certain languages.


For example, upper-case 'ß' should be 'SS' in German.  Lower-case 'SS' is 'ss'.

--
Lew

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-12 Thread Adrian von Bidder
On Friday 11 June 2010 09.27:15 Michal Politowski wrote:

[email address local part is case sensitive]

 In practice I've yet to see a system having both smith and Smith
 and them being different, but still it is theoretically posible.

I routinely modify email addresses I store to my addressbook to all 
lowercase.  I have yet to have a single case where this gets me into 
problems; I think it's probably quite exotic to find a system that actually 
is case sensitive.

cheers
-- vbi

-- 
featured product: ClamAV Antivirus - http://www.clamav.net/


signature.asc
Description: This is a digitally signed message part.


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-12 Thread Scott Marlowe
2010/6/11 Mike Christensen m...@kitchenpc.com:
 Yea this is a valid point.  It's very possible my design won't work
 for the long term, and at some point I'll have to store the email name
 exactly as it was entered, and allow the lookup logic to be case
 insensitive with a lowercase index.  However, I think the way I have
 it now should not break any known email server heh.

Instead of mangling data when you store it, mangle it later when you
retrieve it.  with a functional index on the column, you get the
comparison data stored in an index, ready to go.

Performance test the index:

create test_index on table (lower(fieldname));

versus storing the emails in lower case.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-12 Thread Scott Marlowe
n Sat, Jun 12, 2010 at 3:21 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 Performance test the index:

 create test_index on table (lower(fieldname));

 versus storing the emails in lower case.

Some quick testing on that tells me that storing in lower case will be
about twice as fast at retrieval.  But we're talking things like 1ms
versus 2ms.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-11 Thread Adrian von Bidder
Heyho!

On Thursday 10 June 2010 22.50:23 Mike Christensen wrote:
 2) Every time the user updates or saves their email, store it in
 lowercase, and every time I lookup an email, pass in a lowercase
 email.

I'd do it this way.  Plus either a CHECK condition on the table (email = 
lowercase(email)) (this will reliably catch all cases, but you will 
experience failures until you have found all cases)

Or a BEFORE trigger that converts email to lowercase.  (This is mostly 
transparent for storing, but I usually try to avoid triggers that modify 
data like this.  But that's probably just me.)

In either case, obviously you'll still need to change the code that is used 
for retrieving and comparing email addresses.

cheers
-- vbi

-- 
featured link: http://www.pool.ntp.org


signature.asc
Description: This is a digitally signed message part.


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-11 Thread Mike Christensen
Yup, I actually ended up doing this with this constraint:

ALTER TABLE Users ADD CONSTRAINT check_email CHECK (email ~ E'^[^A-Z]+$');

However, I like your version better so I'll use that instead :)

Mike

On Thu, Jun 10, 2010 at 11:48 PM, Adrian von Bidder
avbid...@fortytwo.ch wrote:
 Heyho!

 On Thursday 10 June 2010 22.50:23 Mike Christensen wrote:
 2) Every time the user updates or saves their email, store it in
 lowercase, and every time I lookup an email, pass in a lowercase
 email.

 I'd do it this way.  Plus either a CHECK condition on the table (email =
 lowercase(email)) (this will reliably catch all cases, but you will
 experience failures until you have found all cases)

 Or a BEFORE trigger that converts email to lowercase.  (This is mostly
 transparent for storing, but I usually try to avoid triggers that modify
 data like this.  But that's probably just me.)

 In either case, obviously you'll still need to change the code that is used
 for retrieving and comparing email addresses.

 cheers
 -- vbi

 --
 featured link: http://www.pool.ntp.org


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-11 Thread Michal Politowski
On Thu, 10 Jun 2010 13:50:23 -0700, Mike Christensen wrote:
 I have a column called email that users login with, thus I need to
 be able to lookup email very quickly.  The problem is, emails are
 case-insensitive.  I want f...@bar.com to be able to login with
 f...@bar.com as well.  There's two ways of doing this, that I can see:

NB: technically the local part in an email address can be case sensitive.
As RFC 5321 says:
   The local-part of a mailbox MUST BE treated as case sensitive.
   Therefore, SMTP implementations MUST take care to preserve the case
   of mailbox local-parts.  In particular, for some hosts, the user
   smith is different from the user Smith.  However, exploiting the
   case sensitivity of mailbox local-parts impedes interoperability and
   is discouraged.  Mailbox domains follow normal DNS rules and are
   hence not case sensitive.

In practice I've yet to see a system having both smith and Smith
and them being different, but still it is theoretically posible.

-- 
Michał Politowski
Talking has been known to lead to communication if practiced carelessly.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-11 Thread Mike Christensen
Yea this is a valid point.  It's very possible my design won't work
for the long term, and at some point I'll have to store the email name
exactly as it was entered, and allow the lookup logic to be case
insensitive with a lowercase index.  However, I think the way I have
it now should not break any known email server heh.

Mike

2010/6/11 Michal Politowski m...@charybda.icm.edu.pl:
 On Thu, 10 Jun 2010 13:50:23 -0700, Mike Christensen wrote:
 I have a column called email that users login with, thus I need to
 be able to lookup email very quickly.  The problem is, emails are
 case-insensitive.  I want f...@bar.com to be able to login with
 f...@bar.com as well.  There's two ways of doing this, that I can see:

 NB: technically the local part in an email address can be case sensitive.
 As RFC 5321 says:
   The local-part of a mailbox MUST BE treated as case sensitive.
   Therefore, SMTP implementations MUST take care to preserve the case
   of mailbox local-parts.  In particular, for some hosts, the user
   smith is different from the user Smith.  However, exploiting the
   case sensitivity of mailbox local-parts impedes interoperability and
   is discouraged.  Mailbox domains follow normal DNS rules and are
   hence not case sensitive.

 In practice I've yet to see a system having both smith and Smith
 and them being different, but still it is theoretically posible.

 --
 Michał Politowski
 Talking has been known to lead to communication if practiced carelessly.

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Best way to store case-insensitive data?

2010-06-10 Thread Mike Christensen
I have a column called email that users login with, thus I need to
be able to lookup email very quickly.  The problem is, emails are
case-insensitive.  I want f...@bar.com to be able to login with
f...@bar.com as well.  There's two ways of doing this, that I can see:

1) Every time I lookup an email in the database, do a case-insensitive
ilike, or cast both sides with LOWER().  I think both are slow,
correct?
2) Every time the user updates or saves their email, store it in
lowercase, and every time I lookup an email, pass in a lowercase
email.  This is somewhat of a bug farm because one might miss some
little spot in a piece of code where an email is compared or updated.

Is there any way to tell postgres to always store data in lowercase
form, or just have a flat out case-insensitive column?  Thanks!

Mike

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-10 Thread Mike Christensen
Right now, I'm running 8.3.4 on Windows 2003, and I just used the MSI
installer to install it.

On Thu, Jun 10, 2010 at 2:18 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Thu, Jun 10, 2010 at 3:16 PM, Mike Christensen m...@kitchenpc.com wrote:
 Where do I get info on installing this?

 Very much depends on OS and how you installed pgsql


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-10 Thread Andy Colson

On 6/10/2010 3:50 PM, Mike Christensen wrote:

I have a column called email that users login with, thus I need to
be able to lookup email very quickly.  The problem is, emails are
case-insensitive.  I want f...@bar.com to be able to login with
f...@bar.com as well.  There's two ways of doing this, that I can see:

1) Every time I lookup an email in the database, do a case-insensitive
ilike, or cast both sides with LOWER().  I think both are slow,
correct?
2) Every time the user updates or saves their email, store it in
lowercase, and every time I lookup an email, pass in a lowercase
email.  This is somewhat of a bug farm because one might miss some
little spot in a piece of code where an email is compared or updated.

Is there any way to tell postgres to always store data in lowercase
form, or just have a flat out case-insensitive column?  Thanks!

Mike



There is citext in contrib, it makes case insensitive text columns.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-10 Thread Mike Christensen
Ah, I should probably upgrade to 8.4.  However, I'll probably just
wait for 9.0 to come out.  So it seems like citext will be about the
same as casting both sides to LOWER(), plus putting an index on the
lowercase version of the text.  I'd probably use that if it were out
of the box, but I'm trying to stay away from adding too many
dependencies..  I think I'll stick with my original approach of only
storing lowercase data in the DB, and perhaps put a CHECK constraint
on there to ensure no upper case letters sneak in.

Mike

On Thu, Jun 10, 2010 at 2:42 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Thu, Jun 10, 2010 at 3:34 PM, Mike Christensen m...@kitchenpc.com wrote:
 From this site:

 http://developer.postgresql.org/pgdocs/postgres/citext.html

 I couldn't tell if you still had to create an index on the lower case
 value.  It seems that it basically mimics the WHERE LOWER(email) =
 LOWER(?) method.  Since this part is incredibly performance critical,
 maybe I'm better off storing my data all in lowercase and keeping the
 DB case sensitive.

 of course you'd still need an index.  whether you store it lower case
 in regular text or mixed case in a citext, the db would need an index
 for good performance.  But you wouldn't have to store a lower() index
 for citext, just an index.

 BTW, citext it new for 8.4, so it's probably not an option for you if
 you're on 8.3


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-10 Thread Scott Marlowe
On Thu, Jun 10, 2010 at 2:50 PM, Mike Christensen m...@kitchenpc.com wrote:
 I have a column called email that users login with, thus I need to
 be able to lookup email very quickly.  The problem is, emails are
 case-insensitive.  I want f...@bar.com to be able to login with
 f...@bar.com as well.  There's two ways of doing this, that I can see:

 1) Every time I lookup an email in the database, do a case-insensitive
 ilike, or cast both sides with LOWER().  I think both are slow,
 correct?
 2) Every time the user updates or saves their email, store it in
 lowercase, and every time I lookup an email, pass in a lowercase
 email.  This is somewhat of a bug farm because one might miss some
 little spot in a piece of code where an email is compared or updated.

 Is there any way to tell postgres to always store data in lowercase
 form, or just have a flat out case-insensitive column?  Thanks!

The contrib module citext provides a case insensitive text type.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-10 Thread Scott Marlowe
On Thu, Jun 10, 2010 at 2:50 PM, Mike Christensen m...@kitchenpc.com wrote:
 I have a column called email that users login with, thus I need to
 be able to lookup email very quickly.  The problem is, emails are
 case-insensitive.  I want f...@bar.com to be able to login with
 f...@bar.com as well.  There's two ways of doing this, that I can see:

 1) Every time I lookup an email in the database, do a case-insensitive
 ilike, or cast both sides with LOWER().  I think both are slow,
 correct?
 2) Every time the user updates or saves their email, store it in
 lowercase, and every time I lookup an email, pass in a lowercase
 email.  This is somewhat of a bug farm because one might miss some
 little spot in a piece of code where an email is compared or updated.

 Is there any way to tell postgres to always store data in lowercase
 form, or just have a flat out case-insensitive column?  Thanks!

Note the other option is to store an index on lower(column)

create index mycaseinsensitiveindex on table ((lower(column));

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-10 Thread Mike Christensen
Where do I get info on installing this?

On Thu, Jun 10, 2010 at 2:15 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Thu, Jun 10, 2010 at 2:50 PM, Mike Christensen m...@kitchenpc.com wrote:
 I have a column called email that users login with, thus I need to
 be able to lookup email very quickly.  The problem is, emails are
 case-insensitive.  I want f...@bar.com to be able to login with
 f...@bar.com as well.  There's two ways of doing this, that I can see:

 1) Every time I lookup an email in the database, do a case-insensitive
 ilike, or cast both sides with LOWER().  I think both are slow,
 correct?
 2) Every time the user updates or saves their email, store it in
 lowercase, and every time I lookup an email, pass in a lowercase
 email.  This is somewhat of a bug farm because one might miss some
 little spot in a piece of code where an email is compared or updated.

 Is there any way to tell postgres to always store data in lowercase
 form, or just have a flat out case-insensitive column?  Thanks!

 The contrib module citext provides a case insensitive text type.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-10 Thread Steve Atkins


On 6/10/2010 3:50 PM, Mike Christensen wrote:
 I have a column called email that users login with, thus I need to
 be able to lookup email very quickly.  The problem is, emails are
 case-insensitive.  I want f...@bar.com to be able to login with
 f...@bar.com as well.  There's two ways of doing this, that I can see:
 
 1) Every time I lookup an email in the database, do a case-insensitive
 ilike, or cast both sides with LOWER().  I think both are slow,
 correct?


Use a functional index and they won't be.

create index email_lower_idx on foo (lower(email));

select * from foo where lower(email) = lower('f...@bar');

 2) Every time the user updates or saves their email, store it in
 lowercase, and every time I lookup an email, pass in a lowercase
 email.  This is somewhat of a bug farm because one might miss some
 little spot in a piece of code where an email is compared or updated.
 
 Is there any way to tell postgres to always store data in lowercase
 form, or just have a flat out case-insensitive column?  Thanks!
 
 Mike
 

Cheers,
  Steve


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-10 Thread Mike Christensen
From this site:

http://developer.postgresql.org/pgdocs/postgres/citext.html

I couldn't tell if you still had to create an index on the lower case
value.  It seems that it basically mimics the WHERE LOWER(email) =
LOWER(?) method.  Since this part is incredibly performance critical,
maybe I'm better off storing my data all in lowercase and keeping the
DB case sensitive.

On Thu, Jun 10, 2010 at 2:29 PM, Mike Christensen m...@kitchenpc.com wrote:
 Right now, I'm running 8.3.4 on Windows 2003, and I just used the MSI
 installer to install it.

 On Thu, Jun 10, 2010 at 2:18 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Thu, Jun 10, 2010 at 3:16 PM, Mike Christensen m...@kitchenpc.com wrote:
 Where do I get info on installing this?

 Very much depends on OS and how you installed pgsql



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-10 Thread Scott Marlowe
On Thu, Jun 10, 2010 at 3:16 PM, Mike Christensen m...@kitchenpc.com wrote:
 Where do I get info on installing this?

Very much depends on OS and how you installed pgsql

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to store case-insensitive data?

2010-06-10 Thread Scott Marlowe
On Thu, Jun 10, 2010 at 3:34 PM, Mike Christensen m...@kitchenpc.com wrote:
 From this site:

 http://developer.postgresql.org/pgdocs/postgres/citext.html

 I couldn't tell if you still had to create an index on the lower case
 value.  It seems that it basically mimics the WHERE LOWER(email) =
 LOWER(?) method.  Since this part is incredibly performance critical,
 maybe I'm better off storing my data all in lowercase and keeping the
 DB case sensitive.

of course you'd still need an index.  whether you store it lower case
in regular text or mixed case in a citext, the db would need an index
for good performance.  But you wouldn't have to store a lower() index
for citext, just an index.

BTW, citext it new for 8.4, so it's probably not an option for you if
you're on 8.3

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general