Re: [GENERAL] match an IP address

2008-09-23 Thread Phoenix Kiula
My post at the bottom.



 On 9/23/08, Craig Ringer [EMAIL PROTECTED] wrote:

 
   Dodgy forum software. Lots of it uses an IP address as a fake username for
  unregistered users, rather than doing the sensible thing and tracking both
  IP address and (if defined) username.
 
   How I'd want to do this if I was designing the setup from scratch would
  probably be:
 
   --
   -- Track user identies
   --
   CREATE TABLE user (
user_id   SERIAL PRIMARY KEY,
user_name VARCHAR NOT NULL,
user_retired  BOOLEAN
-- and whatever else you want to keep track of about them
   );
 
   -- This index enforces unique user names across active users,
   -- permitting re-use of usernames for since deleted users without
   -- losing information about the previous user of that name's activity
   -- or losing the inability to differentiate between the old and new
   -- owners of that name.
   --
   -- I'd revoke DELETE rights on `user' or use a rule that rewrote DELETE
   -- to an UPDATE user SET user_retired = 't'.
   --
 
   CREATE UNIQUE INDEX user_active_name
   ON user(user_name)
   WHERE (NOT user_retired)
 
   -- Now, say your problem described in your post is with user activity
   -- logging:
 
   CREATE TABLE access_log (
-- blah blah
user_id   INTEGER REFERENCES user(user_id),
access_ip cidr NOT NULL
   );
 
   CREATE INDEX access_log_ip ON access_log(access_ip);
 
   CREATE INDEX access_user
   ON access_log(user_id)
   WHERE (user_id NOT NULL);
 
 
 
   In other words: always store the IP address, and if the user involved was a
  registered user store a reference to their user ID as well. Store a
  reference to a globally unique user identity number rather than the user
  name, and permit reuse of user names without losing information about
  distinct username owners.
 
   If you wanted you could use string user IDs and do away with the synthetic
  user_id key I've used above, but I suspect you'd regret it down the track.
 
   If you wanted to look up activity that might be identifed by IP address or
  by username, a query like this would do the trick and would simulate the
  behaviour your forum software is used to, including the ability of a user to
  create a username that's an IP address to throw the whole thing into chaos:
 
   SELECT * FROM access_user
   WHERE COALESCE(user_id, to_char(access_ip)) = matchstring;
 
   --
   Craig Ringer
 





No. You have no idea what the design is for. Not forum crap.

 What happens when you need to store in a table the activity log?

   ACTIVITY_ID
   USER_STAMP  (currently user_id or ip for registered and unregistered resp.)

 The idea of storing IP for users is already being done. So what?
 Everytime they do something, you do not store their IP. Why would
 you? Just store their user id. For unregistered ones however, we store
 the IP because there is nothing else. There is no user ID for them.
 What's your logic for getting a user ID for unregistered guys --
 invent one automagically?

 Finally, this SQL:


WHERE COALESCE(user_id, to_char(access_ip)) = matchstring;


Ever tried this crap on a table of 10 million records on a live
 website, where this query is happening at 3000 times per second? No
 such function schtick will match the raw speed of a simpler indexed
 query. Or did you mean my index should contain the COALESCE already?

Tino, I wasn't talking about sessions. This is not about session IDs.
A session ID is useless the next time a certain IP address does
something on the site. You want a commonality for non-registered
users across many different sessions. (Apart from the fact that
session IDs are usually long hashes which take up space in the table
and in the index)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-23 Thread Tino Wildenhain

Phoenix Kiula wrote:

My post at the bottom.


...


No. You have no idea what the design is for. Not forum crap.

 What happens when you need to store in a table the activity log?

   ACTIVITY_ID
   USER_STAMP  (currently user_id or ip for registered and unregistered resp.)


And here it gets wrong. Obviously you would store the session id
or if you have a lot of relations, use a sequence generated
key for session_id (compare with my design in the other post,
in this case session_id would be serial and you'd have a field
session_key text with the index for the cookies in the sessions
table instead)


 The idea of storing IP for users is already being done. So what?


Abandon this idea I'd say. Its based on the wrong asumption IP
addresses map to users in 1:1 relation.


 Everytime they do something, you do not store their IP. Why would
 you? Just store their user id. For unregistered ones however, we store
 the IP because there is nothing else. There is no user ID for them.
 What's your logic for getting a user ID for unregistered guys --
 invent one automagically?

 Finally, this SQL:


WHERE COALESCE(user_id, to_char(access_ip)) = matchstring;


Ever tried this crap on a table of 10 million records on a live
 website, where this query is happening at 3000 times per second? No


You have 10 million people active the same time in your site?



 such function schtick will match the raw speed of a simpler indexed
 query. Or did you mean my index should contain the COALESCE already?

Tino, I wasn't talking about sessions. This is not about session IDs.


Well actually this is. You are just naming it differently.


A session ID is useless the next time a certain IP address does
something on the site. You want a commonality for non-registered
users across many different sessions. (Apart from the fact that
session IDs are usually long hashes which take up space in the table
and in the index)


Yes but only active ones.

btw, given IP is in every request, where is your username coming from?
Apart from basic auth, there is no way of having a userid tied to
the request directly, so how are you doing this?

Tino


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [GENERAL] match an IP address

2008-09-23 Thread hubert depesz lubaczewski
On Mon, Sep 22, 2008 at 05:59:25PM +0100, Joao Ferreira gmail wrote:
 I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
 address
 192.168.90.3
 10.3.2.1
 any help please...

use this regular expression:

'^[0-9]{1,3}(.[0-9]{1,3}){3}$'

warning: do not use like or similar to.
proper way to use it:

select * from table where field ~ '^[0-9]{1,3}(.[0-9]{1,3}){3}$';

this regexp is not 100% fault proof - it will happily return rows like:
'300.999.998.7'

but for most of the cases it should be enough. if you need it to match
only ips, and filter out things like '300.999.999.999' - let me know.

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: [EMAIL PROTECTED] / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-23 Thread Craig Ringer

Phoenix Kiula wrote:


Ever tried this crap on a table of 10 million records on a live
 website, where this query is happening at 3000 times per second? No
 such function schtick will match the raw speed of a simpler indexed
 query. Or did you mean my index should contain the COALESCE already?


Please forgive my attempt to help you based on a woefully insufficient 
description of your problem and situation. I will not make any attempt 
to do so again.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-23 Thread Joao Ferreira gmail
thank you depesz

it seems a pretty good fix for my problem. Actually yestreday I came up
with something similar but your's is better.

cheers

joao

On Tue, 2008-09-23 at 09:26 +0200, hubert depesz lubaczewski wrote:
 On Mon, Sep 22, 2008 at 05:59:25PM +0100, Joao Ferreira gmail wrote:
  I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
  address
  192.168.90.3
  10.3.2.1
  any help please...
 
 use this regular expression:
 
 '^[0-9]{1,3}(.[0-9]{1,3}){3}$'
 
 warning: do not use like or similar to.
 proper way to use it:
 
 select * from table where field ~ '^[0-9]{1,3}(.[0-9]{1,3}){3}$';
 
 this regexp is not 100% fault proof - it will happily return rows like:
 '300.999.998.7'
 
 but for most of the cases it should be enough. if you need it to match
 only ips, and filter out things like '300.999.999.999' - let me know.
 
 Best regards,
 
 depesz
 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-23 Thread Phoenix Kiula
  Please forgive my attempt to help you based on a woefully insufficient
 description of your problem and situation. I will not make any attempt to do
 so again.



Actually it was not my problem, this is a thread started by some one
else. I use Gmail so I see the entire thread as a conversation and
the context is maintained. You should try it. Anyway, sorry that you
feel bad.

To others: thanks for your suggestions, but this issue is not one of
session IDs, nor is it solved by storing IP addresses separately
(which does not assume 1:1 correlation between user and IP). We'll let
that be.

Let's just say that in *many* online situations it is vital for
querying speed to have the same column that stores users -- both
registered and unregistered. A query in SQL that matches against an IP
address regexp to identify the unregistered ones may work for some
with smaller databases, which is great, and if it doesn't (the ~
match is simply not practical for large busy websites), then consider
a small separate column that stores the registration status as a flag.

Thanks.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-23 Thread Craig Ringer
Phoenix Kiula wrote:

 Ever tried this crap on a table of 10 million records on a live
  website, where this query is happening at 3000 times per second? No
  such function schtick will match the raw speed of a simpler indexed
  query. Or did you mean my index should contain the COALESCE already?

Hmm. My previous response may have been overly grumpy.

The point I was *trying* to make is that shoving a username/id and an IP
address into a single field is probably not ideal. At least in my
experience you pay for this sort of optimisation (if it even works out
as an optimisation in the first place) down the track. I have the
misfortunate to have to administrate a system full of such multi-use
fields, and have developed a real loathing for the approach.

If you don't want to store IPs for registered users, I'd use:

user_id INTEGER,
ip cidr,
CONSTRAINT must_have_userstamp
CHECK ( user_id IS NOT NULL OR ip IS NOT NULL)

... and yes, I'd use a functional index to look it up, or even a
trigger-maintained cache of the text representation if I had to. Then
again, I guess I'm lucky enough to work in environments where data
integrity and correctness is a priority and the resources available are
a good fit to the tasks the database needs to do.

--
Craig Ringe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-23 Thread Phoenix Kiula
  If you don't want to store IPs for registered users, I'd use:

  user_id INTEGER,
  ip cidr,
  CONSTRAINT must_have_userstamp
  CHECK ( user_id IS NOT NULL OR ip IS NOT NULL)

  ... and yes, I'd use a functional index to look it up, or even a
  trigger-maintained cache of the text representation if I had to. Then


Ok, this is an idea. And I do agree that multifunction fields are a
potential pain in the distant future.

My questions:

1. What extra tax will this constraint levy on an INSERT or UPDATE on
this table? There are about 100,000 inserts a day, and over three
times as many UPDATES. The concurrency is pretty high -- I mean
sometimes 1,000 users at the same time but no more than that. If the
additional cost of insertion/updating is not too heavy, I suppose this
could be a nice approach.

2. Why not have an INET field...why a CIDR? What's the benefit? It
stores those pesky .../8 type additional data which one has to mask
with functions. Would INET work just as well?

3. Storage wise does this add significantly? How much space does an
INET field take as opposed to, say, a VARCHAR field?

4. Most importantly, how would you structure the index for this? I
would much rather have a fast = in my sql's WHERE clause. No OR
etc. Any thoughts?

Thanks

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-23 Thread Tino Wildenhain

Hi,

Phoenix Kiula wrote:

 Please forgive my attempt to help you based on a woefully insufficient
description of your problem and situation. I will not make any attempt to do
so again.


To others: thanks for your suggestions, but this issue is not one of
session IDs, nor is it solved by storing IP addresses separately
(which does not assume 1:1 correlation between user and IP). We'll let
that be.

Let's just say that in *many* online situations it is vital for
querying speed to have the same column that stores users -- both
registered and unregistered. A query in SQL that matches against an IP


if not registered, where is the user coming from? The IP is clearly not
an identifier for a user. You (and the OP) should disregard that idea.


address regexp to identify the unregistered ones may work for some
with smaller databases, which is great, and if it doesn't (the ~
match is simply not practical for large busy websites), then consider
a small separate column that stores the registration status as a flag.


The user id itself would serve as that flag. If non NULL - user known,
otherwise unknown. Sounds easy, no? No regex at all! :)



Thanks.


Thx ;)

Tino


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [GENERAL] match an IP address

2008-09-23 Thread Marcus Engene

Phoenix Kiula wrote:

 If you don't want to store IPs for registered users, I'd use:

 user_id INTEGER,
 ip cidr,
 CONSTRAINT must_have_userstamp
 CHECK ( user_id IS NOT NULL OR ip IS NOT NULL)

 ... and yes, I'd use a functional index to look it up, or even a
 trigger-maintained cache of the text representation if I had to. Then




Ok, this is an idea. And I do agree that multifunction fields are a
potential pain in the distant future.

My questions:

1. What extra tax will this constraint levy on an INSERT or UPDATE on
this table? There are about 100,000 inserts a day, and over three
times as many UPDATES. The concurrency is pretty high -- I mean
sometimes 1,000 users at the same time but no more than that. If the
additional cost of insertion/updating is not too heavy, I suppose this
could be a nice approach.

2. Why not have an INET field...why a CIDR? What's the benefit? It
stores those pesky .../8 type additional data which one has to mask
with functions. Would INET work just as well?

3. Storage wise does this add significantly? How much space does an
INET field take as opposed to, say, a VARCHAR field?

4. Most importantly, how would you structure the index for this? I
would much rather have a fast = in my sql's WHERE clause. No OR
etc. Any thoughts?

Thanks

  
Use the best of two worlds - consider memcached and use the db only when 
you create/update an entry so that you can restore it if memcached 
(perhaps as a consequence of a server reboot) gets restarted.

http://www.slideshare.net/vishnu/livejournals-backend-a-history-of-scaling

best regards,
Marcus


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-23 Thread Craig Ringer
Phoenix Kiula wrote:

 1. What extra tax will this constraint levy on an INSERT or UPDATE on
 this table? There are about 100,000 inserts a day, and over three
 times as many UPDATES. The concurrency is pretty high -- I mean
 sometimes 1,000 users at the same time but no more than that. If the
 additional cost of insertion/updating is not too heavy, I suppose this
 could be a nice approach.

The best answer there is to do some testing. I wouldn't expect much of a
cost, but would recommend testing it to be sure.

 2. Why not have an INET field...why a CIDR? What's the benefit? It
 stores those pesky .../8 type additional data which one has to mask
 with functions. Would INET work just as well?

Yes, it would. I was just getting my types muddled.

 3. Storage wise does this add significantly? How much space does an
 INET field take as opposed to, say, a VARCHAR field?

AFAIK nulls are not stored, they're just flagged in the null bitmap. As
such, there should be no or almost no storage cost.

 4. Most importantly, how would you structure the index for this? I
 would much rather have a fast = in my sql's WHERE clause. No OR
 etc. Any thoughts?

I'd try a functional index first. If that didn't do the job, I'd use a
trigger-maintained column _purely_ as an optimisation (ie I could drop
it and lose no data) that stored text representations of the data.
Honestly, though, I expect the functional index would be more than good
enough and probably wouldn't have much of an INSERT/UPDATE cost.

Again, of course, I'd test before setting anything in stone.

--
Craig Ringer

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-23 Thread Tino Wildenhain

Craig Ringer wrote:

Phoenix Kiula wrote:


1. What extra tax will this constraint levy on an INSERT or UPDATE on
this table? There are about 100,000 inserts a day, and over three
times as many UPDATES. The concurrency is pretty high -- I mean
sometimes 1,000 users at the same time but no more than that. If the
additional cost of insertion/updating is not too heavy, I suppose this
could be a nice approach.



...

I'd try a functional index first. If that didn't do the job, I'd use a
trigger-maintained column _purely_ as an optimisation (ie I could drop
it and lose no data) that stored text representations of the data.
Honestly, though, I expect the functional index would be more than good
enough and probably wouldn't have much of an INSERT/UPDATE cost.

Again, of course, I'd test before setting anything in stone.


And do not forget you know beforehand if you are going to lookup a user
or an IP. So I fail to see what problem needs to be solved here :(

Maybe we can get an enlightenment on where every data comes from and
whats going to be its usage? Maybe we can then work out a better
solution at all? Peephole optimizing is great but sometimes the effect
is better if you just open the door :-)

Tino


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [GENERAL] match an IP address

2008-09-23 Thread Steve Atkins


On Sep 23, 2008, at 12:26 AM, hubert depesz lubaczewski wrote:


On Mon, Sep 22, 2008 at 05:59:25PM +0100, Joao Ferreira gmail wrote:
I'm unable to build a LIKE or SIMILAR TO expression for matching  
and ip

address
192.168.90.3
10.3.2.1
any help please...


use this regular expression:

'^[0-9]{1,3}(.[0-9]{1,3}){3}$'

warning: do not use like or similar to.
proper way to use it:

select * from table where field ~ '^[0-9]{1,3}(.[0-9]{1,3}){3}$';

this regexp is not 100% fault proof - it will happily return rows  
like:

'300.999.998.7'


It'll also return rows like 3L33T0o7 - which normally I'd not be too  
bothered about, but on a forum that's a fairly typical username. :)


Gotta escape that period:

^[0-9]{1,3}(?:\.[0-9]{1,3}){3}$


Cheers,
  Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-23 Thread Phoenix Kiula
...snip...

 I'd try a functional index first. If that didn't do the job, I'd use a
  trigger-maintained column _purely_ as an optimisation (ie I could drop
  it and lose no data) that stored text representations of the data.
  Honestly, though, I expect the functional index would be more than good
  enough and probably wouldn't have much of an INSERT/UPDATE cost.
.../snip...


I made a test table and tried a functional index (coalescing the two
columns). Works!

Now let me try that on a mirror on my full real table with millions of
rows and report back. Sounds like a cleaner design so I may be on to
something!

Hope this helps the original poster with some ideas too.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-22 Thread hubert depesz lubaczewski
On Mon, Sep 22, 2008 at 05:59:25PM +0100, Joao Ferreira gmail wrote:
 I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
 address
 192.168.90.3
 10.3.2.1
 any help please...

any reason not to use standard inet datatype? which does the validation.

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: [EMAIL PROTECTED] / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-22 Thread Raymond O'Donnell
On 22/09/2008 17:59, Joao Ferreira gmail wrote:

 I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
 address

There are built-in types in PG for handling IP addresses - are they any
use to you?

If not, there's a useful site here which may get you started:

http://regexlib.com/default.aspx


Ray.

--
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
[EMAIL PROTECTED]
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
--


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-22 Thread Scott Marlowe
On Mon, Sep 22, 2008 at 10:59 AM, Joao Ferreira gmail
[EMAIL PROTECTED] wrote:
 hello all,

 I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
 address

 192.168.90.3
 10.3.2.1

As already mentioned inet / cidr types should work.  Example:

postgres=# create table inettest (a inet);
CREATE TABLE
postgres=# insert into inettest values
('192.168.0.1'),('192.168.1.1'),('10.0.0.1');
INSERT 0 3
postgres=# select a from inettest where '192.168.0.1/16'  a;
  a
-
 192.168.0.1
 192.168.1.1
(2 rows)
postgres=# select a from inettest where '192.168.0.1/24'  a;
  a
-
 192.168.0.1
(1 row)
postgres=# select a from inettest where '192.168.0.1/0'  a;
  a
-
 192.168.0.1
 192.168.1.1
 10.0.0.1
(3 rows)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-22 Thread Joao Ferreira gmail
well...

my IP addresses are stored in a TEXT type field. that field can actually
contain usernames like 'joao' or 'scott' and it can contain IP
addresses



:(

joao


On Mon, 2008-09-22 at 11:13 -0600, Scott Marlowe wrote:
 On Mon, Sep 22, 2008 at 10:59 AM, Joao Ferreira gmail
 [EMAIL PROTECTED] wrote:
  hello all,
 
  I'm unable to build a LIKE or SIMILAR TO expression for matching and ip
  address
 
  192.168.90.3
  10.3.2.1
 
 As already mentioned inet / cidr types should work.  Example:
 
 postgres=# create table inettest (a inet);
 CREATE TABLE
 postgres=# insert into inettest values
 ('192.168.0.1'),('192.168.1.1'),('10.0.0.1');
 INSERT 0 3
 postgres=# select a from inettest where '192.168.0.1/16'  a;
   a
 -
  192.168.0.1
  192.168.1.1
 (2 rows)
 postgres=# select a from inettest where '192.168.0.1/24'  a;
   a
 -
  192.168.0.1
 (1 row)
 postgres=# select a from inettest where '192.168.0.1/0'  a;
   a
 -
  192.168.0.1
  192.168.1.1
  10.0.0.1
 (3 rows)
 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-22 Thread Scott Marlowe
On Mon, Sep 22, 2008 at 11:16 AM, Joao Ferreira gmail
[EMAIL PROTECTED] wrote:
 well...

 my IP addresses are stored in a TEXT type field. that field can actually
 contain usernames like 'joao' or 'scott' and it can contain IP
 addresses

Then cast them to inet and use the method I showed above:

postgres=# create table b as select a::text from inettest ;
SELECT
postgres=# select * from b;
   a

 192.168.0.1/32
 192.168.1.1/32
 10.0.0.1/32
(3 rows)

postgres=# select a from b where '192.168.0.1/0'  a::inet;
   a

 192.168.0.1/32
 192.168.1.1/32
 10.0.0.1/32
(3 rows)

postgres=# select a from b where '192.168.0.1/24'  a::inet;
   a

 192.168.0.1/32

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-22 Thread Tino Wildenhain

Hi,

Joao Ferreira gmail wrote:

well...

my IP addresses are stored in a TEXT type field. that field can actually
contain usernames like 'joao' or 'scott' and it can contain IP
addresses


Any reason not to change this in the first place?

For a quick fix you could use regex to find the records
likely containing an ip-address notation.

Tino


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [GENERAL] match an IP address

2008-09-22 Thread Phoenix Kiula
  my IP addresses are stored in a TEXT type field. that field can actually
  contain usernames like 'joao' or 'scott' and it can contain IP
  addresses



I think this is common DB design on many websites that have registered
user IDs.

My humble suggestion would be to make another column in the table
called user_registered or something. Make this an int(1). If the
user is registered, value is 1, otherwise 0.

To update your existing data onetime, run an SQL update query looking
for IP pattern with 3 dots (which is likely IP address and not user
id) or by the INET conditions in previous suggestions.

Next, include the columns user_id and user_registered in the same
index. This won't increase the size of the index too much, as the
user_registered field is a small INT number. But from now on your SQL
queries:

   ...WHERE user_id = 'testuser' AND user_registered = 1

will return much faster. I have found this to the most convenient and
fastest solution in PGSQL instead of haivng INET in the WHERE clauses.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-22 Thread Tino Wildenhain

Phoenix Kiula wrote:

my IP addresses are stored in a TEXT type field. that field can actually
contain usernames like 'joao' or 'scott' and it can contain IP
addresses




I think this is common DB design on many websites that have registered
user IDs.


Is it? Name one! Sounds like crappy design to me.


My humble suggestion would be to make another column in the table
called user_registered or something. Make this an int(1). If the
user is registered, value is 1, otherwise 0.


Maybe registration (aka authentication) is mixed up with simple
session handling?


To update your existing data onetime, run an SQL update query looking
for IP pattern with 3 dots (which is likely IP address and not user
id) or by the INET conditions in previous suggestions.


It could also just be a fdn (hostname) - still I'm a bit puzzled
how a username and an IP-address can get into the same field.


Next, include the columns user_id and user_registered in the same
index. This won't increase the size of the index too much, as the
user_registered field is a small INT number. But from now on your SQL
queries:

   ...WHERE user_id = 'testuser' AND user_registered = 1

will return much faster. I have found this to the most convenient and
fastest solution in PGSQL instead of haivng INET in the WHERE clauses.



Can you give example on where the inet entry is going to be used?

Cheers
Tino


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [GENERAL] match an IP address

2008-09-22 Thread Phoenix Kiula
  I think this is common DB design on many websites that have registered
   user IDs.
  
 
   Is it? Name one! Sounds like crappy design to me.
 




It might sound crappy design to you, but for websites that allow users
 to do something while they are registered OR unregistered, will choose
 to have this for query speed. (Registered user goes in as testuser
 while an unregistered one goes as his IP address--some websites also
 use cookies but they're less reliable.)

 We can make this very relationally sound and whatnot, but having one
 field to go to, whether you are registered or not, makes it much
 simpler.

 Most websites have to allow for dots in their user ID these days as
 people prefer to have their email address as user ID. Which means that
 the dot checking of an IP address may not work to distinguish IP
 addresses (unregistered user) from registered user IDs.

 In this scenario, for query speed, again, if there is a column that
 tells us whether this user is registered or not it helps a great deal.
 The INET match condition is not good enough for speed for most modern
 websites with any sizeable traffic. I even wrote a function that
 converts IP to INET integer and vice versa, but no great boost in
 query speed that could compare to an indexed query on user_id and
 user_reg.

 Welcome your thoughts on how you would do it.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] match an IP address

2008-09-22 Thread Craig Ringer

Tino Wildenhain wrote:

Phoenix Kiula wrote:
my IP addresses are stored in a TEXT type field. that field can 
actually

contain usernames like 'joao' or 'scott' and it can contain IP
addresses




I think this is common DB design on many websites that have registered
user IDs.


Is it? Name one! Sounds like crappy design to me.


My humble suggestion would be to make another column in the table
called user_registered or something. Make this an int(1). If the
user is registered, value is 1, otherwise 0.


Maybe registration (aka authentication) is mixed up with simple
session handling?


To update your existing data onetime, run an SQL update query looking
for IP pattern with 3 dots (which is likely IP address and not user
id) or by the INET conditions in previous suggestions.


It could also just be a fdn (hostname) - still I'm a bit puzzled
how a username and an IP-address can get into the same field.


Dodgy forum software. Lots of it uses an IP address as a fake username 
for unregistered users, rather than doing the sensible thing and 
tracking both IP address and (if defined) username.


How I'd want to do this if I was designing the setup from scratch would 
probably be:


--
-- Track user identies
--
CREATE TABLE user (
  user_id   SERIAL PRIMARY KEY,
  user_name VARCHAR NOT NULL,
  user_retired  BOOLEAN
  -- and whatever else you want to keep track of about them
);

-- This index enforces unique user names across active users,
-- permitting re-use of usernames for since deleted users without
-- losing information about the previous user of that name's activity
-- or losing the inability to differentiate between the old and new
-- owners of that name.
--
-- I'd revoke DELETE rights on `user' or use a rule that rewrote DELETE
-- to an UPDATE user SET user_retired = 't'.
--

CREATE UNIQUE INDEX user_active_name
ON user(user_name)
WHERE (NOT user_retired)

-- Now, say your problem described in your post is with user activity
-- logging:

CREATE TABLE access_log (
  -- blah blah
  user_id   INTEGER REFERENCES user(user_id),
  access_ip cidr NOT NULL
);

CREATE INDEX access_log_ip ON access_log(access_ip);

CREATE INDEX access_user
ON access_log(user_id)
WHERE (user_id NOT NULL);



In other words: always store the IP address, and if the user involved 
was a registered user store a reference to their user ID as well. Store 
a reference to a globally unique user identity number rather than the 
user name, and permit reuse of user names without losing information 
about distinct username owners.


If you wanted you could use string user IDs and do away with the 
synthetic user_id key I've used above, but I suspect you'd regret it 
down the track.


If you wanted to look up activity that might be identifed by IP address 
or by username, a query like this would do the trick and would simulate 
the behaviour your forum software is used to, including the ability of a 
user to create a username that's an IP address to throw the whole thing 
into chaos:


SELECT * FROM access_user
WHERE COALESCE(user_id, to_char(access_ip)) = matchstring;

--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general