[SQL] Question on COUNT performance

2010-07-14 Thread Anders Østergaard Jensen
Hello mailing list,

I have a performance problem with my postgres 8.4.4 database. The query is
the following:

SELECT count(*) AS count_all FROM "plan_events" WHERE (f_plan_event_acl(17,
plan_events.id))

   QUERY PLAN


 Aggregate  (cost=2859.77..2859.78 rows=1 width=0) (actual
time=4641.720..4641.720 rows=1 loops=1)
   ->  Seq Scan on plan_events  (cost=0.00..2851.44 rows=3331 width=0)
(actual time=32.821..4640.116 rows=2669 loops=1)
 Filter: f_plan_event_acl(17, id)
 Total runtime: 4641.753 ms
(4 rows)



What can I do to improve the performance? The table has around 3000+ rows,
so the data size is really limited.

The table has the following definition:


-

metabase=# \d plan_events;
   Table "public.plan_events"
  Column  |   Type   |
Modifiers
--+--+---
 id   | integer  | not null default
nextval(('"plan_event_id_seq"'::text)::regclass)
 description  | text | not null
 status   | text | not null
 pct_completed| integer  | default 0
 due  | timestamp with time zone | not null
 due_to   | timestamp with time zone | not null
 priority | integer  | not null default 1
 created  | timestamp with time zone | not null
 user_id_created  | integer  | not null
 plan_type_id | integer  | not null
 finished | boolean  | not null default false
 duration | double precision | not null default 0.0
 search_idx   | tsvector |
 org_id   | integer  | not null default 1
 personal_user_id | integer  |
 place| text |
 contact_log_id   | integer  |
Indexes:
"plan_events_pkey" PRIMARY KEY, btree (id)
"plan_event_contact_log_idx" btree (contact_log_id)
"plan_event_search_idx" gin (search_idx)
"plan_events_created_idx" btree (created)
"plan_events_due_idx" btree (due)
"plan_events_org_idx" btree (org_id)
"plan_events_personal_user_idx" btree (personal_user_id)
"plan_events_plan_type_id_idx" btree (plan_type_id)
"plan_events_user_id_created_idx" btree (user_id_created)
Foreign-key constraints:
"plan_events_contact_log_id_fkey" FOREIGN KEY (contact_log_id)
REFERENCES contact_logs(id)
"plan_events_org_id_fkey" FOREIGN KEY (org_id) REFERENCES orgs(id)
"plan_events_personal_user_id_fkey" FOREIGN KEY (personal_user_id)
REFERENCES users(id)
"plan_events_plan_type_id_fkey" FOREIGN KEY (plan_type_id) REFERENCES
plan_types(id)
"plan_events_user_id_created_fkey" FOREIGN KEY (user_id_created)
REFERENCES users(id)
Referenced by:
TABLE "contact_plan_events" CONSTRAINT
"contact_plan_events_plan_event_id_fkey" FOREIGN KEY (plan_event_id)
REFERENCES plan_events(id)
TABLE "custom_values" CONSTRAINT "custom_values_plan_event_id_fkey"
FOREIGN KEY (plan_event_id) REFERENCES plan_events(id)
TABLE "customer_plan_events" CONSTRAINT
"customer_plan_events_plan_event_id_fkey" FOREIGN KEY (plan_event_id)
REFERENCES plan_events(id)
TABLE "generic_comments" CONSTRAINT
"generic_comments_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES
plan_events(id)
TABLE "mail_queue_items" CONSTRAINT
"mail_queue_items_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES
plan_events(id)
TABLE "plan_event_notifications" CONSTRAINT
"plan_event_notifications_plan_event_id_fkey" FOREIGN KEY (plan_event_id)
REFERENCES plan_events(id)
TABLE "project_plan_events" CONSTRAINT
"project_plan_events_plan_event_id_fkey" FOREIGN KEY (plan_event_id)
REFERENCES plan_events(id)
TABLE "to_do_list_events" CONSTRAINT
"to_do_list_events_plan_event_id_fkey" FOREIGN KEY (plan_event_id)
REFERENCES plan_events(id)
TABLE "user_plan_events" CONSTRAINT
"user_plan_events_plan_event_id_fkey" FOREIGN KEY (plan_event_id) REFERENCES
plan_events(id)
Triggers:
plan_events_update BEFORE INSERT OR UPDATE ON plan_events FOR EACH ROW
EXECUTE PROCEDURE tsvector_update_trigger('search_idx', 'pg_catalog.danish',
'description', 'status')
--

And the f_plan_event function has the following definition:

metabase=# \df+ f_plan_event_acl

 List of
functions
 Schema |   Name   | Result data type | Argument data types
 |  Type  | Volatility |  Owner   | Language |
  Source code
  |
Description
+--+--+---

Re: [SQL] Question on COUNT performance

2010-07-14 Thread Reinoud van Leeuwen
On Wed, Jul 14, 2010 at 09:58:10PM +1000, Anders ??stergaard Jensen wrote:
> Hello mailing list,
> 
> I have a performance problem with my postgres 8.4.4 database. The query is
> the following:
> 
> SELECT count(*) AS count_all FROM "plan_events" WHERE (f_plan_event_acl(17,
> plan_events.id))
> 
>QUERY PLAN
> 
> 
>  Aggregate  (cost=2859.77..2859.78 rows=1 width=0) (actual
> time=4641.720..4641.720 rows=1 loops=1)
>->  Seq Scan on plan_events  (cost=0.00..2851.44 rows=3331 width=0)
> (actual time=32.821..4640.116 rows=2669 loops=1)
>  Filter: f_plan_event_acl(17, id)
>  Total runtime: 4641.753 ms
> (4 rows)
> 
> 
> 
> What can I do to improve the performance? 

Have you tried 'select count (1)..."?

Reinoud


-- 
__
"Nothing is as subjective as reality"
Reinoud van Leeuwenreinou...@n.leeuwen.net
http://reinoud.van.leeuwen.net kvk 27320762
__

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] subtract two dates to get the number of days

2010-07-14 Thread Jean-David Beyer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Campbell, Lance wrote:
> I want to subtract to dates to know the number of days different.
> 
> 
> 
> Example:
> 
> 01/02/2010 - 01/01/2010 = 1 day
> 
> 08/01/2010 - 07/31/2010 = 1 day
> 
> 
> 
> How do I do this?
> 
Others have posted SQL answers to this. Which is fine if you need to do
it in SQL.

My dates are of the form -mm-dd and such. And I want to do things
like adding or subtracting days, months, or years to it or from it. Also
the logical comparisons.

Years ago, I made a C++ data type that allowed a date datatype where I
could add, subtract, and so on.

I use it in programs that do not necessarily use a database, but also in
programs that do when the computations are the big part of the cpu load,
as contrasted to just "gentle" massaging of existing data.


- --
  .~.  Jean-David Beyer  Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A Registered Machine   241939.
 /( )\ Shrewsbury, New Jerseyhttp://counter.li.org
 ^^-^^ 08:20:01 up 6 days, 17:06, 4 users, load average: 4.77, 4.78, 4.87
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org/

iD8DBQFMPa9yPtu2XpovyZoRAnuDAJ9U9yghDl8NkGNv1pWSxIwXsDBTXwCgiv1L
INK1dzbUQnWBjhXXrQu6ZsM=
=lyR9
-END PGP SIGNATURE-

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Question on COUNT performance

2010-07-14 Thread Joshua Tolley
On Wed, Jul 14, 2010 at 02:30:29PM +0200, Reinoud van Leeuwen wrote:
> On Wed, Jul 14, 2010 at 09:58:10PM +1000, Anders ??stergaard Jensen wrote:
> > SELECT count(*) AS count_all FROM "plan_events" WHERE (f_plan_event_acl(17,
> > plan_events.id))
> > 
> >QUERY PLAN
> > 
> > 
> >  Aggregate  (cost=2859.77..2859.78 rows=1 width=0) (actual
> > time=4641.720..4641.720 rows=1 loops=1)
> >->  Seq Scan on plan_events  (cost=0.00..2851.44 rows=3331 width=0)
> > (actual time=32.821..4640.116 rows=2669 loops=1)
> >  Filter: f_plan_event_acl(17, id)
> >  Total runtime: 4641.753 ms
> > (4 rows)
> > 
> > 
> > 
> > What can I do to improve the performance? 
> 
> Have you tried 'select count (1)..."?

If this helps at all, it's unlikely to help much. I remember having seen
discussion somewhere that there's an optimization such that count(*) and
count(1) do the same thing anyway, but I can't find it in the code
immediately. In any case, if your WHERE clause frequently includes this
function with 17 and id as arguments, and if f_plan_event_acl is immutable,
you can create an index:

CREATE INDEX foo_ix ON plan_events (f_plan_event_acl(17, id));

If PostgreSQL thinks that function will be true for a sufficiently small
proportion of the rows in the table, it will use the index instead of a
sequential scan, which might end up being faster. But the index won't help you
when you want to check values other than 17, e.g.

SELECT count(*) FROM plan_events WHERE f_plan_event_acl(42, id)

Another option might be to precalculate these data, if you have a relatively
small set of values you pass to f_plan_event_acl(). A table somewhere could
store the f_plan_events_acl() argument (17, as well as any other values you
want to precalculate), and a count of plan_events rows where
f_plan_events_acl() returns true with that argument. A set of triggers would
ensure that whenever someone INSERTs, UPDATEs, or DELETEs a plan_events row,
these counts are updated accordingly. Then you can refer to that table when
you need a count. If values in other tables can change the results of
f_plan_events_acl(), you'd need triggers there, too (and this method would
probably start to become unworkably complicated).

As an alternative to the precalculation option, you could also cache the
results of this query somewhere, and presumably invalidate that cache using a
trigger on the plan_events table.

Finally, you can try to improve performance of your function itself. If it's
taking 4.6 sec. to read and process 2669 rows, either you're reading awfully
slowly (VACUUM problems, perhaps?) or your f_plan_events_acl() function takes
a long time to run.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


Re: [SQL] Question on COUNT performance

2010-07-14 Thread Reinoud van Leeuwen
On Wed, Jul 14, 2010 at 07:30:39AM -0600, Joshua Tolley wrote:

> > Have you tried 'select count (1)..."?
> 
> If this helps at all, it's unlikely to help much. I remember having seen
> discussion somewhere that there's an optimization such that count(*) and
> count(1) do the same thing anyway, but I can't find it in the code
> immediately. 

oops, I was thinking too much about Sybase (where is makes a huge 
difference, since the optimiser sees that everything it needs can be found 
in the index so the table does not have to be read at all).

Reinoud

-- 
__
"Nothing is as subjective as reality"
Reinoud van Leeuwenreinou...@n.leeuwen.net
http://reinoud.van.leeuwen.net kvk 27320762
__

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] subtract two dates to get the number of days

2010-07-14 Thread Thomas Kellerer

Jean-David Beyer wrote on 14.07.2010 14:37:

My dates are of the form -mm-dd and such.

Storing a date as a string is never a good idea.


And I want to do things like adding or subtracting days, months, or years to it 
or from it.
Also the logical comparisons.

Which is all a piece of cake when you use the proper datatype
 

Years ago, I made a C++ data type that allowed a date datatype where I
could add, subtract, and so on.
I use it in programs that do not necessarily use a database,

To be honest: I expect the programming language to support those things.


but also in programs that do when the computations are the big part of the cpu 
load,
as contrasted to just "gentle" massaging of existing data.

I would expect doing "date maths" with strings is wasting more CPU than using a 
native date datatype.

Just my €0.02
Thomas


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Question on COUNT performance

2010-07-14 Thread Lee Hachadoorian
The first statement of the function

> : select into user *
> from users where id = uid;

appears to be a useless drag, as I don't see the user record referred
to anywhere else in the function. There appears to be other
unnecessary statements. For example :

> : select into pcount_rel
> COUNT(*) as acl_count FROM project_plan_events ppe WHERE ppe.plan_event_id =
> plan_id;
>
> : if
> (pcount_rel.acl_count > 0) then
>
>
>
> : SELECT INTO
> project * FROM projects WHERE id IN (SELECT project_id FROM
> project_plan_events ppe2 WHERE ppe2.plan_event_id = plan_id) OFFSET 0 LIMIT
> 1;
>
> : return
> f_project_acl(uid, project.id);
>
>
> : end if;
>

This appears to check whether the plan_id exists in a link table, find
an associated project_id, and run some function on project_id.

This could instead be done as:

FOR project_record IN SELECT project_id FROM project_plan_events WHERE
plan_event_id = plan_id LIMIT 1 LOOP
return f_project_acl(uid, project_record.project_id)
END LOOP;

If plan_id doesn't exist in project_plan_events, the LOOP just gets skipped.

The same could be done for the next IF block in they query which
checks to see whether plan_id has a matching customer_id in a link
table.

Note that your LIMIT 1 (which I have retained) strongly implies a
1-to-1 relationship between project_id and plan_id. If not, this
function gets applied to an arbitrary project_id from among all
matching project_ids. (Same goes for customer_id.)

Assuming f_project_acl and f_customer_acl return TRUE if successful,
the whole thing (from the original SELECT COUNT(*) looks like it can
be summarized as:

Call a function with a plan_id
If a matching project_id exists
Do some function on the project_id
count +1
Else If a matching customer_id exists
Do some function on the customer_id
count +1
Else
count +1

Return count, which, since the function gets called once for each row
in plan_events, count should always equal the number of rows in plan
events.

I would be inclined to replace the whole thing with something like this:

SELECT newfunc(uid);

CREATE FUNCTION newfunc(uid int) RETURNS int AS $$
DECLARE
plan_record record;
i int := 0;
BEGIN
FOR plan_record IN SELECT DISTINCT plan_id, project_id, customer_id
FROM plan_events LEFT JOIN project_plan_events USING (plan_id) LEFT
JOIN customer_plan_events USING (plan_id)  LOOP
IF plan_record.project_id IS NOT NULL THEN
PERFORM f_project_acl(uid, plan_record.project_id);
ELSEIF plan_record.customer_id IS NOT NULL THEN
PERFORM f_customer_acl(uid, plan_record.customer_id);
END IF;
i := i + 1; 
END LOOP ;
RETURN i;
END;
$$ LANGUAGE plpgsql;

If I understand what's going on in your function, I *think* this would
reduce 9000-12,000 SELECT statements to 1 SELECT statement.

Obviously, not tested. Hope this is helpful.

--Lee

-- 
Lee Hachadoorian
PhD Student, Geography
Program in Earth & Environmental Sciences
CUNY Graduate Center

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Question on COUNT performance

2010-07-14 Thread Lee Hachadoorian
In retrospect, it's a big assumption whether f_project_acl() or
f_customer_acl() always return TRUE. If they can return FALSE, you
probably want to replace the statements inside the FOR..LOOP with


>IF plan_record.project_id IS NOT NULL THEN
>IF f_project_acl(uid, plan_record.project_id) THEN i 
> := i + 1; END IF;
>ELSEIF plan_record.customer_id IS NOT NULL THEN
>IF f_customer_acl(uid, plan_record.customer_id) THEN i 
> := i + 1; END IF;
>ELSE
>i := i + 1;
>END IF;

This would mimic the results of your original query, although I must
confess I don't understand the usefulness of the count results, as a
number less that the number of rows in plan_events has an ambiguous
meaning. Either

(1) there is a matching event but f_project_acl returned FALSE

OR

(2) there is no matching event, there IS a matching customer, but
f_customer_acl returned FALSE

And of course you don't know which plan_ids these might be true of.

--Lee

2010/7/14 Lee Hachadoorian :
> SELECT newfunc(uid);
>
> CREATE FUNCTION newfunc(uid int) RETURNS int AS $$
> DECLARE
>        plan_record record;
>        i int := 0;
> BEGIN
>        FOR plan_record IN SELECT DISTINCT plan_id, project_id, customer_id
> FROM plan_events LEFT JOIN project_plan_events USING (plan_id) LEFT
> JOIN customer_plan_events USING (plan_id)  LOOP
>                IF plan_record.project_id IS NOT NULL THEN
>                        PERFORM f_project_acl(uid, plan_record.project_id);
>                ELSEIF plan_record.customer_id IS NOT NULL THEN
>                        PERFORM f_customer_acl(uid, plan_record.customer_id);
>                END IF;
>                i := i + 1;
>        END LOOP ;
>        RETURN i;
> END;
> $$ LANGUAGE plpgsql;

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] subtract two dates to get the number of days

2010-07-14 Thread Jean-David Beyer
Thomas Kellerer wrote:
> Jean-David Beyer wrote on 14.07.2010 14:37:
>> My dates are of the form -mm-dd and such.
> Storing a date as a string is never a good idea.

I started this long ago, when postgreSQL did not really work very well
(1998?). One version of it would not do views, and another had trouble
with primary keys, IIRC. So I first used Informix, until it would not
work any more. It did not really support embedded SQL in C++, only in C,
so that was a pain. But it quit working when Red Hat updated from
release 5 to release 6.

I then moved to IBM's DB2, and that worked very well, but it got too
expensive to keep it when I went from one release of my OS to another
for just my own workstation use. Somewhere around 2004, or a little
before, I decided to give postgreSQL another chance, and it works just fine.


I just looked them up in my data definitions. Dates are _stored_ as type

DATE NOT NULL

so I store them OK. It is just when I want to compute with them that it
gets a bit tricky. Or it did way back when I wrote that stuff in the
late 1990s.
> 
>> And I want to do things like adding or subtracting days, months, or years to 
>> it or from it.
>> Also the logical comparisons.
> Which is all a piece of cake when you use the proper datatype

Yes, if the data happen to be stored at all. But when a program
generates the dates dynamically and wants to produce queries from them,
it is easier to use the C++ class to generate the dates.
>   
>> Years ago, I made a C++ data type that allowed a date datatype where I
>> could add, subtract, and so on.
>> I use it in programs that do not necessarily use a database,

> To be honest: I expect the programming language to support those things.

I would love it. For all I know, the C++ Standard Library supports it
now, but I do not believe it did when I wrote that class.
> 
>> but also in programs that do when the computations are the big part of the 
>> cpu load,
>> as contrasted to just "gentle" massaging of existing data.
> I would expect doing "date maths" with strings is wasting more CPU than using 
> a native date datatype.

My class pretty much does not do it as strings, but as integers (internally)
> 
> Just my €0.02
> Thomas
> 
Well, €0.02 is still more than my US$0.02, I believe.


-- 
  .~.  Jean-David Beyer  Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A Registered Machine   241939.
 /( )\ Shrewsbury, New Jerseyhttp://counter.li.org
 ^^-^^ 12:45:01 up 6 days, 21:31, 4 users, load average: 4.65, 4.69, 4.71

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Question on COUNT performance

2010-07-14 Thread Anders Østergaard Jensen
Hi all,

Thank you so much for your kind replies. It has all been a great help.

I tried the SELECT COUNT(1) but that didn't yield any improvement, sorry.

Doing the index on f_plan_event_acl( ... ) wont work, as the parameters are
frequently shifted (the second parameter denotes the id of a user in another
table).

As Mr. Leeuwen rightfully points out, there might be some performance
problems in my acl functions (these are basic functions that determine
wether or not a user has got access to a certain row in a table or not---fx
f_customer_acl(customer_id, user_id) will return true if the user has access
to the customer with ID customer_id etc, the same for f_project_acl on
projects etc).. I  am not great at optimising PL/pgSQL, though I have the
assumption that the speed of the procedural language might have a great
impact here.

Before I start changing the content of the function that Mr. Leeuwen kindly
provided above, can I pleas ask for help on how to optimise the other acl
functions first?

CREATE OR REPLACE FUNCTION f_contact_acl(uid integer, cid integer)
RETURNS BOOL AS $$
declare
  user record;
  contact record;
  customer record;
begin
  SELECT INTO customer cust.* FROM contacts
JOIN customer_contacts cc ON cc.contact_id = contacts.id
JOIN customers cust ON cust.id = cc.customer_id
WHERE contacts.id = cid;

SELECT INTO user * FROM users WHERE id=uid;

if (customer.org_id != user.org_id) then
  return false;
end if;

return true;
end
$$ LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION f_customer_acl(uid integer, cid integer)
RETURNS BOOL AS $$
declare
  user_id integer;
  customer_id integer;
  user record;
  customer record;
begin
  user_id = $1;
  customer_id = $2;
  SELECT INTO user * FROM users WHERE id=user_id;
  SELECT INTO customer * FROM customers WHERE id=customer_id;

  -- Assert that org_id matches:
  if (customer.org_id != user.org_id) then
return false;
  end if;

  -- Nothing more to check for:
  return true;
end;
$$ LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION f_doc_acl(uid integer, did integer)
RETURNS BOOL AS $$
DECLARE
  user_id   integer;
  doc_idinteger;
  user  record;
  doc   record;
  proj_rel  record;
BEGIN
  user_id := $1;
  doc_id  := $2;
  SELECT INTO user * FROM users WHERE id=user_id;
  SELECT INTO doc  * FROM documents WHERE id=doc_id;

  -- Check that org_id matches
  if (doc.org_id != user.org_id) then
return false;
  end if;

  -- If document was created by user, accept it
  if (doc.user_id_created = user_id) then
return true;
  end if;  -- if document is public, accept it
  if (doc.is_public) then
return true;
  end if;

  -- else, check the project-document relations -- is the
  -- user member of a project that allows access to the document?
  SELECT INTO proj_rel COUNT(*) AS acl_count FROM project_users
JOIN projects ON project_users.project_id = projects.id
JOIN project_documents ON projects.id = project_documents.project_id
JOIN documents ON project_documents.document_id = documents.id
WHERE documents.id = doc_id
  AND project_users.user_id = $1;

  -- acl_count returns the number of allowed relationships to exactly
  -- this document
  return proj_rel.acl_count > 0;
END;
$$ LANGUAGE 'plpgsql';

Would it be more beneficial to drop the functions and rewrite my basic
queries first? However, it is a nice feature having all security checks
wrapped into a three-four basic functions.
If my design is completely flawed, I am also open to other design
suggestions on how to do proper row-based access control.

I am not asking for the complete solution but a few pointers on how to speed
this up would be really great. Thanks!


Re: [SQL] Question on COUNT performance

2010-07-14 Thread Anders Østergaard Jensen
Hi all,

Thank you so much for your kind replies. It has all been a great help.

I tried the SELECT COUNT(1) but that didn't yield any improvement, sorry.

Doing the index on f_plan_event_acl( ... ) wont work, as the parameters are
frequently shifted (the second parameter denotes the id of a user in another
table).

As Mr. Leeuwen rightfully points out, there might be some performance
problems in my acl functions (these are basic functions that determine
wether or not a user has got access to a certain row in a table or not---fx
f_customer_acl(customer_id, user_id) will return true if the user has access
to the customer with ID customer_id etc, the same for f_project_acl on
projects etc).. I  am not great at optimising PL/pgSQL, though I have the
assumption that the speed of the procedural language might have a great
impact here.

Before I start changing the content of the function that Mr. Leeuwen kindly
provided above, can I pleas ask for help on how to optimise the other acl
functions first?

CREATE OR REPLACE FUNCTION f_contact_acl(uid integer, cid integer)
RETURNS BOOL AS $$
declare
  user record;
  contact record;
  customer record;
begin
  SELECT INTO customer cust.* FROM contacts
JOIN customer_contacts cc ON cc.contact_id = contacts.id
JOIN customers cust ON cust.id = cc.customer_id
WHERE contacts.id = cid;

SELECT INTO user * FROM users WHERE id=uid;

if (customer.org_id != user.org_id) then
  return false;
end if;

return true;
end
$$ LANGUAGE 'plpgsql';
- Hide quoted text -

CREATE OR REPLACE FUNCTION f_customer_acl(uid integer, cid integer)
RETURNS BOOL AS $$
declare
  user_id integer;
  customer_id integer;
  user record;
  customer record;
begin
  user_id = $1;
  customer_id = $2;
  SELECT INTO user * FROM users WHERE id=user_id;
  SELECT INTO customer * FROM customers WHERE id=customer_id;

  -- Assert that org_id matches:
  if (customer.org_id != user.org_id) then
return false;
  end if;

  -- Nothing more to check for:
  return true;
end;
$$ LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION f_doc_acl(uid integer, did integer)
RETURNS BOOL AS $$
DECLARE
  user_id   integer;
  doc_idinteger;
  user  record;
  doc   record;
  proj_rel  record;
BEGIN
  user_id := $1;
  doc_id  := $2;
  SELECT INTO user * FROM users WHERE id=user_id;
  SELECT INTO doc  * FROM documents WHERE id=doc_id;

  -- Check that org_id matches
  if (doc.org_id != user.org_id) then
return false;
  end if;

  -- If document was created by user, accept it
  if (doc.user_id_created = user_id) then
return true;
  end if;  -- if document is public, accept it
  if (doc.is_public) then
return true;
  end if;

  -- else, check the project-document relations -- is the
  -- user member of a project that allows access to the document?
  SELECT INTO proj_rel COUNT(*) AS acl_count FROM project_users
JOIN projects ON project_users.project_id = projects.id
JOIN project_documents ON projects.id = project_documents.project_id
JOIN documents ON project_documents.document_id = documents.id
WHERE documents.id = doc_id
  AND project_users.user_id = $1;

  -- acl_count returns the number of allowed relationships to exactly
  -- this document
  return proj_rel.acl_count > 0;
END;
$$ LANGUAGE 'plpgsql';

Would it be more beneficial to drop the functions and rewrite my basic
queries first? However, it is a nice feature having all security checks
wrapped into a three-four basic functions.
If my design is completely flawed, I am also open to other design
suggestions on how to do proper row-based access control.

I am not asking for the complete solution but a few pointers on how to speed
this up would be really great. Thanks!


Re: [SQL] Question on COUNT performance

2010-07-14 Thread Lee Hachadoorian




It appears that the acl functions use more SELECTs than necessary. For
f_customer_acl(uid
integer, cid integer), I might use:

PERFORM 1 FROM customers JOIN users USING (org_id) WHERE customer_id =
cid and user_id = uid;
RETURN FOUND;

This still requires one call to f_customer_acl() (and therefore one
SELECT) for each row in plan_events (since that's the way the calling
function is written). If the goal is to count the number of plan_events
a specific user has access rights to, I'm sure you can write a query
that would accomplish that directly. It's beyond my knowledge whether
it makes more sense to do this via these function calls.

--Lee


-- 
Lee Hachadoorian
PhD Student, Geography
Program in Earth & Environmental Sciences
CUNY Graduate Center