Hi Alban

Many thanks for your answers.

You answered:
>> 1. Filter out all SQL commands which are *not* read-only (no DROP
> Most people do this using permissions.

Oh, yes: forgot to mention that; that's obvious. What I also looked
for was the PL/pgSQL's "EXECUTE command-string".

>> 2. Get the estimated time (units) from PostgreSQL planner in a
>> reliable way (if possible standard/ANSI).

Ok; again keep in mind that I have a read-only database. Therefore the
statistics should be up-to-date (after a vacuum analyse). What I have
in mind is exposing the database to the 'public' for exercising and
testing in a way similar to the following (try a query like "SELECT
ST_AsText(ST_GeometryFromText('POINT(8.5 47.7)', 4326));"):
http://openlayers-buch.de/beispiele/chapter-09/postgis-terminal.html

See below my attempt to write such a function I called
"secure_execute(text)". It's still not functioning and I have
indicated two problems there. What do you think?
I like the idea letting abandon the query if it's obviously(!) wrong
or if the planner alerts me about very high costs?
Or should I rather abandon the idea of such a function and simply rely
on read-only privileges and a session statement_timeout?

Yours, S.

--
-- Executes a query. Aborts if it contains ";" or does take too long.
-- Returns: TABLE
--
DROP FUNCTION secure_execute(text);
--
CREATE OR REPLACE FUNCTION secure_execute(text)
RETURNS SETOF real  -- << PROBLEM 1: Want to return the resultset of
the query here as table (SETOF RECORD?) .
LANGUAGE 'plpgsql' STRICT
AS $$
DECLARE
  query text := $1;
  cost_estimate_txt text;
  max_cost_estimate integer;
  rec RECORD;
BEGIN
  -- Abort if ";" is in query
  -- tbd.

  -- Get max_cost_estimate:
  EXECUTE 'EXPLAIN' || ' ' || query INTO cost_estimate_txt;
  SET statement_timeout TO DEFAULT;
  -- txt example: 'Function Scan on generate_series id
(cost=0.00..12.50 rows=1000 width=0)'
  max_cost_estimate := round(CAST(substring(cost_estimate_txt,50,6) AS
numeric),0);

  -- Execute query (abort if too time consuming)!
  IF max_cost_estimate < 100 THEN  -- in units (production config.:
depends on machine)
    SET statement_timeout TO 10; -- in ms (production config.: set to
a minute = 60000ms)
    EXECUTE $1 INTO rec;
    SET statement_timeout TO DEFAULT;
  END IF;
  RETURN; -- << PROBLEM 2: want to return "rec" here.

  -- Error handling: Catch all
  EXCEPTION WHEN OTHERS THEN
    SET statement_timeout TO DEFAULT;
    RAISE NOTICE 'ERROR';
    RETURN;
END;
$$
-- Test (positive):
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS id');
-- Test (not ok):
SELECT secure_execute('SELECT random() FROM generate_series(1, 100000)
AS id');  -- timeout
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS
id;'SELECT * FROM generate_series(1, 100)); -- two commands
SELECT secure_execute('DROP TABLE IF EXISTS dummy');  -- malicious!
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS
id;DROP TABLE IF EXISTS dummy'); -- two commands, one malicious


2010/12/20 Alban Hertroys <dal...@solfertje.student.utwente.nl>:
> On 20 Dec 2010, at 10:05, Stefan Keller wrote:
>
>> I'd like to guard postgres from overcharged and/or malicious queries.
>>
>> The queries are strinctly read-only (from a SQL users perspective).
>> For doing this I'd like to code two functions (preferrably pl/pgsql):
>>
>> 1. Filter out all SQL commands which are *not* read-only (no
>> DROP/DELETE/UPDATE/TRUNCATE).
>
> Most people do this using permissions.
>
>> 2. Get the estimated time (units) from PostgreSQL planner in a
>> reliable way (if possible standard/ANSI).
>
>
> I don't think there's a way to do that directly, not without hacking the 
> source.
>
> What you can do is to have all users go through a SECURITY DEFINER type of 
> function that does this for them. That function can then read the output of 
> EXPLAIN <query> for its estimates. Those aren't exactly times, but cost 
> estimates. The actual time taken depends on your hardware, you would need to 
> do some measurements to see how planned costs and actual time relate.
>
> I'm not sure this is a good idea though.
> Firstly, testing the query plan adds a little overhead to every query coming 
> in. It's not a lot, but if someone fires a lot of small fast queries it could 
> become a problem. You would be hurting the people who're using your database 
> correctly, instead of the people who're "abusing" it.
>
> Secondly, you could achieve a similar effect by limiting the amount of time a 
> query is allowed to run. I'm pretty sure there are configuration options that 
> cause long-running queries to get killed after a set time.
>
> Thirdly... Reliable estimates??? Lol!
> Seriously, as hard as Postgres tries to keep statistics that make sense, I 
> don't think they can ever be considered entirely reliable. You may not be 
> vacuuming frequently enough, your statistics target may be too small or your 
> data might not be suitable for statistical analysis (random numbers and very 
> unbalanced distributions are good examples).
> Therefore, if you plan to rely on the estimated time a query takes, you're 
> going to be wrong sometimes.
>
> Alban Hertroys
>
> --
> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.
>
>
> !DSPAM:1205,4d0f4177802651300117526!
>
>
>

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to