Hi Alban
Many thanks for your answers.
You answered:
>> 1. Filter out all SQL commands which are *not* read-only (no DROP
> Most people do this using permissions.
Oh, yes: forgot to mention that; that's obvious. What I also looked
for was the PL/pgSQL's "EXECUTE command-string".
>> 2. Get the estimated time (units) from PostgreSQL planner in a
>> reliable way (if possible standard/ANSI).
Ok; again keep in mind that I have a read-only database. Therefore the
statistics should be up-to-date (after a vacuum analyse). What I have
in mind is exposing the database to the 'public' for exercising and
testing in a way similar to the following (try a query like "SELECT
ST_AsText(ST_GeometryFromText('POINT(8.5 47.7)', 4326));"):
http://openlayers-buch.de/beispiele/chapter-09/postgis-terminal.html
See below my attempt to write such a function I called
"secure_execute(text)". It's still not functioning and I have
indicated two problems there. What do you think?
I like the idea letting abandon the query if it's obviously(!) wrong
or if the planner alerts me about very high costs?
Or should I rather abandon the idea of such a function and simply rely
on read-only privileges and a session statement_timeout?
Yours, S.
--
-- Executes a query. Aborts if it contains ";" or does take too long.
-- Returns: TABLE
--
DROP FUNCTION secure_execute(text);
--
CREATE OR REPLACE FUNCTION secure_execute(text)
RETURNS SETOF real -- << PROBLEM 1: Want to return the resultset of
the query here as table (SETOF RECORD?) .
LANGUAGE 'plpgsql' STRICT
AS $$
DECLARE
query text := $1;
cost_estimate_txt text;
max_cost_estimate integer;
rec RECORD;
BEGIN
-- Abort if ";" is in query
-- tbd.
-- Get max_cost_estimate:
EXECUTE 'EXPLAIN' || ' ' || query INTO cost_estimate_txt;
SET statement_timeout TO DEFAULT;
-- txt example: 'Function Scan on generate_series id
(cost=0.00..12.50 rows=1000 width=0)'
max_cost_estimate := round(CAST(substring(cost_estimate_txt,50,6) AS
numeric),0);
-- Execute query (abort if too time consuming)!
IF max_cost_estimate < 100 THEN -- in units (production config.:
depends on machine)
SET statement_timeout TO 10; -- in ms (production config.: set to
a minute = 60000ms)
EXECUTE $1 INTO rec;
SET statement_timeout TO DEFAULT;
END IF;
RETURN; -- << PROBLEM 2: want to return "rec" here.
-- Error handling: Catch all
EXCEPTION WHEN OTHERS THEN
SET statement_timeout TO DEFAULT;
RAISE NOTICE 'ERROR';
RETURN;
END;
$$
-- Test (positive):
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS id');
-- Test (not ok):
SELECT secure_execute('SELECT random() FROM generate_series(1, 100000)
AS id'); -- timeout
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS
id;'SELECT * FROM generate_series(1, 100)); -- two commands
SELECT secure_execute('DROP TABLE IF EXISTS dummy'); -- malicious!
SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS
id;DROP TABLE IF EXISTS dummy'); -- two commands, one malicious
2010/12/20 Alban Hertroys <[email protected]>:
> On 20 Dec 2010, at 10:05, Stefan Keller wrote:
>
>> I'd like to guard postgres from overcharged and/or malicious queries.
>>
>> The queries are strinctly read-only (from a SQL users perspective).
>> For doing this I'd like to code two functions (preferrably pl/pgsql):
>>
>> 1. Filter out all SQL commands which are *not* read-only (no
>> DROP/DELETE/UPDATE/TRUNCATE).
>
> Most people do this using permissions.
>
>> 2. Get the estimated time (units) from PostgreSQL planner in a
>> reliable way (if possible standard/ANSI).
>
>
> I don't think there's a way to do that directly, not without hacking the
> source.
>
> What you can do is to have all users go through a SECURITY DEFINER type of
> function that does this for them. That function can then read the output of
> EXPLAIN <query> for its estimates. Those aren't exactly times, but cost
> estimates. The actual time taken depends on your hardware, you would need to
> do some measurements to see how planned costs and actual time relate.
>
> I'm not sure this is a good idea though.
> Firstly, testing the query plan adds a little overhead to every query coming
> in. It's not a lot, but if someone fires a lot of small fast queries it could
> become a problem. You would be hurting the people who're using your database
> correctly, instead of the people who're "abusing" it.
>
> Secondly, you could achieve a similar effect by limiting the amount of time a
> query is allowed to run. I'm pretty sure there are configuration options that
> cause long-running queries to get killed after a set time.
>
> Thirdly... Reliable estimates??? Lol!
> Seriously, as hard as Postgres tries to keep statistics that make sense, I
> don't think they can ever be considered entirely reliable. You may not be
> vacuuming frequently enough, your statistics target may be too small or your
> data might not be suitable for statistical analysis (random numbers and very
> unbalanced distributions are good examples).
> Therefore, if you plan to rely on the estimated time a query takes, you're
> going to be wrong sometimes.
>
> Alban Hertroys
>
> --
> If you can't see the forest for the trees,
> cut the trees and you'll see there is no forest.
>
>
> !DSPAM:1205,4d0f4177802651300117526!
>
>
>
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general