Hi Alban Many thanks for your answers.
You answered: >> 1. Filter out all SQL commands which are *not* read-only (no DROP > Most people do this using permissions. Oh, yes: forgot to mention that; that's obvious. What I also looked for was the PL/pgSQL's "EXECUTE command-string". >> 2. Get the estimated time (units) from PostgreSQL planner in a >> reliable way (if possible standard/ANSI). Ok; again keep in mind that I have a read-only database. Therefore the statistics should be up-to-date (after a vacuum analyse). What I have in mind is exposing the database to the 'public' for exercising and testing in a way similar to the following (try a query like "SELECT ST_AsText(ST_GeometryFromText('POINT(8.5 47.7)', 4326));"): http://openlayers-buch.de/beispiele/chapter-09/postgis-terminal.html See below my attempt to write such a function I called "secure_execute(text)". It's still not functioning and I have indicated two problems there. What do you think? I like the idea letting abandon the query if it's obviously(!) wrong or if the planner alerts me about very high costs? Or should I rather abandon the idea of such a function and simply rely on read-only privileges and a session statement_timeout? Yours, S. -- -- Executes a query. Aborts if it contains ";" or does take too long. -- Returns: TABLE -- DROP FUNCTION secure_execute(text); -- CREATE OR REPLACE FUNCTION secure_execute(text) RETURNS SETOF real -- << PROBLEM 1: Want to return the resultset of the query here as table (SETOF RECORD?) . LANGUAGE 'plpgsql' STRICT AS $$ DECLARE query text := $1; cost_estimate_txt text; max_cost_estimate integer; rec RECORD; BEGIN -- Abort if ";" is in query -- tbd. -- Get max_cost_estimate: EXECUTE 'EXPLAIN' || ' ' || query INTO cost_estimate_txt; SET statement_timeout TO DEFAULT; -- txt example: 'Function Scan on generate_series id (cost=0.00..12.50 rows=1000 width=0)' max_cost_estimate := round(CAST(substring(cost_estimate_txt,50,6) AS numeric),0); -- Execute query (abort if too time consuming)! IF max_cost_estimate < 100 THEN -- in units (production config.: depends on machine) SET statement_timeout TO 10; -- in ms (production config.: set to a minute = 60000ms) EXECUTE $1 INTO rec; SET statement_timeout TO DEFAULT; END IF; RETURN; -- << PROBLEM 2: want to return "rec" here. -- Error handling: Catch all EXCEPTION WHEN OTHERS THEN SET statement_timeout TO DEFAULT; RAISE NOTICE 'ERROR'; RETURN; END; $$ -- Test (positive): SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS id'); -- Test (not ok): SELECT secure_execute('SELECT random() FROM generate_series(1, 100000) AS id'); -- timeout SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS id;'SELECT * FROM generate_series(1, 100)); -- two commands SELECT secure_execute('DROP TABLE IF EXISTS dummy'); -- malicious! SELECT secure_execute('SELECT random() FROM generate_series(1, 100) AS id;DROP TABLE IF EXISTS dummy'); -- two commands, one malicious 2010/12/20 Alban Hertroys <dal...@solfertje.student.utwente.nl>: > On 20 Dec 2010, at 10:05, Stefan Keller wrote: > >> I'd like to guard postgres from overcharged and/or malicious queries. >> >> The queries are strinctly read-only (from a SQL users perspective). >> For doing this I'd like to code two functions (preferrably pl/pgsql): >> >> 1. Filter out all SQL commands which are *not* read-only (no >> DROP/DELETE/UPDATE/TRUNCATE). > > Most people do this using permissions. > >> 2. Get the estimated time (units) from PostgreSQL planner in a >> reliable way (if possible standard/ANSI). > > > I don't think there's a way to do that directly, not without hacking the > source. > > What you can do is to have all users go through a SECURITY DEFINER type of > function that does this for them. That function can then read the output of > EXPLAIN <query> for its estimates. Those aren't exactly times, but cost > estimates. The actual time taken depends on your hardware, you would need to > do some measurements to see how planned costs and actual time relate. > > I'm not sure this is a good idea though. > Firstly, testing the query plan adds a little overhead to every query coming > in. It's not a lot, but if someone fires a lot of small fast queries it could > become a problem. You would be hurting the people who're using your database > correctly, instead of the people who're "abusing" it. > > Secondly, you could achieve a similar effect by limiting the amount of time a > query is allowed to run. I'm pretty sure there are configuration options that > cause long-running queries to get killed after a set time. > > Thirdly... Reliable estimates??? Lol! > Seriously, as hard as Postgres tries to keep statistics that make sense, I > don't think they can ever be considered entirely reliable. You may not be > vacuuming frequently enough, your statistics target may be too small or your > data might not be suitable for statistical analysis (random numbers and very > unbalanced distributions are good examples). > Therefore, if you plan to rely on the estimated time a query takes, you're > going to be wrong sometimes. > > Alban Hertroys > > -- > If you can't see the forest for the trees, > cut the trees and you'll see there is no forest. > > > !DSPAM:1205,4d0f4177802651300117526! > > > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general