[HACKERS] Avoiding execution of some functions by query rewriting

2012-05-16 Thread Thomas Girault
Hello,

I would like to allow the execution of a function (my_function) only if its
argument (my_table.x) belongs to a predefined interval (e.g. [100,1000]).

Let's take the following query example :
(q)  SELECT * FROM my_table WHERE my_function(mytable.x);

I would like this query automatically rewrites itself to check whether
mytable.x belong to the interval [100,1000] :
(q')  SELECT * FROM my_table WHERE (my_table.x BETWEEN 100 AND 1000) AND
my_function(my_table.x);

The command EXPLAIN ANALYSE shows that the second query is really faster
than the first one.

How can I change the query execution plan in order to automate the process
of query rewriting (q into q') ?
Where can I store suitably the metadata about the interval [100,1000]
associated to my_function ?

Thanks by advance,

Thomas Girault


Re: [HACKERS] LIMITing number of results in a VIEW with global variables

2011-10-15 Thread Thomas Girault
Hello,

Thank you for your answer Robert.

 Well, SQL, our our dialect of it anyway, doesn't have global
 variables.  So I think the above is going to throw a syntax error.
 You may have global variables in your C code, but those won't be
 visible from the SQL level.

I was wrong in the definition of  filtered_employees view.
The correct one is :

CREATE OR REPLACE VIEW filtered_employees AS
SELECT *, get_mu() as mu
FROM employees
ORDER BY mu DESC
LIMIT get_k();


Note that the access to global C variables (K, MU and ALPHA) from SQL is
working well with my definitions of get_k(), get_mu()...  There is no syntax
error here, however the view is not working because it has side effects on
the value of MU.

I have also tried two other alternatives to the LIMIT keyword but it doesn't
work yet :
1) SQL instruction :  RANK() OVER(ORDER BY get_mu()) as sqlf_rank ;
2) C instruction : SPI_exec(query, K) ... it leads to a segmentation fault.

 In general, I think you'd be better off not relying on C global
 variables either,

I don't understand how I could avoid using global variables in some cases.
For instance, I must store the float value $1 corresponding to a fuzzy
predicate degree in the following fuzzy2bool cast operation :

CREATE OR REPLACE FUNCTION fuzzy2bool(FLOAT)
RETURNS BOOLEAN LANGUAGE SQL AS
'SELECT set_mu($1); -- Here $1 must be stored in MU for each record observed
in the view;
SELECT $1  get_alpha()'; -- Then $1 is converted to Boolean according to
ALPHA global value


 and instead passing the values you need as function
 arguments.
Do you mean that I should define a function filter(table_name TEXT, k
INTEGER, alpha FLOAT) ?

Thanks again for your help,

Thomas Girault


Re: [HACKERS] LIMITing number of results in a VIEW with global variables

2011-10-15 Thread Thomas Girault
Hello Florian,

It seems dangerous for a cast to modify global state such a MU. The
 evaluation
 order of functions during query execute isn't always easy to guess, and may
 change depending on the execution plan.


I supposed that fuzzy2bool is called just before the terminal evaluation of
the WHERE clause which needs a Boolean. My first tests showed that this
hypothesis is right but it might be wrong in the case of alternative
execution plans.


  With this implicit cast, the query
   SELECT age, young(age) FROM set_alpha(0.1), employees WHERE young(age);
  is equivalent to
   SELECT age, young(age) FROM set_alpha(0.1), employees WHERE
  fuzzy2bool(young(age));

 Those set_alpha() calls seem equally dangerous. If this alpha is supposed
 to be a global parameter, why not set it *before* issuing the query?


Alternatively, we could also set the alpha value before the query :


SELECT set_alpha(0.1); SELECT age, young(age) FROM employees WHERE
young(age);


I would be very interested to know if there is smarter way to set global
variables.

 I can sort the results in the view 'sorted_employees' according to
  value MU of a fuzzy predicate thanks to fuzzy2bool cast function.
 
  CREATE OR REPLACE VIEW sorted_employees AS
 SELECT *, get_mu() as mu
 FROM employees
 ORDER BY mu DESC;

 Are you aware that an ORDER BY clause in a VIEW is only going to work
 if you do SELECT .. .FROM view.


I It really the first time I am using views, I didn't know that !


 If the outer query is more complex
 than that, I wouldn't bet on the results coming back in the expected order.


I don't mind if the ordering is wrong : it is just a way to process
filtering according to K and ALPHA.

Usually, you'd attach ORDER BY to the outermost query (or to subqueries
 containing a LIMIT clause).

  The following query
   SELECT age, mu FROM set_k(5), set_alpha(0.1), filtered_employees

 Again, you seem to rely on these set_k(), set_alpha() calls being
 done before the query is executed. But postgres doesn't guarantee
 that.

  WHERE young(age);
  gives the results :
   age  mu
   24   1
   16   1
  instead of :
   age  mu
   16   1
   21   0.89976158142
   24   0.60023841858
   26   0.40005960464
   26   0.40005960464
 
  It seems that the 'LIMIT K' instruction have side effects on the MU
 value.

 The execution plan may very well vary depending on the limit. Off-hand,
 I'd guess that with a small K, one of these set_whatever() calls in one
 of your FROM clauses gets executed after the computation it'd supposed
 to affect has already happened.

  Why is it not working ? How to fix this issue ?

 Don't rely on the execution order of function calls in a SELECT statement.
 Divide your functions into two classes.

 The ones which have side-effects (i.e. change global state). These should
 *never* be called from SQL statements, except in the trivial case of
 SELECT my_func(...);. Also, they should be marked with VOLATILE

 And the ones without side effects. Those should be marked with IMMUTABLE,
 STABLE of VOLATILE, depending on how they're influenced by global state
 changes. Read the documentation on these flags.

 Finally, don't assume that ORDER BY inside a view influences the output
 order of queries using the view. (Except in the trivial case of
 SELECT * FROM view). As a rule of thumb, ORDER BY in a view only makes
 sense if there's also a LIMIT clause. You are then guaranteed that the
 view returns the first limit rows according to the specified order.
 Don't assume they're necessarily returned in ascending order, though.

 best regards,
 Florian Pflug


Thank you very much ! I will try to follow your advices.

Thomas


[HACKERS] LIMITing number of results in a VIEW with global variables

2011-10-14 Thread Thomas Girault
Hello,

I am writing an extension to easily execute queries with conditions
expressing constraints in fuzzy logics.

I wrote some C functions that get or set global variables in C.
The variables are MU (FLOAT : degree of a fuzzy predicate), ALPHA
(FLOAT : threshold for filtering predicates) and K (INTEGER : limits
the number of results).
Here is an example for the variable ALPHA :

  /*--- sqlf.c ---*/
  static float8 ALPHA;
  Datum get_alpha(PG_FUNCTION_ARGS);
  Datum get_alpha(PG_FUNCTION_ARGS){
  PG_RETURN_FLOAT8(ALPHA);
  }
  Datum set_alpha(PG_FUNCTION_ARGS);
  Datum set_alpha(PG_FUNCTION_ARGS){
  ALPHA = PG_GETARG_FLOAT8(0);
  PG_RETURN_FLOAT8(ALPHA);
  }
  /*--- sqlf.sql ---*/
  CREATE OR REPLACE FUNCTION set_alpha(alpha FLOAT) RETURNS FLOAT
AS '$libdir/sqlf', 'set_alpha'
LANGUAGE C STRICT;

These variables are parameters for filtering and sorting results.
The following cast operations are using MU and ALPHA.

  CREATE OR REPLACE FUNCTION fuzzy2bool(FLOAT)
  RETURNS BOOLEAN LANGUAGE SQL AS 'SELECT set_mu($1);SELECT $1  get_alpha()';

  CREATE CAST (FLOAT AS BOOLEAN)
  WITH FUNCTION fuzzy2bool(FLOAT) AS IMPLICIT;

With this implicit cast, the query
  SELECT age, young(age) FROM set_alpha(0.1), employees WHERE young(age);
is equivalent to
  SELECT age, young(age) FROM set_alpha(0.1), employees WHERE
fuzzy2bool(young(age));

Here, young(age) is a fuzzy predicate returning a float value in [0,1].
The queries keep results satisfying young(age)  alpha :

  age   young(age)
  161
  240.6
  260.4
  210.9
  260.4


I can sort the results in the view 'sorted_employees' according to
value MU of a fuzzy predicate thanks to fuzzy2bool cast function.

CREATE OR REPLACE VIEW sorted_employees AS
SELECT *, get_mu() as mu
FROM employees
ORDER BY mu DESC;

The query
  SELECT age, mu FROM set_alpha(0.1), sorted_employees WHERE young(age);
gives the following results :
  age   mu
  161
  210.89976158142
  240.60023841858
  260.40005960464
  260.40005960464

I am now trying to limit the number of results in the view according
to the global value K :

CREATE OR REPLACE VIEW filtered_employees AS
SELECT *, get_mu() as mu
FROM employees
ORDER BY mu DESC
LIMIT K;

The following query
  SELECT age, mu FROM set_k(5), set_alpha(0.1), filtered_employees
WHERE young(age);
gives the results :
  age   mu
  241
  161
instead of :
  age   mu
  161
  210.89976158142
  240.60023841858
  260.40005960464
  260.40005960464

It seems that the 'LIMIT K' instruction have side effects on the MU value.

Why is it not working ? How to fix this issue ?


Thanks by advance,

Thomas Girault

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Module extension for parsing and rewriting functions with infixed syntax

2011-08-06 Thread Thomas Girault
Hello,

I am working on a PostgreSQL extension module which defines new
grammar rules completing the classical SQL syntax defined in the
src/backend/parser/gram.y file.

Basically, I want to handle predicates having an infixed syntax { X IS
Y } and rewrite them as classical Boolean functions with prefixed
syntax { Y(X) }.

For instance, the following query :
SELECT * FROM cars WHERE cars.color IS yellow;
would be rewritten into :
SELECT * FROM cars WHERE yellow(cars.color);

The new predicate could be rewritten as a plpgsql Boolean function
with an unique argument (cars.color IS yellow -- yellow(cars.color)).
I have then added the following rule to the func_expr definition
(see gram.y:10280 in postgresql-9.1beta3 source code) :

func_expr:
...
| func_arg_expr IS func_name over_clause
{
FuncCall *n = makeNode(FuncCall);
n-funcname = $3;
n-args = list_make1($1);
n-agg_order = NIL;
n-agg_star = FALSE;
n-agg_distinct = FALSE;
n-func_variadic = FALSE;
n-over = $4;
n-location = @1;
$$ = (Node *)n;
}
...

However, my first attempt leads to the following errors :
/usr/bin/bison -d  -o gram.c gram.y
gram.y: conflicts: 84 shift/reduce, 807 reduce/reduce
gram.y: expected 0 shift/reduce conflicts
gram.y: expected 0 reduce/reduce conflicts

How can I avoid this kind of errors without changing the entire grammar?

In addition, I would rather making this new functionality independent
of the original PostgreSQL source code.
Ideally, the new defined bison rules would be defined in an autonomous
module extension.
I have seen that some contrib modules (such as SEG or CUBE) define
separate bison grammar rules.
However, I don't understand yet how such rules integrate with the
gram.y file without any conflicts.

Can I define my new bison rules separately of the gram.y file?
Can I use the new functionality dynamically after loading an extension
module (LOAD 'MY_EXTENSION';)?


I am new in the PostgreSQL community and any ideas for solving these
problems would be very helpful.

Thanks by advance,

Thomas Girault

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers