Re: [SQL] Create custom aggregate function and custom sfunc

2009-07-02 Thread nha

Hello,

Le 2/07/09 23:21, Greg Stark a écrit :

On Thu, Jul 2, 2009 at 3:48 PM, Jasmin
Dizdarevic  wrote:

customer ; seg
111 ; L1
111 ; L2
111 ; L1
222 ; L3
222 ; L3
222 ; L2

the result should look like this:

111: L1 - because L1 is higher than L2 and ratio of L1 : L2 is 2 : 1
222: L3 - because L3 is higher than L2 and ratio of L3 : L2 is 2 : 1


Are you just looking for the most frequent seg for each customer?

select distinct on (customer) customer,seg
from (select customer, seg, count(*) as n from tab group by seg)
   order by customer, n desc

That doesn't give the ratios though.



Some errors would occur with the above query from within the subquery 
because of a projection on customer column that is not a grouped column 
(neither an uniquely identified column by seg--seg is assumed not to be 
a primary key here according to the given examples of value). By the 
way, the ordered column n is not a member of the projected columns.


Nevertheless, in the same direction as Greg Stark, the following query 
would approach the target result:


SELECT T5A.customer, T5A.seg
FROM (
SELECT T2.customer, T2.seg, COUNT(*) AS nb
FROM cst T2
GROUP BY T2.customer, T2.seg
) T5A INNER JOIN (
SELECT T4.customer, MAX(T4.nb) AS maxNb
FROM (
SELECT T2B.customer, T2B.seg, COUNT(*) AS nb
FROM cst T2B
GROUP BY T2B.customer, T2B.seg
) T4
GROUP BY T4.customer
) T5B
ON T5A.customer = T5B.customer
AND T5A.nb = T5B.maxNb

where 'cst' is the reference table including 'customer' and 'seg' 
columns. This query considers a join between two tables:
- the first table, aliased T5A, counts the multiplicity for each couple 
of 'customer' and 'seg';
- the secund table, aliased T5B, reveals the highest multiplicity for 
each 'customer' with regard to each 'seg' attached to customer.
Then the join only retains the couples of 'customer' and 'seg' whom 
multiplicity equals the higher for the 'customer' of the current couple.


Each multiplicity may be easily added to the resulting records by 
spanning the projected columns with T5A.nb column. A sort of ratio may 
also be added; one aggregated column may first be added to table T4 as 
SUM(T4.nb) -- that is, sum of multiplicity for each customer; then the 
resulting join may compute the ratio between the highest multiplicity 
and the sum of multiplicity for each customer, in addition to the seg 
(or list of seg) for which the multiplicity is the highest.


Here is the modified query:

SELECT T5A.customer, T5A.seg, T5A.nb/T5B.sumNb
FROM (
SELECT T2.customer, T2.seg, COUNT(*) AS nb
FROM cst T2
GROUP BY T2.customer, T2.seg
) T5A INNER JOIN (
SELECT T4.customer, MAX(T4.nb) AS maxNb, SUM(T4.nb) AS sumNb
FROM (
SELECT T2B.customer, T2B.seg, COUNT(*) AS nb
FROM cst T2B
GROUP BY T2B.customer, T2B.seg
) T4
GROUP BY T4.customer
) T5B
ON T5A.customer = T5B.customer
AND T5A.nb = T5B.maxNb

As is, this query may result to multiple couples of customer and seg 
with the same customer value when many seg relie as many times as many 
others for the current customer. This "edge effect" may be avoided in 
many ways depending on the original purpose.


Regards.

--
nha / Lyon / France.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Recording how a table is used

2009-07-08 Thread nha
Hello,

Le 30/06/09 8:47, Daniel Gordon a écrit :
> I'm trying to record the results of a select statement into a separate
> table.  I need the information selected, the column it was stored in,
> the table it was stored in, and the query that selected it.
> [...]
> Here is the table I'm trying to fill, in case it is useful
> 
> create table sql_query_data_log (
>  id serial,
>  create_ time timestamp DEFAULT now(),
>  query varchar,
>  table text,
>  column text,
> 
>  data varchar
>  );
> 
As you said, the usual solution is to process with a tier language
(Perl, PHP, other) on the application (server) side--the one that
submits the query to the database, ie. the one that knows the query text
and is more appropriate for parsing and retrieving target information
like read and written tables, columns and so on.

On the DBMS side, these informations may be stored--and thence
collected--within log files configured at the startup by then DBMS
admin. An example of query log file import is suggested here:
http://www.postgresql.org/docs/8.3/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG

However, because of the raw format of the query retrieved ('query' field
as text), some parsing would be needed to identify operated objects
(tables, columns). Moreover it is assumed that one knows the correct
path to the target log file and that one has privileges to import data
from this file.

Another crude alternative may be based on triggers as you mentioned.
Triggers may be created on each table for each possible triggered event
(before, after). List of tables is available from table
information_schema.tables (list of columns from
information_schema.columns). On Before trigger, current (old) value of
each column of table may be stored somewhere; on After trigger, old and
new values may be compared; if not equal, then altered columns can be
inferred.

On both sides, some issues remain:
- log import approach: query can be retrieved, whereas altered table or
column are difficult to identify (parsing required). Current query would
also be not so easy to extract from import, unless some judicious
trigger help identifying the convenient id);
- trigger alternative: altered table and column can be enough easily
retrieved, whereas query text is unknown (except by inferring from
modified object, with an obvious approximation). Moreover relation
between some query and current modified object would be not so trivial
when multiple concurrent transactions occur and access the same object.

Application-side tier language programming definitely seems likely a
less complicated way to address your purpose while both query and
objects (tables and columns) are directly known and can be handled for a
log query.

Hoping these ideas help you go on investigating.

Regards.

--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Recording how a table is used

2009-07-08 Thread nha
Hello,

Le 8/07/09 11:30, nha a écrit :
> Hello,
> 
> Le 30/06/09 8:47, Daniel Gordon a écrit :
>> I'm trying to record the results of a select statement into a separate
>> table.  I need the information selected, the column it was stored in,
>> the table it was stored in, and the query that selected it.
>> [...]
>> Here is the table I'm trying to fill, in case it is useful
>>
>> create table sql_query_data_log (
>>  id serial,
>>  create_ time timestamp DEFAULT now(),
>>  query varchar,
>>  table text,
>>  column text,
>>
>>  data varchar
>>  );
>> [...]
> Another crude alternative may be based on triggers as you mentioned.
> Triggers may be created on each table for each possible triggered event
> (before, after).
> [...]

Rules may be also appropriate (more than triggers?). This page gives
some examples:
http://www.postgresql.org/docs/8.3/interactive/rules-update.html

Regards.

--
nha / Lyon / France.


-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] how to tell if column set on update

2009-07-20 Thread nha
Hello,

Le 20/07/09 15:19, chester c young a écrit :
> within a trigger need to know if the UPDATE statement set a column.  the 
> column might be set to the old value or a different value.
> 
> (want to make sure the app is sending all necessary values)
> 
> thanks
> 

If the column to test is known -- e.g. column MyCol --, NEW.MyCol and
OLD.MyCol -- respectively value of MyCol after UPDATE and value of MyCol
before UPDATE -- can be compared.

I may be wrong if you are looking for the column(s) that are really
updated, not if such a column is updated or not. In this case, each
column of the target table may be tested with their respective new and
old values as well.

Regards.
--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] unused columns in copy

2009-07-20 Thread nha
Hello,

Le 21/07/09 0:59, Marcin Stępnicki a écrit :
> On Mon, Jul 20, 2009 at 3:22 PM, chester c young 
> wrote:
>> is there a way for COPY FROM to ignore unused columns in CSV?
>>
>> in other words, if table t1 has columns c1, c2, and if csv has columns c1, 
>> c3, c2, could I do something like
>>
>> COPY t1( c1, null, c2 ) FROM 'file.csv'
> 
> No, but you can use pgloader which can easily deal with your problem
> and much more. http://pgfoundry.org/projects/pgloader/
> 

Depending on PostgreSQL version, copy restrictions may be performed (see
also documentation of PostgreSQL 8.4):
http://www.postgresql.org/docs/8.4/interactive/sql-copy.html

Assuming that a CSV file contains data with column headers c1, c2, and
c3, the following statement would retrieve only columns c1 and c2 from
this file to a table t1:
COPY t1(c1, c2) FROM '/path/to/file.csv';

Regards.
--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Storing/sorting hex data in PostgreSQL

2009-07-21 Thread nha
Hello,

On 20/07/09 21:59, JJ wrote :
> [...] 
> I want to sort and query the database based on the
> hex addresses [...]
> How can I store hex values in PostgreSQL and sort and query them?
> 

PostgreSQL (since version 8.3 and maybe older) sets string functions for
encoding/decoding strings from/to their hexadecimal representation. The
following page presents these functions:
http://www.postgresql.org/docs/8.3/interactive/functions-string.html

For example:

# SELECT decode('6d61702064617461', 'hex');
results in string: map data
# SELECT encode('map data', 'hex');
results in hexadecimal representation of "map data": 6d61702064617461
(ie. each character [each byte in fact] is replaced by its hexadecimal
representation in ASCII range [space " " is replaced by "20" (hex.
value) or "32" (decimal value)])

A (simplier?) alternative would be to convert hexadecimal value into
text with the natural class operator:
# SELECT '6d61'::text;
results in string (text type): 6d61

At this time, I do not know about any PostgreSQL function that converts
from/to hex. to/from integer. Such functions may be user-created. The
sorting speed may then depend on the meaning of integer values (numeric
relation relevance) with your data.

Regards.
--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Double aggregate problem

2009-07-22 Thread nha
Hello,

Le 22/07/09 18:16, David Weilers a écrit :
> Dear anyone,
> 
> I have the following query:
> 
> select v.id, array_to_string(array_accum(s.name),', ') as sector , 
> array_to_string(array_accum(p.name),', ') as provincie from tblvacature 
> v, tblaccount a , tblvacaturesector vs, tblsector s , 
> tblvacatureprovincie vp, tblprovincie p where v.id = 11 and v.account = 
> a.id and vs.vacature = v.id and s.id = vs.sector and vp.vacature = v.id 
> and p.id = vp.provincie group by v.id, v.inserted order by v.inserted 
> desc
> [...]
> If i leave out one aggregate, the result is as i expect (if I leave out 
> 'provincie', sector gives):
> [...] 
> I would like both array_accum returning only what they should and not 
> doubles.
> [...]

According to your observation, the following query may match your need:

SELECT
t1.id, t1.sector, t2.provincie
FROM
(
SELECT
v.id, v.inserted,
array_to_string(array_accum(s.name),', ') AS sector
FROM
tblvacature v, tblaccount a, tblvacaturesector vs, tblsector s,
tblvacatureprovincie vp, tblprovincie p
WHERE
v.account = a.id and vs.vacature = v.id and s.id = vs.sector
and vp.vacature = v.id and p.id = vp.provincie
GROUP BY v.id, v.inserted
) AS t1
INNER JOIN
(
SELECT
v2.id,
array_to_string(array_accum(p2.name),', ') AS provincie
FROM
tblvacature v2, tblaccount a2, tblvacaturesector vs2, tblsector s2,
tblvacatureprovincie vp2, tblprovincie p2
WHERE
v2.account = a2.id and vs2.vacature = v2.id and s2.id = vs2.sector
and vp2.vacature = v2.id and p2.id = vp2.provincie
GROUP BY v2.id, v2.inserted
) AS t2
ON t1.id = t2.id
WHERE t1.id = 11
ORDER BY t1.inserted DESC

The query has been rewritten as an inner join between two subqueries
issued from the original one. WHERE and SORT clauses have been pulled
off so that subqueries are not too enough broken down. Other
optimization may be applied (eg. by using explicit joins between the
different tables and by checking appropriate indexes are set up).

Regards.

--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Double aggregate problem

2009-07-22 Thread nha
Hello,

Le 22/07/09 20:04, Peter Eisentraut a écrit :
> On Wednesday 22 July 2009 19:16:21 David Weilers wrote:
>> I have the following query:
>>
>> select v.id, array_to_string(array_accum(s.name),', ') as sector ,
>> array_to_string(array_accum(p.name),', ') as provincie from tblvacature
>> v, tblaccount a , tblvacaturesector vs, tblsector s ,
>> tblvacatureprovincie vp, tblprovincie p where v.id = 11 and v.account =
>> a.id and vs.vacature = v.id and s.id = vs.sector and vp.vacature = v.id
>> and p.id = vp.provincie group by v.id, v.inserted order by v.inserted
>> desc
>>
>> That currently produces the following output:
> 
> No one is going to be able to reproduce that without the table definitions 
> and 
> data.
> 

Peter Eisentraut is right... The rewritten query I suggested is only
derived from your original query and implicit data definitions
(including constraints).

I would like to point 2 things about that rewritten proposed query:
- Column 'inserted' is likely missing in the 2nd subquery and in the
global join condition;
- Duplicity or even multiplicity of value s.name and/or p.name may
result depending on the keys of tables (as implicitly mentioned by Peter
Eisentraut).

Regards.
--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] how to tell if column set on update

2009-07-22 Thread nha
Hello,

Le 22/07/09 13:42, Frank Bax a écrit :
> chester c young wrote:
>>> Le 20/07/09 15:19, chester c young a écrit :
>>>> within a trigger need to know if the UPDATE statement
>>> set a column.  the column might be set to the old value
>>> or a different value. [...]
>>>>
>>> If the column to test is known -- e.g. column MyCol --,
>>> NEW.MyCol and OLD.MyCol [...] can be compared.
>>
>> for example,
>> create table t1( c1 int, c2 int );
>> insert into t1 values( 1, 2 );
>>
>> 1) update t1 set c1=4 where c1=1;
>> 2) update t1 set c1=4, c2=2 where c1=1;
>>
>> each update results in the same row, but in the second update c2 was
>> actually set.
>> a trigger on the update - how do we know if c2 has been actually set
>> or not?
> 
> You cannot do that.  You can only detect that a value has changed.
> 

There would be one way to detect a value change but it is more related
to application management than to data manipulation. For the column c2,
it could be assumed that a flag is set when (a row of) c2 is updated.
Such a flag may be stored in a separate table (while it could also be
added to table t1) that is updated with a trigger based on update on t1.

Regards.
--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] using count in other column

2009-07-24 Thread nha
Hello,

Le 23/07/09 11:59, bartjoosen a écrit :
> Hi,
> 
> I made up a query to make a count for each item for each month/year:
> SELECT"Artnr_ID", to_char("Date_plan","") AS "Jaar",
> to_char("Date_plan","MM") AS "Maand", Count("tblArtnrs"."Artikelnr") AS
> "Monthly_count", "val1","val2","val3"
> FROM (("tblAnalyses" INNER JOIN "tblStudies" ON "tblAnalyses"."Studie_ID" =
> "tblStudies"."Studie_ID") INNER JOIN "tblFichenr" ON
> "tblStudies"."ID_fichenr" = "tblFichenr"."ID") INNER JOIN "tblArtnrs" ON
> "tblFichenr"."ID_Art_nrs" = "tblArtnrs"."Artnr_ID"
> GROUP BY "tblArtnrs"."Artnr_ID", to_char("Date_plan","") ,
> to_char("Date_plan","MM"), "val1","val2","val3";
> 
> Now I want to use the "Monthly_count" value for further calculations with
> other columns.
> I tried to use 
> "Monthly_count"+ "val1" + "Monthly_count" * "val2" + "Monthly_count" *"val3"
> But "Monthly_count" is not recognised in my calculations.
> 
> How can this be solved?
> 
> Thanks
> 
> Bart
> 

The error message you meet is missing in your report although it could
surely help in accurate analysis. However I guess it is about using
alias (like "Monthly_count") for defining project columns (like the one
you tried and failed). Alias columns are used to rename displayed
columns; they cannot be used as terms of other projected columns but
they can be used within GROUP BY clauses. For example, the given GROUP
BY clause may be rewritten (simplier) as:
GROUP BY "Artnr_ID", "Artnr_ID", "Jaar", "Maand", "val1", "val2", "val3"
and even as:
GROUP BY "Artnr_ID", "Date_plan", "val1", "val2", "val3"
as "Jaar" and "Maand" refer to column "Date_plan" which is not passed to
aggregating functions but is effectively aggregated.

Hoping this is a track for you.

Regards.

--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] using count in other column

2009-07-24 Thread nha
Hello again,

Le 25/07/09 0:41, nha a écrit :
> Hello,
> 
> Le 23/07/09 11:59, bartjoosen a écrit :
>> Hi,
>>
>> I made up a query to make a count for each item for each month/year:
>> SELECT"Artnr_ID", to_char("Date_plan","") AS "Jaar",
>> to_char("Date_plan","MM") AS "Maand", Count("tblArtnrs"."Artikelnr") AS
>> "Monthly_count", "val1","val2","val3"
>> FROM (("tblAnalyses" INNER JOIN "tblStudies" ON "tblAnalyses"."Studie_ID" =
>> "tblStudies"."Studie_ID") INNER JOIN "tblFichenr" ON
>> "tblStudies"."ID_fichenr" = "tblFichenr"."ID") INNER JOIN "tblArtnrs" ON
>> "tblFichenr"."ID_Art_nrs" = "tblArtnrs"."Artnr_ID"
>> GROUP BY "tblArtnrs"."Artnr_ID", to_char("Date_plan","") ,
>> to_char("Date_plan","MM"), "val1","val2","val3";
>> 
>> Now I want to use the "Monthly_count" value for further calculations with
>> other columns.
>> I tried to use 
>> "Monthly_count"+ "val1" + "Monthly_count" * "val2" + "Monthly_count" *"val3"
>> But "Monthly_count" is not recognised in my calculations.
>>
>> How can this be solved?
>>
>> Thanks
>>
>> Bart
>>
> The error message you meet is missing in your report although it could
> surely help in accurate analysis. However I guess it is about using
> alias (like "Monthly_count") for defining project columns (like the one
> you tried and failed). [...]

I forgot to mention a solution for using the value aliased by
"Monthly_count". An operational way is to reuse the whole aliased
expression, ie. count("tblArtnrs"."Artikelnr") here.

For example, the following query is wrong:
SELECT
"Artnr_ID",
to_char("Date_plan", '') AS "Jaar",
to_char("Date_plan", 'MM') AS "Maand",
count("tAr"."Artikelnr") AS "Monthly_count",
"val1", "val2", "val3",
"Monthly_count" + "val1" + "Monthly_count" * "val2" + "Monthly_count" *
"val3"
FROM (
(
"tblAnalyses" AS "tAn"
INNER JOIN "tblStudies" AS "tS" ON "tAn"."Studie_ID" = "tS"."Studie_ID"
)
INNER JOIN "tblFichenr" AS "tF" ON "tS"."ID_fichenr" = "tF"."ID"
)
INNER JOIN "tblArtnrs" AS "tAr" ON "tF"."ID_Art_nrs" = "tAr"."Artnr_ID"
GROUP BY "Artnr_ID", "Date_plan", "val1", "val2", "val3"

It can be rewritten as follows:
SELECT
"Artnr_ID",
to_char("Date_plan", '') AS "Jaar",
to_char("Date_plan", 'MM') AS "Maand",
count("tAr"."Artikelnr") AS "Monthly_count",
"val1", "val2", "val3",
count("Artikelnr") + "val1" + count("Artikelnr") * "val2" +
count("Artikelnr") * "val3"
FROM (
(
"tblAnalyses" AS "tAn"
INNER JOIN "tblStudies" AS "tS" ON "tAn"."Studie_ID" = "tS"."Studie_ID"
)
INNER JOIN "tblFichenr" AS "tF" ON "tS"."ID_fichenr" = "tF"."ID"
)
INNER JOIN "tblArtnrs" AS "tAr" ON "tF"."ID_Art_nrs" = "tAr"."Artnr_ID"
GROUP BY "Artnr_ID", "Date_plan", "val1", "val2", "val3"

Aliases have been used to make the overall query expression clearer and
shorter.

Regards.
--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Need magical advice for counting NOTHING

2009-07-24 Thread nha
Hello,

Le 23/07/09 10:23, Glenn Maynard a écrit :
> On Thu, Jul 23, 2009 at 1:01 AM, Andreas wrote:
>> SELECT user_name, log_type_fk, COUNT(log_type_fk)
>> FROM log
>> JOIN users ON (user_id = user_fk)
>> WHERE (ts IS BETWEEN  sometime  AND   another)
>> GROUP BY user_name, log_type_fk
>> ORDER BY user_name, log_type_fk
> [...] 
> SELECT user_name, log_type.log_type, sum((log_type_fk IS NOT
> NULL)::integer) AS count
> FROM users JOIN log_type ON (true) LEFT JOIN log ON (user_id = user_fk
> AND log.log_type_fk = log_type.log_type)
> GROUP BY user_name, log_type.log_type
> ORDER BY user_name, log_type.log_type;
> [...]

In the same direction as Andreas Krestchmer and Glenn Maynard (thanks to
the latter for DDL assumed statements), the following query should also
suit:

SELECT user_name, log_type, COUNT(log_type_fk)
FROM (users CROSS JOIN log_type)
LEFT JOIN log ON (user_id = user_fk AND log_type_id = log_type_fk)
WHERE (ts IS BETWEEN sometime AND another)
GROUP BY user_name, log_type
ORDER BY user_name, log_type

It is syntactically nearer the original query and includes no class
operator. Here are the two main ideas:
- Building all the possible couples of user name and log type by
cross-joining users and log_type tables;
- Counting rows in log table matching each couple (user, log_type) from
the previous cross-join (LEFT JOIN ensures that each row of the table on
the left is mined).

While it is formally assumed that user_id and log_type_id are
respectively keys for users and log_type tables, it is semantically
admitted here that user_name identifies user_id in users table and
log_type identifies log_type_id in log_type table.

Regards.
--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] sql-porting-problem oracle to postgresql with UPDATE/IS NOT NULL

2009-07-28 Thread nha
Hello,

Le 28/07/09 14:25, Daryl Richter a écrit :
> 
> On Jul 28, 2009, at 5:58 AM, Gau, Hans-Jürgen wrote:
> 
>> hello list,
>> i have some problems with an sql-statement which runs on oracle but
>> not on postgresql (i want update only if result of SELECT is not
>> empty, the SELECT-queries are identical):
>>
>> UPDATE table1 t1
>> SET (t1.id) =
>> (SELECT h.id FROM table2 t2,table3 t3, table1 t1
>> WHERE t3.field = t2.field
>> AND t2.id = t1.id
>> AND t1.id <> t3.id)
>> WHERE
>> (SELECT h.id FROM table2 t2,table3 t3, table1 t1
>> WHERE t3.field = t2.field
>> AND t2.id = t1.id
>> AND t1.id <> t3.id) IS NOT NULL;
>>
> Try this:
> 
> UPDATE table1 t1 [...]
> WHERE
> EXISTS (SELECT 1 FROM table2 t2,table3 t3, table1 t1
> WHERE t3.field = t2.field
> AND t2.id = t1.id
> AND t1.id <> t3.id
> 
> AND h.id IS NOT NULL);
> 

Beyond the solution brought by Daryl Richter, it seems that "h" is an
unbound alias in the original (and also in the suggested) query. Some
clarification would be helpful for further investigation.

Regards.

--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: WG: [SQL] sql-porting-problem oracle to postgresql with UPDATE/IS NOT NULL

2009-07-29 Thread nha
Hello,

Le 29/07/09 11:48, Gau, Hans-Jürgen a écrit :
> Sorry, it's a lapse by copying and simplification the original version. that
> is correct:
> 
> UPDATE table1 t1 
>   SET (t1.id) = 
>   (SELECT t3.id FROM table2 t2,table3 t3, table1 t1 
>   WHERE t3.field = t2.field 
>   AND t2.id = t1.id 
>   AND t1.id <> t3.id) 
>   WHERE 
>   (SELECT t3.id FROM table2 t2,table3 t3, table1 t1 
>   WHERE t3.field = t2.field 
>   AND t2.id = t1.id 
>   AND t1.id <> t3.id) IS NOT NULL; 
> 
> or in very simplified form:
> 
> UPDATE table t1
>   SET (t1.id)=(SELECT expression)
>   WHERE   (SELECT expression) IS NOT NULL;
> 
> The SELECT expressions are identical.
> 
> this syntax is allowed on postgresql?
> 
> the solution brought by Daryl Richter has no effect.
>   
> Regards, Hans
> [...]

There is one mistake syntactically speaking and according to PostgreSQL
specification for UPDATE statement: column specified in the SET part
cannot be qualified with an existing alias for the updated table. See also:
http://www.postgresql.org/docs/8.4/interactive/sql-update.html

Except this point, the query is valid although I am not sure the result
is always what you may expect because there is no relation between any
field of the updated table and the SELECT expression in the WHERE
clause; table1 is called (in fact, a copy of this table is implied) but
no column of this table is bound to one or more of the current updated
table1. I may mistake...

Regards.

--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: WG: [SQL] sql-porting-problem oracle to postgresql with UPDATE/IS NOT NULL

2009-07-29 Thread nha
Hello again,

Le 29/07/09 12:21, nha a écrit :
> Hello,
> 
> Le 29/07/09 11:48, Gau, Hans-Jürgen a écrit :
>> Sorry, it's a lapse by copying and simplification the original version. that
>> is correct:
>>
>> UPDATE table1 t1 
>>  SET (t1.id) = 
>>  (SELECT t3.id FROM table2 t2,table3 t3, table1 t1 
>>  WHERE t3.field = t2.field 
>>  AND t2.id = t1.id 
>>  AND t1.id <> t3.id) 
>>  WHERE 
>>  (SELECT t3.id FROM table2 t2,table3 t3, table1 t1 
>>  WHERE t3.field = t2.field 
>>  AND t2.id = t1.id 
>>  AND t1.id <> t3.id) IS NOT NULL; 
>> [...]
>> the solution brought by Daryl Richter has no effect.
> 
> There is one mistake [...]

To be concrete, a valid version would be:
UPDATE table1 t1
 SET id = (something)
 WHERE (anotherthing);

> Except this point, the query is valid although I am not sure the result
> is always what you may expect [...]

To "join" table1 column (assumed: id) to the subquery (sub-select here)
expression, table1 recall is not relevant in the subquery. Moreover an
alias "t1" is already defined to table1 (at the beginning of the statement).

A more suitable version would be:
UPDATE table1 t1
 SET id = (SELECT t3.id FROM table2 t2, table3 t3
  WHERE t3.field = t2.field AND t2.id = t1.id AND t1.id <> t3.id)
 WHERE (SELECT t3.id FROM table2 t2, table3 t3
  WHERE t3.field = t2.field AND t2.id = t1.id AND t1.id <> t3.id) IS NOT
NULL;

A PostgreSQL-compliant version could also be:
UPDATE table1 t1
 SET id = t3.id
 FROM table2 t2, table3 t3
 WHERE t3.field = t2.field AND t2.id = t1.id AND t1.id <> t3.id AND
t3.id IS NOT NULL;

A PostgreSQL-compliant and faster version may be:
UPDATE table1 t1
 SET id = t3.id
 FROM table2 t2
  INNER JOIN
   (SELECT t.id, t.field FROM table3 t WHERE t.id IS NOT NULL) t3
  ON t3.field = t2.field
  WHERE t2.id = t1.id AND t3.id <> t1.id;

Hoping a satisfying solution is up.

--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] SELECT max(time) group by problem

2009-07-30 Thread nha
Hello,

Le 30/07/09 11:38, Heigo Niilop a écrit :
> hi,
> 
> I have table
> 
> CREATE TABLE table
> (
>   id integer NOT NULL,
>   timest timestamp with time zone NOT NULL,
>   db_time timestamp with time zone NOT NULL DEFAULT now(),
>   "values" text[],
>   CONSTRAINT table_pkey PRIMARY KEY (id, timest)
> )
> [...] 
> SELECT MAX(table.timest)  FROM table, table1  WHERE
> table.id=table1.id and table1.id in (1,2,3) GROUP BY table.id
> 
> [...] it is terrible slow,
> when I use strange syntax
> 
> SELECT table.timest  FROM table,table1 WHERE
> table.id=table1.id and table1.id in(1,2,3) and table.timest=
> (SELECT max(timest) FROM table WHERE table.id=table1.id)  
> 
> I receive all needed data very fast.
> 
> My questions are
> 1)  why this first query is slow and what I can do to make it faster
> (some more indexes??)?
> 2)  what kind of danger I have with second query (so far I have
> right data)?   
> 
> I have Postgres 8.3 and table have over million rows. [...]

Q1) EXPLAIN ANALYZE output would likely give (at least partially) some
clarification about observed performance.

According to "table" definition, the implicit index created on
"table_pkey" would not be efficiently used in the 1st query because of
explicit aggregation on column "id" (ie. partial key of "table"). Full
scan of "table" is assumed for the join despite index scan on "table1"
with "id" index. Each "table1" row is then joined with million of rows
of "table" before matching WHERE clauses (as these latter apply for each
row resulting from join). Slowness is expected.

3 ideas (with or without combination) of improvement come to my mind at
this point:
ID1- Definition of explicit index on "table.id": this would lightly
quicken aggregation by "id" and join on "id";
ID2- Aggregation on "table1.id" instead of "table.id": because of
reference declaration of "table.id" on "table1.id", "table1.id" is
assumed to be a (primary) key of "table1" (and thence bound to an index,
speeding up aggregation);
ID3- Integration of WHERE clause "table1.id IN (1,2,3)" into a subquery
on "table1" (because this filter is independent from "table") and use of
this subquery instead of "table1" call for join. This would reduce the
size of the table to join to "table" and thence reduce the number of
join rows at a sooner stage of query execution.

A possible rewritten query would express as follows:
SELECT MAX(t.timest)
FROM table t
INNER JOIN (SELECT id FROM table1 WHERE id IN (1,2,3)) t1
ON t.id = t1.id
GROUP BY t1.id;
(In this case, indexing on "table1.id" is not necessary although
recommended.)

Inviting you to assess this proposal (or a derivative according to ideas
1, 2, and/or 3).

A 4th idea may consist in directly looking up "table.id IN (1,2,3)" in
case of certainty on these 3 values in column "table1.id". Such a way
strengthens the efficiency of an index to declare on "table.id" so that
lookup quickens.

The query would look like the following:
SELECT MAX(timest)
FROM table
WHERE id IN (1,2,3)
GROUP BY id;

Q2) On the other hand, the 2nd query seems to take advantage of "table"
index on its primay key (id,timest) as the 2 columns are explicitely
referred in WHERE clauses. The sub-select would be a bottleneck; but the
WHERE clause of this sub-select refers to a literal against column "id",
ie. hash join is assumed to be efficient and used in this case by the
database engine for a faster evaluation.

Eventually I do not guess what you have in mind by "danger" from this
2nd query. Maybe I missed some elements.

Regards.
--
nha / Lyon / France.

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql