[SQL] Removing redundant itemsets

2008-03-31 Thread Allan Kamau

Hi all,
I have a list of purchases (market basket) and I would like to select
non redundant longest possible patterns by eliminating
(creating/populating other table to contain only non redandant itemsets)
purchases having item lists which are fully included in at least one
other purchase.

(Am assuming all the items of all the purchases have met the minimum
support currently set at 1)

Below is a sample case, table schema and data(DDL and DML)

Transaction   Itemset
'100'   'a','b','c','d'
'200'   'c','d'
'300'   'a','c','e'
'400'   'e','d'

On successful removal out of 'redanduant' or smaller purchases having
items contained in totality by at least one other purchase, the
purchase '200' would be weeded out as it's itemset {'c','d'} is
contained in '100' {'a','b','c','d'} purchase.


drop sequence if exists togo_seq cascade;
create sequence togo_seq;
drop table if exists togo cascade;
create table togo
(
id integer not null default nextval('togo_seq')
,tid char(3) not null
,item char(1) not null
,primary key(id)
,unique(tid,item)
)
;
insert into togo(tid,item)values('100','b');
insert into togo(tid,item)values('100','a');
insert into togo(tid,item)values('100','c');
insert into togo(tid,item)values('100','d');
insert into togo(tid,item)values('200','c');
insert into togo(tid,item)values('200','d');
insert into togo(tid,item)values('300','a');
insert into togo(tid,item)values('300','c');
insert into togo(tid,item)values('300','e');
insert into togo(tid,item)values('400','e');
insert into togo(tid,item)values('400','d');

Allan.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Removing redundant itemsets

2008-03-31 Thread Craig Ringer
Allan Kamau wrote:
> Hi all,
> I have a list of purchases (market basket) and I would like to select
> non redundant longest possible patterns by eliminating
> (creating/populating other table to contain only non redandant itemsets)
> purchases having item lists which are fully included in at least one
> other purchase.

Here's a possibly slow and surely ugly solution (I think it's right,
though I haven't done more than passing testing):



CREATE VIEW togo_as_arr AS
  SELECT a.tid,
ARRAY(SELECT item FROM togo b WHERE b.tid = a.tid ORDER BY item)
AS items
  FROM togo a GROUP BY tid;

SELECT arr_a.tid AS redundant_tid, arr_b.tid AS contained_by
FROM togo_as_arr arr_a CROSS JOIN togo_as_arr arr_b
WHERE arr_a.tid <> arr_b.tid AND arr_a.items <@ arr_b.items;



(the view isn't necessary, but does improve the readability of the query).

It groups the purchases up with item lists as arrays, then finds any
purchases with items arrays wholly contained by other item arrays from
other purchases.

I'm *sure* there's a smarter way to do this that avoids the use of
arrays, but I don't seem to be able to come up with one right now. It's
interesting, though, so I might keep fiddling.

--
Craig Ringer

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Removing redundant itemsets

2008-03-31 Thread Craig Ringer
Craig Ringer wrote:
> Allan Kamau wrote:
>> Hi all,
>> I have a list of purchases (market basket) and I would like to select
>> non redundant longest possible patterns by eliminating
>> (creating/populating other table to contain only non redandant itemsets)
>> purchases having item lists which are fully included in at least one
>> other purchase.
> 
> Here's a possibly slow and surely ugly solution (I think it's right,
> though I haven't done more than passing testing):
> 
> 
> 
> CREATE VIEW togo_as_arr AS
>   SELECT a.tid,
> ARRAY(SELECT item FROM togo b WHERE b.tid = a.tid ORDER BY item)
> AS items
>   FROM togo a GROUP BY tid;
> 
> SELECT arr_a.tid AS redundant_tid, arr_b.tid AS contained_by
> FROM togo_as_arr arr_a CROSS JOIN togo_as_arr arr_b
> WHERE arr_a.tid <> arr_b.tid AND arr_a.items <@ arr_b.items;

Alternately:

-- Helps with the massively repeated subquery below
CREATE INDEX togo_by_tid_and_item ON togo(tid,item);

-- Find any `a' for which `item_from_a_is_in_b' is
-- true for all items in `a'
SELECT a_tid AS is_redundant, b_tid AS contained_by
FROM (
  -- For every item in every pair of purchases,
  -- determine whether the item in purchase `a'
  -- was also in purchase `b'.
  SELECT
a.tid AS a_tid,
b.tid AS b_tid,
a.item AS item,
EXISTS(
  -- Was this item from `a' also in the `b' purchase?
  SELECT 1 FROM togo x WHERE x.tid = b.tid AND x.item = a.item
) AS item_from_a_is_in_b
  FROM togo a INNER JOIN togo b ON (a.tid <> b.tid)
  GROUP BY a.tid, b.tid, a.item) AS item_containment
GROUP BY a_tid, b_tid
HAVING every(item_from_a_is_in_b);


... which avoids the array building, but is actually considerably slower
on the trivial test data. That's not too surprising given that this
approach requires a subquery:

   SELECT 1 FROM togo x WHERE x.tid = b.tid AND x.item = a.item

for EVERY item to be tested. Twice, actually, as each record appears in
both `a' and `b' positions.

I'd be very interested to see what happened on real world test data,
especially compared to doing the array accumulation based query off a
temporary table instead of a view.

I suspect it'll depend on the average number of items per purchase -
lots of items per purchase and the array building cost will dominate.
That's really just a guess, though.

I'm sure there's a properly smart way to do this that I just can't
figure out, but this is the best I've come up with so far.

--
Craig Ringer

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Removing redundant itemsets

2008-03-31 Thread Craig Ringer
> -- Find any `a' for which `item_from_a_is_in_b' is
> -- true for all items in `a'
> SELECT a_tid AS is_redundant, b_tid AS contained_by
> FROM (
>   -- For every item in every pair of purchases,
>   -- determine whether the item in purchase `a'
>   -- was also in purchase `b'.
>   SELECT
> a.tid AS a_tid,
> b.tid AS b_tid,
> a.item AS item,
> EXISTS(
>   -- Was this item from `a' also in the `b' purchase?
>   SELECT 1 FROM togo x WHERE x.tid = b.tid AND x.item = a.item
> ) AS item_from_a_is_in_b
>   FROM togo a INNER JOIN togo b ON (a.tid <> b.tid)
>   GROUP BY a.tid, b.tid, a.item) AS item_containment
> GROUP BY a_tid, b_tid
> HAVING every(item_from_a_is_in_b);

That really should've been written as:

SELECT
  a.tid AS is_redundant,
  b.tid AS contained_by
FROM togo a INNER JOIN togo b ON (a.tid <> b.tid)
GROUP BY a.tid, b.tid
HAVING
  EVERY(EXISTS(
SELECT 1 FROM togo x WHERE x.tid = b.tid AND x.item = a.item
  ));

... but I'm a bit of an idiot, and couldn't figure out why the
EVERY(EXISTS(subq)) wasn't working when testing it before.

Sorry for all the noise.

--
Craig Ringer

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Removing redundant itemsets

2008-03-31 Thread Allan Kamau

Craig,
Thank you so much for the solution. I have spent many hours since 
Thursday last week including the weekend (and it took you just a few 
minutes) trying to figure out a solution not involving procedural 
programming and looping (as the size of the items and even the number of 
"purchases" in the datasets I may be working with may be large), I was 
looking for a solution that may take (almost) polynomial time (and 
resources) and also make use of Postgresql refined and efficient engine. 
Your solution satisfies these requirements. Thanks.


Allan.

Craig Ringer wrote:

Allan Kamau wrote:
  

Hi all,
I have a list of purchases (market basket) and I would like to select
non redundant longest possible patterns by eliminating
(creating/populating other table to contain only non redandant itemsets)
purchases having item lists which are fully included in at least one
other purchase.



Here's a possibly slow and surely ugly solution (I think it's right,
though I haven't done more than passing testing):



CREATE VIEW togo_as_arr AS
  SELECT a.tid,
ARRAY(SELECT item FROM togo b WHERE b.tid = a.tid ORDER BY item)
AS items
  FROM togo a GROUP BY tid;

SELECT arr_a.tid AS redundant_tid, arr_b.tid AS contained_by
FROM togo_as_arr arr_a CROSS JOIN togo_as_arr arr_b
WHERE arr_a.tid <> arr_b.tid AND arr_a.items <@ arr_b.items;



(the view isn't necessary, but does improve the readability of the query).

It groups the purchases up with item lists as arrays, then finds any
purchases with items arrays wholly contained by other item arrays from
other purchases.

I'm *sure* there's a smarter way to do this that avoids the use of
arrays, but I don't seem to be able to come up with one right now. It's
interesting, though, so I might keep fiddling.

--
Craig Ringer
  



--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] drop table where tableName like 'backup_2007%' ?

2008-03-31 Thread Emi Lu

Good morning,


Is there a command to drop tables whose name begins a specific string?


For example, all backup tables begins with backup_2007:
Drop table where tableName like 'backup_2007%'?


Thanks a lot!



--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] drop table where tableName like 'backup_2007%' ?

2008-03-31 Thread Tom Lane
Emi Lu <[EMAIL PROTECTED]> writes:
> Is there a command to drop tables whose name begins a specific string?

No.  The standard answer to this type of problem is to write a little
plpgsql function that scans the appropriate catalog and issues commands
constructed with EXECUTE.

for r in select relname from pg_class where relname like 'backup_2007%'
loop
execute 'DROP TABLE ' || quote_ident(r);
end loop;

Note that the above is overly simplistic --- it doesn't pay attention
to schemas, for example.

Some people prefer to just print out the constructed commands into a
file, so they can eyeball them before actually executing them.

regards, tom lane

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] drop table where tableName like 'backup_2007%' ?

2008-03-31 Thread Richard Huxton

Tom Lane wrote:

Note that the above is overly simplistic --- it doesn't pay attention
to schemas, for example.


These are what I use.


BEGIN;

CREATE SCHEMA util;

CREATE OR REPLACE FUNCTION util.exec_all(objtype name, schname name, 
objname name, cmd TEXT) RETURNS text AS $$

DECLARE
r   RECORD;
sql TEXT;
out TEXT;
BEGIN
out := cmd || ': ';

-- Tables includes views
IF objtype = 'tables' THEN
FOR r IN SELECT table_name AS nm FROM information_schema.tables
WHERE table_schema=schname AND table_name LIKE objname
LOOP
			sql := regexp_replace(cmd, E'\\?', quote_ident(schname) || '.' || 
quote_ident(r.nm));

EXECUTE sql;
out := out || r.nm || ' ';
END LOOP;
ELSIF objtype = 'sequences' THEN
FOR r IN SELECT sequence_name AS nm FROM 
information_schema.sequences
WHERE sequence_schema=schname AND sequence_name LIKE objname
LOOP
			sql := regexp_replace(cmd, E'\\?', quote_ident(schname) || '.' || 
quote_ident(r.nm));

EXECUTE sql;
out := out || r.nm || ' ';
END LOOP;
END IF;

RETURN out;
END;
$$ LANGUAGE plpgsql;


CREATE OR REPLACE FUNCTION util.grant_all(objtype name, schname name, 
objname name, perms TEXT, roles TEXT) RETURNS text AS $$

DECLARE
r   RECORD;
sql TEXT;
g   TEXT;
BEGIN
g := perms || ' ON ' || schname || '( ';

-- Tables includes views
IF objtype = 'tables' THEN
FOR r IN SELECT table_name AS nm FROM information_schema.tables
WHERE table_schema=schname AND table_name LIKE objname
LOOP
			sql := 'GRANT ' || perms || ' ON TABLE ' || quote_ident(schname) || 
'.' || quote_ident(r.nm) || ' TO ' || roles;

-- RAISE NOTICE 'granting: %', sql;
EXECUTE sql;
g := g || r.nm || ' ';
END LOOP;
ELSIF objtype = 'sequences' THEN
FOR r IN SELECT sequence_name AS nm FROM 
information_schema.sequences
WHERE sequence_schema=schname AND sequence_name LIKE objname
LOOP
			sql := 'GRANT ' || perms || ' ON SEQUENCE ' || quote_ident(schname) 
|| '.' || quote_ident(r.nm) || ' TO ' || roles;

-- RAISE NOTICE 'granting: %', sql;
EXECUTE sql;
g := g || r.nm || ' ';
END LOOP;
END IF;
g := g || ') TO ' || roles;

RETURN g;
END;
$$ LANGUAGE plpgsql;

COMMENT ON FUNCTION util.grant_all(objtype name, schname name, objname 
name, perms TEXT, roles TEXT) IS

$$Pattern-matching permission granter.
objtype  - (tables|sequences) where "tables" includes views
schname  - target schema (NOT wildcarded)
objname  - wildcard (_%) name to match
perms- permissions to grant
roles- comma-separated list of roles to grant perms to.
$$;


CREATE OR REPLACE FUNCTION util.revoke_all(objtype name, schname name, 
objname name, perms TEXT, roles TEXT) RETURNS text AS $$

DECLARE
r   RECORD;
sql TEXT;
g   TEXT;
BEGIN
g := perms || ' ON ' || schname || '( ';

-- Tables includes views
IF objtype = 'tables' THEN
FOR r IN SELECT table_name AS nm FROM information_schema.tables
WHERE table_schema=schname AND table_name LIKE objname
LOOP
			sql := 'REVOKE ' || perms || ' ON TABLE ' || quote_ident(schname) || 
'.' || quote_ident(r.nm) || ' FROM ' || roles;

-- RAISE NOTICE 'granting: %', sql;
EXECUTE sql;
g := g || r.nm || ' ';
END LOOP;
ELSIF objtype = 'sequences' THEN
FOR r IN SELECT sequence_name AS nm FROM 
information_schema.sequences
WHERE sequence_schema=schname AND sequence_name LIKE objname
LOOP
			sql := 'REVOKE ' || perms || ' ON SEQUENCE ' || quote_ident(schname) 
|| '.' || quote_ident(r.nm) || ' FROM ' || roles;

-- RAISE NOTICE 'granting: %', sql;
EXECUTE sql;
g := g || r.nm || ' ';
END LOOP;
END IF;
g := g || ') TO ' || roles;

RETURN g;
END;
$$ LANGUAGE plpgsql;

COMMENT ON FUNCTION util.revoke_all(objtype name, schname name, objname 
name, perms TEXT, roles TEXT) IS

$$Pattern-matching permission revoker. See grant_all(...) for details.
$$;


CREATE OR REPLACE FUNCTION util.drop_all_roles(pattern name) RETURNS 
TEXT AS $$

DECLARE
rRECORD;
sql  TEXT;
res  TEXT;
BEGIN
res := 'Dropped: ';
FOR r IN SELECT rolname FROM pg_roles WHERE rolname LIKE pattern
LO

[SQL] specifying wal file size

2008-03-31 Thread gherzig
Hi dudes. Im trying to find out how to deal with this. I just receive a
1.2 MB wal file (may some 'windows home version' do that?), and obviosly,
when i try to recovery from there, get the 'incorrect file size' error.
Acording to the docs, wal file size can be changed at compile time. Not so
bad, i can do that. But where is that code?

BTW: 1163398 bytes seems like a wrong size for any wal file, isnt?
The .backup file may have been created under windows, and im working on
Linux. Could that be a problem?

Thanks!
Gerardo


-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] specifying wal file size

2008-03-31 Thread Tom Lane
[EMAIL PROTECTED] writes:
> Hi dudes. Im trying to find out how to deal with this. I just receive a
> 1.2 MB wal file (may some 'windows home version' do that?), and obviosly,
> when i try to recovery from there, get the 'incorrect file size' error.
> Acording to the docs, wal file size can be changed at compile time. Not so
> bad, i can do that. But where is that code?

> BTW: 1163398 bytes seems like a wrong size for any wal file, isnt?

WAL files must certainly be a power of 2 in size.  I think you're
looking at a file-transfer error, or else the file got corrupted
on the source system.  I rather doubt that your source knows how
to change the intended file size from 16MB anyway ;-)

regards, tom lane

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql