Re: [SQL] bibliographic schema

2004-10-25 Thread James Cloos
Try googling for:

bibtex sql

That should provide some pointers that may help produce an effective
sql schema for storing bibliographic references.

-JimC
-- 
James H. Cloos, Jr. <[EMAIL PROTECTED]> 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[SQL] Two tables or three?

2005-10-29 Thread James Cloos
I'm designing a schema for an app that requires two large blobs per
row.  Given that querying does not hit those, I presume it is
beneficial to move the blobs out of the main table.

But should each blob be in its own table, or should they go into a
single, three column table?

The main table does need a sequential primary key column, so I expect
to use that value to reference the blobs by using it as a primary key
in the blobs' table(s).

Each row will have an associated first_blob, virtually all of them
with also have an associated second_blob.

The usage scenario has the user doing some queries on the data in
the main table and manually selecting one or more rows at a time;
upon selection the app will need to grab the blobs and show those.
Most of the time both will be grabbed by interactive users, but
noninteractive clients may only care about the first blob.

The blob table(s) will need to CASCADE row deletions from the
first table.  I expect to have the ui flag deletable rows and
use a vacuum-like process to do the actual deletions.  (In part
this will give some opportunity to 'un-delete', but I also presume
a performance benefit.  Yes?)

Given that usage pattern, is there any benefit for one or two
blob-specific table(s)?  Or for that matter any benefit for
splitting them out at all?

This will be on 8.1.  (Devel work is being done on the current
beta.)

Thanks,

-JimC
-- 
James H. Cloos, Jr. <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [SQL] Two tables or three?

2005-10-29 Thread James Cloos
> "Tom" == Tom Lane <[EMAIL PROTECTED]> writes:

Tom> You shouldn't contort your schema artificially in order to do that.

Tom> Postgres automatically stores wide fields out-of-line ...

Cool.  Good to know it is even better than I thought. :)

-JimC
-- 
James H. Cloos, Jr. <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[SQL] counting related rows

2010-10-08 Thread James Cloos
I have a table which includes a text column containing posix-style
paths.  Ie, matching the regexp "^[^/]+(/[^/]+)*$".

I need to do a query of a number of columns from that table, plus the
count of rows which are "children" of the current row.

The query:

  SELECT count(*) AS nch FROM m WHERE o = (SELECT o FROM m WHERE id=30016)
AND name ILIKE (SELECT name || '/%' FROM m WHERE id=30016);

selects that extra column given the id.

A view containing all of the columns from m plus a column matching the
above select would cover my needs well.

But I haven't been able to get the syntax right.

The current code (which I did not write) uses one select to get the
values of id it wants, and then iterates through them selecting four
columns from the table and then the above.  As you may imagine, two
selects per row is *slow*.  Maybe 20 rows per second.  I expect a
single, complete select to take < 10 ms.

Again, to be clear, for each row I need the count of other rows which
have the same value for column o and whose name is a child path of the
current row's name.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] counting related rows

2010-10-09 Thread James Cloos
>>>>> "FB" == Frank Bax  writes:

FB> It would help if you provided:
FB> a) statements to create sample data
FB> b) expected results from sample data

FB> Does this do what you want?

FB> select * from m, (SELECT count(*) AS nch FROM m WHERE o = (SELECT o
FB> FROM m WHERE id=30016) AND name ILIKE (SELECT name || '/%' FROM m
FB> WHERE id=30016)) om;

That is almost right, except that it uses id=30016's nch value for every
row in the result, rather than computing each row's own nch.

As an example:

create TABLE m ( id integer primary key, o integer, name text,
 f1 integer, f2 integer, f3 integer);

insert into m values (1, 3, 'a', 0, 1, 1);
insert into m values (2, 3, 'a/short', 1, 0, 1);
insert into m values (3, 3, 'a/short/path', 1, 0, 0);
insert into m values (4, 4, 'nothing', 0, 0, 1);
insert into m values (5, 2, 'nothing', 0, 1, 0);
insert into m values (6, 2, 'nothing/of', 1, 0, 0);
insert into m values (7, 2, 'nothing/of/value', 0, 0, 0);

The select should result in something like:

 id | o |   name   | f1 | f2 | f3 | nch
+---+--++++-
  1 | 3 | a|  0 |  1 |  1 |  2
  2 | 3 | a/short  |  1 |  0 |  1 |  1
  3 | 3 | a/short/path |  1 |  0 |  0 |  0
  4 | 4 | nothing  |  0 |  0 |  1 |  0
  5 | 2 | nothing  |  0 |  1 |  0 |  2
  6 | 2 | nothing/of   |  1 |  0 |  0 |  1
  7 | 2 | nothing/of/value |  0 |  0 |  0 |  0

since rows 2 and 3 are children of row 1, row 3 is also a child of
row 2, rows 6 and 7 are children of row 5 and row 7 is also a child
of row 6.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] counting related rows

2010-10-15 Thread James Cloos
After further though, I tried using a function:

CREATE OR REPLACE FUNCTION children ( ow integer, parent text) returns integer 
AS $$
select count(*) as children from m where o = $1 and name ilike $2 || '/%';
$$
LANGUAGE sql;

An example call is:

select o, name, f1, f2, (select children(o,name) as children) from m where o=3;

Which worked, but was no faster than selecting all of the matching ids
and iterating through them on the client finding each row's parent count
each in its own select.  Ie, it took about 1 ks for about 20k rows.

So it looks like the real solution is to add a column to track the
number of children and update it, for each "parent" row via a trigger
whenever a row is added, removed or the path column of a row is changed.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] large xml database

2010-10-30 Thread James Cloos
>>>>> "VB" == Viktor Bojović  writes:

VB> i have very big XML documment which is larger than 50GB and want to
VB> import it into databse, and transform it to relational schema.

Were I doing such a conversion, I'd use perl to convert the xml into
something which COPY can grok.  Any other language, script or compiled,
would work just as well.  The goal is to avoid having to slurp the whole
xml structure into memory.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] A more efficient way?

2010-10-31 Thread James Cloos
I've a third-party app which is slower than it ought to be.

It does one operation in particular by doing a query to get a list of
rows, and then iterates though them to confirm whether it actually wants
that row.  As you can imagine that is very slow.

This query gets the correct data set in one go, but is also slow:

 select p from m where s = 7 and p not in (select p from m where s != 7);

The p column is not unique; the goal is the set of p for which *every*
row with a matching p also has s=7.  (s and p are both integer columns,
if that matters.)

That takes about 38 seconds with this (explain analyze) plan:

QUERY PLAN

 Index Scan using m_5 on m  (cost=8141.99..11519.73 rows=1 width=4) (actual 
time=564.689..37964.163 rows=243 loops=1)
   Index Cond: (s = 7)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
 ->  Materialize  (cost=8141.99..13586.24 rows=523955 width=4) (actual 
time=0.003..80.734 rows=523948 loops=243)
   ->  Seq Scan on m  (cost=0.00..7413.34 rows=523955 width=4) (actual 
time=0.023..259.901 rows=523948 loops=1)
 Filter: (s <> 7)
 Total runtime: 38121.781 ms

The index m_5 is btree (s).  

Can that be done is a way which requires only 1 loop?

My understanding is that:

 select p from m where s = 7  except select p from m where s != 7;

will return the same results as a simple 'select p from m where s = 7', yes?

Is there a way to do it with a self join which requires just a single loop?

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] where clause subqueries vs multiple rows results

2006-09-10 Thread James Cloos
I've a query which I'd have liked to word akin to:

  SELECT guid FROM child WHERE the_fkey =
 ( SELECT id FROM parent WHERE name ~ 'some_regex' )
 ORDER BY the_fkey, my_pkey;

I got around it by doing the SELECT id first, and then doing a SELECT
guid for each row returned, appending the results together.

Can that be done in a single query, insead of 1+n queries?

Thanks,

-JimC
-- 
James Cloos <[EMAIL PROTECTED]> OpenPGP: 0xED7DAEA6

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [SQL] where clause subqueries vs multiple rows results

2006-09-11 Thread James Cloos
[SIGH]

I was getting the syntax wrong.  Just using ON rathar than = fixed
the mistake.

Time to crash for the night (day?) it seems

Thanks for the replies; if I hadn't've figured it out myself they
would have pushed me in the right direction.

-JimC
-- 
James Cloos <[EMAIL PROTECTED]> OpenPGP: 0xED7DAEA6

---(end of broadcast)---
TIP 6: explain analyze is your friend


[SQL] Escape sequence for UTF-8 Character Literals?

2008-01-10 Thread James Cloos
I’ve a data set in a text file which uses the U+ syntax for UCS
characters which I want to enter into a (utf8) db, using the actual
characters rather than the codepoint names.

The docs give the impression that eg E'\x91D1' ought to be the same
as '金', but my tests show that \x only accepts 2 hex digits.

Is doing the conversion client side the only way to do this?

I’m on 8.2, if the answer is version-dependent.

-JimC
-- 
James Cloos <[EMAIL PROTECTED]> OpenPGP: 1024D/ED7DAEA6



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate