[GENERAL] array_agg problem

2011-08-19 Thread TJ O'Donnell
array_agg says it can take any type of arg, but this seems not to work
when the arg in an integer array.

create temp table x(name text, val integer[]);
insert into x values('a',array[1,2,3]);
insert into x values('b',array[3,4,5]);
select * from x;
select max(val), min(val) from x;
select array_agg(val) from x;

Well, everything works except the last statement.  In 8.4 I get
ERROR:  could not find array type for data type integer[]

I was hoping for [1,2,3,4,5] or at least [1,2,3,3,4,5] or even [[1,2,3],[3,4,5]]
Am I not understanding something?

Thanks,
TJ O'Donnell

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] plpython returns integer[] fails for multi-dimensional array

2010-12-21 Thread TJ O'Donnell
In postgresql-9.0.1 I have to modify my plpython functions that return arrays.
It seems one dimesional arrays are handled properly, but not
2-dimensional arrays.

create or replace function atest() returns integer[] as $eopy$
 a = list()
 a.append(1)
 a.append(2)
 a.append(3)
 #return a works fine
 b = list()
 b.append(a)
 b.append(a)
 # error
 return b
$eopy$ language plpythonu


select atest() gives
obtest=# select atest();
ERROR:  invalid input syntax for integer: [1, 2, 3]
CONTEXT:  while creating return value
PL/Python function atest

How can I return multi-dimensional arrays in plpython?

TJ O'Donnell

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] plpython returns integer[] fails for multi-dimensional array

2010-12-21 Thread TJ O'Donnell
In previous versions (8.x) for plpython fn returning integer[]
 I created (had to create) a string in the proper SQL format {
{1,2,3}, {4,5,6} }
and returned that.  It worked fine.

I LIKE the ability to not have to do that in 9.0
but I CAN'T return and string like  { {1,2,3}, {4,5,6} } for a fn that
returns integer[]
AND I can't return a two-dimensional array.  Not a happy 9.0 camper.

Anyone know of any plans to 9.0 plpython to support multi-dimensional arrays?

TJ O'Donnell

On Tue, Dec 21, 2010 at 4:02 PM, Adrian Klaver adrian.kla...@gmail.com wrote:
 On Tuesday 21 December 2010 3:25:48 pm Peter Geoghegan wrote:
 On 21 December 2010 23:17, Thom Brown t...@linux.com wrote:
  Are you sure that a returns okay in that scenario.  You're using a
  list.  Shouldn't you be using an array?  Like: a = []

 a =[] actually declares an empty list in Python. You can return a list
 or a tuple from a pl/python function in 9.0 and it will be interpreted
 as an array at the SQL call site. You cannot in prior versions.

 --
 Regards,
 Peter Geoghegan

 Digging into the source for plpython seems to show it only supports one
 dimensional arrays. When I tried my previous example on a 9.0.1 instance it
 kept changing integer[][] to integer[].

 --
 Adrian Klaver
 adrian.kla...@gmail.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [pgsql-sql] Daily digest v1.3328 (5 messages)

2010-09-23 Thread TJ O'Donnell
Yes indeed, pg_query returns t or f as strings when selecting
boolean columns.
I was able to switch over to PDO with only an afternoon's work and it works
perfectly for boolean columns, returning values that are properly interpreted
as true and false by php, and by json_encode.  I'm all set now.
Thanks for the help!

TJ O'Donnell

On Thu, Sep 23, 2010 at 1:18 AM, Raymond O'Donnell r...@iol.ie wrote:
 On 23/09/2010 02:00, Adrian Klaver wrote:

 On Wednesday 22 September 2010 5:40:55 pm David Wilson wrote:

 On Wed, Sep 22, 2010 at 8:30 PM, Adrian
 Klaveradrian.kla...@gmail.comwrote:

  From here;

 http://www.postgresql.org/docs/8.4/interactive/datatype-boolean.html

 I believe the question relates to changing the string *output* of the
 database to something other than 't' and 'f', not an issue with input
 formats.

 Oops missed that. This would seem to an adapter problem. I work with
 Python and
 psycopg2 adapts the boolean values to True and False. Not sure about the
 PHP
 solution.

 AFAIK the native functions (pg_query() etc) output 't' and 'f', but PDO
 (which I've been using in the context of Zend Framework) output real true
 and false values.

 Ray.

 --
 Raymond O'Donnell :: Galway :: Ireland
 r...@iol.ie


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [pgsql-sql] Daily digest v1.3328 (5 messages)

2010-09-22 Thread TJ O'Donnell
Many frameworks do not natively recognize t as true and f as false.
I'm using php, json_encode and extjs.
Is it possible to cause the default output of boolean values to be something
other than t and f, say true and false or even 1 and 0?
Of course I can do this for an individual query using appropriate functions, but
it would help me to be able to modify the default behavior.

TJ

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Lost password

2008-12-16 Thread TJ O'Donnell

It seems you haven't done anything you need to save, so why
not save time and just reinstall postgres?

TJ

Hi 
 
I started the installation of postrgres got distracted and then started again but forgot my password



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] options for launching sql script asynchronously from web app

2008-11-08 Thread TJ O'Donnell

The only way to launch an ~asynchronous~ request from a web app is using
XMLHTTPRequest.  That would be answered by a web server of
some sort.  The server can launch any app you want - to run a sql
script, say.  Authentication can be done with passwords, as usual.
Security may be an issue - you should use https, and not hard-code
passwords...all the usual security precautions.

Anything ready?  Hard to say without knowing more about your
specific needs.

TJ O'Donnell
http://www.gnova.com


I'd like to launch some sql script asynchronously from a web app and
have some kind of feedback later.

Some form of authentication would be a plus.

Is there anything ready?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] cool code_swarm animation of PostgreSQL development since 1996

2008-06-16 Thread TJ O'Donnell

This is a very cool animation for your
amusement, amazement and edification.

http://www.vimeo.com/1081680

TJ O'Donnell
http://www.gnova.com/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL Array Use

2008-03-03 Thread TJ O'Donnell

I use arrays of integers, double or numeric, and text.
They are 1 or 2 dimensional.
They are usually limited to 100 elements in
one dimension and 10 in the other.

TJ O'Donnell
http://www.gnova.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Function to convert from TEXT to BYTEA?

2007-12-12 Thread TJ O'Donnell

I think decode('your string', 'escape') might be what you need.
It returns bytea when 'your string' is type text or varchar.

TJ
http://www.gnova.com

Is there a function that will do what I want to convert the datatype 
without having to create a CAST that PostgreSQL doesn't have natively? 
How else are you supposed to use the ENCRYPT and DECRYPT functions?



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


[GENERAL] a provocative question?

2007-09-06 Thread TJ O'Donnell

I am getting in the habit of storing much of my day-to-day
information in postgres, rather than flat files.
I have not had any problems of data corruption or loss,
but others have warned me against abandoning files.
I like the benefits of enforced data types, powerful searching,
data integrity, etc.
But I worry a bit about the safety of my data, residing
in a big scary database, instead of a simple friendly
folder-based files system.

I ran across this quote on Wikipedia at
http://en.wikipedia.org/wiki/Eudora_%28e-mail_client%29

Text files are also much safer than databases, in that should disk 
corruption occur, most of the mail is likely to be unaffected, and any 
that is damaged can usually be recovered.


How naive (optimistic?) is it to think that the database can
replace the filesystem?

TJ O'Donnell
http://www.gnova.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Functions on tables

2006-12-16 Thread TJ O'Donnell

I wholeheartedly support the approach BJ is advocating.
The notion that methods (functions) and variables (tables)
can be defined together is a very useful OO approach.
I too find it difficult to recall which functions belong
to which tables.  Of course, some of my functions are very
generic and wouldn't appropriately belong to any one table,
but many are meant solely to operate on data in one type of
object (table).

I've taken to using schemas to collect together functions and tables
that belong together.  This requires the use of the schema name,
as you say BJ,

... so I'm not passionately attached to the idea of being able to call
the method without prefixing the table name.


In my approach, the schema name becomes the object name and the 
functions belong to the schema.  Most OO approaches only allow one

definition of variables (tables), and I can easily allow each schema
to have only one table.  But I can also use multiple tables.
The extra tables can be instances, much like BJ's use of rows as
instances.  Using separate tables allows me to have groups of
instances that are grouped together for some reason.  I can also
have tables that are sub-classes of the original table.

TJ
http://www.gnova.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] R and postgres

2006-11-07 Thread TJ O'Donnell

I use RODBC which is available from http://cran.r-project.org/
I'm not sure if this will do what you want, or whether it has
the features of Rdbi you need, but it gets the job done for me.
I can open a channel, execute a SQL statement (typically a Select)
and read the results back into a R dataframe.
There is a nice way to dump a dataframe back into a table which
is created for you, with columns and datatypes as appropriate.

Hope this helps.

TJ O'Donnell
http://www.gnova.com/


I'd like to get R to talk to postgresql, but my head's spinning among a
web of broken links, way outdated web pages, project deprecation
announcements and a terrible signal:link ratio.

Rdbi and RdbiPgSQL seem to be the answer, despite both being apparently
defunct projects.

What is the Right Thing for a guy who wants R to talk to postgresql? 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Changing encoding of a database

2006-06-23 Thread TJ O'Donnell
 We've PostgreSQL database, with SQL_ASCII or LATIN1 encoding. We would
 like to migrate them to UNICODE. Is there some contributed/available
 script, or this is something we should do at hand?
I had a similar problem migrating from 7.4 to 8.1 and wanting to
go from sql_ascii to utf8.  I did the following:

pg_dump -p 5433 --encoding ISO_8859_7 -t cas tj |psql tj

where the dump connected to 7.4 (port 5433) and interpreted the
cas data using ISO_8859_7.  psql connected to 8.1
I had to experiment to find that ISO_8859_7 was the proper
encoding - i had some greek (math and chemistry) letters which
were accomodated by sql_ascii, but not quite properly.
The output from pg_dump above properly converts to utf8
which 8.1 (i set the default enccoding utf8) accepts without complaint.

See http://www.postgresql.org/docs/8.1/static/multibyte.html
for all the other encodings.

I don't think the above will convert a table in place, but could be
used to create a copy with changed encoding.
Hope this helps.

TJ



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] aggregate of bitstrings

2006-06-22 Thread TJ O'Donnell

AHA!  I hadn't encountered any null values in my bitstrings.
and having to include 'where xxx is not null' is rather
inconvenient and easy to forget.

indeed bitor(B'1000',null) returns null

but as a test, i nulled one row's data (the bitstring column portion only)
and my orsum without INITCOND returned the proper OR of the remaining values
when including all rows in the aggregate.  it did not return null.

maybe the aggregator (whoever,whatever that is) handles null args differently,
not calling the func when the arg is null?

pg8.1.3

TJ


Florian G. Pflug wrote:

TJ O'Donnell wrote:


create aggregate or_bit64(
   basetype=bit,
   sfunc=bitor,
   stype=bit,
   initcond=
 ''
) ;

I'm using this in production, and it works fine. I didn't find a way to
make this length-agnostic, so I defined this for all lenghts of 
bitstrings the my app uses (32 and 64).


greetings, Florian Pflug



I've created a similar aggregate using:
CREATE AGGREGATE gnova.orsum (
BASETYPE = bit,
SFUNC = bitor,
STYPE = bit
);
Notice, not using INITCOND allows bit of any length.  While it may be
poor programming practice to not initialize, the docs say:
If it is not supplied then the state value starts out null.
which is good enough for this old programmer.   AND it works :)


The problem was, as far as I remember, that bitor returns NULL if any
of it's arguments is null. So not specifying an INITCOND makes the
aggregate work for any length, but always returns null then...

greetings, Florian Pflug


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] recompliing c-language functions with new releases of postgres

2006-05-24 Thread TJ O'Donnell

I have several c-language extensions to postgres that I
distribute.  I generally have to recompile my code
with a new release of postgres, although sometimes it
appears that it functions properly (passes my tests at least)
without a recompile.  Since my users are all on various
releases of postgres, from 7.4.x to 8.1.x, I would like
to minimize the number of versions of my code that I
need to maintain and distribute.

Presumably, the only reason I would HAVE TO recompile
is when some header file changes.  Is there any guarantee
that header files DO NOT change, for example from
7.4.5 to 7.4.8 or even 7.4.12?  Can I assume that header
file changes only occur between major pg changes, such as
7.3 to 7.4, or 8.0 to 8.1?

TJ

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] recompliing c-language functions with new releases of postgres

2006-05-24 Thread TJ O'Donnell




TJ O'Donnell [EMAIL PROTECTED] writes:


Presumably, the only reason I would HAVE TO recompile
is when some header file changes.  Is there any guarantee
that header files DO NOT change, for example from
7.4.5 to 7.4.8 or even 7.4.12?  Can I assume that header
file changes only occur between major pg changes, such as
7.3 to 7.4, or 8.0 to 8.1?



Uh, no, not really; see complaint from Thomas Hallgren in -hackers
just yesterday.  We don't normally change internal APIs in patch
releases; in fact we don't change anything we don't have to.  But
we will change 'em if needed to fix a bug.

You might want to eyeball the proposed magic block for loadable
modules:
http://archives.postgresql.org/pgsql-patches/2006-05/msg00124.php

regards, tom lane


I understand and appreciate bug fixes, but isn't one of the purposes of
major releases to provide some stability (say of API) within
the major release?  I know in some software systems (and users complain
about this) some bug fixes which would require API, or other major changes
are postponed until the next major release.  Maybe the changes Thomas
Hallgren was pointing out in 8.1.4 are quite rare and we both realized
at the same time that we were not in Utopia.

As far as I can see, the magic block stuff would only work BETWEEN
major releases, so this would not help me (much) or Thomas' 8.1.4+ woes.


It now only checks four things:

Major version number (7.4 or 8.1 for example)
NAMEDATALEN
FUNC_MAX_ARGS
INDEX_MAX_KEYS


TJ

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] postgres vs. oracle for very large tables

2006-05-15 Thread TJ O'Donnell

I've written some extensions to postgres to implement
chemical structure searching.  I get inquiries about
the performance of postgres vs. oracle.  This is a huge
topic, with lots of opinions and lots of facts.  But,
today I got some feedback stating the opinion that:
Postgres performance diminishes with large tables
 (we’ll be going to upwards of hundreds of millions of rows).

Is this pure speculation, opinion, known fact?
Does anyone know of measured performance of postgres
vs. oracle, specifically with very large tables?

TJ O'Donnell
www.gnova.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] efficiency of group by 1 order by 1

2006-03-17 Thread TJ O'Donnell

 Is there a way to eliminate the ugly repeated use of
 date_trunc('day',endtime)?


In this particular case you could say

... GROUP BY 1 ORDER BY 1;

I use a similar SQL, e.g.:
 select func(x)group by func(x) order by func(x)
but my func is rather expensive.  Is func(x) evaluated three times
in the above statement?  Would it be evaluated only once if I used
 select func(x)group by 1 order by 1

TJ O'Donnell
www.gnova.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[GENERAL] Five reasons why you should never use PostgreSQL -- ever

2006-03-15 Thread TJ O'Donnell

Slashdot had this today.

http://searchopensource.techtarget.com/originalContent/0,289142,sid39_gci1172668,00.html

TJ O'Donnell
www.gnova.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] I see this as the end of BDB in MySQL without a doubt.

2006-02-15 Thread TJ O'Donnell

Chad [EMAIL PROTECTED] wrote:
What we need now is an open source DB with clean APIs into various
places in the software stack (eg we need a Berkeley DB kind of API
under the hood into something like Postgres) A full bells and whistles
relational DB with these low level ACCESS APIs will be a powerfull
thing in the future. PostgreSQL take note. If you don't already have it
you should begin exposing such a thing today in my humble opinion.

I am quite happy with the c-language API for postgres, as far as it's
capabilities and access to low-level postgres.  OK, the docs and examples could
be better.  Am I missing something in Chad's comments/warnings or is he
missing something in not understanding pg better?
Chad, could you say more about what in the BDB/API is missing and needed in 
postgres?

Could it be that Oracle's recent company purchases were intended simply to 
confuse
people about the future of MySQL and therefore ecourage them to select Oracle?

TJ O'Donnell
http://www.gnova.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] Converting Text to Bytea

2006-02-01 Thread TJ O'Donnell

decode(your_string,'escape') will convert text to bytea
You can just use this as a sort of type cast, or:
1. Create a new bytea column in your table
2. Update the table, setting the newcolumn = decode(oldcolumn,'escape')
3. Drop the oldcolumn (or not)

TJ


Date: Wed, 01 Feb 2006 10:04:36 +
From: Howard Cole [EMAIL PROTECTED]
To: 'PgSql General' pgsql-general@postgresql.org
Subject: Converting Text to Bytea
Message-ID: [EMAIL PROTECTED]

Hi,

Can anyone tell me how to convert a text column to a bytea column?

Thanks.

Howard Cole
www.selestial.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] Oracle DB Worm Code Published

2006-01-07 Thread TJ O'Donnell

A recent article about an Oracle worm:
http://www.eweek.com/article2/0,1895,1880648,00.asp
got me wondering.
Could a worm like this infect a PostgreSQL installation?
It seems to depend on default usernames and passwords -
and lazy DBAs, IMO.
Isn't it true that PostgreSQL doesn't have any default user/password?
Is this an issue we should be concerned about, at some level?

TJ O'Donnell

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] resetting superuser password

2005-11-09 Thread TJ O'Donnell

I would unix su, edit pg_hba.conf to allow open access
temporarily, connect to pg and change the posgres password.
Don't forget to change pg_hba.conf back again to password
protect the db!

TJ



I have lost the superuser (user postgres) password, but I still have
the unix root password. Is there anything I can do to reset the
postgres user's password (short of reinstalling Postgres)?





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] using new bitmap scans to index bit columns?

2005-11-09 Thread TJ O'Donnell

I like the new bitmap scans and I'm wondering if there is any way
I can utilize them for my bitmask column, defined as bit(1024).
I use this column as a sort of fingerprint to quickly scan my tables.
But it is a scan, not an index.  I have not figured out a way to
index the bitmask column.  Is there some way it can be used as
an index now that there are bitmap scans in 8.1?

Currently I do this:
Select * from mytable where contains(bitmask, fingerprint(user_data))
 and matches(datacolumn, user_data);

user_data is a string, like a regexp but with different semantics for chemical 
data.
bitmask is precomputed/stored as bit(1024) = fingerprint(datacolumn)
contains(a,b) returns bool as 'select b=(ab);'

This works well because matches() is an expensive functions.
But it would work better if bitmask could be indexed, no?

TJ O'Donnell

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] new type question

2005-10-17 Thread TJ O'Donnell

I was needing something similar last week, not to throw an error,
but to catch an error when 99% of my column's data is real, but some
is not (e.g. '1.2-1.4' or '32.7').  I wanted to do it in pure
SQL, but settled on this.  Is there a way to do this in pure
SQL (so it will be faster)?

Declare x real;
Begin
  x = cast($1 as real);
  Return x;
Exception
  When OTHERS Then
Return NULL;
End;

Thanks,
TJ


On a side note, do you really want to punt to 0 when an invalid value
comes it? That sounds like something MySQL would do... ISTM you should
throw an error.

Also, you could have written that as a pure SQL function, which would
have been faster (assuming you could use something other than C for
this).

create function uint_in(val cstring) returns uint2 as

$$
declare thisval int4;
begin
 thisval=val::int4
 if thisval between 0 and 65535 then
  return (thisval-32768)::int2;
 else
  return 0;
 end if;
end
$$ language 'plpgsql';





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] dynamic loading of .so

2005-10-14 Thread TJ O'Donnell

I have begun to use some static variables in my c-language
functions to maintain state (3rd party licensing issues) during
the course of a session (postgres process, spawned by postmaster).
These are declared static outside the scope of any function.
(is global the correct term anymore?)
When I use dynamic loading of my .so,
each session is independent, with its own static variables.
Will the same be true if I were to load the .so once when
the database starts up?  Or will there be just one set of
static variables for all sessions?

TJ

---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] dynamic loading of .so

2005-10-14 Thread TJ O'Donnell

I have begun to use some static variables in my c-language
functions to maintain state (3rd party licensing issues) during
the course of a session (postgres process, spawned by postmaster).
When I use dynamic loading of my .so,
each session is independent, with its own static variables.
Will the same be true if I were to load the .so once when
the database starts up?  Or will there be just one set of
static variables for all sessions?

TJ

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] problem with bit(n) type

2005-10-12 Thread TJ O'Donnell

Binary data in bit(n) has no text equivalent, and so cannot be cast as text.
It has an external representation, e.g. B'10010101', but this is not
the same as a cast.  In some cases, where there bit(n) has (n mod 8) = 0
and the bitstring happens to be valid ascii in the text range, one could
say there is a text equivalent, but in general not.

I think you're looking for some kind of printf-type %b function?  Maybe
someone has written one, or maybe you'll have to do so.  If you do,
how about a hex representation of bit(n) - I could use that ;)

TJ O'Donnell
http://www.gnova.com/
-

But this is the big problem for me... Is there a way to cast bit(n) to
text or varchar?

thanks in advance.

Sergey Karin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] c-language function .h file changes between (major) versions

2005-09-26 Thread TJ O'Donnell

I've been developing c-language functions in 7.4.5 and 7.4.8.
I had not encountered a need to recompile, because of pg .h file,
or other differences.  I expected a need to do so when upgrading
to 8.0.3.  But I forgot to do so and it worked!!  I figured I
lucked out, a suspicion which was confirmed when my functions
failed with 8.1beta1, but only some functions.  A recompile
restored correct operation under 8.1beta1.

So, my question is: Is there some hard and fast rule about when
one needs to recompile c-language functions from one pg version to
another?  Always for a major version change?  minor?  I suppose
the wise thing is to recompile with ANY pg version change.

Thanks,
TJ

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] do separate databases have any impact each other?

2005-08-16 Thread TJ O'Donnell



Alvaro Herrera wrote:

On Mon, Aug 15, 2005 at 01:15:03PM -0700, TJ O'Donnell wrote:

While writing installation instructions for my new PostgreSQL product, I 
found myself

writing the following sentence:
For first time users, we recommend building the gnova database,
since it has no impact on other databases.



A problem not mentioned already is that if your database contains a C
function and it has a bug which results in a crash, your whole
production database will go down.  This may not be desirable.


Well I have the perfect solution for that grin
I just won't write any buggy code!
Seriously, this could be a problem, especially since there is
3rd party software involved, too.  Guess I've opened myself up for some
serious testing procedures.  That's a good thing.
Since I'm writing in C++ with a thin interface to C for pg,
I catch a lot of nasty situtations which used to crash my C programs.

TJ


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] do separate databases have any impact each other?

2005-08-15 Thread TJ O'Donnell

While writing installation instructions for my new PostgreSQL product, I found 
myself
writing the following sentence:
For first time users, we recommend building the gnova database,
since it has no impact on other databases.

Is this really true?  Of course, my gnova database will take some disk space,
but other than that can y'all think of other concerns a big production database
admin might have about installing a new (smallish) database?
Is it true that if no one has begun a session involving my database, it simply
sits on disk and cannot possibly interfere with other databases/sessions?
When a session ends that had opened my database, do all traces of it disappear,
except its life on disk?  How about .so's it might have caused to be loaded?

TJ


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[GENERAL] append to the current search_path

2005-08-12 Thread TJ O'Donnell

How can I append to the current search_path?
How can I put $user (back) into the search path?
These commands all fail:
 set search_path  $user,public,gnova;
 set search_path \$user,public,gnova;
 set search_path '$user',public,gnova;

Ideally, I would like something like PATH=${PATH}:/usr/local/bin
where I can get the current value and just append to it.

TJ

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] best way to reference tables

2005-08-10 Thread TJ O'Donnell

Alvaro Herrera [EMAIL PROTECTED] writes:


On Tue, Aug 09, 2005 at 04:01:33PM -0400, Tom Lane wrote:



Yeah, you can store the pg_class OID of the table,




Maybe it is possible to use a column of type regclass to store it.  Not
sure exactly what advantage that would give, but it's an idea.


That is a great idea!  And you examples were very helpful.  I think this
will get me what I need.
Thanks,
TJ

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] best way to reference tables

2005-08-09 Thread TJ O'Donnell
 TJ O'Donnell wrote:
 I have many different tables that I want to keep track of.
 So, I thought of a master table with those table names in it.
 But, to maintain this (suppose a table changes
 its name, gets dropped) I want to have some kind of referential
 integrity - the way foreign keys and constraints do.
 What could I use that would disallow inserting a name into
 the master table unless another table by that name already exists? And what 
 could ensure that
 a table would not be renamed or dropped unless the master table is changed?

 You can write your own triggers that would stop you from adding a  
 non-existent table to your
 master-table. You can't fire triggers on  changes to system tables though, so 
 you can't stop
 someone adding a table.

 In any case, you presumably want to track ALTER TABLE ADD COLUMN too so  
 you'll need something a
 little smarter.

 Have you considered pg_dump --schema-only along with suitable
 version-control software (CVS/Subversion/Arch etc)? Or are you trying to  
 track something
 specific?

 --
Richard Huxton
Archonet Ltd

My tables are a subset of all the tables - I'm not trying to keep track of 
everything!
So, I think I'll create a schema for the tables I need to keep track of, and 
create
some procedures to properly create/modify tables therein.  I don't need to
keep track of all the innards of each table - ADD COLUMN, etc.

I guess what I'm asking is: Is there a better way to keep track of a table once
it's registered in my master table than just to put its name in my master table?
Some system oid type thing that stays fixed in spite of renames or other
tables mods?

TJ



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] user's groups

2005-08-02 Thread TJ O'Donnell
I know of the four user's group listed at http://pugs.postgresql.org/
I'm interested in starting one in the San Diego area.
If there are others in San Diego who are interested,
please get in touch with me.

Thanks,
TJ O'Donnell
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] dynamic loading of c-functions

2005-07-21 Thread TJ O'Donnell
It is clear from the manual that c-functions can cause the dynamic loading
of .so's and that subsequent usage in the same database session will use
the already loaded function.  But, when the session is done, will the loaded
function remain in the server for subsequent sessions?
When/how does a dynamically loaded .so get unloaded?
I want to avoid preloading .so's at server startup, but also avoid dynamically
loading them for ~every~ session.

TJ



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] [SQL] dynamically loaded functions

2005-07-13 Thread TJ O'Donnell


 It sounds like you need to link gnova.so against the other shared
 objects so the runtime linker can find them.  For examples, see the
 Makefiles used by contributed modules like dblink, xml2, and a few
 others that link against external libraries.

That approach is working, but only after much troubles.
I have several 3rd party libraries, some of which call each other.
It appears I need to carefully order the libraries in the
link step, or some needed functions do not get included in the
final .so.

This problem used to crop up all the time way back 20 years ago
with linkers.  I thought all this name resolution stuff was all worked
out with modern linkers.  I'm linking with (linux redhat)
 gcc -shared -o my.so my.o my2.o their.a their2.a their3.a
When function x() in their2.a calls something in their.a
 (or is it the other way around?)
I get an error from postmaster that my.so cannot be loaded because
function x cannot be found.
If I reverse their.a their2.a in the link command, all is well.
Note: I never use, nor even knew about the exitence of function x() - they do.

Any help on how to make this more pain-free?

TJ


Michael Fuhr wrote:

On Mon, Jul 11, 2005 at 08:16:17PM -0700, TJ O'Donnell wrote:


CREATE or REPLACE FUNCTION cansmiles(varchar) RETURNS varchar
  AS 'gnova', 'oe_cansmiles' LANGUAGE 'c' IMMUTABLE STRICT;
requires preloading of oe_chem.so to work.

Is there any way I can associate oe_cansmiles with 2 .so's without
preloading?



It sounds like you need to link gnova.so against the other shared
objects so the runtime linker can find them.  For examples, see the
Makefiles used by contributed modules like dblink, xml2, and a few
others that link against external libraries.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[GENERAL] user privilege to create c function

2005-07-12 Thread TJ O'Donnell
Until now I have been content to have the superuser CREATE FUNCTION...LANGUAGE 
'C'
because I noticed that ordinary users could not:

ERROR:  permission denied for language c

I would like to allow a user to create C language functions, but can't
find just which privilege I need to grant.  The user owns the
database, by the way, but still can't create the function.  The user only
needs to create C functions for her own database(s).

TJ




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] varbit functions

2005-04-02 Thread TJ O'Donnell
I'm using bit and bit varying datatypes to store chemical
fingerprints and need various functions to operate on
these datatypes.  The logical operators (and, or, not)
and the shift operators, as well as length and octet-length
all work fine on bit types.  I was hoping for a manual entry
that specifically discusses functions for bit and bit varying,
like the one for dates, strings, etc.  Does anyone know of such
a manual entry?  I see bits (no pun intended) of info scattered
around.  \df helps some too.  I'll consider writing such an entry
unless someone can point me to something existing.
So, the functions I need are (at least):
int  nbits_set(bit);to count # of bits set in the string.
void bit_set(bit, int) ;to set a particular bit in a string.
bool isbit_set(bit);to test whether a particular bit is set.
I've already written nbits_set (in c) and could write the others,
but I want to ask if functions like these already exist (and I
just can't find them)?  Does anyone have any ideas about how to
use combinations of existing functions (shifts, ands, etc.) to do
these functions?  The bit strings I am interested in range from
20 to maybe 1024.
TJ
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[GENERAL] bit varying(512) vs bytea(64)

2005-03-21 Thread TJ O'Donnell
I have N-bit data pairs.  I want to write a c-language function
which compares bits set in each.  N is typically 512, but could be other.
I can store this as bit varying(512) or bytea(64).  I can't decide which.
Here are the questions that concern me.
1) will each take the same storage?
2) can I pass bit varying data to a c-language function?  I can't find any
   docs or examples of that.
3) are bit strings stored as actual bits or as character strings of 0 and 1?
Thanks,
TJ
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[GENERAL] does the planner learn?

2005-02-07 Thread TJ O'Donnell
I understand the value of indexes and of ANALYZE for the efficient use of them.
In the following statement, you can see that the index scan is being used.
Even though it takes 80 seconds (for a 1.25 million row table), it is
much faster than without the index.
But, if I repeat this search, it speeds up by more than a factor of 2!
I love it, but I'd sure like to understand why.  When I do it a third time,
it speeds up again.  A fourth try does not speed it up more.
Is this speedup due to some memory/disk buffering from which I'm
benefiting?  I'm using linux (actually under VMware on WinXP, so it's even less
efficient that it could be on it's own).  Or is the planner learning
something from previous runs of this search?  It appears not, since the
rows it thinks it needs to search are the same in the EXPLAIN ANALYZE
outputs below.  Can someone help me understand why my searches are speeding
up so I can make it happen the first time, if possible?
Thanks,
TJ
Try #1:
Select distinct smiles from structure where (_c, _o, _arom_c, _c_double, _o_double, _n, _ring_c, _ring_hetero, _halogen, 
_n_double, _arom_n, _s, _s_double, _other_atoms, _c_triple, _n_triple, _p, _arom_s, _p_double, _arom_o) = 
(4,2,6,2,2,1,4,1,0,0,0,0,0,0,0,0,0,0,0,0) and oe_matches(smiles,'c1ccc(cc1)C2CC(=O)NC2=O') limit 500

Limit  (cost=92649.53..92652.35 rows=500 width=49) (actual time=81544.566..81545.522 rows=117 loops=1)
  -  Unique  (cost=92649.53..92688.60 rows=6924 width=49) (actual time=81544.561..81545.174 rows=117 loops=1)
-  Sort  (cost=92649.53..92669.06 rows=7813 width=49) (actual time=81544.553..81544.726 rows=117 loops=1)
  Sort Key: smiles
  -  Index Scan using fingerprint on structure  (cost=0.00..92144.36 rows=7813 width=49) (actual 
time=36.179..81533.872 rows=117 loops=1)
Index Cond: ((_c = 4) AND (_o = 2) AND (_arom_c = 6) AND (_c_double = 2) AND (_o_double = 2) 
AND (_n = 1) AND (_ring_c = 4) AND (_ring_hetero = 1) AND (_halogen = 0) AND (_n_double = 0) AND (_arom_n = 0) AND 
(_s = 0) AND (_s_double = 0) AND (_other_atoms = 0) AND (_c_triple = 0) AND (_n_triple = 0) AND (_p = 0) AND 
(_arom_s = 0) AND (_p_double = 0) AND (_arom_o = 0))
Filter: oe_matches(smiles, 'c1ccc(cc1)C2CC(=O)NC2=O'::character varying)
Total runtime: 81545.903 ms

Try #2:
Limit  (cost=92649.53..92652.35 rows=500 width=49) (actual time=36924.436..36925.450 rows=117 loops=1)
  -  Unique  (cost=92649.53..92688.60 rows=6924 width=49) (actual time=36924.431..36925.051 rows=117 loops=1)
-  Sort  (cost=92649.53..92669.06 rows=7813 width=49) (actual time=36924.423..36924.596 rows=117 loops=1)
  Sort Key: smiles
  -  Index Scan using fingerprint on structure  (cost=0.00..92144.36 rows=7813 width=49) (actual 
time=14.591..36891.589 rows=117 loops=1)
Index Cond: ((_c = 4) AND (_o = 2) AND (_arom_c = 6) AND (_c_double = 2) AND (_o_double = 2) 
AND (_n = 1) AND (_ring_c = 4) AND (_ring_hetero = 1) AND (_halogen = 0) AND (_n_double = 0) AND (_arom_n = 0) AND 
(_s = 0) AND (_s_double = 0) AND (_other_atoms = 0) AND (_c_triple = 0) AND (_n_triple = 0) AND (_p = 0) AND 
(_arom_s = 0) AND (_p_double = 0) AND (_arom_o = 0))
Filter: oe_matches(smiles, 'c1ccc(cc1)C2CC(=O)NC2=O'::character varying)
Total runtime: 36925.820 ms

Try #3:
Limit  (cost=92649.53..92652.35 rows=500 width=49) (actual time=23712.435..23713.394 rows=117 loops=1)
  -  Unique  (cost=92649.53..92688.60 rows=6924 width=49) (actual time=23712.430..23713.046 rows=117 loops=1)
-  Sort  (cost=92649.53..92669.06 rows=7813 width=49) (actual time=23712.422..23712.599 rows=117 loops=1)
  Sort Key: smiles
  -  Index Scan using fingerprint on structure  (cost=0.00..92144.36 rows=7813 width=49) (actual 
time=17.548..23631.915 rows=117 loops=1)
Index Cond: ((_c = 4) AND (_o = 2) AND (_arom_c = 6) AND (_c_double = 2) AND (_o_double = 2) 
AND (_n = 1) AND (_ring_c = 4) AND (_ring_hetero = 1) AND (_halogen = 0) AND (_n_double = 0) AND (_arom_n = 0) AND 
(_s = 0) AND (_s_double = 0) AND (_other_atoms = 0) AND (_c_triple = 0) AND (_n_triple = 0) AND (_p = 0) AND 
(_arom_s = 0) AND (_p_double = 0) AND (_arom_o = 0))
Filter: oe_matches(smiles, 'c1ccc(cc1)C2CC(=O)NC2=O'::character varying)
Total runtime: 23713.765 ms

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Does indexing help = as well as = for integer columns?

2005-02-02 Thread TJ O'Donnell
I had thought that the Creation of the Index would do something
equivalent to Analyze.  I tried Analyze Verbose and it improved
the scanner's ability to predict when an index would be useful.
Last week, I asked about visualizing B-tree coverage.  I think
I meant Can I see the histograms that Analyze creates?
Are they available anywhere?  The docs mention them (bins) and I
was hoping Analyze Verbose would show them to me.
TJ
Tom Lane wrote:
TJ O'Donnell [EMAIL PROTECTED] writes:
This I don't get.  Why is an index scan not used?  Isn't an index supposed
to help when using   = = too?
Explain Analyze Select count(smiles) from structure where _c = 30
Aggregate  (cost=196033.74..196033.74 rows=1 width=32) (actual 
time=42133.432..42133.434 rows=1
loops=1)
 -  Seq Scan on structure  (cost=0.00..191619.56 rows=1765669 width=32) (actual
time=8050.437..42117.062 rows=1569 loops=1)
   Filter: (_c = 30)

Have you ANALYZEd the table lately?  That rowcount estimate is off by
about three orders of magnitude :-(
			regards, tom lane
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


[GENERAL] Does indexing help = as well as = for integer columns?

2005-02-01 Thread TJ O'Donnell
I have a table of about 5 million rows, 24 columns.
Integer column _c is BTREE indexed (as is _n, _o and 3 others).

This I understand and like:
Explain Analyze Select count(smiles) from structure where _c = 30
Aggregate  (cost=105595.11..105595.11 rows=1 width=32) (actual 
time=17.722..17.724 rows=1 loops=1)
  -  Index Scan using Nc on structure  (cost=0.00..105528.89 rows=26486 
width=32) (actual
time=0.098..16.095 rows=734 loops=1)
Index Cond: (_c = 30)
Total runtime: 18.019 ms

This I don't get.  Why is an index scan not used?  Isn't an index supposed
to help when using   = = too?
Explain Analyze Select count(smiles) from structure where _c = 30
Aggregate  (cost=196033.74..196033.74 rows=1 width=32) (actual 
time=42133.432..42133.434 rows=1
loops=1)
  -  Seq Scan on structure  (cost=0.00..191619.56 rows=1765669 width=32) 
(actual
time=8050.437..42117.062 rows=1569 loops=1)
Filter: (_c = 30)
Total runtime: 42133.746 ms


TJ



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] visualizing B-tree index coverage

2005-01-27 Thread TJ O'Donnell
I realize that using OR will not result in an index scan.
I will never be interested in a OR condition for the kinds
of searches I use.  In my Select statements, I always name
every column of the multi-column index in same order that
they were named when creating the index.  I always use
the = condition, and very rarely, the = condition.
However, I am concerned that I must place
the most selective column first in my index.  I cannot tell,
a priori, which column will be most selective.  That depends on the
nature of search, which can vary widely each time.
Are you saying that if my first column is not selective, even though the 
remaining
columns are, the planner may choose not to use the index after
seeing that the first column is not very selective?
That seems like an oversight, IMHO.  Shouldn't the overall effect of
using all the columns be considered before choosing not to use an
index scan?
Since I'm using every column of my multi-column index for every search,
and I always use =, Explain Analyze always shows that every column
is considered in the index scan.  However, that is only when the
index scan is used.  Sometimes, Explain Analyze shows it is not used.
That appears to happen when my search condition is very general.
This it to be expected, so I am not worried.  Most of my searches will
be intermediate, namely not VERY selective, but also not VERY general.
So the idea of the multi-column index is to characterize each row
sufficiently, even when it is a perfectly ordinary row with no ONE
feature being distinctive, but rather several features together giving
it it's distinctive character.  That is my interpretation of the
multi-column index.
TJ
PFC wrote:
I think you missed an important feature of multicolumn indexes, 
that  you better not use 'OR' in your expressions. You seem to want only 
to use  '=' so this should be OK.

Suppose you have 3 columns a,z,e containing values linearly 
distributed  between ...

select min(a),max(a),min(z),max(z),min(e),max(e) from test;
 min | max | min | max | min | max
-+-+-+-+-+-
   0 |  13 |   0 |  99 |   0 |  99
For instance the following query is indexed :
explain analyze select * from test where a=0 and z=90 and e=0;
   QUERY PLAN
- 

 Index Scan using testa on test  (cost=0.00..1637.56 rows=11345 
width=16)  (actual time=0.085..51.441 rows=13000 loops=1)
   Index Cond: ((a = 0) AND (z = 90) AND (e = 0))
 Total runtime: 56.307 ms

The following is only partially indexed :
explain analyze select * from test where (a=1 or a=2) and (z=1 or z=8) 
and  e=0;
 QUERY PLAN
 

 Index Scan using testa, testa on test  (cost=0.00..3269.06 rows=346  
width=16) (actual time=0.328..52.961 rows=400 loops=1)
   Index Cond: ((a = 1) OR (a = 2))
   Filter: (((z = 1) OR (z = 8)) AND (e = 0))
 Total runtime: 53.297 ms

You see the 'index cond' field which is what determines the fetched 
rows,  which are then fetched and filtered with the 'filter' expression. 
Having  the most selective index cond is important because it will 
diminish the  number of rows to be fetched. However, in your case the 
filter expression  is also beneficial because any row eliminated by the 
filter will not need  to go through your expensive matching function.

In this case :
SELECT count(*) FROM test;
= 131072
SELECT count(*) FROM test WHERE ((a = 1) OR (a = 2));
= 2
SELECT count(*) FROM test WHERE (a=1 or a=2) and (z=1 or z=8) and e=0;
= 400
In this case the index fetches 20k rows out of 131072 but only 400 are  
used...

If you don't use OR, index use is more likely :
explain analyze select * from test where (a,z,e) = (0,50,80);
   QUERY PLAN
- 

 Index Scan using testa on test  (cost=0.00..1669.78 rows=12627 
width=16)  (actual time=0.087..58.316 rows=13000 loops=1)
   Index Cond: ((a = 0) AND (z = 50) AND (e = 80))
 Total runtime: 63.049 ms

Here you have a full index scan.
To determine the efficiency of your indexes, you can thus use this 
method,  and look at the 'index cond' and 'filter' expressions, and 
counting the  rows matched by each.



particular number of columns
for indexing.  I don't want to use too many, nor too few columns.  I also
want to optimize the nature(which atom types, bond types, etc.)
of the count columns.  While I could do this
and use the speedup as the measure of success, I think
that if my B-tree were covering the data well, I would get the best  
results.
Covering means finding that optimal situation 

[GENERAL] visualizing B-tree index coverage

2005-01-25 Thread TJ O'Donnell
Does anyone know of a tools that allows one to visualize
the tree created by a multi-column B-tree index?
A picture of a tree with branches, showing how branchy the
tree is would be great.
I'm wondering how well I've clustered the data in my table
using the multi-column index.  In other words, do my
multi-columns sufficiently but not overly discriminate rows from each other?
Do I have too many with the same index? (not enough branches)
Do I have a unique index for each row? (way too many branches)

Thanks,
TJ



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] visualizing B-tree index coverage

2005-01-25 Thread TJ O'Donnell
Since I'm using a multi-column index, I can greatly influence
the nature of the index created, depending on which columns I use
and how many.  I'm searching for an optimal set
of columns that creates an index that, for sure does not have
every value the same, nor only two values.  Instead, I want to see
how well I've spread the index out over the data (if that phrasing makes sense).

More specifically, I have character data representing molecular structures.
I've written (rather slow) search functions.  I can create any number of
columns that fingerprint each structure, e.g. # Carbon atoms, # N atoms,
# single bonds, etc.  I expect my fingerprints will not be unique (fingerprint 
may
be a poor analogy), but rather will classify similar structures together.
I create a multi-column index on these counts and
get about 2-3 times speedup using 13 columns right now.
For example:

select count(smiles) from structure where  
oe_matches(smiles,'c1c1CC(=O)NC')  about 15 sec.

select count(smiles) from structure where
 (_c, _n, _o, _s, _p, _halo,
  _arom_c, _arom_n, _arom_o, _arom_s,
  _atoms, _single_bonds, _other_bonds)  =
 ( 3,1,1,0,0,0, 6,0,0,0, 11,4,7 )
 and oe_matches(smiles,'c1c1CC(=O)NC')   about 6 seconds
when the (_c, etc.) is a multi-column index.

The data isn't inherently structured in any way that invites some particular 
number of columns
for indexing.  I don't want to use too many, nor too few columns.  I also
want to optimize the nature(which atom types, bond types, etc.)
of the count columns.  While I could do this
and use the speedup as the measure of success, I think
that if my B-tree were covering the data well, I would get the best results.
Covering means finding that optimal situation where there is not one index for 
all rows
and also not a unique index for every row - something inbetween would be ideal,
or is that basically a wrong idea?

TJ



 Useful explanation of PostgreSQL index format:
 http://www.faqs.org/docs/ppbook/c13329.htm

 I think you are aiming for the wrong thing.
 The worst possible index is one with every value the same.
 The second worst (still basically useless) is one with only two values. The 
 greater the
 differentiation of the data, the more workload is
 reduced on a search.

 Since it isn't a straight binary tree, I don't think that having highly 
 dissimilar data in the
 index should be a problem.

 Do you have data or experience that shows otherwise?

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of TJ O'Donnell Sent: Tuesday, January 25,
 2005 2:19 PM
 To: pgsql-general@postgresql.org
 Cc: [EMAIL PROTECTED]
 Subject: [GENERAL] visualizing B-tree index coverage

 Does anyone know of a tools that allows one to visualize
 the tree created by a multi-column B-tree index?
 A picture of a tree with branches, showing how branchy the
 tree is would be great.
 I'm wondering how well I've clustered the data in my table
 using the multi-column index.  In other words, do my
 multi-columns sufficiently but not overly discriminate rows from each other?
 Do I have too many with the same index? (not enough branches)
 Do I have a unique index for each row? (way too many branches)

 Thanks,
 TJ



 ---(end of broadcast)--- TIP 
 7: don't forget to
 increase your free space map settings




---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] how to optimize my c-extension functions

2005-01-10 Thread TJ O'Donnell
I was not hoping that indexing, per se, would help me.
In fact, indexing smiles would be of virtually no use
to me, except for exact matches, e.g. where smiles = 'CCCOC';
I was only trying to subvert the use of indexing for
my own purposes, to store the parsed smiles somewhere
automatic for the sql user, yet transparently available to my
functions for quick searches.
I think I've thought about this enough and gotten enough advice
to realize I should do this the straightforward way.
I should store the parsed smiles in a separate column,
have a trigger to keep it up to date, and require the
user to pass me the parsed_smiles column for quick searches.
And the user could maintain the parsed_smiles in a separate
table, if he so desired, with foreign key relations.
Thanks to everyone for all your advice.  This is my first
postgresql project and I'm liking what I've seen so far.
TJ
Tom Lane wrote:
TJ O'Donnell [EMAIL PROTECTED] writes:
The only type of search will be of the type:

Select smiles,id from structure where  oe_matches(smiles,'c1c1C(=O)N');

You haven't really said much about how you expect an index to be able to
help you with this, but I think if any index type can help you it will
be GiST.  What you would do is define an operator on top of the
oe_matches function, so that the above query is written say
Select smiles,id from structure where smiles ~~ 'c1c1C(=O)N';
and then construct a GiST operator class that accepts ~~ as an
indexable operator.  There's not a huge amount of
plain-old-documentation about GiST but there are quite a few examples
available in the contrib/ tree.
I don't think you can completely hide the existence of the parsed
version of the smiles data.  The easiest way to go at it would be to
write the queries like
Select smiles,id from structure where smiles_parsed ~~ 'c1c1C(=O)N';
where smiles_parsed is the extra column holding the parsed data, and
the ~~ operator is grabbed by a GiST index over that column.
Plan B would be to construct the index as a functional index and write
Select smiles,id from structure where parsed(smiles) ~~ 'c1c1C(=O)N';
However plan B doesn't readily support applying any other operations to
the parsed data, since it doesn't exist anywhere except inside the
index.  Since you mentioned having other things you wanted to do with it,
I think you'll end up wanting the separate column.
			regards, tom lane
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [GENERAL] how to optimize my c-extension functions

2005-01-09 Thread TJ O'Donnell
Yes, my c function and it's sql counterpart, oe_matches(smiles)
uses two steps (1) parse smiles (2) search parsed smiles.
Parsing is expensive.  The smiles has an external string representation,
which is stored in a smiles column, but only the parsed form is actually 
searchable.
The smiles representation is never searched in a traditional string
manner, except perhaps for a direct lookup (string equals).
LIKE has no meaning for smiles strings, similarly  or  are
meaningless.

Smiles is parsed into atom and bond representations using
3rd party c++ code/methods.  I simply call their methods
to parse and search.  A binary string can be got from them
for persistent storage in a postgres column.  It can then be
restored into a c++ object for searching, thus avoiding the
parsing stage, except for the initial creation of a row with
a smiles column.
My goal here is to optimize the search by storing the parsed smiles,
YET HIDE THIS FROM THE USER.  I thought I might be able to store
the parsed smiles in an index for me to use while searching, even
though it would not be used for indexing in the traditional manner.
This would mean creating a new indexing method.  I've read up on this
and it seems a daunting task.  Am I perverting the index method if
I try to do this?
So, aside from having the user be responsible for maintaining a
column of parsed smiles (and possibly other related columns which
would speed up the search), is there a way I can create and maintain
a table related to the table containing the smiles - and all
behind the scenes so the sql user is unaware of this.
My thought was that an index is like that and I might borrow some
of the internal uses of indexing for my purposes.
TJ O'Donnell
[EMAIL PROTECTED]
Pierre-Frédéric Caillaud wrote:
I gather your program uses two steps, let's call them :
- parse( smiles ) - data
- search( data ) - result
You can create a functional index on your smiles column, but I don't 
know  if this will help you ; you can do things like CREATE INDEX ... 
ON  mytable( lower( myfield )), then SELECT ... FROM mytable WHERE  
lower(myfield) = something, BUT in your case I gather your search 
function  which processes the parsed data does a lot more than just dome 
character  match, so creating a functional index on parse(smile) would 
be useless for  selecting on search(parse(smile))...

So, in any case, if the parsing phase is slow, you can store the  
preparsed data in a text or binary field and search it directly, but 
this  will not be indexed.

If you can map a subset of your searchable properties to simple  
datatypes, you could do a first search for related matches, as you said.

You say nothing about how your system works internally, whta kind 
of  representation is it and what kind of searches do you actually do ?


On Sat, 08 Jan 2005 15:50:06 -0800, TJ O'Donnell [EMAIL PROTECTED] wrote:
I've written a c-language extension to postgresql to implement a  
chemical search of a varchar column (named smiles, typically).
It might be called as:
  oe_matches(smiles,'COCC') where 'COCC' is a typical search string.
This uses 3rd party functions to parse the varchar inputs into c++  
objects.  In order to optimize this search, I'd like to parse the 
whole  table containing smiles just once, store it and use the parsed 
data  instead of the varchar smiles.

So, I could create another column, say smiles_ob and have the
oe_matches function take that column.  I'd have to be sure the 
smiles_ob  column was related (by a trigger?) to the smiles column.
But I was thinking I might be able to hide these parsed objects from 
the  user by somehow incoporating the parsed objects into a type of 
index.
I'd like also to use additional columns (like molecular formula) in
the match function to triage the table to quickly rule out impossible
matches, before doing a full search/match.

Am I way off the track here?  Is it a bad idea to clutter the index
with things like this?  Is it possible?   Is there another
approach that could hide some of these details from the user - meaning
they would not have to create/update these additional columns?
Thanks,
TJ O'Donnell
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if  
your
  joining column's datatypes do not match


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [GENERAL] how to optimize my c-extension functions

2005-01-09 Thread TJ O'Donnell
To add to my last followup posting, the only way I use
oe_mathces(smiles) is in something like the following:
Select smiles,id from structure where oe_matches(smiles,'CCOC');
The match string 'CCOC' in this case, varies widely according to
the needs of the user during that session.
It is analogous to regexp matching, except that the semantics
of oe_matches is nothing at all like string matching, even though
smiles is actually a character string.  Character string smiles
are simply an extenral representation of a more complex c++ molecular
structure object.
TJ
Pierre-Frédéric Caillaud wrote:
 You say nothing about how your system works internally, whta kind
 of  representation is it and what kind of searches do you actually do ?


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] how to optimize my c-extension functions

2005-01-09 Thread TJ O'Donnell
Let me first say that I will routinely be dealing with
one million+ rows, so I want to take care to optimize
my work as much as possible, and to consider carefully
my design decisions.
The only type of search will be of the type:
Select smiles,id from structure where  oe_matches(smiles,'c1c1C(=O)N');
or joins with other tables e.g.
Select id,smiles,clogp from structure,properties where
oe_matches(smiles,'c1c1C(=O)N') and id = prop_id;
with id being a sequence (hence unique) and foreign
key prop_id column of properties.
There may be other useful functions of smiles, e.g.
int oe_count_matches(smiles,'CCC'),
and these would also prefer to use the pre-parsed smiles
c++ object.
After I parse the smiles,
the character string smiles is really of no use anymore.
It might be output, for example to an external program such as 
smiles_to_jpeg which re-parses the smiles and need not be fast.
So, there is no real use for indexing smiles.  So I want to
borrow the internal tables normally used for indexing to store
my parsed smiles and use the parsed smiles in oe_matches and other
functions.
If I do this, maybe I would have to use operators (=,,,LIKE?)
to do the matching.  A c-function is simply called with data and
would have no access to indexes, correct?

TJ
Pierre-Frédéric Caillaud wrote:
Well, first and easy thing you can do is create a column to store 
the  parsed representation and update it via a trigger when the 
original,  unparsed column is updated or inserted.
Is this sufficiently hidden from the user for you ? I know it's 
not  really hidden, but the fact that updating is automatic could be 
good  enough.
You could also store this column in another table and join with the 
main  table.

What are the kinds of searches you do more often ? Can you give a 
few  examples ?


Yes, my c function and it's sql counterpart, oe_matches(smiles)
uses two steps (1) parse smiles (2) search parsed smiles.
Parsing is expensive.  The smiles has an external string representation,
which is stored in a smiles column, but only the parsed form is 
actually  searchable.
The smiles representation is never searched in a traditional string
manner, except perhaps for a direct lookup (string equals).
LIKE has no meaning for smiles strings, similarly  or  are
meaningless.

Smiles is parsed into atom and bond representations using
3rd party c++ code/methods.  I simply call their methods
to parse and search.  A binary string can be got from them
for persistent storage in a postgres column.  It can then be
restored into a c++ object for searching, thus avoiding the
parsing stage, except for the initial creation of a row with
a smiles column.
My goal here is to optimize the search by storing the parsed smiles,
YET HIDE THIS FROM THE USER.  I thought I might be able to store
the parsed smiles in an index for me to use while searching, even
though it would not be used for indexing in the traditional manner.
This would mean creating a new indexing method.  I've read up on this
and it seems a daunting task.  Am I perverting the index method if
I try to do this?
So, aside from having the user be responsible for maintaining a
column of parsed smiles (and possibly other related columns which
would speed up the search), is there a way I can create and maintain
a table related to the table containing the smiles - and all
behind the scenes so the sql user is unaware of this.
My thought was that an index is like that and I might borrow some
of the internal uses of indexing for my purposes.
TJ O'Donnell
[EMAIL PROTECTED]
Pierre-Frédéric Caillaud wrote:
I gather your program uses two steps, let's call them :
- parse( smiles ) - data
- search( data ) - result
 You can create a functional index on your smiles column, but I  
don't know  if this will help you ; you can do things like CREATE 
INDEX  ... ON  mytable( lower( myfield )), then SELECT ... FROM 
mytable WHERE   lower(myfield) = something, BUT in your case I gather 
your search  function  which processes the parsed data does a lot 
more than just  dome character  match, so creating a functional index 
on parse(smile)  would be useless for  selecting on 
search(parse(smile))...
 So, in any case, if the parsing phase is slow, you can store 
the   preparsed data in a text or binary field and search it 
directly, but  this  will not be indexed.
 If you can map a subset of your searchable properties to 
simple   datatypes, you could do a first search for related matches, 
as you said.
 You say nothing about how your system works internally, whta 
kind  of  representation is it and what kind of searches do you 
actually do ?
 On Sat, 08 Jan 2005 15:50:06 -0800, TJ O'Donnell [EMAIL PROTECTED]  
wrote:

I've written a c-language extension to postgresql to implement a   
chemical search of a varchar column (named smiles, typically).
It might be called as:
  oe_matches(smiles,'COCC') where 'COCC' is a typical search string.
This uses 3rd party

[GENERAL] how to optimize my c-extension functions

2005-01-08 Thread TJ O'Donnell
I've written a c-language extension to postgresql to implement a 
chemical search of a varchar column (named smiles, typically).
It might be called as:
 oe_matches(smiles,'COCC') where 'COCC' is a typical search string.
This uses 3rd party functions to parse the varchar inputs into c++ 
objects.  In order to optimize this search, I'd like to parse the whole 
table containing smiles just once, store it and use the parsed data 
instead of the varchar smiles.

So, I could create another column, say smiles_ob and have the
oe_matches function take that column.  I'd have to be sure the smiles_ob 
column was related (by a trigger?) to the smiles column.
But I was thinking I might be able to hide these parsed objects from the 
user by somehow incoporating the parsed objects into a type of index.
I'd like also to use additional columns (like molecular formula) in
the match function to triage the table to quickly rule out impossible
matches, before doing a full search/match.

Am I way off the track here?  Is it a bad idea to clutter the index
with things like this?  Is it possible?   Is there another
approach that could hide some of these details from the user - meaning
they would not have to create/update these additional columns?
Thanks,
TJ O'Donnell
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match