from:"Ted"

[GENERAL] postgres user with password read-only user without?

2016-01-12 Thread Ted Toth

Is there a way to configure authentication so that some users require
a password and other users don't?

Ted


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] 9.5rc1 RLS select policy on insert?

2016-01-06 Thread Ted Toth

I see the insert policy check running but also the select policy using
on insert. I don't understand why the select policy is being run.
Could it possibly be related to using a sequence on the table?
Ted


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] RLS 9.5rc1 configuration changes?

2016-01-04 Thread Ted Toth

I had been using CrunchyDatas 9.4 with backported RLS but I decided
since my ultimate target is 9.5 that I update to it. However now the
function called for the SELECT policy is not being called. \dt shows
the policy but EXPLAIN ANALYZE of a select doesn't show the filter.
When I turn up debug in postghresql.conf in pgstartup.log I see the
library loaded and the _PG_init function called and in the daily log I
see the client auth function called each time I run psql. The only
changes I made for 9.5 were to no longer set row_security to 'force'
in postgresql.conf and to add:
ALTER TABLE  FORCE ROW LEVEL SECURITY;
in addition to the:
ALTER TABLE  ENABLE ROW LEVEL SECURITY;
which I was already doing as I want RLS to be used even for the owner
of the table.
Are there any other additional configuration changes needed to get RLS
to work again?

Ted


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] RLS 9.5rc1 configuration changes?

2016-01-04 Thread Ted Toth

On Mon, Jan 4, 2016 at 4:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Ted Toth <txt...@gmail.com> writes:
>> I had been using CrunchyDatas 9.4 with backported RLS but I decided
>> since my ultimate target is 9.5 that I update to it. However now the
>> function called for the SELECT policy is not being called. \dt shows
>> the policy but EXPLAIN ANALYZE of a select doesn't show the filter.
>
> I'm not sure how Crunchy's 9.4 version behaves, but I'd expect the
> policy USING condition to be visible in EXPLAIN in 9.5.
>
> Are you perhaps testing this as a superuser?  Superusers bypass RLS
> even with FORCE ROW LEVEL SECURITY.

Yes I was a Superuser but without 'Bypass RLS'. So there's no way to
enforce RLS for all users/roles?

>
>> ... The only
>> changes I made for 9.5 were to no longer set row_security to 'force'
>> in postgresql.conf
>
> What did you set it to instead?

row_security=on. Maybe 'force' did what I wanted in Crunchy's 9.4 version :(

>
> regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] native api or odbc?

2015-06-23 Thread Ted Toth

What are peoples experiences with either/both, pluses/minuses?

Ted

[GENERAL] RLS policy issue

2015-05-21 Thread Ted Toth

I'm work on understanding and implementing RLS. Since I work on
systems using SELinux (MLS policy) I'm using the sepgsql module that
I've modified slightly i.e. I've added a function named
sepgsql_check_row_perm that I'm using in the policy for example I have
a 'reports' table that looks like:

 Table public.reports
 Column |  Type   |  Modifiers
  | Storage  | Stats target | Description
+-+--+--+--+-
 id | integer | not null default
nextval('reports_id_seq'::regclass) | plain|  |
 report | json|
  | extended |  |
 message_id | integer | not null
  | plain|  |
 location   | geometry(Point) |
  | main |  |
 security_label | text| default sepgsql_getcon()
  | extended |  |
Policies:
POLICY check_report_delete_selinux FOR DELETE
  USING sepgsql_check_row_perm(security_label, sepgsql_getcon(),
'delete'::text)
POLICY check_report_insert_selinux FOR INSERT
  WITH CHECK sepgsql_check_row_perm(security_label,
sepgsql_getcon(), 'insert'::text)
POLICY check_report_select_selinux FOR SELECT
  USING sepgsql_check_row_perm(sepgsql_getcon(), security_label,
'select'::text)
POLICY check_report_update_selinux FOR UPDATE
  USING sepgsql_check_row_perm(security_label, sepgsql_getcon(),
'update'::text)
  WITH CHECK sepgsql_check_row_perm(security_label,
sepgsql_getcon(), 'update'::text)

When I do a select I expect sepgsql_check_row_perm to be called and at
least output the elog message I added here's part of the patch I apply
to add the sepgsql_check_row_perm funstion to the module:

 /*
+ * BOOL sepgsql_check_row_perm(TEXT, TEXT, TEXT)
+ *
+ * Check if perm allowed for tuple.
+ * This is a variant of sepgsql_avc_check_perms_label which allows the
+ * specifying of both the source and target contexts. For MLS
+ * (write up read down) dominance purposes in the case of
+ * INSERT/UPDATE/DELETE (write) the source is the tuples context
+ * and it must dominate the peers context however in the case of
+ * SELECT (read) the source is the peers context and it must dominate
+ * the tuples context.
+ */
+PG_FUNCTION_INFO_V1(sepgsql_check_row_perm);
+Datum
+sepgsql_check_row_perm(PG_FUNCTION_ARGS)
+{
+   const char *scontext;
+   const char *tcontext;
+   const char *perm_name;
+   access_vector_t av_perm;
+
+   elog(DEBUG1, sepgsql_check_row_perm);

I'd also expect that the rewrite would have added the POLICY SELECT
USING clause to the query but I don't see any indication of that in
the details that follow:

 2015-05-21 16:59:39.030 CDT STATEMENT:  select * from reports
 2015-05-21 16:59:39.030 CDT LOG:  rewritten parse tree:
 2015-05-21 16:59:39.030 CDT DETAIL:  (
   {QUERY
   :commandType 1
   :querySource 0
   :canSetTag true
   :utilityStmt 
   :resultRelation 0
   :hasAggs false
   :hasWindowFuncs false
   :hasSubLinks false
   :hasDistinctOn false
   :hasRecursive false
   :hasModifyingCTE false
   :hasForUpdate false
   :hasRowSecurity true
   :cteList 
   :rtable (
  {RTE
  :alias 
  :eref
 {ALIAS
 :aliasname reports
 :colnames (id report message_id location security_label)
 }
  :rtekind 0
  :relid 19116
  :relkind r
  :lateral false
  :inh true
  :inFromCl true
  :requiredPerms 2
  :checkAsUser 0
  :selectedCols (b 9 10 11 12 13)
  :modifiedCols (b)
  :securityQuals 
  }
   )
   :jointree
  {FROMEXPR
  :fromlist (
 {RANGETBLREF
 :rtindex 1
 }
  )
  :quals 
  }
   :targetList (
  {TARGETENTRY
  :expr
 {VAR
 :varno 1
 :varattno 1
 :vartype 23
 :vartypmod -1
 :varcollid 0
 :varlevelsup 0
 :varnoold 1
 :varoattno 1
 :location 7
 }
  :resno 1
  :resname id
  :ressortgroupref 0
  :resorigtbl 19116
  :resorigcol 1
  :resjunk false
  }
  {TARGETENTRY
  :expr
 {VAR
 :varno 1
 :varattno 2
 :vartype 114
 :vartypmod -1
 :varcollid 0
 :varlevelsup 0
 :varnoold 1
 :varoattno 2
 :location 7
 }
  :resno 2
  :resname report
  :ressortgroupref 0
  :resorigtbl 19116

[GENERAL] 9.5 RLS 'check policy' function arguments

2015-02-12 Thread Ted Toth

I'm contemplating writing a function for use with the CHECK POLICY
statement. Where can I find documentation describing the arguments
that will be passed to the function?

Ted


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] sepgsql where are the security labels

2014-11-13 Thread Ted Toth

This table maintains information about the context of postgresql
objects not the data in tables.

On Wed, Nov 12, 2014 at 5:56 PM, Adrian Klaver
adrian.kla...@aklaver.com wrote:
 On 11/12/2014 02:45 PM, Ted Toth wrote:

 I'm running selinux mls policy I've got labeled ipsec working and my
 postgresql configured to load
 sepgsql. I've created a db, run the sepgsql.sql script on it, created
 tables and inserted data. How do I
 query the security labels on the data?


 I do not use SECURITY LABELS, but it seems they can be queried here:

 http://www.postgresql.org/docs/9.3/interactive/view-pg-seclabels.html


  As best I can tell there is no

 security_context
 column on either of the tables I've created that I see? How does the
 system column security_context get added to tables? I've read
 everything I can find on the web but
 a lot of it is dated. Here's how I'm creating my db and tables:


 --
 Adrian Klaver
 adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] sepgsql where are the security labels

2014-11-13 Thread Ted Toth

Exactly what I talking about ... but unfortunately that appears to
have been based on KaiGai's branch and is not in 9.3. The current
discuss/work is around row-level-security with patches to 9.5 which is
not much help to me now :(

On Thu, Nov 13, 2014 at 9:26 AM, Adrian Klaver
adrian.kla...@aklaver.com wrote:
 On 11/13/2014 05:58 AM, Ted Toth wrote:

 This table maintains information about the context of postgresql
 objects not the data in tables.




 http://www.slideshare.net/kaigai/label-based-mandatory-access-control-on-postgresql

 Slide 23



 On Wed, Nov 12, 2014 at 5:56 PM, Adrian Klaver
 adrian.kla...@aklaver.com wrote:

 On 11/12/2014 02:45 PM, Ted Toth wrote:


 I'm running selinux mls policy I've got labeled ipsec working and my
 postgresql configured to load
 sepgsql. I've created a db, run the sepgsql.sql script on it, created
 tables and inserted data. How do I
 query the security labels on the data?



 I do not use SECURITY LABELS, but it seems they can be queried here:

 http://www.postgresql.org/docs/9.3/interactive/view-pg-seclabels.html


   As best I can tell there is no


 security_context
 column on either of the tables I've created that I see? How does the
 system column security_context get added to tables? I've read
 everything I can find on the web but
 a lot of it is dated. Here's how I'm creating my db and tables:


 --
 Adrian Klaver
 adrian.kla...@aklaver.com




 --
 Adrian Klaver
 adrian.kla...@aklaver.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] sepgsql where are the security labels

2014-11-12 Thread Ted Toth

I'm running selinux mls policy I've got labeled ipsec working and my
postgresql configured to load
sepgsql. I've created a db, run the sepgsql.sql script on it, created
tables and inserted data. How do I
query the security labels on the data? As best I can tell there is no
security_context
column on either of the tables I've created that I see? How does the
system column security_context get added to tables? I've read
everything I can find on the web but
a lot of it is dated. Here's how I'm creating my db and tables:


CREATE DATABASE contacts
  WITH OWNER = jcdx
   ENCODING = 'UTF8'
   TABLESPACE = pg_default
   LC_COLLATE = 'en_US.UTF-8'
   LC_CTYPE = 'en_US.UTF-8'
   CONNECTION LIMIT = -1;

SECURITY LABEL FOR selinux
  ON DATABASE contacts
  IS 'user_u:object_r:sepgsql_db_t:
s0';
--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;

--
-- Name: plpgsql; Type: EXTENSION; Schema: -; Owner:
--

CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;


--
-- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner:
--

COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';


--
-- Name: postgis; Type: EXTENSION; Schema: -; Owner:
--

CREATE EXTENSION IF NOT EXISTS postgis WITH SCHEMA public;


--
-- Name: EXTENSION postgis; Type: COMMENT; Schema: -; Owner:
--

COMMENT ON EXTENSION postgis IS 'PostGIS geometry, geography, and
raster spatial types and functions';


--
-- Name: pgrouting; Type: EXTENSION; Schema: -; Owner:
--

CREATE EXTENSION IF NOT EXISTS pgrouting WITH SCHEMA public;


--
-- Name: EXTENSION pgrouting; Type: COMMENT; Schema: -; Owner:
--

COMMENT ON EXTENSION pgrouting IS 'pgRouting Extension';


SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: messages; Type: TABLE; Schema: public; Owner: jcdx; Tablespace:
--

CREATE TABLE messages (
id integer NOT NULL,
message json
);


SECURITY LABEL FOR selinux ON TABLE messages IS
'user_u:object_r:sepgsql_table_t:s0';

ALTER TABLE public.messages OWNER TO jcdx;

--
-- Name: messages_id_seq; Type: SEQUENCE; Schema: public; Owner: jcdx
--

CREATE SEQUENCE messages_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;

SECURITY LABEL FOR selinux ON SEQUENCE messages_id_seq IS
'user_u:object_r:sepgsql_seq_t:s0';

ALTER TABLE public.messages_id_seq OWNER TO jcdx;

--
-- Name: messages_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: jcdx
--

ALTER SEQUENCE messages_id_seq OWNED BY messages.id;


--
-- Name: reports; Type: TABLE; Schema: public; Owner: jcdx; Tablespace:
--

CREATE TABLE reports (
id integer NOT NULL,
report json,
message_id integer NOT NULL,
location geometry(Point)
);

SECURITY LABEL FOR selinux ON TABLE reports IS
'user_u:object_r:sepgsql_table_t:s0';

ALTER TABLE public.reports OWNER TO jcdx;

--
-- Name: reports_id_seq; Type: SEQUENCE; Schema: public; Owner: jcdx
--

CREATE SEQUENCE reports_id_seq
START WITH 1
INCREMENT BY 1
NO MINVALUE
NO MAXVALUE
CACHE 1;

SECURITY LABEL FOR selinux ON SEQUENCE reports_id_seq IS
'user_u:object_r:sepgsql_seq_t:s0';

ALTER TABLE public.reports_id_seq OWNER TO jcdx;

--
-- Name: reports_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: jcdx
--

ALTER SEQUENCE reports_id_seq OWNED BY reports.id;


--
-- Name: id; Type: DEFAULT; Schema: public; Owner: jcdx
--

ALTER TABLE ONLY messages ALTER COLUMN id SET DEFAULT
nextval('messages_id_seq'::regclass);


--
-- Name: id; Type: DEFAULT; Schema: public; Owner: jcdx
--

ALTER TABLE ONLY reports ALTER COLUMN id SET DEFAULT
nextval('reports_id_seq'::regclass);


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] does postgresql works on distributed systems?

2008-06-03 Thread Ted Byers

 Microsoft
 sql server 2005
 cannot be shared on multiple systems i,e in a
 network environment when
 it is installed in one system it cannot be accessed
 one other
 systems.

Nonsense!  

Where did you get this stuff?  

I have even played with MS SQL Server 2005 Express,
and it is not crippled in the way you describe.  I am
not a big fan of MS, but I have worked in shops where
we used MS SQL Server 2005, and once the DB was set
up, we could access it from anywhere.  Since I often
develop for it, I even have this capability, with MS
SQL Server 2005, set up on the little LAN in my home
office.

 One can access only from a system where it
 is already installed
 but not on the system where there is no sqlserver.Is
 postgresql similar
 to sql server or does it supports network sharing
 i,e one one can
 access postgresql from any system irrespective on
 which system it is
 installed.
 
You can do this with any RDBMS I have seen. A RDBMS is
of little commercial utility if you can't access it
from other machines in a network.

Mind you, I have worked with systems where the RDBMS
was configured to respond only to apps on localhost,
or a specific IP on the LAN, with access to the DB
mediated through middleware.

You should probably look at a) how your server is
configured and b) how your client is configured
(including whether or not you actually have client
software on your client machine).

Cheers

Ted

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] No Return??

2008-04-13 Thread Ted Byers


--- Bob Pawley [EMAIL PROTECTED] wrote:

 I'm getting a little frustrated with this problem.
 
 Can anyone tell me what is wrong with the following
 code.
 
 I have tested the portions separately and they all
 work.
 
 When I try it as a whole I get the message control
 reached end of trigger 
 procedure without RETURN.
 
 Any help greatly appreciated.
 
 Bob
 
 Declare
  pumpnumber integer;
 
  Begin
 
  Select count(*) Into pumpnumber From p_id.devices,
 p_id.processes
  Where device_number = '11'
  and p_id.devices.fluid_id = p_id.processes.fluid_id
  and p_id.processes.ip_op_equipment = 'op';
 
  If pumpnumber = 1 then
  Update p_id.devices
  Set number = '#1'
  From p_id.processes
  Where p_id.devices.number is null
  and p_id.devices.device_number = '11'
  and p_id.devices.fluid_id = p_id.processes.fluid_id
  and p_id.processes.ip_op_equipment = 'op' ;
 
  Else If pumpnumber = 2 Then
  Update p_id.devices
  Set number = '#2'
  From p_id.processes
  Where p_id.devices.number is null
  and p_id.devices.device_number = '11'
  and p_id.devices.fluid_id = p_id.processes.fluid_id
  and p_id.processes.ip_op_equipment = 'op' ;
 
  End If;
  RETURN NULL;
  End If;
  END;
 
 I have tried 'Return New' and 'Return Result'
 without luck, and if I leave 
 off either of the two 'End If ' statements the
 procedure returns an error.
 
Look at your flow control!  Your return is within a
conditional block.  If the condition for your first
returns false, flow goes to the very end of the
function and reaches end without encountering a
return statement.

Cheers,

Ted

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] (FAQ?) JOIN condition - 'WHERE NULL = NULL'

2008-04-02 Thread Ted Byers

--- Martijn van Oosterhout [EMAIL PROTECTED] wrote:
 'transform_null_equals' won't help you at all here
 since it only help
 in the very specific case of comparing with a
 constant. The easiest is
 to think of NULL as meaning 'unknown'. Clearly you
 can't compare that
 usefully with anything.
 
Not even a null in another record ... (hence my
question below).  If the value is unknown, then it
could be anything, and (thinking as a mathematician
considering real numbers) the probability of two
records having null having their true, but unknown
values be the same is indistinguishable from 0. (with
integers or decimal numbers or floating point numbers,
that would be qualified with the clause, for practical
purposes :)

 Perhaps you can use a marker like -1 to achieve the
 effect you want?
 
Is that really valid, though, especially in a join? I
mean, if the column in question has multiple nulls, in
each of the tables, then how do you, of the DB, figure
out which of the rows containing nulls in the one
table match up with rows in the other table containing
nulls?  Or is the resultset supposed to be the product
of the two sets (match each row with a null in the one
table with each row with a null in the other)?  That,
for me, creates a nightmare situation where some of my
tables have tens of millions of rows, and if even 1%
of the rows contains null in the relevant column, I
don't even want to think about processing the
resultset that would be produced from such an idea
using these tables.

My joins always only involve primary keys, or indeces
on columns that prohibit nulls, so this problem
doesn't crop up in my code, but would I be not too far
from right in expecting that the rational thing to do
when creating a join on columns that allow nulls is to
exclude ALL rows, in either table, where the columns
involved are null?

Cheers,

Ted

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] (FAQ?) JOIN condition - 'WHERE NULL = NULL'

2008-04-02 Thread Ted Byers


--- Ian Sillitoe [EMAIL PROTECTED] wrote:

 I completely take your points - so maybe I should be
 asking for advice on
 database design instead.
 
 We are annotating nodes on a hierarchical structure
 where NULL implied an

I don't mean to be rude, but yuck.  Why provide a
record for data that isn't there?

I recently put together a database (at present at a
very early prototype stage) to handle biological data.
 Considering ONLY the taxonomic portion of it, I opted
for a general hierarchical model.  Maybe not the most
efficient, yet, but no waste, yet.  

In what is an over simplification, I created a taxon
table, with columns for a unique ID number, taxonomic
level (species, genus, c. with all the glorious
subcategories taxonomists of varius tripes are wont to
create/define).  The taxonomic levels are predefined
(taken from my references that deal with such
matters), in a lookup table.  Then, I have columns to
hold parent taxon ID number.  

Of course, there is, in a middle layer, constraints
that prevents recording a species as a parent of a
genus, and other silliness (no linking a species
epithet directly to a class or order).  But you get
the idea.  

An object oriented programming metaphore might be that
of a singly linked list.  And of course, I have
deliberately obfuscated the complexity arising from
having to handle synonyms both usefully and
gracefully, but the core idea is simple, and there are
no nulls, except for taxa representing a whole
kingdom.  Last I checked, there were no taxa more
general than the kingdom, and there's only a handful
of kingdoms.  If you don't have data on subclass or
superfamily or subspecies, you just don't put it in. 
Therefore no nulls!

I have no idea if this model would work for you, but
maybe it will help.

Cheers,

Ted

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] arrays of floating point numbers / linear algebra operations into the DB

2008-02-01 Thread Ted Byers

--- Webb Sprague [EMAIL PROTECTED] wrote:
   ...linear algebra ...
   ... matrices and vectors .
   ...Especially if some GIST or similar index
 could efficiently search
   for vectors close to other vectors...
  
I see a potential problem here, in terms of how one
defines close or similitude.  I think, though,
practical answers can be found in examples of applying
quantitative methods in some subdisciplines of
biology.

   Hmm.  If I get some more interest on this list
 (I need just one LAPACK
   / BLAS hacker...), I will apply for a pgFoundry
 project and appoint
   myself head of the peanut gallery...
 
Someone pointed to the potential utility of pl/R.  I
would be interested at least in learning about your
assessment of the two (postgis and pl/r.  Alas, I
don't have decent date I could use to experiment with
either (except possibly for time series analysis,
which is a completely different kettle of fish.
 
  and deal with a big database doing lots of
 similarity-based searches (a
  6'2 guy with light brown hair being similar to a
 6'1 guy with dark
  blond hair) - and am experimenting with modeling
 some of the data as
  vectors in postgres.
 
 Well,  I bet a good linear algebra library would
 help.  A lot. :)
 

If you're looking at similarity, and some practicality
in the USE of quantitative procedures, you may want to
look into the biogeography and numerical taxonomy
literature, and to a lesser extent quantitative plant
ecology.  All three subdisciplines of biology have
decades of experience, and copious literature, looking
at similarity measures, and in my experience much more
practical or pragmatic than the 'pure' biostatistics
literature, and infinitely more practical than any
theoretical statistical or mathematical literature I
have seen (trust me, I have a bookcase full of this
stuff).

A good linear algebra library would be useful, but
there are a lot of nonlinear analyses that would be of
interest; and there are nonparametric, yet
quantitative approaches that are of considerable
interest in assessing similarity.

I don't know of work looking at applying things like
discriminant functions analysis or cluster analysis or
any of the many ordination analyses that may be
considered to searches in a database, but then I
haven't looked at the question since I graduated.  I
am interested in the question, though, and would be
interested in hearing about your experience on the
question.  

If I can manage the time, I hope to start a project
where I can store description data for specimens of
plants and animals, use analyses including but not
limited to ordination, clustering, discriminant
functions, cannonical correlation, to create a
structure for comparing them, and for identifying new
specimens, or at a minimum, if the specimen is truly
something unknown, learn what known specimens or
groups thereof it is most similar to, and how it is
different.

I have managed to install pl/r, but I haven't had the
time to figure out how best to analyze data stored in
the database using it.  In the data I Do have, it
changes daily, and some of the tables are well over
100MB, so I am a bit worried about how well it can
handle such an amount of data, and how long it would
take.

Cheers,

Ted  

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Sun acquires MySQL

2008-01-16 Thread Ted Byers


--- Bill Moran [EMAIL PROTECTED] wrote:

 In response to dvanatta [EMAIL PROTECTED]:
 
  
  What's up with 3 of the 7 being from Pennsylvania?
  What's the connection?
 
 Well, as everyone knows, Pennsylvania is a haven for
 brilliant
 people.  In fact, simply living in Pennsylvania
 makes you smarter.
 
Does it count if I lived there for a year many many
years ago?  ;-)

Ted

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] logging arguments to prepared statements?

2007-12-18 Thread Ted Byers

--- rihad [EMAIL PROTECTED] wrote:
 Dec 18 15:49:41 myhost postgres[29832]: [35-1]
 ERROR:  23505: duplicate 
 key value violates unique constraint foo_key
 Dec 18 15:49:41 myhost postgres[29832]: [35-4]
 INSERT INTO foo
 Dec 18 15:49:41 myhost postgres[29832]: [35-5]   
 (a,b,c)
 Dec 18 15:49:41 myhost postgres[29832]: [35-7]
 VALUES ($1,$2,$3)
 Dec 18 15:49:41 myhost postgres[29832]: [35-8]
 
 And that's it, leaving me wondering which value
 triggered the error. Any 
Why?  It seems simple enough.  You have a table called
foo, with at least three columns: a, b, and c.  And
you have a violation of your unique constraint.  If it
isn't that simple, you have left out useful
information.  You did not say, for example, which of
your columns, if any, are involved in your unique
constraint.  If the answer to that is none, then you
need to show how the constraint is defined.

Which of the three columns are involved in a unique
constraint?  If none of the columns you use are
involved in a unique constraint, there must be other
columns that are, and that would imply that there is
either a problem with your prepared statement,
ignoring certain columns that can't be ignored, or a
problem with how you set up the default values for
another column that is involved in a unique
constraint; or the table has grown so big that it is
impossible to add a new record without violating the
existing unique constraint (unlikely as that is in
most cases, especially during development).

I could see creating a before insert trigger that
stores the values to be inserted in a log table with a
timestamp, but I don't see the profit in that. Doesn't
such an error generate a SQL exception to your client?
 If so, the client code will know immediately what
insert attempt failed, and therefore what values are
involved in the problem.  Using JDBC, for example, all
of the JDBC functions that execute a prepared
statement (or any other SQL) will throw a
java.sql.SQLException.  One therefore knows
immediately when there is a problem of the sort you
describe, and so you can determine quickly what the
values were that resulting in your error.  If need be,
that could be stored in your application's log.  If
one needed full audit functionality, one could create
the tables to store the details of every SQL
statement, including who is responsible for the
statement and a timestamp.  But if you don't need to
support that kind of detailed audit, why bother when
there are easier ways to address your issue?

HTH

Ted

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Efficiency vs. code bloat for SELECT wrappers

2007-12-17 Thread Ted Byers


--- Colin Wetherbee [EMAIL PROTECTED] wrote:

 Sam Mason wrote:
  On Sun, Dec 16, 2007 at 06:31:56PM -0500, Colin
 Wetherbee wrote:
  If I write one Perl sub for each operation on the
 table (e.g. one that 
  gets the username and password hash, another that
 gets the last name and 
  first name, etc.), there will be a whole lot of
 subs, each of which 
  performs one very specific task.
 
Right. First rule of software engineering is keep
functions as small as possible, focussed on one thing
wherever practicable.  It doesn't matter if the
language is Perl or C++ or Java, or a stored procedure
in an RDBMS.  One can always create additional driver
functions that use the elemental simple functions to
do more complex tasks (bearing in mind the
complexities that will inevitably arise in multiple
user situations).

  If I write one larger Perl sub that grabs the
 whole row, and then I deal 
  with the contents of the row in Perl, ignoring
 columns as I please, it 
  will require fewer subs and, in turn, imply
 cleaner code.
  
Define cleaner code.  The more data, and the more
complex that data, the more code you have to write,
regardless of whether that is in one function or
several.  Either way, done badly, can be a problem for
both maintenance and performance.

  It sounds as though you're just treating the
 database as a relatively
  dumb datastore.  They can be used as this, and
 Toms comments are as
  always good, but relational databases come into
 their own when you're
  writing more complex queries.  When I'm writing my
 code I tend to put
  the SQL statements directly in with the rest of
 the code, abstracting
  away from the database tends to make things more
 complicated than they
  need to be.
 
 Because I know Perl a whole lot better than SQL,
 PostgreSQL, and even 
 the Perl DBI, I'm always inclined to wrap the
 database stuff in a nice 
 little package and forget about it.  This method has
 worked well for me 
 in the past, but the project I'm starting is much
 bigger in terms of 
 database use than anything else I've written.
 
I routinely keep my SQL code distinct from my Perl,
java or C++ code.  When a client program needs to do
something with the database, then either a child
process executes a script I have written, if the
client program doesn't need to do anything with data
drawn from the database, or I have all the SQL code in
one or more stored procedures, and use the appropriate
client interface to invoke the stored procedure(s). 
Whether the SQL is in a specific script or in a stored
procedure, my SQL code is kept distinct from the
client code, regardles of the language I have used for
that.  I find this even MORE useful as my projects get
bigger.

 You say you write SQL directly in your application
 code, rather than 
 writing task-oriented wrappers.  I like that idea,
 but I can see that 
 getting really messy.  What is your opinion on how
 it affects code 
 maintenance and things like that?
 
From what I have seen, even in small, almost trivial,
client programs, I find this gets messy real quick. 
I, therefore, hate the idea of mixing SQL in with
client code (it is all, in my view, application code).
 I like the use of middleware objects since, if well
designed, they can make developing the code required
all the simpler.  It is only when badly done that an
abstraction leads to complicated code that is a
nightmare to maintain; worse if it is inadequately
documented.  The whole purpose of abstraction, whether
one is using objected oriented development of
middleware, or generic programming, or structured
programming, or functional programming, is to analyze
a complex problem into more manageable parts.  The
parts themselves become easier to code (witness java
beans of various kinds, such as backing beans - my
beans get their data either from the database or from
the user interface - in either case, they make
interaction between a web based interface and the
database back end MUCH simpler to code), and the
relationships among the parts are easier to
understand.  Each kind of abstraction has its place. 
It is up to the analyst or architect to figure out how
many layers and what abstractions are appropriate for
a given project.

HTH

Ted

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Efficiency vs. code bloat for SELECT wrappers

2007-12-17 Thread Ted Byers


--- Sam Mason [EMAIL PROTECTED] wrote:

 On Mon, Dec 17, 2007 at 01:29:10PM -0500, Ted Byers
 wrote:
  I routinely keep my SQL code distinct from my
 Perl,
  java or C++ code.  When a client program needs to
 do
  something with the database, then either a child
  process executes a script I have written, if the
  client program doesn't need to do anything with
 data
  drawn from the database, or I have all the SQL
 code in
  one or more stored procedures, and use the
 appropriate
  client interface to invoke the stored
 procedure(s). 
  Whether the SQL is in a specific script or in a
 stored
  procedure, my SQL code is kept distinct from the
  client code, regardles of the language I have used
 for
  that.  I find this even MORE useful as my projects
 get
  bigger.
 
 Humm, this discussion is horribly domain specific. 
 Abstractions are the
 bread and butter of programming and designing them
 appropriately makes
 the difference between things turning into a mess
 later on or not.  The
 only solid rules I stick to is that when I'm working
 to solve a problem
 I've not solved before, I will get it (i.e. the way
 I go about solving
 the problem, getting the right answer is generally
 pretty easy) wrong
 the first time and probably only get it reasonable
 the third or fourth
 time.
 

Right.  But there is a quicker way.  I like to work
with people who have experience that is different from
mine, expecting they will have seen a different suite
of problems and found solutions I have come to rely
on.  In that way, in discussing my current problems, I
can learn something new, and that much faster than
trial and error.  Of course, there are always test or
proof of concept programs, where I test ideas.  For
example, I have routinely been advised that left joins
are faster than correlated subqueries.  In the past
month, I have found two problems requiring either a
correlated subquery or a left join, and in one case
the correlated subquery was more than ten times faster
while in the other the left join was about twice as
fast.  In both cases, the results returned by the two
approaches were identical, but there were significant
differences in performance; most astonishing in the
one case that proved to be quite different than
expected.

  Each kind of abstraction has its place. 
  It is up to the analyst or architect to figure out
 how
  many layers and what abstractions are appropriate
 for
  a given project.
 
 Indeed.  But to be able to choose a suitable set of
 abstractions, it
 helps for the designer to know the ins and outs of
 the tools being
 used.  I was trying to encourage Colin to look at
 using databases
 in a different way, different styles of programming
 suit different
 applications and hiding the database can have
 detrimental effects as
 well as positive effects.
 
I find the opportunity to look at problems in a
different light priceless.  

But some practices generally turn out to be counter
productive.  Producing overly long functions, or
mixing code involving different languages into the
same file, often leads to an unmaintainable mess. 
More often than not, such practices are a consequence
of poor design.  But even here, there is no hard and
fast rule, since some algorithms, such as numeric
quadrature or QR factorization of general real
matrices can not be written in a short, simple
function although they logically ought to be one
function in the simplest cases; but even these admit
enhancements that warrant more interesting data
structures and ancillary functions.  But for these
algorithms, which admittedly have nothing to do with
databases, the top consideration is the speed of
provably correct code.  If that means it is too
complex for junior or even intermediate programmers,
so be it.  I have seen and used library code for
number crunching that only a specialist in numeric
methods have a hope of easily understanding, and been
glad I didn't have to write those functions myself. 
;-)

For Colin's situation, what I would have suggested
would be a simple extension of what he normally does,
and that is look at an additional layer that gets the
data from the database as efficiently as possible, and
cache it in that layer, providing it to the UI layer
as required.  Similarly, perhaps in a different layer,
get user data from the interface and cache it, sending
it to the database back end at the most appropriate
time in the most efficient way.  And perhaps with all
this, some form of database connection pooling;
although this latter option will depend on how many
concurrent users there may be in the worse case
scenario (or maybe that is the best case scenario
since it would mean that there is significant demand
for what the app does ;-).

I haven't attempted this in Perl, but it is dirt
simple in Java/J2EE.  In a web application, based,
e.g. on Tomcat, it is easy to set this up with as many
layers as one can justify, especially with an IDE like
NetBeans6.  Not having thought about how to do

Re: [GENERAL] Simpler dump?

2007-12-10 Thread Ted Byers

Thanks Uwe

This is a great start.  It reduces the dump from 2 MB
down to 167K, but out of 6833 lines of SQL, 5744
relate to the public schema in the DB, and I didn't
touch that.  It has over a dozen types, 419 functions,
c., that were put there by postgresql the moment I
created the database.  I'd expect the same stuff to be
there the moment I issue the create database directive
on the host machine, so all I really want is the dozen
sequences, two dozen tables, and the suite of
constraints I created, all in the schema specific to
my new DB.

Is there a reason pg_dump dumps the stuff in public
even though that stuff seems to be created, and
therefore present, in every database I create on a
given server instance?  Isn't that duplication a waste
of space, and it's presence in the dump a waste of CPU
cycles?

Thanks again.

Ted
--- Uwe C. Schroeder [EMAIL PROTECTED] wrote:

 
 pg_dump -x -O -s [databasename]  outfile.sql
 
 HTH
   Uwe


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Simpler dump?

2007-12-10 Thread Ted Byers


--- Tom Lane [EMAIL PROTECTED] wrote:

 Ted Byers [EMAIL PROTECTED] writes:
  Is there a way to tell pg_dump to just dump the
 SQL
  statements required to create the tables,
 sequences,
  indeces, keys, c.?
 
 pg_dump -s ?
 

Thanks Tom

  I DON'T need to restore or
  recreate things like users, or most other kinds of
 DB
  objects.  Just routine DDL statements.
 
 pg_dump doesn't try to recreate users, and to most
 people DDL would
 include the creation commands for any type of DB
 object whatsoever.
 Your demarcation between stuff you want and stuff
 you don't seems far
 too vaguely stated.
 
Sorry,

I just wanted the statements I need to recreate the
tables, sequences and constraints I created.  When I
create a database, I leave the default public schema
alone.  The tables, etc. I add are placed in a
separate schema.  It seems the public schema is
automagically created by Postgres every time I create
a new database on a given server, and it has over a
dozen types, over 400 functions, c.  I don't really
understand why it needs to be duplicated in every Db
on a server, but that is another matter.  In my
database, so far I have only created a dozen sequences
and two dozen tables, along with a suite of
constraints.  So when I look at the dump file, out of
6833 lines of SQL, the first 5744 relate to the
default public schema and only the last 1100 relate to
DB objects I'd created.  I'd assume that the public
schema would be created with the usual stuff when I
create the database name on the production host
anyway, so why recreate all that in the dump file?

Thanks again,

Ted

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Simpler dump?

2007-12-10 Thread Ted Byers


--- Tom Lane [EMAIL PROTECTED] wrote:

 Ted Byers [EMAIL PROTECTED] writes:
  It seems the public schema is
  automagically created by Postgres every time I
 create
  a new database on a given server, and it has over
 a
  dozen types, over 400 functions, c.  I don't
 really
  understand why it needs to be duplicated in every
 Db
  on a server, but that is another matter.
 
 What it sounds like to me is that you've managed to
 clutter your
 template1 with a lot of stuff you don't actually
 want, and that's
 getting propagated into new databases by CREATE
 DATABASE.  If so,
 it's not pg_dump's fault --- you need to clean out
 template1.
 
Thanks Tom,

Where will I find template1?  When I look at the
databases on the server, the only template I see is
called template_postgis.  Most of the extra stuff I
see in all my databases relates to geometry that I
find in this template.  When I installed postgresql, I
enabled postgis because I need it for some, but not
all, of my databases.  Is it possible to have more
than one template, and to specify which template to
use when creating a new DB?

Thanks again,

Ted

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Simpler dump?

2007-12-10 Thread Ted Byers


--- Richard Huxton [EMAIL PROTECTED] wrote:

 Tino Wildenhain wrote:
  Hi Ted,
  
  Ted Byers wrote:
  Thanks Uwe
 
  This is a great start.  It reduces the dump from
 2 MB
  down to 167K, but out of 6833 lines of SQL, 5744
  relate to the public schema in the DB, and I
 didn't
  touch that.  It has over a dozen types, 419
 functions,
  c., that were put there by postgresql the moment
 I
  created the database.
 
  Well thats usually not the case unless you changed
  the default database per accident. 
 
 The database is called template1 See the manuals
 regarding CREATE 
 DATABASE for details.
 
 
Thanks Richard.  To date, I never worried about
templates for my databases.  I just created them, and
when I needed to deploy them, I dumped them, put the
dump file on a memoery stick and carried it physically
to the production server and restored there.

Anyway, the extra stuff i don't need for this specific
database appears to be for postgis, which I'd enabled
when I installed  because I need gis capability for
another database.  But it seems to be putting gis
support in all of the databases I created.  I'd
thought that, by enabling it, I'd be able to turn it
on when I needed it.

Amyway, when I look at the server using pgadmin, I
don't see either template0 or template1.  I see only
template_postgis.  Should I be able to see template0
and template1 among the databases on the server, or
are they normally hidden to minimise the chances of
getting them screwed up.

At this stage, how can I get a template I can use by
default that doesn't include the postgis stuff, so I
can use the template_postgis only when I need it?

Thanks

Ted

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Simpler dump?

2007-12-10 Thread Ted Byers

OK, Now I am a bit confused.  The manual doesn't say
what to do if you don't see template1 or template0 on
the server, or even whether or not you should be able
to see them when using a tool like pgAdmin.  But it
does say:

The postgres database is also created when a database
cluster is initialized. This database
is meant as a default database for users and
applications to connect to. It is simply a copy of
template1 and may be dropped and recreated if
required.

Now, when I look at postgres, it is empty, apart from
one public schema, and all the items (aggregates c.)
have nothing in them.  The manual talks about creating
a template from an existing database, but not about
how to create a template from a template, apart from
:template1 is the default source database name for
CREATE DATABASE. For example, one could
drop template1 and recreate it from template0 without
any ill effects.

Obviously, template1 must have been either blown away
by the creation of template_postgis, or the one has
been made to be an alias for the other, or the default
template has been set to use template_postgis.  And if
postgres is a copy of template0, and template1 starts
as a copy of template0, and all my databases have
included the postgis stuff, then template_postgis is
the template being used by default.

So, how do I determine whether or not template1 really
exists on my server and is a copy of template0 (as I'd
infer from what I see in postgres) rather than
template_postgis, and then modify things so that the
default is the normal template1 rather than
template_postgis, but leaving the latter in place so I
can use it when I need it?

Thanks

Ted

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] Simpler dump?

2007-12-10 Thread Ted Byers

Thanks Richard.

--- Richard Huxton [EMAIL PROTECTED] wrote:

 Ted Byers wrote:
  Amyway, when I look at the server using pgadmin, I
  don't see either template0 or template1.  I see
 only
  template_postgis.  Should I be able to see
 template0
  and template1 among the databases on the server,
 or
  are they normally hidden to minimise the chances
 of
  getting them screwed up.
 
 There'll be an option in pgadmin somewhere to show
 them. Not sure where 
 I'm afraid, I mostly use the command-line.
 
Found it.  Under the view menu, the last item is
System object.  That now shows the template0 and
template1.  It won't let me examine the contents of
template0, BUT ...  It looks like template1 is a copy
of template_postgis.  I see the same stuff in the two.

  At this stage, how can I get a template I can use
 by
  default that doesn't include the postgis stuff, so
 I
  can use the template_postgis only when I need it?
 
 You can just drop template1 and re-create it using
 template0 (which is 
 read-only) as it's template. Check the docs for
 CREATE DATABASE and 
 google a little for examples.
 
OK.  A worry.  How is template_postgis constructed? 
Is it just a handy reference to template1?  Or does it
exist independantly?  I don't want to be dropping
template1 only to find that breaking template_postgis.

A metaphore might explain the origin of my worry.  I
use both C++ and Java.  In C++, copy/assignment gives
you two different objects with the same data and
structure.  Assignment in Java gives you two
references to the same object, so changes using the
one reference will be seen when examining the other
(this is a common gotcha for beginning developers
using both C++ and Java - the way to get C++
functionality inJava is to use operator new and the
class' copy constructor).  So, in adding postgis
support, is the template_postgis creation more
analogous to assignment in C++ or to assignment in
Java?

Thanks

Ted

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Simpler dump?

2007-12-10 Thread Ted Byers

Thanks All.  I learned plenty this morning.
--- Douglas McNaught [EMAIL PROTECTED] wrote:

 On 12/10/07, Ted Byers [EMAIL PROTECTED]
 wrote:
 
  OK.  A worry.  How is template_postgis
 constructed?
  Is it just a handy reference to template1?  Or
 does it
  exist independantly?  I don't want to be dropping
  template1 only to find that breaking
 template_postgis.
 
 All databases are separate entities--the data copy
 only happens at
 creation time, based on the template you specify. 
 The
 'template_postgis' DB is not a standard PG database,
 so either it's
 created by the PostGIS install, or someone at your
 site set it up.
 
 It sounds like the thing for you to do is drop
 template1 (which will
 have no effect on template_postgis), create it again
 from template0,
 and use template_postgis when you need it (otherwise
 template1 will be
 used by default for new databases).
 
Great!  Thanks for this.  I appreciate it.

Ted

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] SQL design pattern for a delta trigger?

2007-12-10 Thread Ted Byers

Thanks all.  I tried the appended code in a trigger
function, but postgresql won't take it.

It complains that assets.quantity is not a scalar. 
However, the WHERE clause in that select statement
guarantees that at most only one record will be
returned.  An open position on a given kind of asset
is represented by null in the end_valid_time field,
and the combination of asset_type_id, portfolio_id and
end_valid_time is certain to be unique, if there is a
record for that asset type in that porfolio at all.

I thought I'd try checking for an open position first
because the manual indicated that exception handling
is quite expensive.  But I must have missed something,
because it doesn't like how I tried to define my
trigger function.

I have four sequences, one each for four tables.  Two
of the tables are just look up tables, for asset types
and portfolios; trivial for test case with only an
autoincrementing integer primary key and a name. 
The other two are the ones of interest.  Assets is
treated as read only as far as the user is concerned. 
The user's data in the assets table is mediated
through transactions inserted (and NEVER deleted or
updated), into the transactions table.  Assets has the
minimal suite of columns (autoincrementing integer
primary key, asset_typeID, portfolio_id, all integers,
quantity with is a floating point number and two
dates: start_valid_time and end_valid_time). 
Transactions has only a transaction_id, portfolio_id,
asset_type_id, quantity and transaction_date.  There
are of course foreign keys connectin the assets and
transactions tables to the lookup tables, and a
composite index on assets to make looking up records
based on portfolio_id, asset_id and end_valid_time as
quick as possible.  It couldn't be simpler,
conceptually!  yet I must have missed something, cause
postgresql won't accept the function body I show
below.

If I can't get this working quickly, I may just resort
to creating a stored procedure that takes the
transaction details as arguments and processes both
tables appropriately without relying on a trigger. 
:-(

Thanks for everyone's help.

Ted



===
DECLARE
  id BIGINT;
  q DOUBLE PRECISION;
BEGIN
  SELECT assets.id INTO id, assets.quantity INTO q
FROM assets
WHERE assets.asset_type_id = NEW.asset_type_id
  AND assets.portfolio_id = NEW.portfolio_id
  AND assets.end_valid_time IS NULL;
  IF (id IS NULL) THEN
INSERT INTO assets (asset_type_id,
portfolio_id,quantity,start_valid_stime,end_valid_time)
  VALUES (NEW.asset_type_id,NEW.portfolio_id,
NEW.quantity, NEW.transaction_date,NULL);
  ELSE
UPDATE assets SET end_valid_time =
NEW.transaction_date WHERE id = id;
INSERT INTO assets (asset_type_id,
portfolio_id,quantity,start_valid_stime,end_valid_time)
  VALUES (NEW.asset_type_id,NEW.portfolio_id, q +
NEW.quantity, NEW.transaction_date,NULL);
  END
END


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] SQL design pattern for a delta trigger?

2007-12-10 Thread Ted Byers


--- Vivek Khera [EMAIL PROTECTED] wrote:

 
 On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote:
 
  For what it's worth, the real algorithm would be
 as follows.  I  
  hadn't had enough coffee yet, and I forgot the
 UPDATE bit.
 
  IF
   (a query matching your old data returns rows)
  THEN
   UPDATE with your new data
  ELSE
   INSERT your new data
 
 Still exists race condition.  Your race comes from
 testing existence,  
 then creating/modifying data afterwards.  You need
 to make the test/ 
 set atomic else you have race.
 

Yes, but how do you do that in a stored function or
procedure or in a trigger.  It would be obvious to me
if I were writing this in C++ or Java, but how do you
do it using SQL in an RDBMS?  

I saw something about table locks, but that doesn't
seem wise, WRT performance.

The classic example of a race condition, involving a
bank account, was used in the manual to introduce the
idea of a transaction, but we can't use a transaction
in a trigger, can we?

It is one thing to point out a race condition, but a
pointer to a solution that would work in the context
of the problem at hand would be useful and
appreciated.

Thanks all.

Ted

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] SQL design pattern for a delta trigger?

2007-12-10 Thread Ted Byers

Thanks Erik
 
 In a stored procedure you'd just execute the UPDATE
 and then check  
 the FOUND variable to see if it found a row to
 update:
 
 UPDATE table_name SET foo='bar' WHERE id=5;
 
 IF NOT FOUND THEN
   INSERT INTO table_name (id, foo) VALUES (5, 'bar');
 END IF;
 
To be clear, if I understand you correctly, with your
example, if there is no record where id=5, nothing
happens except FOUND is set to false?  Can I, then,
declare a variable prior to your update statement, and
then modify your update statement so that the value in
a particular field on the row where id=5 can be
captured?  Bearing in mind this is to be in a row
level trigger after an insert into table_name,
something like:

DECLARE q DOUBLE;
UPDATE  table_name 
   SET foo='bar',
   q = table_name.quantity 
 WHERE id=5;

And then follow that with something like:
IF FOUND THEN
  INSERT INTO another_table (baz,quantity)
VALUES (foo,q+NEW.quantity);
ELSE
  INSERT INTO another_table (baz,quantity)
VALUES (foo,NEW.quantity);
END IF

Thanks again,

Ted

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Simpler dump?

2007-12-09 Thread Ted Byers

Is there a way to tell pg_dump to just dump the SQL
statements required to create the tables, sequences,
indeces, keys, c.?  I DON'T need to restore or
recreate things like users, or most other kinds of DB
objects.  Just routine DDL statements.  Looking
through a dump file for a small database, it looks
like pg_dump is serious overkill, dumping a lot of
stuff I don't need (since I'm just using defaults for
them anyway).

I am developing a new DB app, to be deployed on a web
based host on the other side of the planet.  There is,
at present, no 'data', and the only information to be
transferred consists of the various tables, indeces,
c. I am creating.

Obviously, we don't want to put any of my test data on
a server that will in due course be the production
host, when the app goes live (so once my colleagues on
the other side of the planet have had a chance to play
with what I've developed, we'll mostly empty the DB of
test data, except for a small amount of data we've
obtained).  I expect that a few tiny csv files I have
here will be ftped to the host and we'd use a simple
script to bulk load that.  Another fly in the ointment
is that the hosting company is still using v 8.1.9 and
I am using 8.2.5 on my machine, so I am concerned that
a regular dump and restore may be problematic: it
hasn't worked so far, but then I've spent much of my
time so far wrestling with phppgadmin.  :-(

I'm just looking for something that will save me a
little time.  I've created the core of the DB already
on my development machine, using pgAdmin, but I can
recreate it in about a day using Emacs to create a SQL
script that preproduces what I did in pgAdmin.

Any information would be appreciated.

Thanks

Ted

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] SQL design pattern for a delta trigger?

2007-12-07 Thread Ted Byers


--- Erik Jones [EMAIL PROTECTED] wrote:

 
 On Dec 6, 2007, at 2:36 PM, Ted Byers wrote:
 
 [snip]
 What you want to do here for handling the update v.
 insert is called  
 an UPSERT.  Basically, what you do is run the
 update as if the row  
 exists and catch the exception that is thrown if it
 doesn't at which  
 point you insert the record with the end date =
 now().  After that  
 you can proceed normally with creating the new
 record with start date  
 = now() and end date = NULL.
 
Thanks Eric.  Do you know of an URL where this is
discussed or where I can find an example.  None of my
books discuss this, and my search using google has so
far produced only noise.

Thanks again.

Ted

---(end of broadcast)---
TIP 6: explain analyze is your friend

[GENERAL] SQL design pattern for a delta trigger?

2007-12-06 Thread Ted Byers

IS there such a thing?  I can be the first to consider
this.

What I am aiming for is a solution with a couple
coupled tables, one of which represents state through
time and the other represents transactions or deltas
on the state.  With one field (a floating point
number) in the state table (or should I say a number
for each id field), it obviously has a time interval
for which it is valid: a start time and an end time.

What I am after is a situation where the moment a
record is inserted in the deltas table, a trigger
function first looks to see if the id provided
presently has a state in the state table.  If not,
then it creates one.  Then, the  end date for that
state record gets set to the current time and a new
record is inserted with the new state (computed by
applying the delta to the value in the previous record
for the state), the current date as the start date and
null for the end date.

This seems like an obvious thing to try, but I am
floundering a little and am therefore wondering if
anyone has seen an SQL design pattern that talks about
this, and an url where I can see such a discussion
and, better, an example.  The first concern is to
ensure that every record inserted into the deltas
table is immediately reflected in the state table, and
the second is that the history of state can be
reconstructed from a suitable query on the state
table.

I can do this easily in client code, but isn't this
the sort of thing best suited to living within the
database itself?

Thanks

Ted

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] SQL design pattern for a delta trigger?

2007-12-06 Thread Ted Byers


--- Ted Byers [EMAIL PROTECTED] wrote:

 IS there such a thing?  I can be the first to
 consider
 this.
 
OOPS.  The mind is faster than the fingers.  That
should have been I can NOT be the first to consider
this.

Ted

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] stored procedures and dynamic queries

2007-12-04 Thread Ted Byers

?  Is it different from what I normally
do with regard to ensuring all the requisite data is
available, properly normalized, with a suitable suite
of indeces, keys, c., and as  simple as practicable? 
(That is, over-simplification is avoided.)  I also
tend to ensure that all user access to the data is
through either a stored procedure or a read only view
(perhaps with a little paranoia thrown in ;).

Ted

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Need help with complicated SQL statement

2007-11-19 Thread Ted Byers

Thanks Shane,

It works reasonably well.  It gets the right answer,
but I guess my data set is much larger than your test.

Please consider the appended data.

The first two SQL statements are directly comparable. 
My Left join is marginally simpler, as shown by
EXPLAIN, and runs to completion in about 3 seconds
(elapsed real time), as opposed to about 30 seconds
for the two views.  It makes a little sense, though,
in that according to EXPLAIN, the LEFT JOIN needs to
consider a dramatically smaller number of rows.  What
I find puzzling, though, is that it takes less time to
get the report for 28 stocks at a given time than it
does to get the report for 1. (Both take about 30
seconds, but for 28 stocks, it takes about 0.005
seconds less time ;-)

This is a case where LEFT JOINS appear to be much
faster than subqueries.

I appreciate all your help, but I am struggling to
figure out how best to adapt my LEFT JOINs in your
VIEWs, so that the latter benefit from the speed of
the JOINs.  The heart of my problem is to figure out
how to  use a stock_id in the WHERE clause.

One thing I am not certain of is, Is there a way to
preserve the logic of the WHERE clauses by replacing
the WHERE clause, which I use to sample the time
series at 22 days ago, 66 days ago, 132 days ago c.,
by a GROUP BY clause, grouping by stock_id?  If so,
might that, along with an additional LEFT JOIN, get me
the result I am after?

I created a stored procedure that takes an id argument
(and can usefully invoke it on any stock_id in the
database), but the problem remains as to how to
construct a record set by applying the procedure to
each id in a set of ids returned, e.g., by SELECT
stock_id FROM stocks;

Ted

=== test data =
EXPLAIN SELECT A1.stock_id,
A1.price_date,
A1.adjusted,
A2.price_date AS pd22,
100.0 * (A1.adjusted -
A2.adjusted)/A2.adjusted AS gl22pc,
A3.price_date AS pd66,
100.0 * (A1.adjusted -
A3.adjusted)/A3.adjusted AS gl66pc,
A4.price_date AS pd132,
100.0 * (A1.adjusted -
A4.adjusted)/A4.adjusted AS gl132pc,
A5.price_date AS pd264,
100.0 * (A1.adjusted -
A5.adjusted)/A5.adjusted AS gl264pc
 FROM
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1) AS A1
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 22) AS A2
 ON A1.stock_id = A2.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 66) AS A3
 ON A1.stock_id = A3.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 132) AS A4
 ON A1.stock_id = A4.stock_id
 LEFT JOIN
 (SELECT * FROM stockprices WHERE stock_id = 1 ORDER
BY price_date DESC LIMIT 1 OFFSET 264) AS A5
 ON A1.stock_id = A5.stock_id;
++-+-++---+-+-+--+---+-+
| id | select_type | table   | type   |
possible_keys | key | key_len | ref  | rows  |
Extra   |
++-+-++---+-+-+--+---+-+
|  1 | PRIMARY | derived2  | system | NULL  
   | NULL| NULL| NULL | 1 | |
|  1 | PRIMARY | derived3  | system | NULL  
   | NULL| NULL| NULL | 1 | |
|  1 | PRIMARY | derived4  | system | NULL  
   | NULL| NULL| NULL | 1 | |
|  1 | PRIMARY | derived5  | system | NULL  
   | NULL| NULL| NULL | 1 | |
|  1 | PRIMARY | derived6  | system | NULL  
   | NULL| NULL| NULL | 1 | |
|  6 | DERIVED | stockprices | ref| PRIMARY   
   | PRIMARY | 4   |  | 17442 | Using where |
|  5 | DERIVED | stockprices | ref| PRIMARY   
   | PRIMARY | 4   |  | 17442 | Using where |
|  4 | DERIVED | stockprices | ref| PRIMARY   
   | PRIMARY | 4   |  | 17442 | Using where |
|  3 | DERIVED | stockprices | ref| PRIMARY   
   | PRIMARY | 4   |  | 17442 | Using where |
|  2 | DERIVED | stockprices | ref| PRIMARY   
   | PRIMARY | 4   |  | 17442 | Using where |
++-+-++---+-+-+--+---+-+
10 rows in set (0.08 sec)

EXPLAIN SELECT * FROM stock_price_history WHERE
stock_id = 1;
+++-+---+---+-+-+---+++
| id | select_type| table   | type  |
possible_keys | key | key_len | ref  
| rows   | Extra  |
+++-+---+---+-+-+---+++
|  1 | PRIMARY| derived3  | ALL   | NULL
 | NULL| NULL| NULL  |   
494 | Using

Re: [GENERAL] Need help with complicated SQL statement

2007-11-17 Thread Ted Byers


--- Shane Ambler [EMAIL PROTECTED] wrote:

 Ted Byers wrote:
  Please consider the following statement (it
 becomes
  obvious if you remember the important thing about
 the
  table is that it has columns for each of stock_id,
  price_date, and price).
  
  (SELECT * FROM stockprices WHERE stock_id = 1
 ORDER BY
  price_date DESC LIMIT 1)
  UNION
  (SELECT * FROM (SELECT * FROM stockprices WHERE
  stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS
 T2
  ORDER BY T2.price_date ASC LIMIT 1)
  UNION
  (SELECT * FROM (SELECT * FROM stockprices WHERE
  stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS
 T3
  ORDER BY T3.price_date ASC LIMIT 1)
  UNION
  (SELECT * FROM (SELECT * FROM stockprices WHERE
  stock_id = 1 ORDER BY price_date DESC LIMIT 132)
 AS T4
  ORDER BY T4.price_date ASC LIMIT 1)
  UNION
  (SELECT * FROM (SELECT * FROM stockprices WHERE
  stock_id = 1 ORDER BY price_date DESC LIMIT 264)
 AS T5
  ORDER BY T5.price_date ASC LIMIT 1);
  
  This statement works flawlessly, and is blindingly
  fast relative to everything else I have tried. 
 But I
  am stuck.
 
 I would have these subselects as -
 
 UNION
 (SELECT * FROM stockprices WHERE
   stock_id = 1 ORDER BY price_date DESC LIMIT 1
 OFFSET 264)
 
 I would expect that to give the same result but make
 the query plan a 
 bit simpler and quicker using less memory.
 
It gave apparently correct values, but for some
reason, it insisted on returning thousands upon
thousands of identical record.  There is something
awry there, but I can't place what.  Yes, I know I
could use SELECT DISTINCT, but I worry that it may be
doing a full table scan, as opposed to the relatively
direct lookup I came up with after looking at your
statement.  I don't yet know how long it would take
because it is the slowest option I tied, and I gave up
after it had returned over 10,000 rows and still
showed no signs of finishing.  I don't understand this
as explain returned apparently much better results for
yours than it did for mine.

My latest is as follows:

SELECT A1.stock_id,
   A1.price_date,
   A1.adjusted,
   A2.price_date AS pd22,
   100.0 * (A1.adjusted - A2.adjusted)/A2.adjusted
AS gl22pc,
   A3.price_date AS pd66,
   100.0 * (A1.adjusted - A3.adjusted)/A3.adjusted
AS gl66pc,
   A4.price_date AS pd132,
   100.0 * (A1.adjusted - A4.adjusted)/A4.adjusted
AS gl132pc,
   A5.price_date AS pd264,
   100.0 * (A1.adjusted - A5.adjusted)/A5.adjusted
AS gl264pc
FROM
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1) AS A1
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 22) AS A2
ON A1.stock_id = A2.stock_id
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 66) AS A3
ON A1.stock_id = A3.stock_id
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 132) AS A4
ON A1.stock_id = A4.stock_id
LEFT JOIN
(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1 OFFSET 264) AS A5
ON A1.stock_id = A5.stock_id;

This still gives me the correct answer, but is faster
still than anything I came up with before.  

Now that I have the correct result for one stock, I
need to adapt it to apply to each stock individually,
in some small selection from a large number of stocks.

Thanks again

Ted

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] Need help with complicated SQL statement

2007-11-16 Thread Ted Byers

Please consider the following statement (it becomes
obvious if you remember the important thing about the
table is that it has columns for each of stock_id,
price_date, and price).

(SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY
price_date DESC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2
ORDER BY T2.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3
ORDER BY T3.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4
ORDER BY T4.price_date ASC LIMIT 1)
UNION
(SELECT * FROM (SELECT * FROM stockprices WHERE
stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5
ORDER BY T5.price_date ASC LIMIT 1);

This statement works flawlessly, and is blindingly
fast relative to everything else I have tried.  But I
am stuck.

First, while this statement gets me the correct data,
I need to obtain a single record with stock_id,
current price (that obtained from the first select
statement in the union, and each of the prices
returned by the subsequent select statements as a the
current price minus the price at the previous date,
and the result divided by the price at the previous
date, expressed as a percentage.  I do not yet know
how to do this using SQL (it would be trivial if I
exported the data to Java or C++ - but it isn't clear
how to do it within SQL).

To make things more difficult, suppose I have another
select statement that returns a set of stock_ids.  How
do I apply the SQL logic I require to only those
stocks in the set returned by a statement like SELECT
stock_id FROM someTable WHERE ...  The result of this
extension would be that I have one record for each
stock in the selected set of stocks.

I do NOT want to have to recompute the set of stocks
for each of the select statements in the above union
(since that would be a waste because the resulting set
of stocks would always be the same for the given
criteria).  Nor do I want to apply the SQL logic I
need for the prices to all the stocks in the database.
 There could be thousands, or even tens of thousands,
of stocks represented in the database and I'd need the
gain/loss logic only for a few dozen at any given
time!

How do I make the two extensions I require?

I expect the SQL I get to be eventually placed in a
stored procedure, which may then be used to construct
a view, but that is the easy part.

Maybe I have been staring at this for too long to see
the obvious solution, but I am exhausted and am not
seeing the next step.  If there IS an obvious next
step, please at least give me a hint.

Thanks

Ted


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/

Re: [GENERAL] Optimal time series sampling.

2007-11-09 Thread Ted Byers


--- Gregory Stark [EMAIL PROTECTED] wrote:

 Ted Byers [EMAIL PROTECTED] writes:
 
  As a prelude to where I really want to go, please
  consider the following SELECT statement.
 
SELECT close_price FROM stockprices A
WHERE price_date =
   (SELECT MAX(price_date) FROM stockprices B
WHERE A.stock_id = B.stock_id AND A.stock_id
 = id);
 
 I assume you're missing another stock_id = id on
 the outer query?
 
Right.

 I think you'll have to post the actual explain
 analyze output you're getting
 and the precise schema you have.

OK, it is challenging to present it in plain text, but
here is the HTML exported by MySQL Query Browser.  If
you cut between the   lines and paste the
content into a file with an html extension, it will
look fine and be easy to read.
==
html
head
titleQuery EXPLAIN SELECT price_date,`close` AS
close_price FROM stockprices A   WHERE A.stock_id = 1
AND price_date IN  (SELECT MAX(price_date) FROM
stockprices B   WHERE A.stock_id = B.stock_id AND
A.stock_id = 1 GROUP BY
YEAR(B.price_date),WEEKOFYEAR(B.price_date));, Fri Nov
09 11:12:46 2007
/title
meta http-equiv=Content-Type content=text/html;
charset=utf-8
/head
bodyh1Query EXPLAIN SELECT price_date,`close` AS
close_price FROM stockprices A   WHERE A.stock_id = 1
AND price_date IN  (SELECT MAX(price_date) FROM
stockprices B   WHERE A.stock_id = B.stock_id AND
A.stock_id = 1 GROUP BY
YEAR(B.price_date),WEEKOFYEAR(B.price_date));, Fri Nov
09 11:12:46 2007
/h1
table border=1 cellspacing=1 cellpadding=0tr
thid/ththselect_type/ththtable/ththtype/ththpossible_keys/ththkey/ththkey_len/ththref/ththrows/ththExtra/th/tr
tr
td1/tdtdPRIMARY/tdtdA/tdtdref/tdtdPRIMARY/tdtdPRIMARY/tdtd4/tdtdconst/tdtd17442/tdtdUsing
where/td/tr
tr
td2/tdtdDEPENDENT
SUBQUERY/tdtdB/tdtdref/tdtdPRIMARY/tdtdPRIMARY/tdtd4/tdtdconst/tdtd17442/tdtdUsing
where; Using index; Using temporary; Using
filesort/td/tr
/table
/body/html
===

 You might need an
 index on
 stock_id,price_date.
 
That is the definition of the primary key.  That is
why both the main query and the subquery are shown
using the primary key.

 That would be an entirely different ball of wax than
 trying to pull out a
 single stock's closing price. I suspect you're going
 to want to use Postgres's
 DISTINCT ON SQL extension. Something like:
 
 SELECT DISTINCT ON (stock_id,price_date) *
   FROM stockprices
  ORDER BY stock_id, price_date DESC
 
 And you may want an index on  stock_id, price_date
 DESC 
 
That pair, as I mentioned, formed the primary key for
the stockprices table.

Here is my SQL for subsampling a time series:

SELECT price_date,close_price FROM stockprices A
  WHERE A.stock_id = 1 AND price_date IN
 (SELECT MAX(price_date) FROM stockprices B
  WHERE A.stock_id = B.stock_id AND A.stock_id = 1
GROUP BY YEAR(B.price_date),WEEKOFYEAR(B.price_date));

This performs better than the outer join algorithm for
getting only the last price for a couple dozen stocks.
 This particular statement crawls to completion in
about 4 or 5 minutes, as compared to over ten to get
just the last price for a couple dozen stocks.

Not too surprisingly, Explain gives identical results
for this query as it did for the simpler SELECT above.

 Do you care what happens if there were no trades for
 a given stock in the time
 period? The query you give above using MAX would
 still work but the query I
 described using DISTINCT ON would not emit a record
 for the stock at all.
 
No.  But then I haven't yet analyzed the data to learn
what the prices for a given period really mean if
there haven't been any trades within the period.  I
have yet to see a series of prices for which the
volume is 0.  That may be an artifact of how my
colleagues selected stocks and etfs to use to test our
algorithm.  I do not yet know if it will be a
significant issue for us since I don't see how a stock
or etf that can go for a while without any trades at
all would be of interest given the kind of information
we will be producing for our clients.  I am presently
focussed on developing ways of looking at the data, to
help my colleagues better understand the data and what
our algorithm does with it.

One of the properties of interest, and relevance to
our algorithm is the common property that the series
seem to be self affine (as described most notably by
B. Mandlebrot): hence the need to sample with
different degrees of granularity.  My colleagues have
worked primarily with finance data (esp. commodities),
while my background is more focussed on risk
management in environmental science.

Thanks

Ted

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Optimal time series sampling.

2007-11-09 Thread Ted Byers


--- Tom Lane [EMAIL PROTECTED] wrote:

 Ted Byers [EMAIL PROTECTED] writes:
  OK, it is challenging to present it in plain text,
 but
  here is the HTML exported by MySQL Query Browser.
 
 Why are you asking this list for help with a MySQL
 performance problem?
 
because my question isn't really about MySQL, but
rather about how best to construct the SQL required to
get the job done, regardless of what database is used.
 I have seen some claims that it is better to use
joins instead of correlated subqueries and others that
say the opposite.  And I do not, at this stage, know
if there are other options in SQL that may or may not
be better.  At this time, the database in use is
irrelevant (I want to stick as close to the ANSI
standard as practicable so the rewriting required will
be minimal should we decide to change the database
later, for whatever reason).  

Thanks,

Ted

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Optimal time series sampling.

2007-11-09 Thread Ted Byers


--- Scott Marlowe [EMAIL PROTECTED] wrote:

 On Nov 9, 2007 11:47 AM, Ted Byers
 [EMAIL PROTECTED] wrote:
 
  --- Tom Lane [EMAIL PROTECTED] wrote:
 
   Ted Byers [EMAIL PROTECTED] writes:
 [snip] 
 Which is better depends largely on how your database
 is built.  MySQL
 still uses loops for all subselects, so with large
 numbers of tuples
 in the subselect method, it will be slow.  But they
 might fix this in
 a later release.
 
 Fairly recent versions of PostgreSQL could make some
 bad choices when
 doing joins for certain datasets that would be much
 faster with a
 correlated subquery (specifically the old left join
 where
 righttable.field is null trick made some pgsql
 versions choose an
 inefficient join method)
 
 So, the right way is a question of which db, and
 even which version
 of that DB you're on.
 
My life just got s much more complicated.  Oh
well, I guess that will be useful when providing
advice to management when they start seeing
performance issues.  Thanks ;-)

Have you looked at version 5.0.45?  I am always seeing
the claim that the left join trick is so much more
faster than the correlated subquery, especially if a
function like MAX() is used, but the numbers I am
seeing with real stock price data has it running, on
average, about three times faster than the left join. 
So I assumed I was doing something wrong in a manner
than would get me the right answer the slowest way
possible.


   At this time, the database in use is
  irrelevant (I want to stick as close to the ANSI
  standard as practicable so the rewriting required
 will
  be minimal should we decide to change the database
  later, for whatever reason).
 
 If you want to stick with ANSI, MySQL tends to be
 more divergent from
 the spec than pgsql and other rdbms.
 
The books I am using, which describe the SQL language,
don't seem to mention or illustrate much difference
among any of the rdbms (including my references that
talk about Oracle and MS SQL Server).  The SQL I try
from those books seem to work reasonably well in all
of them (I can't check against Oracle, though, since I
don't have that), and I try most of my SQL against
MySQL, Postgres and MS SQL Server (the biggest
divergences seem to be in how bulk loading of data
happens).  Maybe I haven't explored enough of the SQL
language, with large enough datasets, to see the
differences you mention; or perhaps things are
improving  with all of them.  

 Most people would consider the correlate subquery
 the better method.
 But it's also likely to be the slowest on MySQL.
 
Right now, with this particular query the correlated
subquery is the one that gets me the right answers
about 3 times faster than any other method I have
tried.  But it still takes several minutes to get the
results for only a few dozen stocks.  And yet I can
get several megabytes of data from the following query
in about a quarter of the time.

SELECT price_date,close_price FROM stockprices A
  WHERE A.stock_id = 1 AND price_date IN
 (SELECT MAX(B.price_date) FROM stockprices B
  WHERE A.stock_id = B.stock_id AND A.stock_id = 1
GROUP BY YEAR(B.price_date),WEEKOFYEAR(B.price_date));

I had expected this to take many times longer than the
simple select that gets only the last price for a
given stock, but I was surprised to see it so much
faster than the query that gets just the last prices
for only a couple dozen stocks.

Thanks alot.  I learned alot from your reply.

Ted

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[GENERAL] Optimal time series sampling.

2007-11-08 Thread Ted Byers

As a prelude to where I really want to go, please
consider the following SELECT statement.

  SELECT close_price FROM stockprices A
  WHERE price_date =
 (SELECT MAX(price_date) FROM stockprices B
  WHERE A.stock_id = B.stock_id AND A.stock_id =
id);

stockprices has a primary key comprised of stock_id
and price_date, and I tried the same query with an
extra inex on price_date (but that index made no
difference in apparent performance as seen on the
clock on the wall).

I have been advised (on the MySQL board), to use the
following (with the claim, unsupported as far as I can
tell, that it is both correct and much faster - it
appears to be correct, but it is certainly no faster):

  SELECT A.`close` AS close_price
  FROM stockprices A LEFT JOIN stockprices B
ON A.stock_id = B.stock_id
  AND B.price_date  A.price_date
  WHERE B.price_date IS NULL
  AND A.stock_id = id;

It appears to do the right thing.  I certainly get the
right answer, but I am not seeing a significant
difference in performance.  Worse, when I invoke
something like it for a suite of about two dozen
stocks, it takes about ten minutes to complete.  (I
may try a variant in which the last clause used in
WHERE is replaced by IN followed by a trivial select
that gets the same two dozen stock_ids, to see if that
helps.)

Now, I am concerned with performance because, and this
is where I really want to go, I want to adapt this
logic to create new time series of closing prices, but
at the granularity of a week, a month or quarter, and
there is no predicting a priori how long the series
is.  IBM's data goes back decades while I have data
for other stocks that go back only a couple years.

Now, a junior programmer here had suggested just doing
a simple select, at least for weekly granularity, by
selecting a value if it's day of the week computes to
Friday.  That can't work correctly because in some
weeks, there are statutory holidays that land on
Fridays, resulting in the last actual trading day for
that week being Thursday.  His simple approach
guarantees that many records that ought to be included
will be ignored.  I need a more dynamic and flexible
approach which allows me to work on the basis that I
have prices for all trading days for a given stock
from the time my data for it begins.  So I need a more
complex select statement that will just select the
most recent price for a given stock for each week (or
month or quarter or year).

Now, I can get the full time series for two dozen
stocks, as slow and brain dead as doing a select for
each stock ID, AND have my Java code construct and
display a chart, in less than 20 seconds (and Java
does not have a reputation for being fast).  I need
whatever solution I use to be that quick.

Any thoughts about how best to attack this in order to
get the correct results as fast as is possible?  What
options would you consider, WRT defining the SQL
statements you would benchmark, in order to design
your benchmark testing?

Thanks,

Ted

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] young guy wanting (Postgres DBA) ammo

2007-11-03 Thread Ted Byers


--- Greg Smith [EMAIL PROTECTED] wrote:

 On Fri, 2 Nov 2007, Kevin Hunter wrote:
 
  I don't have ammo to defend (or agree?) with my
 friend when he says 
  that Postgres requires a DBA and MySQL doesn't so
 that's why they 
  choose the latter.
 
 [snip]
 
 To step back for a second, the software industry as
 a whole is going 
 through this phase right now where programmers are
 more empowered than 
 ever to run complicated database-driven designs
 without actually having to 
 be DBAs.  It used to be that you needed a DBA for
 every job like this 
 because they were the only people who knew how to
 setup the database 
 tables at all, and once they were involved they also
 (if they were any 
 good) did higher-level design planning, with
 scalabilty in mind, and 
 worried about data integrity issues.
 
 Software frameworks like Ruby on Rails and Hibernate
 have made it simple 
 for programmers to churn out code that operates on
 databases without 
 having the slightest idea what is going on under the
 hood.  From a 
 programmer's perspective, the better database is
 the one that requires 
 the least work to get running.  This leads to
 projects where a system that 
 worked fine in development crashes and burns once
 it reaches a 
 non-trivial workload, because if you don't design
 databases with an eye 
 towards scalability and integrity you don't
 magically get either.
 
As one of these programmers, where is the best place
to find the information I need to get it right. 
Finding information, and finding good information is
not the same thing, and I am wary of 99% of what I
find using google.  Since you know what a DBA needs to
know, I ask you where I can learn what you believe a
good DBA needs to know.  Or am I OK just relying on
the documentation that comes with a given RDBMS
(Postgres, MySQL, MS SQL, c.)?

Ted

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Design Question (Time Series Data)

2007-10-04 Thread Ted Byers


--- Andreas Strasser [EMAIL PROTECTED]
wrote:

 Hello,
 
 i'm currently designing an application that will
 retrieve economic data 
 (mainly time series)from different sources and
 distribute it to clients. 
 It is supposed to manage around 20.000 different
 series with differing 
 numbers of observations (some have only a few dozen
 observations, others 
 several thousand) and i'm now faced with the
 decision where and how to 
 store the data.

If you really have such a disparity among your series,
then it is a mistake to blend them into a single
table.  You really need to spend more time analyzing
what the data means.  If one data set is comprised of
the daily close price of a suite of stocks or mutual
funds, then it makes sense to include all such series
in a given table, but if some of the series are daily
close price and others are monthly averages, then it
is a mistake to combine them in a single table, and
two or more would be warranted.  Or if the data are
from different data feed vendors, then you have to
think very carefully whether or not the data can
logically be combined.

 
 So far, i've come up with 3 possible solutions
 
 1) Storing the observations in one big table with
 fields for the series, 
 position within the series and the value (float)
 2) Storing the observations in an array (either in
 the same table as the 
 series or in an extra data-table)
 3) Storing the observations in CSV-files on the hard
 disk and only 
 putting a reference to it in the database
 
I don't much like any of the above.  When I have had
to process data for financial consultants, I applied a
few simple filters to ensure the data is clean (e.g.
tests to ensure data hasn't been corrupted during
transmission, proper handling of missing data, c.),
and then bulk loaded the data into a suite of tables
designed specifically to match the vendor's
definitions of what the data means.  Only then did we
apply specific analyses designed in consultation with
the financial consultant's specialists; folk best
qualified to help us understand how best to understand
the data and especially how it can be combined in a
meaningful way.

If the data are stored in a suite of well defined
tables, subsequent analyses are much more easily
designed, implemented and executed.

I do not know if PostgreSQL, or any other RDBMS,
includes the ability to call on software such as R
to do specific statistical analysis, but if I had to
do some time series analysis, I would do it in a
client application that retrieves the appropriate data
from the database and either does the analysis in
custom code I have written (usually in C++, as some of
my favourite analyses have not made it into commonly
available open source or commercial statistical
software) or invokes the appropriate functions from
statistical software I have at my disposal.  The
strategy I describe above makes the SQL required for
much of this dirt simple.

HTH

Ted

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Design Question (Time Series Data)

2007-10-04 Thread Ted Byers

--- Michael Glaesemann [EMAIL PROTECTED] wrote:

 
 On Oct 4, 2007, at 9:30 , Ted Byers wrote:
 
  I do not know if PostgreSQL, or any other RDBMS,
  includes the ability to call on software such as
 R
 
 See PL/R:
 
 http://www.joeconway.com/plr/
 
Thanks.  Good to know.

Ted


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] How to avoid: 9.50184e+06

2007-09-28 Thread Ted Byers


--- Stefan Schwarzer [EMAIL PROTECTED]
wrote:

 Hi there,
 
 how can I avoid results like this: 9.50184e+06
 
 Instead it should return the real value, as
 950184.
 
But 9.50184e+06 IS the real value!  That is about nine
and a half million, not nine hundred and fifty
thousand, BTW.  I do not see why you want the database
back end to divide the real number by ten and then
display the result as an integer.  

I have not checked tbe behaviour of the functions
provided by Postgresql to convert numbers to strings,
but I would be surprised if a function suitable for
use in serializing a floating point number failed to
show 9.50184e+06 as 9.50184e+06; to have it
automagically convert it to an integer would be
counter intuitive to me.  Really, how a number is
displayed is properly in the domain of the client
application.  If it is written in C, then you have
functions like printf, with all their glorious format
specifiers, to give you exactly what you want and
expect.  And you have similar control with IO streams
in C++ and Java.  ALL real programming languages
provide support for producing formatted output, and
give you absolute control over the format used.

Whether you are using a thin client or a thick client,
manipulating how floating point numbers really belongs
in the interface layer of the client.

If you want your numbers displayed as integers, then
you used the wrong type (as others have also
suggested).  If the data really requires use of
floating point numbers, then use the libraries
provided by whatever language you're using to develop
your client to produce the format you want.

HTH

Ted

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Audit-trail engine inner-workings

2007-08-21 Thread Ted Byers


--- Marcelo de Moraes Serpa [EMAIL PROTECTED]
wrote:

 Hello list,
 [snip] 
 
 * Each http request gets an instance of a session_id
 ? Or is it
 per-connection ?

It depends.  There is no necessary correlation between
a session as defined within a web application and a
session as defined in the RDBMS.  I routinely set up
web applications that may have multiple sessions as
seen by the RDBMS.  Consider the idiom of doing
operations with the least priviledge required to get
the job done.  The application might log into the
database as one databse user with very limited rights,
to authenticate the user and pass data to the web
application regarding what the user is permitted to do
(all server side, on the application server).  Then,
the application may log in as a very different user
with limited rights to perform some operation the user
has initiated.  So far, we have two sessions as far as
the database is concerned and only one as far as the
web application is concerned.

If you're working with web applications, you must know
that multiple http requests can share a web
application session, as can multiple web applications
(if written to do so using the single sign-on idiom),
assuming you use technologies such as cookies or URL
rewriting or hidden form fields, or the magic of
ASP.NET, to set up and maintain sessions, and that the
session is restricted to a single http request if you
don't (plain old http/html is stateless, so there is
no useful concept of session without help from other
technologies).

HTH

Ted


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] [RESEND] Transaction auto-abort causes grief with Spring Framework

2007-08-17 Thread Ted Byers

Sorry for top posting - but this is an annoying of this web interface to email. 
 :-(
   
  Isn't what you're doing here a misuse of the idea of a transaction.  I don't 
claim to be an expert in this, but I thought the idea of a transaction was that 
you bundle a group of statements together that must all succeed or all fail 
together; so if any one of them fails, all the preceding work completed in the 
transaction gets rolled back.  Did I misunderstand something here, then?  If 
not, then the operations you describe for your first transaction really ought 
not be in the same transaction.
   
  Now, when I run a script of SQL statements in MS SQL Server, a statement that 
encounters an error will simply stop, and then control can be passed to the 
next SQL statement in the script, BUT THIS IS NOT IN A TRANSACTION!!!  It is 
only a script, and in the context in which I was working, I didn't need to use 
transactions.  And, I can provide additional options in the statements that 
will modify this behaviour, so the script aborts at the error condition rather 
than continuing with the next SQL statement in the script.  With some 
statements, what I would regard as an error requiring the script to abort seem 
to be regarded as merely a warning by default.  For example, in a bulk load of 
data into a table, and there is a problem with the data for the tenth field on 
the second row, the first row, which is correct, is stored, the statement 
aborts at the problem row, and control is passed to the next SQL statement.  In 
my situations, I had other programs that would clean up the
 data if this sort of problem arises, so where the problem is seen by default 
as warranting only a warning, I could upgrade it to be regarded as an error.  
And of course, there are options for controlling how the script behaves when an 
error occurs.  Perhaps that is involved in the behaviour you are reporting for 
MS SQL Server??  I haven't investigated this myself as I haven't had the 
problem you describe.
   
  I didn't quite understand your description, in another post, of how Spring 
treats your database statements.  Am I to understand it puts all your SQL 
statements into a single transaction?  If so, either they badly mishandle 
transactions or they are working with a very different concept of what a 
transaction is.
   
  One last question.  You describe part of your problem as being correct 
addition of data to an audit table.  If I haven't misunderstood what you're 
doing, isn't it incomplete if you record only statement failures?  When I deal 
with audits, I put the logic into triggers whenever possible.  And I'd maintain 
an independant error log from my application code, probably in a catch clause, 
and this either goes to a flat file or uses an independant connection to the 
database.  And my practice is to use separate try/catch blocks for each 
statement that can throw an exception, to improve the granularity of error 
handling logic.  That is the only way to have a chance of getting one statement 
to execute regardless of whether or not a preceding statement throws an 
exception.
   
  I have a special interest in this because I am just beginning to look at 
Spring (I downloaded it just a few days ago).
   
  Cheers,
   
  Ted

Tyson Lloyd Thwaites [EMAIL PROTECTED] wrote:
  Our app uses system state. We scan filesystems and record file 
information in a database.

Here is one example:


- insert 250 files
- update some stats (MUST work even if insert fails)
- update agent last-contact time (also must work so we know it's not awol)


When last insert has finished, server will kick into summary mode:


- wait for asynchronous inserts to finish
- summarise data
- do some misc operations
- switch back to 'Idle' state


That last step is where the trick is: if anything goes wrong with the 
previous operations, we MUST be able to put the system back into an 
'Idle' state. Currently we do this in a catch block, ie:

try {

} catch {

}

Now of course that won't work in pg. We use the same logic all through 
the system, so you can see my problem. For example, if the user deletes 
a drive record that still has referential links to it, we do this:

try {

} catch (ReferentialIntegrityException e) {

}

We rely on the fact that we can still do things and commit a transaction 
even if a single statement has failed.

The above drive delete case is where I first noticed the problem when 
switching from MSSQL: instead of the server handling the exception and 
doing something else instead, the 'something else' also threw an 
exception (txn is aborted), which propagated to the client end.

UPDATE: Webb has proposed that this behaviour may be MSSQL misbehaving.

-- 
Tyson Lloyd Thwaites
Lead Software Developer
Allianza Services Pty Ltd

M: 0404 456 858
P: (+618) 8232 5525
F: (+618) 8232 8252
W: www.allianza.com.au


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner

Re: [GENERAL] How best to represent relationships in a database generically?

2007-07-27 Thread Ted Byers

.  Yes, I'd love to have more 
computing power at my disposal, but that doesn't stop me from doing useful 
stuff now!  Modern ecological models are, at present, much more limited by the 
availability of good data, both for parameterizing models and for validating 
models, than they are by available computing resources.
   
  HTH
   
  Ted

Lincoln Yeoh [EMAIL PROTECTED] wrote:
  Hi,

Sorry, this really isn't postgresql specific, but I figure there are 
lots of smarter people around here.

Say I have lots of different objects (thousands or even millions?). 
Example: cow, grass, tiger, goat, fish, penguin.

BUT I'm not so interested in defining things by linking them to 
categories or giving them names, I'm trying to figure out a way to 
define things by their relationships with other things, and more 
importantly do searches and other processing by those relationships.

So, what would be the best way to store them so that a search for the 
relationship like grass is to cow, will also turn up cow is to 
tiger, and goat is to tiger, and fish is to penguin (and penguin is 
to bigger fish ;) ), and electricity is to computer. And a search for 
cow is to goat, could turn up tiger is to lion, and goat is to cow.

Is the only way to store all the links explicitly? e.g. have a huge 
link table storing stuff like obj = cow, subj = grass, type = 
consumes, probability=90% ( = means points/links to). Or even just 
have one table (links are objects too).

Or is it possible to somehow put the objects in a multidimensional 
space (1000 dimensions?) and keep trying to arrange the objects so 
that their relationships/vectors with/from each other are fairly 
consistent/reasonable based on current knowledge? Trouble is in 
some cases the grass eventually eats the cow, so maybe that doesn't 
work at all ;).

Or even do both? Maybe use the first as a cache, and the second for 
deeper stuff (flash of insight or got the punchline = figure 
out better arrangement/ joining of disparate maps).

My worry about the first approach is that the number of links might 
go up very much faster as you add more objects. But perhaps this 
won't be true in practice. The worry about the second approach is 
that it might get stuck, or run out of dimensions.

Is there a better way to do this? There must be right?

Wait for huge quantum computers and use qubits for each 
multidimensional coordinate? ;).

Regards,
Link.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Will partial index creation use existing index?

2007-07-24 Thread Ted Byers

I don't know about partial indexing, but in a similar situation to what you 
describe, I have resorted to special tables designed specifically to hold one 
day's worth of data and to support our daily processing.  I'd put the daily 
data into specific tables, with all the requisite indeces, and then, as part of 
the daily processing and after it has been processed, copy the day's data into 
the main tables and empty the tables used to hold the data temporarily.  It may 
not be elegant or pretty, but it serves to greatly simplify processing since 
you know a priori that these tables content only the data you need to process, 
and you can be certain that eventually it finds its way into the main data 
tables.  And you have your indeces only on the tables used for daily 
processing, rather than on your main tables.  An additional concern I had was 
that if the process for getting the data is vulnerable to error (and in my case 
it was), you can apply your verification procedures to it
 before bad data gets into your main tables, thereby reducing the risk of bad 
data getting into the database.  
   
  I though this might be an option for you if you have trouble getting your 
partial indeces to work well for you.
   
  HTH
   
  Ted

Steve Crawford [EMAIL PROTECTED] wrote:
  Does PostgreSQL use an existing index, if possible, when creating a
partial index?

By way of background, we have some nightly bulk processing that includes
a couple of 15-30 million row tables. Most of the processing is only
looking at prior-day data (up to ~200,000 rows) and for efficiency
requires several indexes. Except for this one process, the indexes are
useless and I'd rather not constantly maintain them.

Re: [GENERAL] COPY to/from non-local file

2007-06-27 Thread Ted Byers

I just checked the manual and it says nothing about being able to use the SQL 
copy to access network files.  
   
  But you have an option that almost works for you.  I am sure others here will 
have some suggestions, but if I were in your place, I'd gather more information 
about where bottlenecks exist in a solution that works for me, albeit slowly.
   
  To check your performance problem, you may want to add benchmarking code to 
your Ruby program to see where it is spending its time: on getting the data 
from PostgreSQL or writing it to a file.  That will tell you where your efforts 
are best spent.
   
  I don't work with Ruby, but if I was using C++ or Java or Perl, I'd use the 
same strategy: get the data en mass, and then write it out as efficiently as 
possible.  In C++ for example, there is an iostream class based idiom that 
using one line of code in the application program and that line is as fast as 
you can make your io code without delving deeply into IO processing and 
developing your own IOstream classes.  In both C++ and Java, you have stream 
classes, and significant control over how the data is written: one character at 
a time (if you want the process to take forever ;-) or in blocks of whatever 
size you want.  But this involves being willing to develop your own stream 
classes to implement your preferred buffering strategy.  In C++, you can save a 
lot of development time by having template policy classes that control how best 
to optimize disk IO.  In Perl, you can read a file en mass and then iterate 
through it a line at a time, but for this my preference at
 present is to use C++.  Since I don't know Ruby, I don't know how well it 
supports these, and related, IO programming idioms.  If it's support is poor, 
it may pay to use a more powerful and flexible language.  If it has outstanding 
power and flexibility for supporting IO optimization programming idioms, let me 
know and I'll invest the time to add Ruby to my list of languages in my 
development toolbox.  :-)  I would assume that what I have learned using the 
languages I know can be applied in some fashion to programming in Ruby.
   
  HTH
   
  Ted

Jaime Silvela [EMAIL PROTECTED] wrote:
  I've been looking around for this functionality:
Is it possible to use COPY with a file that doesn't reside in the DB's 
filesystem? I know there is sort of a solution in using COPY from stdin/ 
COPY to stdout, however that depends on calling the psql command, which 
breaks the flow of control of the programs I'm trying to write.

Here's the story: I have a Ruby on Rails application which sits on 
server A, and connects to a Postgres Database running on B. Using the 
regular flow, the rails code writes into various tables, and then 
generates a table that needs to be exported into a file which will be 
used as input into a separate program, a simulator, and kept in the 
filesystem of the Rails server. Using the COPY command would entail 
either of
a) Using COPY to generate the file on server B, then transfer to A - but 
how? ftp? I want to reduce this type of coupling
b) Using COPY TO STDOUT from psql, called in the Rails code with a 
backtick, then gathering the output and filing it. - but this solution 
depends on having psql in the path of the Rails server, reintroducing 
the server credentials, and from a programming point of view is kind of 
ugly.
c) The Postgres driver for Rails tries to give an interface to the COPY 
command using raw_connection, getline and endcopy, but it's quite 
brittle, so I'm avoiding it altogether.

At the moment I'm avoiding those solutions, and instead get the table 
into Rails space with a CURSOR on a SELECT, then simply write the file 
in Rails, one line at a time. However, the tables I'm writing are pretty 
big, and the performance is so much worse than with COPY...

Any suggestions?
Thanks
Jaime


***
Bear Stearns is not responsible for any recommendation, solicitation,
offer or agreement or any information about any transaction, customer
account or account activity contained in this communication.

Bear Stearns does not provide tax, legal or accounting advice. You
should consult your own tax, legal and accounting advisors before
engaging in any transaction. In order for Bear Stearns to comply with
Internal Revenue Service Circular 230 (if applicable), you are notified
that any discussion of U.S. federal tax issues contained or referred to
herein is not intended or written to be used, and cannot be used, for
the purpose of: (A) avoiding penalties that may be imposed under the
Internal Revenue Code; nor (B) promoting, marketing or recommending to
another party any transaction or matter addressed herein.
***

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Temporal Units

2007-05-03 Thread Ted Byers

Rich,
   
  I would think that as an ecologist, you would have a better sense than most 
here of the kinds of things I'd be doing.  After all, I am a mathematical 
ecologist by training and the majority of applications I have developed have 
been either for agricultural consultants or environmental consultants.  As a 
very simplistic example of where I find the kind of auditing I mentioned being 
priceless, consider a decision support tool for which the input data involves 
concentrations of various contaminants in water leaving some facility (at this 
point we don't care what the facility is).  While I would prefer that the input 
come from a laboratory information system, and fully automated, imagine that 
the data is recorded and input by some lab tech. as each test is completed.  On 
April 1, 2006, there is a typo leading the decision maker to believe that the 
concentration of contaminant X is 1 mg/l, and that this is an error and the 
true value is 1 g/l.  The decision maker may decide to
 do nothing because  1 mg/l is considered safe.  But doing nothing, some bad 
environmental effect occurs, and it is discovered days after the decision was 
made that the data entered is in error.  Who is to blame?  Where does 
responsibility lay and what corrective action is needed to ensure the problem 
does not recurr?  When there is an issue of accountability and responsibility 
involved in environmental issues, every scientific measurement taken becomes a 
candidate for the kind of multitemporal data I routinely use.
   
  Now, for lookup tables vs functions, there is always a tradeoff even within a 
given programming language (I am most familiar with FORTRAN, C++ and Java).  
The tradeoff is between memory use (including how that may affect performance 
if the table is large relative to available memory) and the speed or size of 
the function.  In the case of the date and time functions I typically use in my 
perl scripts, they are so short, I doubt the performance impact is significant 
or easily measurable.
   
  On a side note, how do you like Ruby and Python?  How would you compare them 
to the other options (such as C++, Java, Perl, c.)?
   
   
  Cheers,
   
  Ted

Rich Shepard [EMAIL PROTECTED] wrote:
On Mon, 30 Apr 2007, Ted Byers wrote:

 I am not sure I see why it would be good to do this using SQL, but I do
 know that I have used a number of Perl packages for this sort of thing.

 I am not arguing with you. I just want to know in what circumstances my
 schemas can be improved by a calendar table, and how it provides a
 benefit over my more usual Perl functions.

Ted,

Having never used such a table -- or having written an application that
had such a heavy use of temporal data rather than scientific data -- I have
no idea in what circumstances your schemas might be improved with a calendar
table.

I suspect, however, that a SQL table lookup may well be quicker than
running a script (or compiled function) in another language, and the table
is available for use in multiple apps. Isn't it faster or more efficient to
run SELECT queries with table lookups rather then use stored procedures?

For this web-based application, the UI and communications between client
and server are being written in Ruby (with Rails) while the report
generation is written in Python using ReportLab. If most of the queries can
be done with SQL, I think it will be much easier to maintain, modify, and
expand. Could be wrong, of course.

Rich

-- 
Richard B. Shepard, Ph.D. | The Environmental Permitting
Applied Ecosystem Services, Inc. | Accelerator(TM)
Voice: 503-667-4517 Fax: 503-667-8863

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [GENERAL] Temporal Units

2007-04-30 Thread Ted Byers

I am not sure I see why it would be good to do this using SQL, but I do know 
that I have used a number of Perl packages for this sort of thing.  When I have 
done this in the past, I'd do the date and time calculations in Perl and feed 
the result to whatever RDBMS I happen to be using (PostgreSQL, MySQL, MS SQL 
Server 2005).  I suppose that if it is nessary to do it within an SQL script, 
resort could be made to functions that in turm use the Perl packages.
   
  But a question: Why would any schema that includes temporal components need a 
calendar table?
   
  I use temporal components all the time and have yet to need a calendar table. 
 In fact, some of my database applications are multitemporal, keeping track of 
edits to data that correct or update data, so that during an audit script, one 
can determine what a decision maker knew at the time he made a decision.  This 
is so that a decision that was bad, but based on good data can be distinguished 
from a decision that had been based on bad data, but which would have been a 
good decision had the data been correct.  The first option warrants correction 
of the decision maker while the latter warrants examination of the data entry 
process.
   
  I have found my Perl scripts adequate for those few instances where use of my 
tyemporal data depended on a calendar.
   
  I am not arguing with you.  I just want to know in what circumstances my 
schemas can be improved by a calendar table, and how it provides a benefit over 
my more usual Perl functions.
   
  Cheers,
   
  Ted

Rich Shepard [EMAIL PROTECTED] wrote:
  On Mon, 30 Apr 2007, Brent Wood wrote:

 If I'm following this correctly, then interval  extract timepart can be
 used to provide all the required functionality:

Thanks, Brent. Your suggestions complete the approach I was considering.
There is no need for real-time response, to checking after each shift or day
-- or other time period -- will be sufficient.

I wonder if a workweek/holiday calendar table for PostgreSQL already
exists. If not I need to track down the procedure for creating one as Joe
Celko references such a calendar in his books. I think that any schema that
has temporal components needs such a table.

Rich

-- 
Richard B. Shepard, Ph.D. | The Environmental Permitting
Applied Ecosystem Services, Inc. | Accelerator(TM)
Voice: 503-667-4517 Fax: 503-667-8863

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Design / Implementation problem

2007-03-18 Thread Ted Byers

 
procedures.


Note, this should allow your customers to get their information almost 
instantly since the amount of data that would be processed for them would be 
very small.  If they want to know how many points they have, and even how 
many points they have that will expire in the next day, they're looking at 
dirt simple queries with at most one substraction.


Note, I did not spend the time to refine this to minimize the total 
computational load or the data storage requirements.  That could take days, 
or even weeks, depending on your attention to detail and how concerned you 
are about efficiency.  I am sure there are some computational efficiencies 
that can be gained, with additional analysis, perhaps with some tradeoffs 
regarding the detail of the data stored available for audit purposes, but 
I'll leave that as an exercise for you.  :-)


Note, I do not see how, from what you'd written, how your proposed solution 
would ensure that customers used their oldest points first.  Maybe it does, 
and you didn't describe that aspect well, but that is something you'll have 
to be careful about if you want to avoid upset customers.  Unit testing, 
followed by integration tests, are your friends!


HTH

Ted 




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [Bulk] Re: [GENERAL] quoted identifier behaviour

2007-03-14 Thread Ted Byers

 is broken and preserve what is not broken.


HTH

Ted 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Setting week starting day

2007-03-09 Thread Ted Byers


It is not hard to calculate, as you can see... but it would be nice if
date_trunc('week', date) could do that directly.  Even if it became
date_trunc('week', date, 4) or date_trunc('week', date, 'Wednesday') 
it

would be nice...  :-)  And that is what I was trying to ask ;-)


Use date_trunc('week', current_day + 1) and date_trunc('dow', current_day 
+ 1)

to have a one day offset from the standard first day of the week.



I believe there's more than that...  Probably the +1 should be outside 
the

date_trunc, anyway.  It might help, but I still see the need to to do
calculations...  Specially if it was Tuesday today...


Out of curiosity, why does the database need to know this, or to be able to 
calculate it?  There are lots of things that would be useful to me, if the 
RDBMS I'm using at the time supported them (particularly certain statistical 
functions - ANOVA, MANOVA, nonlinear least squares regression, time series 
analysis, c.), but given that I can readily obtain these from other 
software I use, and can if necessary put the requisite code in a middleware 
component, I would rather have the PostgreSQL developer's focus on issues 
central to having a good DB, such as ANSI standard compliance for SQL, or 
robust pooling, c. and just leave me a mechanism for calling functions that 
are external to the database for the extra stuff I need.  I would prefer a 
suite of applications that each does one thing well than a single 
application that does a mediocre job on everything it allegedly supports. 
What would be 'nice' and what is practical are often very different things. 
I know what you're after is simple, but remember the good folk responsible 
for PostgreSQL have only finite time available to work on it, and thus, when 
they're making choices about priorities, I'd rather they ignore even simple 
ancillary stuff and focus on what really matters.


I just recently finished a project in which the data processing needed 
information similar to what you're after, but instead of doing it in the 
database, we opted to do it in the Perl script I wrote that fed data to the 
database.  In fact, it wasn't so much the day of the week that mattered to 
the processing algorithm but the resulting dates for the immediately 
preceding business day and the immediately following business day.  It was 
those dates we fed to the database rather than the weekday.  There are 
several Perl packages (see CPAN) supporting this kind of calculation.  These 
are generally outstanding (and would probably be useful if you want to 
create your own stored function implemented in Perl), but you may have to 
customize them by providing additional configuration information such as 
timezone and statutory and religious holidays if you need to determine 
business days in addition to just the day of the week.  the day of the week 
can be obtained in Perl with a single function call!


I just took a quick break to read about the date functions available within 
PostgreSQL, and while apparently nice, you have much greater flexibility, 
and many more functions, in these Perl packages I mentioned.  If you just 
want a function call, I'd suggest you create a function that just dispatches 
a call to the Perl function that best meets your needs.  In a sense, you are 
not really rolling your own.  You're just dispatching the call to a function 
in a Perl package.


Cheers

Ted 




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Setting week starting day

2007-03-09 Thread Ted Byers

- Original Message - 
From: Omar Eljumaily [EMAIL PROTECTED]

To: Ted Byers [EMAIL PROTECTED]
Cc: pgsql-general@postgresql.org
Sent: Friday, March 09, 2007 5:00 PM
Subject: Re: [GENERAL] Setting week starting day

Ted, my reason for asking the question that I believe precipitated this 
thread was that I wanted a single sql statement that aggregated time data 
by week.  Yes, I could do the aggregation subsequently in my own client 
side code, but it's easier and less error prone to have it done by the 
server.

I  don't buy the suggestion that server side code is less error prone that 
client side code, but be that as it may, we're talking about a function that 
has one line of code.  And given what you just said, you don't want the day 
of the week, you want a function that returns the week of the year.  This 
can be had from the same Perl functions I mentioned before, with a minor 
alteration in how you call it.  my suggestion would be to create that one 
line function that invokes the relevant Perl function, which can then be 
invoked in your select statement (presumably with a group clause to avoid 
mixing data from different years).  It should take about ten to fifteen 
minutes to write and test?

Ted 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Data validation - was Re: [GENERAL] OT: Canadian Tax Database

2007-03-09 Thread Ted Byers

- Original Message - 
From: omar [EMAIL PROTECTED]

To: pgsql-general@postgresql.org
Sent: Friday, March 09, 2007 10:40 PM
Subject: Re: [GENERAL] OT: Canadian Tax Database

Tom, I promise this isn't a political statement, even though it's on the 
same thread.
I'm curious what people think about the following statement considering 
the database typing talk being brought up here.  My experience is that 
more times than not I have to put data validation in my client code even 
when it's available on the server, if for no other reason that users don't 
understand what foreign key violation, etc messages mean.  It begs the 
question of whether it's really necessary on the server or not.  SQLite 
seems to take the position that it isn't since there is no referential 
integrity and the following.  To be honest, there's a lot of power in the 
ability to view everything as a string, with of course proper data 
validation.

This risk of this is far too high.  Treating everything as a string is, 
IMHO, a very bad idea.

There are, especially for a web application, numerous forms of attack, so I 
routinely provide code for client side validation, server side validation 
(in a web app or in filters that process the data before providing it to 
whatever is going to do something useful with the data.  this includes 
designing stored procedures to receive, and validate, data before the data 
is stored in the database.  On the client side, the main benefit is to 
ensure the user doesn't miss anything that is necessary and that he enters 
valid data.   If the user is malicious, and wants to try a SQL injection 
attack, nothing you do on the client side can prevent him from creating his 
own version of your page bypassing all of your client side validation code. 
And it is possible for a scoundrel to try a man in the middle attack 
(intercepting a transaction mid stream and trying, e.g., a SQL injection 
attack).  So even with client side validation, server side validation is 
absolutely essential.  I like Perl for that, but it can be done in your 
favourite programming language.  And it can be done in .NET also, if you 
prefer.

Maybe I am paranoid, but whether I am writing code to be run at the very 
back end, or the very front end, or anywhere between the two, my preference 
is to validate the data that specific object has received before I do 
anything with it.  That is key in secure application development.  You 
generally assume that your system, and any component therein, can been 
compromised so you program on the assumption that it can be compromised 
somewhere and write code that minimizes or eliminates the damage that can be 
done if some component anywhere else in the system has been compromised. 
Just 'coz I'm paranoid doesn't mean they're not out to get me.  ;-)   I 
value really good system administrators who go the extra mile to make 
intranets and systems as secure as humanly possible, but as an application 
developer, I never assume they have not overlooked something.  Instead, I 
assume the opposite and that therefore, they got everything wrong and that 
the intranet and every server in it either has been compromised or will soon 
be compromised, and I therefore try to minimize the risk of damage or 
violation of data confidentiality or security in a network or on a system 
that has been compromised.  I know perfection is not possible, but I hope we 
can make it too expensive for a cyber criminal to get what he wants 
illegally.  If we make his cost greater than his potential return, he should 
rationally move on to easier targets.

Cheers

Ted 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] OT: Canadian Tax Database

2007-03-08 Thread Ted Byers

- Original Message - 
From: Joshua D. Drake [EMAIL PROTECTED]

To: Richard Huxton dev@archonet.com
Cc: pgsql-general@postgresql.org
Sent: Thursday, March 08, 2007 8:00 AM
Subject: Re: [GENERAL] OT: Canadian Tax Database

Richard Huxton wrote:

http://www.thestar.com/News/article/189175

For instance, in some cases the field for the social insurance number 
was instead filled in with a birth date.

Unbelievable. Sixty years of electronic computing, fifty years use in 
business and the professionals who built the tax system for a wealthy 
democratic country didn't use data types.

This is Unbelievable? This is commonplace.

And due at least in part to government (and other institutions operated by 
damned fools) opting for the least expensive provider rather than paying for 
someone who actually knows what they're doing.  Just as buying cheap junk 
always comes back to get you, hiring incompetent fools that don't know their 
ass from a hole in the ground will come back to get you too.

This time CRA is embarrassed, but they don't care because the people that 
suffer are the taxpayers who ultimately paid for such shoddy work in the 
first place.  There's no consequences for the bureaucratic peons really 
responsible for it.  They probably even get paid obscene sums in overtime 
for the time they spend fixing the problem.  More annoying, for me, are the 
scurrilous scoundrels that pass themselves off as competent software 
consultants who take advantage of such incompetence in their clients' staff. 
I couldn't begin to document all the cases I have seen where either the 
wrong software was used (imagine a spreadsheet being used as an RDBMS) or 
the right software was grossly abused (imagine forcing a data entry clerk to 
enter the same data four times because the developer was too damned lazy or 
incompetent to develop a simple form to collect the data once and then 
submit it to the four externally owned databases that needed to be queried 
using it, and then having to manually collate the results returned from the 
queries).  And then businesses operated by capable folk get burned by such 
incompetent and unethical scoundrels and swear off custom software because 
they'd rather have a COTS product that gives a 80% fit than try for a 100% 
fit with a custom product that in the end doesn't work at all.  I have been 
told by some of these folk that they have found it virtually impossible to 
find capable software developers.  This is because these scoundrels I 
mention outnumber capable developers by several orders of magnitude (and the 
current state of the curricula at colleges 'training' programmers doesn't 
help).

It is s easy to get cynical, and very discouraged, when I think about 
this.  :-(  Maybe I should have myself lobotomized and become one of the 
mindless grunts at Canada post.

Cheers

Ted 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] OT: Canadian Tax Database

2007-03-08 Thread Ted Byers



And due at least in part to government (and other institutions operated 
by
damned fools) opting for the least expensive provider rather than paying 
for

someone who actually knows what they're doing.  Just as buying cheap junk
always comes back to get you, hiring incompetent fools that don't know 
their

ass from a hole in the ground will come back to get you too.


What you describe is a hundred times better than the reality... most of
them actually get _expensive_ junk with some kick-back ;-)


I concede.

You're right.

I recall being told by one project manager I knew years ago who had an 
opportunity to create a bid for an RFP issued by Transport Canada (long long 
ago).  He refuse, so his employer prepared the bid.  He refused because the 
RFP was a joke.  There were absolutely no functional requirements, nor 
non-functional requirements, identified in the RFP.  His company didn't get 
the contract, but being a bidder they did see the winning bid.  It was just 
as ludicrous!  It, too, failed to identify any requirements, and so it did 
not actually promise to deliver anything, working or not!  They would have 
satisfied the terms of their contract if, after a few years, and hundreds of 
man-years, they walked away without delivering anything.  That tragedy cost 
Canada hundreds of millions, if not billions, of dollars (I don't know if 
any final accounting was ever done - that would be opposed by the civil 
servants responsible lest they should be criticised for their 
incompetence), and ultimately nothing was delivered because the next elected 
government cancelled the project, refusing to through more money into their 
opposition's money pit - they prefer, of course, to through it into money 
pits created by their political supporters. The decisions to create the 
project, and to cancel it, were political, but the incompetence really 
responsible for it was lower done within the ranks of the civil service. 
The project could have delivered something good had the civil servants 
involved been competent!  Similar nonsense happened with the firearms 
registry.  For most of the early history of it, the software systems used 
where so bad, and inappropriate, that the clerks that had to use it could 
have been ten times more productive if they had the use of properly designed 
and implemented software.  I can not understand how it became so 
outrageously expensive when the real needs for it were so simple (relative 
to products I have worked on).  I'll bet real, genuinely capable, software 
engineers could have delivered a gold and platinum plated product for just a 
few million dollars; nothing really relative to what it ended up costing us.


I know contractors that refuse to do business with the government because of 
this sort of nonsense, and I know, from discussions with ex-civil servants, 
that such incompetence is the norm in government.  I know engineers who have 
been burned by government by investing a fortune in new products or 
services, and then educating relevant civil servants WRT to the new science 
or technology involved, only to find these same civil servants give 
contracts to provide the new product or service to incompetent bozos who 
didn't know the first thing about it.  They just happened to be cheaper. 
Why waste time and money developing a product or service that is really 
relevant only to government when the risk of such unethical conduct by 
government is so high?


I don't support anyone out there can describe a project or three where 
things were done right, to provide a cure for the depressing and 
discouraging nature of what this thread has turned out to be?


Cheers

Ted 




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] OT: Canadian Tax Database

2007-03-08 Thread Ted Byers

- Original Message - 
From: Alan Hodgson [EMAIL PROTECTED]

To: pgsql-general@postgresql.org
Sent: Thursday, March 08, 2007 11:32 AM
Subject: Re: [GENERAL] OT: Canadian Tax Database

On Thursday 08 March 2007 08:15, Ted Byers [EMAIL PROTECTED] 
wrote:

They would have satisfied the terms of their contract
if, after a few years, and hundreds of man-years, they walked away
without delivering anything.  That tragedy cost Canada hundreds of
millions, if not billions, of dollars

It didn't happen to be a gun owners' registry, perhaps?

(fellow Canadians will understand :)

No.  This predated that fiasco by more than ten years.  In fact, had it been 
done right, it would have been a much much larger project than the registry.

Ted 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] OT: Canadian Tax Database

2007-03-08 Thread Ted Byers


 Richard Huxton wrote:
 http://www.thestar.com/News/article/189175

 For instance, in some cases the field for the social insurance number
 was instead filled in with a birth date.

 Unbelievable. Sixty years of electronic computing, fifty years use in
 business and the professionals who built the tax system for a 
 wealthy

 democratic country didn't use data types.

 This is Unbelievable? This is commonplace.

And due at least in part to government (and other institutions operated 
by
damned fools) opting for the least expensive provider rather than paying 
for

someone who actually knows what they're doing.  Just as buying cheap junk
always comes back to get you, hiring incompetent fools that don't know 
their

ass from a hole in the ground will come back to get you too.


Come on, they don't hire incompetent fools. The hire the people


You CAN'T be serious!  Have you ever dealt with them or with the 
consequences of their incompetence?



they need to fill their quota regardless of how well trained
and experienced they are. I am not saying that non white males
are in any way less competent than white males, but by removing
them from the pool does not make things better. The biggest
problem with quotas is not hiring less qualified staff, it is
that less qualified staff know why they were hired and know that
they are very unlikely to be fired, so they have little incentive
to work hard or attempt to do their best, they can always fail
upwards.

What does this have to do with anything?  No one here, except you, has said 
anything about the profile of the people involved WRT race, gender, 
religion, c.  Nor has anyone said anything about qualifications.  The 
only thing that has been said is that, based on what is seen in the work, 
the people responsible for that work must be incompetent.  It is an 
inference based on what is seen in what has been done and has nothing to do 
with any of the prohibited grounds for discrimination used as excuses for 
affirmative action.  And yes, I have seen cases where less qualified, even 
unqualified, people have been hired as a result of these affirmative action 
initiatives (and I have been told, by HR personelle in government, that 
certain favoured groups are deemed to be superior to white men, even if the 
favoured party has no education nor experience and the latter have earned 
doctorates and decades of experience), but no one has said anything about 
such people being employed on the projects to which I referred.  But this is 
an aspect of our present society that is bound to degenerate into a flame 
war, launched by the politically correct, so we ought to say little, or even 
leave it alone.  Those in power tend to be vicious, especially when there 
are no effective checks on their conduct and no consequences for what they 
do.


Cheers

Ted 




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] security permissions for functions

2007-03-08 Thread Ted Byers

- Original Message - 
From: Bill Moran [EMAIL PROTECTED]

To: Rikard Pavelic [EMAIL PROTECTED]
Cc: Tom Lane [EMAIL PROTECTED]; pgsql-general@postgresql.org
Sent: Thursday, March 08, 2007 3:18 PM
Subject: Re: [GENERAL] security permissions for functions

In response to Rikard Pavelic [EMAIL PROTECTED]:

Tom Lane wrote:
 No, it's operating as designed. Per the GRANT reference page:
 : Depending on the type of object, the initial default privileges may
 : include granting some privileges to PUBLIC. The default is no public
 : access for tables, schemas, and tablespaces; CONNECT privilege and 
 TEMP
 : table creation privilege for databases; EXECUTE privilege for 
 functions;
 : and USAGE privilege for languages. The object owner may of course 
 revoke

 : these privileges. (For maximum security, issue the REVOKE in the same
 : transaction that creates the object; then there is no window in which
 : another user may use the object.)

This seems clear enough.

 You'll need to revoke the default public EXECUTE privilege on any
 functions you don't want to be callable.

As does this.

Hmm, so the answer to my question
How can I assign execute permission to a role for a single function
inside schema.
is I can't?

How did you interpret do it like this to mean you can't do it?

I too can not understand how he came to this conclusion, unless it wasn't 
obvious to him how to grant certain permissions to roles.

I am curious, though.  I shape my understanding of this using a metaphore of 
private, protected and public member functions in C++ classes.  There is, of 
course, no point in having a function that can't be called under any 
circumstances, but it is often useful to have a variety of protected and 
private functions (and other members) that can only be called by other 
member functions or member frunctions of derived classes.  Does the 
permission granting procedure for functions work in a similar fashion?  Can 
I make a function as a part of a schema that is executable only by the owner 
and other functions in the schema, and no-one else, and still have a 
publically callable function in that schema invoke the private function? 
Or is my C++ based understanding leading me astray here?

REVOKE ALL ON function name FROM PUBLIC;

So this basically means that I can't fine tune the permissions through
functions, but I
can through views and tables?
This looks like a bug in design to me ;(

Relax.  You (somehow) misunderstood Tom.

I wonder if he's worried about granting permissions to roles or groups 
rather than to individual users.  I mean the obvious statement, for the fine 
tuning he appears to me to want to do, would be to follow the REVOKE 
statement you show with a GRANT statement for a specific user.At least 
that is what I'd infer from what you and Tom wrote.  Did I misunderstand 
either of you, or what Rikard is worried about?

Ted

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] security permissions for functions

2007-03-08 Thread Ted Byers



Functions are controlled by the same ACL mechanism that tables and 
everything
else follows.  Thus you have the idea of user id X may do Y with object 
Z

i.e. user barbara may execute function somefunction().

But there's no real way to alter those permissions outside of changing the
user ID context.



So, I should be able to have user barbara execute function 
somefunction(), but, though barbara must not have access of object alpha 
lets say for data security reasons (and user sarah does), I could have 
function somefunction invoke another function that stores information about 
barbara's action to object alpha by changing user context temporarily and 
without barbara's knowledge; basically saying within function 
somefunction() something like execute function 'someotherfunction()' 
impersonating sarah and stop impersonating sarah once someotherfunction 
returns.  Much like the way I can log in to Windows or Linux as one user and 
temporarily impersonate another while executing a particular program or 
administrative function (e,g, log into Linux as a mere mortal, start a bash 
shell providing credentials for an admin account, do my admin type stuff and 
then close the shell).


Or have I misunderstood you here WRT user ID context?

Ted 




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] php professional

2007-02-22 Thread Ted Byers



Mark Similar issues with Mysql.  It's faster,

But it doesn't matter *how* fast you get the *wrong* answer. :)

I thought one of the first rules of software engineering was First make it 
right and only then make it fast!


Granted, most of my experience has more to do with number crunching and 
scientific computing, but for the past 27 years, I always first ensured the 
code was provably correct before worrying about optimization.  And this 
always involved a test suite that applied every possible mathematical test 
with randomly generated test data.  For example, my code for matrix algebra 
problems generally came with test code and driver that would exercise the 
code with tens of millions of random matrices and vectors.  One failure, 
even in millions of test cases, means the code is somehow broken.  I have 
seen, and fixed, such cases before.  One of my problems with database 
development is how to construct analogously strong test cases in order to 
prove the code correct.  And it seems to get harder as the database gets 
larger.  :-(  In other words, I know how to do this with algebra and 
calculus, and C++/Java/FORTRAN programming, but not with sets and SQL 
programming.


Cheers,

Ted 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] backup database by cloning itself

2007-02-15 Thread Ted

On Feb 14, 2:14 am, filippo [EMAIL PROTECTED] wrote:
 On 13 Feb, 14:54, filippo [EMAIL PROTECTED] wrote:

  My target is to have the backup operation not affecting the users, so
  I want to be able to copy a database even if the database is used by
  someone.

 I could use pg_dump/pg_restore. pg_dump doesn't have to have exclusive
 access to database to perform the operation. My only problem is that
 pg_dump create a backup on a file, the best to me whould be to have a
 perfect clone (users/ data etc) of original database ready to be used
 just after the cloning. Is it possible?

 Thanks,

 Filippo

Well, I could see you writing a client application that creates a
clone by first recreating all the schemas in your database and then
copying the data to the clone, and probably quite a bit more,  In such
a case, since you have absolute control over your client code, you can
do anything you want.  I am not sure, though, that that is the best
use of your time and hardware resources, especially if all you're
after is a backup.  Just think of all the overhead involved in
creating a new clone, and everything that implies, every hour.

But why not further explore your backup options if all you're
concerned about is a reliable backup.  You may find 23.3. On-line
backup and point-in-time recovery (PITR) in the postgresql
documentation useful.  You haven't given any information about why it
might not be appropriate in your situation.  If you're really doing
what it looks to me like you're doing, then you may be in the
beginning stages of reinventing Postgresql's PITR capability.

The builtin support for PITR in Postgresql strikes me as sufficient
for what you say you need.  If you require more, which would imply you
want more than the simple backup you say you're after, then defining a
suitable suite of triggers and audit tables may serve.  Neither should
adversely affect your users. especially if your database is not very
big .

HTH

Ted


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [GENERAL] Having a problem with my stored procedure

2007-02-14 Thread Ted

 As suggested earlier, it is probably cleaner to define separate triggers
 on insert and on update. That is possible, but they can't have the same
 names.
 You probably want to name them accordingly too, or you'll get naming
 conflicts.

 I suggest:
 DROP TRIGGER archive_articles ON news_content;

While this is something I'd do if I had absolute control over all the
software, in the OP's case, there is a third party application
involved that can't be modified.  I would therefore worry that the
developer of that software may have also placed triggers of that name
on that table.  This would suggest a little slop in the practices of
that developer (i.e. of the third party application, not the OP) since
a decent naming convention would make a name collision between that
developer's code and the OP's code highly unlikely, but that is
another issue.  If the OP is getting name collision when trying to
create these triggers, the implication is that the developer of the
third party app in fact defined triggers of the same names, so
dropping previously created triggers may well break that app.  It
seems to me that dropping something I haven't created is a high risk
action.

A naming convention similar to what I use would solve that problem
without the risk associated with dropping something someone else has
developed.  If I am working on something to be distributed, I use a
naming scheme that prepends a very short string that makes it clear
the code was developed by myself or one of my staff, and in languages
that support a namespace, such as C++, I make certain there is a
namespace ID unique to my organization.  This eliminates the risk of a
name collision unless some developer actually tries to impersonate
me.  In some cases, where I am working as part of a team, my
preference is to do the same with the developer's ID (since always the
developer who developed a given peice of code is responsible for
fixing any bugs in it whever possible).  Of course, always the fact is
fully documented, both in the code and in design documents provided to
the client.  This is a discipline I impose on myself, as a courtesy to
those who come after me, and it involves considerations any library
developer necessarily worries about.  It is not something I want to
impose on those who come after me, but which I would require of those
who develop libraries or databases or tools I need to use in order to
be productive.

Damn.  It just occured to me that the OP had to be able to see the
structure of the DB using a tool like pgAdmin, in order to just get
the names of the tables and columns.  Therefore, the OP should have
also been able to see the definitions of any existing triggers and
trigger functions.  Further, the OP should be able to create more
functions that could be called at the end of any existing trigger
functions, thereby obtaining the functionality desired without
compromizing the third party app.  You can edit these functions from
within pgAdmin.  I have done so myself on trigger functions I created
myself.  This would make the OP's task almost trivially simple.  Don't
you just hate when you see the obvious solution only after spending
time on other options?  ;-)

Cheers,

Ted


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [GENERAL] backup database by cloning itself

2007-02-14 Thread Ted Byers


Maybe his real goal all the backups readily available to be read by
my program (opening the backup read only) is to have a historical
record of what certain records looked like in the past.

There are other ways of doing that, though.



If your speculation is right, perhaps the OP ought to explain a little more 
fully why he needs 24 snapshots a day, or indeed any at all.


It seems to me that if you really want a historical record of what certain 
tables looked like in the past, it would be smarter and more accurate to 
create triggers, for each possible operation, that store the relevant 
details in an audit table including especially who made the edits and when. 
This strikes me as being much less work than developing code that processes 
so many backups.


Cheers

Ted 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] backup database by cloning itself

2007-02-14 Thread Ted Byers

- Original Message - 
From: Ron Johnson [EMAIL PROTECTED]

To: pgsql-general@postgresql.org
Sent: Wednesday, February 14, 2007 3:46 PM
Subject: Re: [GENERAL] backup database by cloning itself

It seems to me that if you really want a historical record of what
certain tables looked like in the past, it would be smarter and more
accurate to create triggers, for each possible operation, that store the
relevant details in an audit table including especially who made the
edits and when. This strikes me as being much less work than developing
code that processes so many backups.

I dunno about that.  We use triggers to populate log tables that get
extracted, truncated then loaded into a reporting db every night.
Because of the night time batch cycle, there is no quiescent time to
do this, so we have 2 log tables, and the triggers alternate which
log table to insert into, depending on whether the day number is
even or odd.

That's in addition to the trigger logic to insert into history tables.

It's a royal pain.

Sounds painful, but not as painful as trying to do the same thing with a 
suite of backups and client code to read the backups.  Or maybe the latter 
task is easier than it looks at first glance.

Doesn't the amount of pain depend on the amount of traffic your servers see, 
the capability of the hardware you can devote to the task, what you're doing 
with the history tables, c.?  When I've worked on this kind of problem, 
everything was in a single DB. The traffic, though, was low enough that for 
daily reports a simple view of the log selecting only the previous day's 
data, was sufficient, but only occasionally used.  The principal reason for 
the log was to support accountability, to know who made changes, and when, 
and who knew what when.  Historical reconstructions were required, e.g., 
only when something went awry and there was a need to know if a good 
decision was made based on bad data or a bad decision was made based on good 
data, or, e.g., during an audit of the business processes and decision 
support systems.

Cheers,

Ted 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Having a problem with my stored procedure

2007-02-13 Thread Ted Byers

Would it not be simpler to just create two trigger functions, one that acts on 
insert operations and a second that acts on update operations?  A 30 second 
glance at the Postgresql documentation showed me that it is possible to have 
more than one row level trigger for a given table, which implies the simpler 
options is possible.  This would make for a much simpler design and avoid a 
conditional block that would then be unnecessary.  This extra cost is, of 
course, trivial if only a handful of records are modified or created, but if 
the number is large, it could become significant.  Or is there something in how 
an RDBMS handles triggers that would make it preferable to have a single 
trigger for all possible operations on a record?  Something an old C++ 
programmer would miss if not informed about the peculiarities of database 
development.  Did I miss something critical?  My usual approach is to have 
functions remain as simple as practicable and do only one thing, unless there 
is a very good reason to have them more complex (in which a driver function 
that calls a number of simple functions may be preferable to one that tries to 
do everything).  Simple functions are easy to validate, and once validated make 
validation of more complex driver functions easier.

Why bother with so many temporaries?  Isn't that a waste of both development 
time (lots of extra typing and opportunity for errors such as typos) and 
runtime CPU cycles?  Why not just insert or update values directly from the NEW 
or OLD record into the target table rather than copying the values first into 
the temporaries and then from the temporaries into their final destination?

HTH

Ted
  - Original Message - 
  From: William Leite Araújo 
  To: Laura McCord 
  Cc: pgsql-general@postgresql.org 
  Sent: Tuesday, February 13, 2007 12:19 PM
  Subject: Re: [GENERAL] Having a problem with my stored procedure


  2007/2/13, Laura McCord [EMAIL PROTECTED]:
To make a long story short, I am archiving data from an original table
to a table I created. This is a third party web application that I am
doing this with, so I can't revise the structure/code of this
application. With this said, if the original table goes through an 
insert or update action I want to replicate the information to my
archive table. I don't want to delete any articles from my archive
table so this is why I am not wanting to do anything based on a delete
action.

The only problem that I am facing is how to tell the function that I want 
to perform an update if an update occurred and an insert if an insert action 
occurred. I want to have different actions occur depending on if the trigger 
was based on an insert or update. 

Help, I've been stumped for two days.
Thanks in advance.

This is what I have so far:
CREATE TRIGGER archive_articles
AFTER INSERT OR UPDATE ON
news_content
EXECUTE PROCEDURE su_archive_articles(); 



CREATE OR REPLACE FUNCTION su_archive_articles()
RETURNS TRIGGER
LANGUAGE plpgsql
AS '
DECLARE
tmp_news_id CHARACTER varying(48);
tmp_title CHARACTER varying(100);
tmp_abstract CHARACTER varying(300); 
tmp_news_story TEXT;
tmp_topic_id CHARACTER varying(10);
tmp_create_date DATE;
tmp_author CHARACTER varying(50);
tmp_begin_date DATE;
tmp_end_date DATE;
tmp_priority CHARACTER(1);
tmp_image_name CHARACTER varying(512); 
tmp_image_mime_type CHARACTER varying(50);
tmp_layout_type CHARACTER varying(10);

BEGIN
SELECT INTO  tmp_news_id news_id from news_content where 
last_inserted(news_id);
SELECT INTO  tmp_title title from news_content where 
last_inserted(news_id); 
SELECT INTO  tmp_abstract abstract from news_content where 
last_inserted(news_id);
SELECT INTO  tmp_news_story news_story from news_content where 
last_inserted(news_id);
SELECT INTO  tmp_topic_id topic_id from news_content where 
last_inserted(news_id); 
SELECT INTO  tmp_create_date create_date from news_content where 
last_inserted(news_id);
SELECT INTO  tmp_author author from news_content where 
last_inserted(news_id);
SELECT INTO  tmp_begin_date begin_date from news_content where 
last_inserted(news_id); 
SELECT INTO  tmp_end_date end_date from news_content where 
last_inserted(news_id);
SELECT INTO  tmp_priority priority from news_content where 
last_inserted(news_id);
SELECT INTO  tmp_image_name image_name from news_content where 
last_inserted(news_id); 
SELECT INTO  tmp_image_mime_type image_mime_type from news_content where 
last_inserted(news_id);
SELECT INTO  tmp_layout_type layout_type from news_content where 
last_inserted(news_id);

 IF TG_OP = 'INSERT' THEN 



//This is to be done if an INSERT action was done on the table

INSERT INTO su_archives(news_id, title, abstract, news_story,
topic_id, create_date, author, begin_date, end_date, priority,
image_name

Re: [GENERAL] Having a problem with my stored procedure

2007-02-13 Thread Ted

On Feb 13, 1:20 pm, [EMAIL PROTECTED] (Laura McCord) wrote:
 I tried doing two different triggers as you suggested but I kept getting
 an error stating:

 psql:archive_news_articles.sql:75: ERROR:  trigger archive_articles
 for relation news_content already exists
 psql:archive_news_articles.sql:80: ERROR:  trigger update_archives for
 relation news_content already exists

 So, I thought perhaps it couldn't be done.





 Ted Byers wrote:
  Would it not be simpler to just create two trigger functions, one that
  acts on insert operations and a second that acts on update
  operations?  A 30 second glance at the Postgresql documentation showed
  me that it is possible to have more than one row level trigger for a
  given table, which implies the simpler options is possible.  This
  would make for a much simpler design and avoid a conditional block
  that would then be unnecessary.  This extra cost is, of course,
  trivial if only a handful of records are modified or created, but if
  the number is large, it could become significant.  Or is there
  something in how an RDBMS handles triggers that would make it
  preferable to have a single trigger for all possible operations on a
  record?  Something an old C++ programmer would miss if not informed
  about the peculiarities of database development.  Did I miss something
  critical?  My usual approach is to have functions remain as simple as
  practicable and do only one thing, unless there is a very good reason
  to have them more complex (in which a driver function that calls a
  number of simple functions may be preferable to one that tries to do
  everything).  Simple functions are easy to validate, and once
  validated make validation of more complex driver functions easier.

  Why bother with so many temporaries?  Isn't that a waste of both
  development time (lots of extra typing and opportunity for errors such
  as typos) and runtime CPU cycles?  Why not just insert or update
  values directly from the NEW or OLD record into the target table
  rather than copying the values first into the temporaries and then
  from the temporaries into their final destination?

  HTH

  Ted

  - Original Message -
  *From:* William Leite Araújo mailto:[EMAIL PROTECTED]
  *To:* Laura McCord mailto:[EMAIL PROTECTED]
  *Cc:* [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED]
  *Sent:* Tuesday, February 13, 2007 12:19 PM
  *Subject:* Re: [GENERAL] Having a problem with my stored procedure

  2007/2/13, Laura McCord [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED]:

  To make a long story short, I am archiving data from an
  original table
  to a table I created. This is a third party web application
  that I am
  doing this with, so I can't revise the structure/code of this
  application. With this said, if the original table goes
  through an
  insert or update action I want to replicate the information to my
  archive table. I don't want to delete any articles from my archive
  table so this is why I am not wanting to do anything based on
  a delete
  action.

  The only problem that I am facing is how to tell the function
  that I want to perform an update if an update occurred and an
  insert if an insert action occurred. I want to have different
  actions occur depending on if the trigger was based on an
  insert or update.

  Help, I've been stumped for two days.
  Thanks in advance.

  This is what I have so far:
  CREATE TRIGGER archive_articles
  AFTER INSERT OR UPDATE ON
  news_content
  EXECUTE PROCEDURE su_archive_articles();

  CREATE OR REPLACE FUNCTION su_archive_articles()
  RETURNS TRIGGER
  LANGUAGE plpgsql
  AS '
  DECLARE
  tmp_news_id CHARACTER varying(48);
  tmp_title CHARACTER varying(100);
  tmp_abstract CHARACTER varying(300);
  tmp_news_story TEXT;
  tmp_topic_id CHARACTER varying(10);
  tmp_create_date DATE;
  tmp_author CHARACTER varying(50);
  tmp_begin_date DATE;
  tmp_end_date DATE;
  tmp_priority CHARACTER(1);
  tmp_image_name CHARACTER varying(512);
  tmp_image_mime_type CHARACTER varying(50);
  tmp_layout_type CHARACTER varying(10);

  BEGIN
  SELECT INTO  tmp_news_id news_id from news_content where
  last_inserted(news_id);
  SELECT INTO  tmp_title title from news_content where
  last_inserted(news_id);
  SELECT INTO  tmp_abstract abstract from news_content where
  last_inserted(news_id);
  SELECT INTO  tmp_news_story news_story from news_content where
  last_inserted(news_id);
  SELECT INTO  tmp_topic_id topic_id from news_content where
  last_inserted(news_id);
  SELECT

Re: [GENERAL] Having a problem with my stored procedure

2007-02-13 Thread Ted

On Feb 13, 1:45 pm, [EMAIL PROTECTED] (Laura McCord) wrote:
 About your last comment, I can't do any revisions of the third party
 application where the inserts and updates are occurring. Plus, this
 whole idea came from a workaround based on a glitch in the software
 where the expiration of articles is not occurring ,therefore I have to
 do a delete articles to prevent them from being displayed on the web
 interface. Also, I don't entirely want to get rid of them completely and
 that is why I am saving records in an archive table to be used in
 another application that I need to write in the future. So, this is the
 reason for the redundancy.it's a long story.

 -Laura


I don't think I was suggesting you do anything with the original
application.  Rather, I was suggesting you alter what you're were
already trying to do.  You can apparently create as many trigger
functions as you like (implied by what I have read in the PostgreSQL
documentation), presumably without name collisions though, and you
have absolute control of what you do within the functions you develop,
as long as it is syntactically correct.  So far, I see nothing
preventing you from writing two trigger functions or requiring you to
use so many temporaries.   How does not touching the third party
application affect the triggers you add to the database?

Cheers

Ted


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [GENERAL] Postgres SQL Syntax

2007-02-09 Thread Ted Byers

 to assume that the tools we like now will still exist ten 
years from now.  Anyone remember IBM's OS/2? That was my favourite OS, and 
it was arguably better than any flavour of Windows available at the time. 
But even though I used it exclusively ten years ago, it is no longer a 
viable option for any machine I now use.  I am sure anyone reading this who 
is of my generation could produce a long list of products that they used and 
liked which no longer exist or which are now no longer commercially viable. 
C++ is still a good option for many kinds of application development, but I 
don't think there are very many people using Watcom's compiler to produce 
commercial applications.  SQL will likely exist ten years from now, but will 
our favourite RDBMS?  I won't make that prediction, but I think I can 
minimize the risk by creating SQL scripts that use the highest common factor 
across the RDBMS products I have at my disposal.  Therefore, I should be 
able to use my scripts, with minimal pain, regardless of what RDBMS products 
are available at that time.


I do much the same for my C++ coding.  I try to use the highest common 
factor in the language, as implemented by the suite of compilers I use, and 
keep the implementation specific stuff to a minimum in completely separate 
compilation units (and even in different directories).  I just see it as 
inevitable that significant refactoring will be required, especially when 
migrating from one suite of tools to another, so I plan an architecture for 
my code that should make it as easy as possible, and similarly, for moving 
data around, I developed my approach to be as easy and reliable as possible, 
even if it is not the most elegant or efficient.


Cheers,

Ted 




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] SQL textbook

2007-02-08 Thread Ted Byers

- Original Message - 
From: Michael Schmidt

To: Jan Mura ; pgsql-general@postgresql.org
Sent: Thursday, February 08, 2007 12:54 PM
Subject: Re: [GENERAL] SQL textbook

Many books are available for download.  Joe Celko is a well respected 
author and just about anything with his name on it is good.  Here are some 
specific suggestions.

Do you have any URLs for the books that are available for download?

Cheers

Ted 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [GENERAL] Problem with result ordering

2007-01-26 Thread Ted Byers


Hi Ted,



Hi Thorsten,



Am Donnerstag, 25. Januar 2007 19:53 schrieb Ted Byers:
The question I'd ask before offering a solution is, Does the order of 
the

id data matter, or is it a question of having all the results for a given
id together before proceeding to the next id?  The answer to this will
determine whether or not adding either a group by clause or an order by
clause will help.

Is there a reason you client app doesn't submit a simple select for each 
of
the required ids?  You'd have to do some checking to see whether it pays 
to

have the ordering or grouping operation handled on the server or client.
Other options to consider, perhaps affecting performance and security,
would be parameterized queries or stored procedures.


Yes, the reason is, that a large list of ids are generated from a users
request, coming from outside our application (for example as an EXCEL 
sheet),

and the output msut be ordered by the ordering in this list.
Surely we can handle this in our client application (java code), but I 
think
it would be more sexy to have this possibility in the database logic, 
since
our client-application should not do much more than retrieve data from the 
db

and then present it.



To be honest, your rationale here makes no sense to me, business or 
otherwise.  I think I'd be remiss if I didn't tell you this.  Of course, 
what you do is up to you, but I never concern myself with what is 'more 
sexy' when designing a distributed application.  I can see a number of 
situations in which your approach would result in terrible performance.  If 
you have a lot of users, and you're putting all the workload on your data 
server, all the client apps will end up spending a lot of time waiting for 
the server to do its thing.  Ordering the display of data, while it can be 
helped by the database, is really a presentation issue and IMHO the best 
place for that, by default, is the client app (i.e. do it server side only 
if there is a very good reason to do so).


If I understand you correctly, you have a java application as the client 
app, and it receives your users' ID data, from whatever source.  I'd assume, 
and hope, that you have built code to read the IDs from sources like your 
Excel spreadsheet, but that doesn't matter that much.  For the purpose of 
this exchange, it wouldn't matter if you made your clients enter the data 
manually (though IMHO it would be sadistic to make users manually enter a 
long list of values when you can as easily have the program read them from a 
file of whatever format).  The point it that you say our client-application 
should not do much more than retrieve data from the db and then present it, 
and this implies you get the IDs into your client application.  You say 
you're getting a large list of ids coming from outside our application. 
If your database is large, and your list of IDs is long, you may be taking a 
huge performance hit by making the database perform either an ordering or 
grouping that both would be unnecessary if you constructed a series of 
simple parameterized queries in your client app and executed them in the 
order you desire.  Whether or not this matters will depend on just how large 
your large is, how many simultaneous users there'd be, and how powerful the 
server hardware is (but I'd be quite upset if one of my developers made me 
pay more for a faster server just because he or she thought it would be sexy 
to do all the work on the server rather than the client).


Given what you've just said, if I were in your place, I'd be addressing the 
ordering issues in the client java application first, and then look at doing 
it in the database only if doing this work in the client app presented 
problems that could be addressed by doing the work on the server.  Equally 
importantly, if there is a reason to not take the obvious and simple option, 
I'd be doing some performance evaluation based on enough test data to 
reasonably simulate real world usage so that I'd have hard data on which 
option is to be preferred.



Thanks for your comments
Thorsten



You're welcome.  I hope you find this useful.

Ted

--
CappuccinoSoft Business Systems
Hamburg

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/





---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/

Re: [GENERAL] Problem with result ordering

2007-01-25 Thread Ted Byers

The question I'd ask before offering a solution is, Does the order of the 
id data matter, or is it a question of having all the results for a given id 
together before proceeding to the next id?  The answer to this will 
determine whether or not adding either a group by clause or an order by 
clause will help.


Is there a reason you client app doesn't submit a simple select for each of 
the required ids?  You'd have to do some checking to see whether it pays to 
have the ordering or grouping operation handled on the server or client. 
Other options to consider, perhaps affecting performance and security, would 
be parameterized queries or stored procedures.


Much depends on the design and implementation of your client app.  I know, 
e.g., that in ASP.NET 2, and later, you can handle multiple resultsets from 
a single datasource, so a trivially simple SQL script that consists of the 
simplest SELECT statements might be a viable option.  But it is hard to 
advise since you don't say if you have access to or control over the source 
code for the client app or what it is written in.


In my experience, I always have to run some benchmarks for a given 
distributed application to figure out how best to distribute the workload, 
and there are always plenty of different ways to do things, with often big 
differences in performance and security.  It seems never to be trivial to 
figure this out without some testing before a final decision.  I can never 
just assume that it is best to do all the processing in the RDBMS backend to 
my apps.


HTH

Ted

- Original Message - 
From: Thorsten Körner [EMAIL PROTECTED]

To: pgsql-general@postgresql.org
Sent: Thursday, January 25, 2007 10:45 AM
Subject: [GENERAL] Problem with result ordering



Hi,

when I fire the following query:
select m_id, m_u_id, m_title, m_rating from tablename where m_id in 
(26250,

11042, 16279, 42197, 672089);

I will get the same results in the same order, as in in the next query:
select m_id, m_u_id, m_title, m_rating from tablename where m_id in
(11042,42197,672089,26250,16279);

I wonder, how it is possible, to retrieve the results in the same order, 
as

queried in the list. The listed IDs are from an application outside the
database.

Version is PostgreSQL 8.2.1

Has anyone an idea, how to do this, while PostgreSQL knows nothing about
hints, like oracle does?

THX,
Thorsten

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] MSSQL/ASP migration

2007-01-22 Thread Ted Byers

Is the original application ASP or SP.NET?  It makes a difference, 
particularly if it was developed to take advantage of ASP.NET 2.  It might 
conceivably be ASP.NET 3, but since that is brand new I can't see anyone 
paying to replace an ASP.NET 3 application that was just created.  If it is 
ASP.NET 2, and you can't find a PostgreSQL provider class, your simplest 
approach after migrating the data might be to write your own provider (check 
a recent, decent reference on ASP.NET 2 for details - there are several). 
OTOH, if it is ASP.NET 1.x or the earlier ASP, your planned conversion to 
PHP might be worth comparing to developing it de novo with ASP.NET 3.


I am not an MS advocate, and I don't like tying myself to one vendor, but 
for obvious commercial reasons I have to be aware of the options including 
MS options.  I recently, therefore, started studying all things .NET, and 
comparing to other application frameworks I've worked with, MS seems to have 
done a decent job with ASP.NET 2 and 3.  Therefore, if I have a client 
running mostly MS software, and time is of the essence, I would probably 
make .NET, ASP.NET3 or a Windows .NET app, as the case may be, my first 
choice; that is unless I find a public domain framework in Perl or PHP that 
is competitive with .NET.  That said, I've not had an opportunity to see how 
it performs in a production setting, so YMMV.


HTH

Ted

- Original Message - 
From: Robert Fitzpatrick [EMAIL PROTECTED]

To: PostgreSQL pgsql-general@postgresql.org
Sent: Monday, January 22, 2007 11:05 AM
Subject: [GENERAL] MSSQL/ASP migration



I have a customer who is wants to migrate his MSSQL database to
PostgreSQL and we'll replace his application ASP with PHP. The issues
should be limited as there are no stored procedures or triggers in
MSSQL, just structure and data should be all that is needed to migrate.
I have never migrated from MSSQL or to PostgreSQL, but have handled
database migration in the past for other DB's. I know there is
mssql2pgsql script out there somewhere and I find lots of info on the
subject. Of course, when we rebuild his application, some db structure
will change, I was planning to do all the changes to structure after a
successful migration of his current structure now w/o data. After the
new application is done, then create a migration path for the data. Is
that the best way to handle these types of migrations? The customer will
want to continue working on the old system during the several months of
development in the new. Any docs or other helpful info is welcome, just
looking for some advise.

Thanks in advance,
--
Robert


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [GENERAL] Hardware related question: 3ware 9500S

2006-04-12 Thread Ted Byers

- Original Message - 
From: Merlin Moncure [EMAIL PROTECTED]

To: Janning Vygen [EMAIL PROTECTED]
Cc: pgsql-general@postgresql.org
Sent: Wednesday, April 12, 2006 12:31 PM
Subject: Re: [GENERAL] Hardware related question: 3ware 9500S
[snip]

 - I want to know if 3ware 9500 S is recommended or if its one of those
 controllers which sucks.

escalade is a fairly full featured raid controller for the price.
consider it the ford taurus of raid controllers, it's functional and
practical but not sexy.  Their S line is not native sata but operates
over a pata-sata bridge.  Stay away from raid 5.

Hi Merlin

Why?  What's wrong with raid 5? I could well be wrong (given how little 
attention I have paid to hardware over the past few years because of a focus 
on developing software), but I was under the impression that of the raid 
options available, raid 5 with hot swappable drives provided good data 
protection and performance at a reasonably low cost.  Is the problem with 
the concept of raid 5, or the common implementations?

Do you have a recommendation regarding whether the raid array is built into 
the server running the RDBMS (in our case PostgreSQL), or located in a 
network appliance dedicated to storing the data managed by the RDBMS?  If 
you were asked to design a subnet that provides the best possible 
performance and protection of the data, but without gold-plating anything, 
what would you do?  How much redundancy would you build in, and at what 
granularity?

Ted

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [GENERAL] Hardware related question: 3ware 9500S

2006-04-12 Thread Ted Byers

- Original Message - 
From: Scott Marlowe [EMAIL PROTECTED]

To: Ted Byers [EMAIL PROTECTED]
Cc: Merlin Moncure [EMAIL PROTECTED]; Janning Vygen [EMAIL PROTECTED]; 
pgsql general pgsql-general@postgresql.org

Sent: Wednesday, April 12, 2006 2:24 PM
Subject: Re: [GENERAL] Hardware related question: 3ware 9500S

On Wed, 2006-04-12 at 13:10, Ted Byers wrote:
 - Original Message - 
 From: Merlin Moncure [EMAIL PROTECTED]

 To: Janning Vygen [EMAIL PROTECTED]
 Cc: pgsql-general@postgresql.org
 Sent: Wednesday, April 12, 2006 12:31 PM
 Subject: Re: [GENERAL] Hardware related question: 3ware 9500S
 [snip]

Why?  What's wrong with raid 5? I could well be wrong (given how little
attention I have paid to hardware over the past few years because of a 
focus

on developing software), but I was under the impression that of the raid
options available, raid 5 with hot swappable drives provided good data
protection and performance at a reasonably low cost.  Is the problem with
the concept of raid 5, or the common implementations?

Do you have a recommendation regarding whether the raid array is built 
into

the server running the RDBMS (in our case PostgreSQL), or located in a
network appliance dedicated to storing the data managed by the RDBMS?  If
you were asked to design a subnet that provides the best possible
performance and protection of the data, but without gold-plating 
anything,

what would you do?  How much redundancy would you build in, and at what
granularity?

There have been NUMEROUS discussions of RAID-5 versus RAID 1+0 in the
perform group in the last year or two.  Short version:

Interesting.

I take it that RAID 1+0 refers to a combination of Raid 1 and RAID 0. 
What about RAID 10?  I am curious because RAID 10 has come out since the 
last time I took a look at RAID technology.  I am not sure what it actually 
does differently from RAID 5.

This question of data security is becoming of increasing importance to me 
professionally since I will soon have to advise the company I'm working with 
regarding how best to secure the data managed by the applications I'm 
developing for them.  I will need overall guidelines to produce a design 
that makes it virtually impossible for them to lose even on field in one 
record.  The data is both sensitive and vital.  Fortunately, I have a few 
months before we need to commit to anything.  Also, fortunately, with one 
exception, the applications rely on a data feed that comes in once a day 
after normal working hours, so I won't have to worry about writes to the DB 
other than what my script does to load the datafeed into the DB.  All other 
access is read only.  This should make it easier to produce a strategy to 
protect the data from any kind of technology failure (software or hardware). 
Cost is a factor, but reliability is much much more important!

Thanks,

Ted

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [GENERAL] [Slightly OT] data model books/resources?

2006-03-31 Thread Ted Byers


On Thursday 30 March 2006 03:03, Aaron Glenn wrote:

Anyone care to share the great books, articles, manifestos, notes,
leaflets, etc on data modelling they've come across? Ideally I'd like
to find a great college level book on data models, but I haven't come
across one that even slightly holds definitive resource-type status.



I've heard that Relational Database Design (ISBN: 0123264251) is good 
for

college level introductory material, though the book I generally recommend
most is Practical Issues in Database Management (ISBN: 0201485559)


Feel free to reply off list to keep the clutter down - I'd be happy to
summarize responses for the list.



We're all about clutter :-)

Well then, in that case, can I add to the clutter by asking a question about 
IT training?  I was just asked today, by a vice president in the company I'm 
working with, to train one of his staff to become a database programmer and 
administrator.  I have taught software engineering using UML, and 
programming in Java and C++.  I have not taught database programming and 
administration, although I have done some of each for some of my own 
applications.


My Question?  Can the folk in this group help me develop a reading list and 
a list of competencies for this fellow to master?  While I can easily 
develop a list of books dealing with databases in general and SQL in 
particular, it is not so easy to separate the wheat from the chaff, and I do 
not want to waste a pile of money on evaluating the range of books that are 
available.  I'd therefore like accounts of books to avoid, and why, as well 
as books that are essential in any respectable collection, and why.  I'm 
interested both in text books, with exercises, and reference books (both 
theoretical and practical).


Thanks

Ted 




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] PostgreSQL's XML support comparison against other RDBMSes

2006-03-29 Thread Ted Byers


May I ask a question about this?

I will be working on an older database in which the original developer 
stored XML documents as a single variable length text field.  To process it, 
it has to be retrieved in full and parsed.  But the structure of it is 
simple in that it has an element for each field in the DB that replaced the 
earily version.  But people are still using the earlier one because they 
still need access to the old data and no tool has yet been written by my 
predecessors to move the old data over to the new DB.  Does the XML support 
you're all talking about make it less tedious to develop tools to parse 
these old XML files and put their data into the right field of the right 
table?  I can develop the tool our users need using the resources I have at 
hand, but the proces of parsing these XML files is certain to be very 
tedious; something I am not looking forward to.  There is a reason I don't 
do much with XML even though I know how.


Ted
- Original Message - 
From: Steve Atkins [EMAIL PROTECTED]

To: pgsql general pgsql-general@postgresql.org
Sent: Wednesday, March 29, 2006 12:21 PM
Subject: Re: [GENERAL] PostgreSQL's XML support comparison against other 
RDBMSes





On Mar 29, 2006, at 2:57 AM, Ivan Zolotukhin wrote:


On 3/29/06, Peter Eisentraut [EMAIL PROTECTED] wrote:

Ivan Zolotukhin wrote:

BTW, are there any plans for its improvement? What are the first
steps for the people who want to contribute to PostgreSQL in this
field?


The first step would be making a detailed analysis of what XML 
support

is supposed to mean.


So let's make such analysis here within PostgreSQL community! There
were a lot of talks about XML support in Postgres, but they did not
lead to any steady work in this direction. IMHO, it's time to start it
now, no?


The people who want it should explain what it is, ideally by
pointing at the relevant standards documents.

Cheers,
  Steve


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match





---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [Bulk] [GENERAL] General advice on database/web applications

2006-03-27 Thread Ted Byers



I am developing a small web application.  Currently, our web server is
sitting outside our firewall (running its own firewall), and the 
application

being developed would let users do things like place orders.

My question is...what and where is the database for this?

What do you mean when you say your web server is running its own firewall? 
I could well be wrong, but I am not aware of a web server that can run a 
firewall; web servers and firewalls are, as I understand them, quite 
different kinds of software, though I am aware of some hardware that have 
built in firewalls.


Your question, though, doesn't make sense.  If, as you say explicitly in 
your first sentence, that you're developing a small web application, then 
either you don't have a database and need to create it, or you have already 
created your database and know both where and what it is.  If you haven't 
created it already, then you can create it and you have absolute control 
over where to put it and what RDBMS to use.  The only circumstance in which 
I could imagine you having a database back end for your application but not 
knowing about it is if you bought hosting services from a company that 
provides such services.  But if that's the case, then you ought to be asking 
that company about it.  But if that's the case, they probably already have a 
ready made virtual store application for you to use, which makes developing 
your own unnecessary unless you're planning to do your own hosting, and that 
takes us back to you having complete control over what you use and where you 
put it.


If I were to create such a web application as you describe, I'd create a 
database using PostgreSQL or something similar and have it live inside the 
firewall, configured to respond only to applications running behind the 
firewall.  Under no circumstances would I want it to accept connections 
across the firewall.  Similarly, I'd have my application server and my httpd 
server behind the firewall and configured to accept connections across the 
firewall but only from proxy servers set up in a DMZ.


Since you are dealing with sensitive information such as financial data, you 
are going to have to design security into your application from start to 
finish, and then harden your entire network inside and out, including 
especially your firewall and each machine individually.  You have some legal 
responsibilities to protect your clients' data.  I'm told, by folk who ought 
to know, that you could face major problems if you fail to exercise due 
diligence in protecting your clients' data.


Cheers,

Ted 




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[GENERAL] Double checking my logic?

2006-03-20 Thread Ted Byers




I have to deal with the following situation.

I get a daily feed of data for several tables. This 
data is retrieved from the supplier by ftp in the form of CSV 
textfiles. I know I can load the files using "load data". The 
relational structure between the files and tables is simple: one file contains 
data for a lookup table. But the data in the lookup table is dynamic, 
potentially changing from one day to the next (though in practice, the rate of 
change is likely to be very slow). I will also need to maintain a history 
in such a way that any and all changesin the data can be fully 
audited. This means that, for auditing purposes, I will need to be able to 
extract valid time for each item in the lookup table. The only likely 
change in the data will be that occassionally items will be added or removed 
from the lookup table. None of the data already present in the database 
will be editable (data validation happens before I get the feed). The only 
analysis of which I have been told is done daily, as the feed is received. 
At present, no-one looks at earlier data (although I expect that will change as 
I create a better structure for the database to support audits).

I am trying to create a flexible design so that 
refactoring will be simple if and when the assumptions or practices need to be 
changed.

I know how to handle all this using a brute force 
approach, but I expect that approach will be painfully slow. So here is 
what I am planning.

1) create a suite of tables corresponding to the files in 
the feed, with the addition of a date/time stamp containing the date and time on 
which the data being processed was received.
2) create a suite of temporary tables corresponding to the 
tables created in step 1 (but without the date/time stamp)
3) load the data into the temporary tables
4) analyse the data while it is in the temporary tables, 
storing the result of the analysis in new tables
5) copy the data into the permanent tables, and add the 
date and time stamp for the data (this date/time stamp is not present in the 
files retrieved).
6) free the temporary tables

Now, this secondlast step is brute force, adequate 
for all but one of the tables: the look up table. If I stick with the 
brute force approach, the lookup table will waste a significant amount of 
space. This won't be much initially, but it is guaranteed to get worse as 
time passes and I'd expect the lookup performance to degrade as the amount of 
data in the lookup table increases.

Each record in the lookup table represents a product, and 
both the name and the product will have valid time intervals that may not be 
related. The name may change because the product has been renamed for 
whatever reason (and we don't care why), or the name may disappear altogether 
because the product has been discontinued. We can distinguish the two 
cases because each product has an ID that remains valid while the product 
exists, and the ID won't be in the data at all if the product is 
discontinued. 

I am considering creating an additional table just to 
lookup product names, but with two date and time stamps. The first would 
represent the first time the product name appears in the data and the last 
would represent the last time the product nameis present in the 
data. The first of these would be edited only once, and that is on the 
first day for which we have data. I am torn between updating the last of 
these every day, until the name disappears, or leave it null until the name 
disappears. leaving it null would save on space, but updating it with the 
current data and time should save time since it would not be necessary to 
execute a complex conditional on every product ever included in the 
database. If we update it only for those items in today's data, those that 
were terminated before today will not have their date/time stamp updated, so the 
two fields will always represent the time interval for which the name is valid. 
A similar logic applies to the product ID.

Have I overlooked anything that is obvious to you? 
Any gotchas I should be aware of?

What opportunities for improving performance do you 
see?

The ultimate objective is to have this fully automated 
from a shell script that is configured by our sysop to execute at a specific 
time every day Monday through Friday. Can all of the steps I describe 
above be implemented using ANSI standard SQL, and thus put into a stored 
procedure, or should I look at doing some of it in a Perl script or java 
application? I suppose I will have to have at least a basic Java 
application, or perl script, if only to connect to the database and invoke any 
functions I have created to do this.

Any suggestions would be appreciated.

Thanks,

Ted

R.E. (Ted) Byers, Ph.D., Ed.D.R  D Decision 
Support Solutionshttp://www.randddecisionsupportsolutions.com/

Re: [Bulk] Re: [GENERAL] Tool

2006-02-10 Thread Ted Byers




Bob,

If you're interested in a free java tool, you might want 
to look at NetBeans (now version 5). It is one of the best I have seen, 
and I have used a wide variety of such tools in a number of different 
programminglanguages. Understand, though, that you have a wide 
variety of options, ranging from a web interface (which may be preferred if your 
users will be accessing the database over a network) at one extreme to a thick 
client, perhaps started using Sun's Web Start, at the other. 

An advantage NetBeans offers is that it is easy to learn, 
and the user interface is quite intuitive, especially for GUI design and 
implementation.

Eclipse is another one with a good reputation, but I don't 
like it so much. Its support for GUI design seems to me to bequite 
weak.

What you haven't said in any of your posts is what 
programming languages you know. Nor do you indicate whether or not you 
have the skills required to work with an editor like emacs to create a very thin 
client (using a web server like Apache) with server side scripting using perl or 
php. It is hard to make a recommendation without knowing something about 
your skills. If you don't already have the skills of an intermediate Java 
programmer, then none of the tools mentioned in this thread will help you 
much. If you knew only C++ (again at least at an intermediate level), for 
example, the only tools I know of that would be suitable are commercial 
(Borland's C++ Builder and Microsoft's Visual C++). And I have no idea 
what the corresponding tools would be on unix/Linux (if anyone knows of such 
tools for Linux, I'd appreciate hearing about them as one of my computers will 
soon have Windows XP removed and replaced by Suse linux).

Cheers,

Ted

  - Original Message - 
  From: 
  Bob Pawley 
  To: Wes Williams ; 'Postgresql' 
  Sent: Friday, February 10, 2006 1:03 
  PM
  Subject: [Bulk] Re: [GENERAL] Tool
  
  This looks quite interseting.
  
  Thanks Wes
  
  Bob
  
- Original Message - 
From: 
Wes Williams 
To: 'Bob Pawley' ; 'Postgresql' 
Sent: Friday, February 10, 2006 9:14 
AM
Subject: RE: [GENERAL] Tool

You may wish to 
consider creating a web interface GUI. 

For 
example: Sun's free Java Studio Creator2
http://developers.sun.com/prodtech/javatools/jscreator/index.jsp
Alittletolearnupfront,butthentheGUIgoodnesscomesthrough.


  -Original Message-From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED]On Behalf Of Bob 
  PawleySent: Friday, February 10, 2006 12:02 PMTo: 
  PostgresqlSubject: [GENERAL] Tool
  I am developing a database which needs 
  testing prior to developing the host program.
  
  I am looking for an easy-to-learn tool. I 
  want to develop an interfacethat I can use toenter and display 
  information in a similar fashion to how a user would operate the database. 
  
  
  A GUI tool would be ideal.
  
  I am running Postgresql v 8.0 on Windows 
  XP.
  
  Bob 
Pawley

Re: [GENERAL] Primary keys for companies and people

2006-02-02 Thread Ted Byers

 
or misidentified, and the cost to the owner or user of the application or 
database.  Who will suffer if a mistake is made?  Will, or can, bad things 
happen if a given person ends up with more than one ID?  What is the cost, 
and who bears this cost, if more than one person can use the same ID?


3) Can we construct a suite of best practices from which we can select given 
specific functional or non-functional constraints as developed for our 
application?  Included with this question is consideration of protection of 
sensitive data in general, and protection of data that might conceivably be 
used by cyber-criminals in activity related to identity theft, or to use 
sensitive data to the harm of the person so identified.


4) How is biometric data best stored and searched for use in authentication 
processes within an arbitrary application?  I guess this question assumes 
that biometric data needs to be used in an authentication request, and it 
occurs to me that for some applications, it may be sufficient to use 
biometric data in creation of a unique user id, and subsequently may be 
needed only for certain sensitive processes or resources.


My own feeling is that some options are very easy, and some of these are 
adequate for some situations, but that there are others that may be needed 
depending on the sentivity of the data in question or on the potential cost 
to one or more parties to a given business process.  I expect to be 
considering these issues extensively over the next few years since they are 
relevant to some of the web applications I am designing.  Any insights you, 
or others, may have on these questions would be greatly appreciated.


Cheers,

Ted

R.E. (Ted) Byers, Ph.D., Ed.D.
R  D Decision Support Solutions
http://www.randddecisionsupportsolutions.com/ 




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Logging statements and parameter values

2006-01-30 Thread Ted Powell

On Mon, Jan 30, 2006 at 04:31:29PM -0500, Bruce Momjian wrote:
 
 I assume it is this TODO:
 
   * Allow protocol-level BIND parameter values to be logged
 
 
 ---
 
 Ted Powell wrote:
  Our development group needs to have the option of logging all SQL
  statements including substituted parameter values. [...]

That's it! (I should have thought to look in the TODO.)

Has any design work been done on this?

-- 
Ted Powell [EMAIL PROTECTED]   http://psg.com/~ted/
GPL code ... It's the difference between
owning your own home and just renting.  --PJ

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] Logging statements and parameter values

2006-01-30 Thread Ted Powell

On Mon, Jan 30, 2006 at 05:19:23PM -0500, Bruce Momjian wrote:
 [...]
 * Allow protocol-level BIND parameter values to be logged
 [...]
  That's it! (I should have thought to look in the TODO.)
  
  Has any design work been done on this?
 
 No.  I am with Simon Riggs today at my house and I asked him, hoping he
 can get it done for 8.2.  I don't think it is very hard.

Various things have been pushed on my stack since I posted about this.
When it gets near the top again, I'll check back. Thanks for the response.

-- 
Ted Powell [EMAIL PROTECTED]   http://psg.com/~ted/
GPL code ... It's the difference between
owning your own home and just renting.  --PJ

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[GENERAL] Logging statements and parameter values

2006-01-27 Thread Ted Powell

Our development group needs to have the option of logging all SQL
statements including substituted parameter values. Getting output in the
form:
... WHERE contact.login_con = $1 AND company.login_co = $2

was no problem, but nothing that I tried turning on in the config file
yielded values for $1 and $2.

Digging into the source for 8.1.1 brought me to this code in
.../backend/tcop/postgres.c (lines 1449+)

/* We need to output the parameter values someday */
if (log_statement == LOGSTMT_ALL)
ereport(LOG,
(errmsg(statement: BIND %s, portal_name)));

/*
 * Fetch parameters, if any, and store in the portal's memory context.
 */
if (numParams  0)

It seems to me that a point near the bottom of the loop over parameters
(1564+)
params[i].kind = PARAM_NUM;
params[i].id = i + 1;
params[i].ptype = ptype;
params[i].isnull = isNull;

i++;
}

(params[i].value is set in a couple of places higher up in the loop)
would be a good place to log each parameter, but...

Has this not been done simply because nobody has gotten around to it, or
are there pitfalls? Although I've been using PostgreSQL for several years,
this is my first venture into its source code beyond watching it build.

Also, the Datum params[i].value, does it necessarily hold displayable
text, even when its content is the output of a binary input converter?
Is there a utility log routine somewhere that I can simply feed a
Datum to?


-- 
Ted Powell [EMAIL PROTECTED]   http://psg.com/~ted/
GPL code ... It's the difference between
owning your own home and just renting.  --PJ

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [Bulk] Re: [GENERAL] Best programming language / connectivity for best performance

2006-01-10 Thread Ted Byers


Nico,

I do not believe there is anyone who can tell you the best way to go as 
that is determined by far more than DB access.


That said, I am inclined to agree with John.  I would expand on that to say 
that if Delphi is the only language you know, then it is the best option for 
getting productive quickly.  Otherwise, you have to resort to multiparadigm 
development and determine which of the languages you know best meets the 
requirements for your project.  I use FORTRAN, C++, and JAVA, and I know VB, 
Perl, and PHP.  If I require an outstanding object model, my choice reduces 
to C++ and JAVA, because the object model in FORTRAN is almost as seriously 
broken  as that in VB.  I generally don't use VB, except for trivial toys, 
and even there, it has been years since I used it.  Perl and PHP are 
interesting, but I have yet to see an advantage they can offer over JAVA 
(including servlets and JSP) or C++.  I use FORTRAN for specialty programs 
requiring extensive number crunching, but it is being displaced in my own 
work by C++ particularly because of the advantages provided by 
metaprogramming.  The only languages I have used for commercial application 
development are C++ and Java.  If there is significant networking and/or 
security, I lean towards JAVA because of the related libraries built into 
the SDK.  OTOH, if I need real generic programming, I lean toward C++ 
because, while JAVA has made some progress in generics, it is still quite 
broken relative to the powerful support for generics in C++.  And, if there 
is a need for metaprogramming based on generic programming, then C++ is the 
only choice I am aware of.  You see, every language has it's strengths and 
weaknesses, and you generally need to choose based on what you know about 
each language you know how to use, and the availability of libraries to 
support your problem domain.  In my experience, you become a more productive 
and better programmer as you make a point of learning more programming 
languages and give thought to their relative strengths and weaknesses.


Almost invariably, when my use of a DB is slow, it is because I have a 
problem with my queries.  This happens more often than it should, but then I 
have been programming using languages like FORTRAN, Pascal, Basic and C++ 
(all closely related Algol languages) for well over two dozen years and I 
started studying RDBMS and SQL only a few years ago.  Like John, when I'm 
using a RAD tool (and I should say I like Delphi almost as much as I like 
Netbeans and Borland's C++ Builder), I handle DB access in my own code and 
let the GUI handle the interface only.


My experience with bound controls is similar to John's. They are a 
convenience for novice programmers, but once you start adding more 
professional features, such as data validation and formatting, they leave 
just about everything to be desired.  In my work, they served as a 
convenience during proof of concept work to show a client where their 
project was heading, but in virtually every case, I replaced them for one 
reason or another.  They just didn't provide the feature set I required and 
it was just less work to use the basic controls than it was to fight with 
broken bound controls.


Cheers,

Ted


- Original Message - 
From: John McCawley [EMAIL PROTECTED]

To: Nico Callewaert [EMAIL PROTECTED]
Cc: pgsql-general@postgresql.org
Sent: Tuesday, January 10, 2006 9:06 AM
Subject: [Bulk] Re: [GENERAL] Best programming language / connectivity for 
best performance



I really wouldn't take relative DB performance into much consideration when 
choosing a programming language for your project.  I have found that the 
actual Language/API overhead to be pretty inconsequential in most of my 
projects.  When my DB access is slow, I have found it is almost always a 
problem with my query or the DB is just overloaded.


There is one exception I can think of, and that's doing bound controls in 
VB through ODBC.  I haven't done bound controls in VB for years and years, 
so I don't even know if it's still in the language, but they're crappy and 
you shouldn't use them.  When I'm using a RAD tool like VB etc. I *never* 
use design-level tools to handle my DB IO.  When I write a client side 
app, I manually handle my own DB IO and let the GUI simply handle the user 
interface.



Nico Callewaert wrote:


Hi,
 I was wondering what is the best database connectivity and programming 
language to get the best performance with PostgreSQL.  I'm currently 
working with Delphi, but I don't know if that is the best way to go ?




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[GENERAL] another problem with stored procedures

2005-12-29 Thread Ted Byers




I have just encountered 
another problem. I am not sure if it is with my code, or with how I am 
working with Postgres/pgAdmin III.Here is another function, as created 
using the wizard/dialog boxin pgAmin III for creating 
functions:CREATE FUNCTION "People".get_pw(ea "varchar") RETURNS 
"varchar" AS$BODY$SELECT pword FROM "People".uids WHERE email_address = 
ea;$BODY$LANGUAGE 'sql' VOLATILE;When I click OK to 
indicate that I am finished, I get an error message saying there is no 
column called "ea". Of course I know that; that is because it is a 
function parameter instead. What I don't understand is why pgAdmin 
would not put the "IN" qualifier for the function's only parameter or why 
Postgres would think ea is a column when the code clearly identifies it as a 
function parameter. (BTW: replacing 'sql' by 'plpgsql' has no effect, 
except the error message is even less informative).Any 
ideas?

What I am after is a simple 
select procedure returning the contents of pword in the record where the 
contents of email_address are the same as the contents of the parameter 
ea. I figure that if the result set returned to the calling Java/JDBC code 
is empty, the email address offered does not exist in the database and that, if 
there is one record, I'll compare the string value returned withthe 
password offered by the user in order to authenticate the user. Then, if 
authentication succeeds, I'll query a different database to see what resources 
the user is authorized to use.

I have used, through JDBC 
function calls that end up submitting something like the following to the RDBMS 
back end:

SELECT pword FROM 
"People".uids WHERE email_address = 'ea_value';

these all worked fine. 
It was just a little tedious to concatenate the various strings so that the 
contents of the SQL statement string looked like the above statement. I 
can't see a reason why I'd have trouble transforming the above select statement 
into a stored function.

BTW: I know I can do this my 
old way of using prepared statements with JDBC and java, but I read that I can 
make my distributedapplication more secure by putting all my SQL into 
stored, parameterizedprocedures. What are the SQL related attacks 
that a web application is vulnerable to, and how effective is the approach of 
placing all my SQL into stored procedures at countering them. Are prepared 
statements any more, or less, usefulin making a distributed application 
more secure? Of course, I'd have validation code on both the client side 
and within my servlet that processes user data. After all, I have bitten 
the bullet to learn about stored procedures and functions precisely because of 
my studies of ways to make distributed applications secure.

Thanks,

Ted

R.E. (Ted) Byers, Ph.D., Ed.D.R  D Decision 
Support Softwarehttp://www.randddecisionsupportsolutions.com/

[GENERAL] Final stored procedure question, for now anyway

2005-12-28 Thread Ted Byers




OK, I finally have Postgres accepting my stored 
procedure. here is the relevant data in the history window of pgAmind III 
Query:
=
-- Executing query:CREATE OR REPLACEFUNCTION 
addEntity (fn IN VARCHAR,ln IN VARCHAR,ivar IN 
VARCHAR,hi IN VARCHAR,pw IN VARCHAR,ea IN 
VARCHAR,ad IN VARCHAR,c IN VARCHAR, p IN 
VARCHAR,co IN VARCHAR,pc IN VARCHAR)RETURNS void AS 
$$DECLAREvarID INTEGER;BEGINSELECT INTO varID uid 
from uids where email_address=ea;IF varID IS NOT NULL 
THENINSERT INTO addy 
(uid,address,city,province,country,postal_code) VALUES 
(varID,ad,c,p,co,pc);ELSEINSERT INTO 
uids(family_name,first_name,initials,hid,pword,email_address) 
VALUES (ln,fn,ivar,hi,pw,ea);INSERT INTO 
addy(uid,address,city,province,country,postal_code) VALUES 
(currval('uids_uid_seq'),ad,c,p,co,pc);END IF;END;$$ LANGUAGE 
'plpgsql' VOLATILE;

Query returned successfully with no result in 0 
ms.

=
So I now go into pgAmin III, and take a look, and I can't 
find it.

When I execute the above SQL, where will Postgres store 
the function? I see, in pgAmin's main window, several places where 
functions are stored, but none of them contain my function.

It must be stored somewhere since I get an error saying 
the function already exists if I attempt to modiy it slightly and re-run 
it. For example, I noticed I used the wrong sequence in the last INSERT 
statement. To correct that, I had to add "OR REPLACE" after "CREATE" to 
make the correction.  I am puzzled though that I can't find it when I go 
back to the pgAmin main window and search through the whole database (after 
pressing F5 to refresh the contents of the windows).

Thanks to all who helped me on this matter.

Cheers,

Ted


R.E. (Ted) Byers, Ph.D., Ed.D.R  D Decision 
Support Softwarehttp://www.randddecisionsupportsolutions.com/

Re: [GENERAL] sending mail from Postgres

2005-12-28 Thread Ted Byers

- Original Message - 
From: Vivek Khera [EMAIL PROTECTED]

To: PG-General General pgsql-general@postgresql.org
Sent: Wednesday, December 28, 2005 11:48 AM
Subject: Re: [GENERAL] sending mail from Postgres

On Dec 28, 2005, at 8:28 AM, Christopher Browne wrote:

Some separate process (perhaps polling, perhaps using NOTIFY/LISTEN)
would then grab messages from the table and submit them to [whatever
is the communications layer].

That is a clean sort of design for that sort of thing.

This is precisely how we build things.  Throw in some good locking 
mechanisms and you can scale this to incredible levels.

Here is a general question relating to this.  The problem involves due 
diligence related to environmental protection.  Consider a distributed 
application to support this.  You have a facility with an environmental 
monitoring program.  Samples are taken from soil, water and air and sent 
off-site for analysis.  Each sample, when analyzed, results in a report sent 
to the facility's management for their records.  However, if the 
concentration of some contaminant in a sample is above some threshold, a 
second report, containing the complete analysis results for the sample, is 
generated and sent to one or more people, both inside and outside the 
organisation (e.g. engineers within the organization responsible for fixing 
problems with the facility and engineers at relevant regulatory agencies). 
One objective is to automate as much of the data management as possible and 
to ensure that if a problem arises everyone who needs to know about it is 
notified.  The process has to be auditable, so that information about when 
each step in the process starts is stored in the database, as well as 
information about when messages are acknowledged (again automated - so when 
an engineer opens a message about a problem, an acknowledgement is sent to 
the database without his intervention).

I suppose email might work as a means of sending messages, but I was 
thinking of Sun's JMS instead, working with triggers.  I could then create 
my own thin client to display the reports, perhaps sorting them according to 
user specified criteria.  I can see how to do it within the web tier, or 
within the client tier (within the labs doing the analyses).  The thing is, 
of the designs I have considered, the one involving triggers with JMS on the 
main supporting website (with a database back end) is the simplest in terms 
of deployment, since all interested parties could interact with the 
application through the internet using a very thin client (perhaps even with 
an applet within a web page) and I would not need to worry about deploying 
software to all relevant people/sites.

If you faced this problem, what would you do, and why?

Cheers,

Ted
R.E. (Ted) Byers, Ph.D., Ed.D.
R  D Decision Support Software
http://www.randddecisionsupportsolutions.com/ 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [GENERAL] Final stored procedure question, for now anyway

2005-12-28 Thread Ted Byers

Did you find a fix for it?

  - Original Message - 
  From: 
  Jonel 
  Rienton 
  To: 'Ted Byers' ; pgsql-general@postgresql.org 

  Sent: Wednesday, December 28, 2005 1:24 
  PM
  Subject: RE: [GENERAL] Final stored 
  procedure question, for now anyway

  it's because of the mixed-case you're using in the 
  function name, i've had this issue last week myself and it seems to be the 
  culprit

  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Ted 
  ByersSent: Wednesday, December 28, 2005 11:13 AMTo: pgsql-general@postgresql.orgSubject: 
  [GENERAL] Final stored procedure question, for now anyway

  OK, I finally have Postgres accepting my stored 
  procedure. here is the relevant data in the history window of pgAmind 
  III Query:
  =
  -- Executing query:CREATE OR REPLACEFUNCTION 
  addEntity (fn IN VARCHAR,ln IN VARCHAR,ivar IN 
  VARCHAR,hi IN VARCHAR,pw IN VARCHAR,ea IN 
  VARCHAR,ad IN VARCHAR,c IN VARCHAR, p IN 
  VARCHAR,co IN VARCHAR,pc IN VARCHAR)RETURNS void 
  AS $$DECLAREvarID INTEGER;BEGINSELECT INTO varID 
  uid from uids where email_address=ea;IF varID IS NOT NULL 
  THENINSERT INTO addy 
  (uid,address,city,province,country,postal_code) VALUES 
  (varID,ad,c,p,co,pc);ELSEINSERT INTO 
  uids(family_name,first_name,initials,hid,pword,email_address) 
  VALUES (ln,fn,ivar,hi,pw,ea);INSERT INTO 
  addy(uid,address,city,province,country,postal_code) 
  VALUES 
  (currval('uids_uid_seq'),ad,c,p,co,pc);END IF;END;$$ 
  LANGUAGE 'plpgsql' VOLATILE;

  Query returned successfully with no result in 0 
  ms.

  =
  So I now go into pgAmin III, and take a look, and I 
  can't find it.

  When I execute the above SQL, where will Postgres store 
  the function? I see, in pgAmin's main window, several places where 
  functions are stored, but none of them contain my function.

  It must be stored somewhere since I get an error saying 
  the function already exists if I attempt to modiy it slightly and re-run 
  it. For example, I noticed I used the wrong sequence in the last INSERT 
  statement. To correct that, I had to add "OR REPLACE" after "CREATE" to 
  make the correction.  I am puzzled though that I can't find it when I go 
  back to the pgAmin main window and search through the whole database (after 
  pressing F5 to refresh the contents of the windows).

  Thanks to all who helped me on this matter.

  Cheers,

  Ted

  R.E. (Ted) Byers, Ph.D., Ed.D.R  D Decision 
  Support Softwarehttp://www.randddecisionsupportsolutions.com/

Re: [GENERAL] Final stored procedure question, for now anyway

2005-12-28 Thread Ted Byers




Well, I just ran another test and the result indicates the 
cause of the problem lies elsewhere. I replaced the original function name 
with "add_entity" and the end result is the same. The function gets stored 
somewhere, but it is notdisplayed anywhereby pgAdmin 

Any suggestions?

  - Original Message - 
  From: 
  Jonel 
  Rienton 
  To: 'Ted Byers' ; pgsql-general@postgresql.org 
  
  Sent: Wednesday, December 28, 2005 4:16 
  PM
  Subject: Re: [GENERAL] Final stored 
  procedure question, for now anyway
  
  yup, just use lowercase all the 
  time
  
  
  From: Ted Byers 
  [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 28, 2005 
  1:45 PMTo: Jonel Rienton; pgsql-general@postgresql.orgSubject: 
  Re: [GENERAL] Final stored procedure question, for now 
  anyway
  
  Did you find a fix for it?
  
- Original Message - 
From: 
Jonel 
Rienton 
    To: 'Ted Byers' ; pgsql-general@postgresql.org 

Sent: Wednesday, December 28, 2005 1:24 
PM
Subject: RE: [GENERAL] Final stored 
procedure question, for now anyway

it's because of the mixed-case you're using in the 
function name, i've had this issue last week myself and it seems to be the 
culprit


From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Ted 
ByersSent: Wednesday, December 28, 2005 11:13 AMTo: pgsql-general@postgresql.orgSubject: 
[GENERAL] Final stored procedure question, for now 
anyway

OK, I finally have Postgres accepting my stored 
procedure. here is the relevant data in the history window of pgAmind 
III Query:
=
-- Executing query:CREATE OR REPLACEFUNCTION 
addEntity (fn IN VARCHAR,ln IN VARCHAR,ivar IN 
VARCHAR,hi IN VARCHAR,pw IN VARCHAR,ea IN 
VARCHAR,ad IN VARCHAR,c IN VARCHAR, p IN 
VARCHAR,co IN VARCHAR,pc IN VARCHAR)RETURNS void 
AS $$DECLAREvarID INTEGER;BEGINSELECT INTO varID 
uid from uids where email_address=ea;IF varID IS NOT NULL 
THENINSERT INTO addy 
(uid,address,city,province,country,postal_code) VALUES 
(varID,ad,c,p,co,pc);ELSEINSERT INTO 
uids(family_name,first_name,initials,hid,pword,email_address) 
VALUES (ln,fn,ivar,hi,pw,ea);INSERT 
INTO addy(uid,address,city,province,country,postal_code) 
VALUES 
(currval('uids_uid_seq'),ad,c,p,co,pc);END IF;END;$$ 
LANGUAGE 'plpgsql' VOLATILE;

Query returned successfully with no result in 0 
ms.

=
So I now go into pgAmin III, and take a look, and I 
can't find it.

When I execute the above SQL, where will Postgres 
store the function? I see, in pgAmin's main window, several places 
where functions are stored, but none of them contain my 
function.

It must be stored somewhere since I get an error 
saying the function already exists if I attempt to modiy it slightly and 
re-run it. For example, I noticed I used the wrong sequence in the 
last INSERT statement. To correct that, I had to add "OR REPLACE" 
after "CREATE" to make the correction.  I am puzzled though that I 
can't find it when I go back to the pgAmin main window and search through 
the whole database (after pressing F5 to refresh the contents of the 
windows).

Thanks to all who helped me on this 
matter.

Cheers,
    
    Ted


R.E. (Ted) Byers, Ph.D., Ed.D.R  D Decision 
Support Softwarehttp://www.randddecisionsupportsolutions.com/
  --No virus found in this incoming message.Checked by 
  AVG Free Edition.Version: 7.1.371 / Virus Database: 267.14.8/215 - Release 
  Date: 12/27/2005
  --No virus found in this outgoing message.Checked by 
  AVG Free Edition.Version: 7.1.371 / Virus Database: 267.14.8/215 - Release 
  Date: 12/27/2005

Re: [Bulk] Re: [GENERAL] Final stored procedure question, for now anyway

2005-12-28 Thread Ted Byers

- Original Message - 
From: Frank L. Parks [EMAIL PROTECTED]

To: pgsql-general@postgresql.org
Sent: Wednesday, December 28, 2005 7:27 PM
Subject: [Bulk] Re: [GENERAL] Final stored procedure question, for now 
anyway

Do you refresh you database after you add the function?  It will not show 
up until after you refresh.

I didn't realise that refreshing the display in pgAdmin would not be enough. 
I did that several times with no luck.  I then closed the connection to the 
database and then connected again, and voila!  The functions appear, 
entirely in lower case, in the public schema in my EntityDatabase.  This 
database has two schemas, public, which was there the moment I created the 
database, and People which I created.

Now, this begs more questions.  My function references tables in the 
People schema.  They do not qualify the references to these tables.  In my 
view, the function properly belongs to the people schema.

1) Can the function be simply moved from the public schema to the people 
schema, or do I need to delete the function and recreate it within the 
people schema?  If the latter is true, how would I specify, within the SQL, 
that the function is to be added to the People schema instead of public?

2) Do I need to qualify my references to my tables in the function to 
identify the schema in which the tables exist, or will Postgres find the 
right tables with the function in its present form?  If I have to further 
qualify the table references, what is the correct qualification of the table 
references within the SQL in the function definition?

Thanks,

Ted 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] Stored procedure

2005-12-22 Thread Ted Byers




I am learning how to create stored procedures using 
pgAdmin and Postgres. I have a couple of questions.

1) For all of my tables (admittedly not many, yet), I 
created columns with the type of integer to serve as indices. I now 
learned (well, latelast might)in your documentation that Postgres 
has additional types (serial, serial4, c.) that are integers that are 
autoincremented. I tried, therefore, to change these columns to type 
serial but received an error stating that type serial does not exist. Was 
the documentation I read mistaken, or is it a problem with pgAdmin, or did I 
make a mistake? Should I drop the columns in question and recreate them as 
having type serial(is that possible when the column is used as a foreign 
key in other tables)?

2) Suppose I have a lookup table with an autoincremented 
integer index column, used as a foreign key in a second table, and I want a 
stored procedure to insert data into a second table that uses the index from the 
first as a foreign key. Now, the stored procedure must:
a) check the name passed for the second column of the 
first table to see if it exists there, and if not insert it
b) whether the name provided for the second column had to 
be inserted or not, retrieve the index that corresponds to it
c) execute the insert into the second table using the 
index value retrieved from the first as the value for the foreign key column in 
the second table.
Doing all this in Java or C++ is trivial, and I have done 
so when using a database that didn't have stored procedures, but it isn't clear 
to me how to do this using only SQL inside a stored procedure.

I have just learned this morning that MySQL would allow 
the following inside a stored procedure:

INSERT INTO foo (auto,text) 
VALUES(NULL,'text'); 
# generate ID by inserting NULLINSERT INTO foo2 
(id,text) VALUES(LAST_INSERT_ID(),'text'); # use ID 
in second table

I have yet to figure out how to modify this to verify that 
'text' isn't already in foo, and return its index for use in foo2 if it is, but 
that's another question (I'd want the values in the second column in foo to be 
unique). But I am curious to know if Postgres has something equivalent to 
"LAST_INSERT_ID()". Can one embed the first insert above in an if/else block inside a 
stored procedure, placing the index in a variable that has scope local to the 
procedure, and use that variable in the second insert?

Thanks,

Ted

R.E. (Ted) Byers, Ph.D., Ed.D.R  D Decision 
Support Softwarehttp://www.randddecisionsupportsolutions.com/

Re: [GENERAL] Stored procedure

2005-12-22 Thread Ted Byers


Hi Jaime,

Thanks.

I'd suggest the manual be edited by an educator, since it is a little dense 
for someone coming to it for the first time.  Once I read your reply to me, 
and reread the manual, I understood.  But on first reading, it is a little 
too dense and short on examples.


Regarding serial:
 I now learned (well, late last might) in your documentation that 
Postgres has
additional types (serial, serial4, c.) that are integers that are 
autoincremented.


serial is not a type is a shorthand for integer with a default
expresion that retrives next value in a sequence...


I tried, therefore, to change these columns to type serial but

received an error

stating that type serial does not exist.


how did you try? what was the exact error you receive?


I tried:
ALTER TABLE People.addy ALTER COLUMN aid TYPE serial

and the error I received is:
ERROR:  type serial does not exist

I understand this now, but it seems pgAdmin creates the illusion serial can 
be treated like genuine types by including serial along with all the other 
types in the drop down list used to set type when creating a new column.


Regarding autoincrement:


INSERT INTO foo (auto,text)
VALUES(NULL,'text');  # generate ID by inserting NULL


and this of course is bad... if a insert NULL i want the NULL to be 
inserted.


In programming in C++, I often pass a value of null or 0 as an argument to a 
function; this is done to use the null value as a flag to control the 
behaviour of the function at a very fine degree of granularity.  This is a 
commonly used and powerful idiom in C++ programming.It is curious, though, 
that on thinking about this, I have not used this idiom nearly as much when 
I am programming in Java. I can't explain why.


SQL Standard way of doing things is ommiting the auto incremental fld at 
all


INSERT INTO foo (text) VALUES ('text');


INSERT INTO foo2 (id,text)
VALUES(LAST_INSERT_ID(),'text');  # use ID in second table



INSERT INTO foo2 (id, text) VALUES (currval('seq'), 'text')


On reading more about this, my inclination was to do as you did.  However, I 
would point out that the sample code I showed here was taken directly from 
the MySQL reference manual.  If it matters, I can provide the precise 
location in the manual.  I guess some folk disagree with you about how good 
or bad it is.  I'll reserve judgement until I have more experience working 
with databases.


Assuming I have set up a sequence called 'seq', and set the default value of 
id in foo to be nextval('seq'), then the following is getting close to what 
I need (there seems to be only one thing left - what do I replace the 
question mark with in order to get the id value from the initial select and 
pass it to the insert in the first block):


if exists(select id from foo where x = text) then
   INSERT INTO foo2 (foo_id, foo2_text) VALUES (?,more_text)
else
   INSERT INTO foo (text) VALUES ('text')
   INSERT INTO foo2 (foo_id, foo2_text) VALUES (currval('seq'), 
more_text)

end if;

The second block of the conditional statement looks like it ought to 
properly handle inserting new data into foo, autoincrementing id in foo and 
providing the value of id to the insert into foo2.  However, for the first 
block, there is no way to know where 'text' is located in the table, so it 
is necessary to get the value of id from the SQL statement used as the 
argument for exists() and pass it to the insert into foo2 (where the 
question mark is located).


Thanks for your time.

Ted 




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] To Postgres or not

2005-07-14 Thread Ted Slate


To everyone,

Thanks very much for your replies, and if these responses are any indication 
of Postgres quality then I'd be in good hands.


Ted

_
Dont just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[GENERAL] To Postgres or not

2005-07-13 Thread Ted Slate


Hello Everyone,

I considering moving a product to Solaris/Linux and trying to find a good 
DB.  Oracle is out due to cost so as far as i know the only reasonable 
alternatives are Postgres or Codebase or MySQL.  Does anyone here have any 
experience using Codebase or MySql?  If I stick with a true RDBMS then 
Codebase is out.  So that leaves Postgres and MySQL.  I'm very used to all 
the comforts of Oracle so I think Postgres stacks up better but maybe some 
veterans could shed some light.


In other words, is Postgres really that much better than MySQL and/or the 
other way around?


Also, in my searches I ran across an company called EnterpriseDB and another 
like it.  Basically they offer Postgres support.  So I'm a little concerned 
that I'm just robbing Peter to pay Paul.


Thanks

_
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [GENERAL] LISTEN/NOTIFY for lightweight replication

2004-10-13 Thread Ted Shab

Richard,

Thanks for the response.

I'll look into both the dblink and iirc.  

Do you know of any extended examples of either?

--Ted
--- Richard Huxton [EMAIL PROTECTED] wrote:

 Ted Shab wrote:
  Hi,
  
  I'm trying to come up with a relatively simple
  multi-master replication solution.  This is for
  multiple databases that need to be discreet, and
  change relatively infrequently (10-30 updates an
  hour), and almost never update each others data
 (less
  than once a day).  
  
  The TCL-based replication project for multi-master
 is
  troublesome to configure and seems to really
 impact
  performance.  It can be assumed that the
 master-slave
  setup will not work for me, nor do we want to
 purchase
  a commercial soluton, nor can we run this all from
 one
  central database.
 
  e.  If there is a field level conflict, raise an
  exception (TBD).
 
 Exception handling and failure recovery are what
 makes for all the work 
 in replication.
 
 I don't think a pure listen/notify setup will be
 enough because iirc the 
 system doesn't guarantee delivery of multiple
 notifications if 1 are 
 queued.
 
 Have you looked into the possibility of using dblink
 to handle updates 
 of each others' data? That would mean your problem
 reverting to one of 
 single-master replication.
 
 -- 
Richard Huxton
Archonet Ltd
 
 ---(end of
 broadcast)---
 TIP 9: the planner will ignore your desire to choose
 an index scan if your
   joining column's datatypes do not match
 




___
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [GENERAL] LISTEN/NOTIFY for lightweight replication

2004-10-13 Thread Ted Shab

Thanks.  I was thinking iirc was the transport
protocol :-)

Looks like dblink is the best bet here.

--Ted
--- Richard Huxton [EMAIL PROTECTED] wrote:

 Ted Shab wrote:
  Richard,
  
  Thanks for the response.
  
  I'll look into both the dblink and iirc.  
  
  Do you know of any extended examples of either?
 
 dblink is in the contrib/ folder of the source
 distribution and possibly 
 your packaged version if you use such a thing. Never
 needed it myself, 
 but the documentation looks clear enough.
 
 As for listen/notify possibly dropping duplicate
 notifications... Ah! 
 it's in the SQL COMMANDS reference part of the
 manuals
 
 NOTIFY behaves like Unix signals in one important
 respect: if the same 
 condition name is signaled multiple times in quick
 succession, 
 recipients may get only one notify event for several
 executions of 
 NOTIFY. So it is a bad idea to depend on the number
 of notifies 
 received. Instead, use NOTIFY to wake up
 applications that need to pay 
 attention to something, and use a database object
 (such as a sequence) 
 to keep track of what happened or how many times it
 happened.
 
 --
Richard Huxton
Archonet Ltd
 




___
Do you Yahoo!?
Declare Yourself - Register online to vote today!
http://vote.yahoo.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

1 2 >

1 - 100 of 109 matches

Mail list logo