[GENERAL] postgres user with password read-only user without?
Is there a way to configure authentication so that some users require a password and other users don't? Ted -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] 9.5rc1 RLS select policy on insert?
I see the insert policy check running but also the select policy using on insert. I don't understand why the select policy is being run. Could it possibly be related to using a sequence on the table? Ted -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] RLS 9.5rc1 configuration changes?
I had been using CrunchyDatas 9.4 with backported RLS but I decided since my ultimate target is 9.5 that I update to it. However now the function called for the SELECT policy is not being called. \dt shows the policy but EXPLAIN ANALYZE of a select doesn't show the filter. When I turn up debug in postghresql.conf in pgstartup.log I see the library loaded and the _PG_init function called and in the daily log I see the client auth function called each time I run psql. The only changes I made for 9.5 were to no longer set row_security to 'force' in postgresql.conf and to add: ALTER TABLE FORCE ROW LEVEL SECURITY; in addition to the: ALTER TABLE ENABLE ROW LEVEL SECURITY; which I was already doing as I want RLS to be used even for the owner of the table. Are there any other additional configuration changes needed to get RLS to work again? Ted -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] RLS 9.5rc1 configuration changes?
On Mon, Jan 4, 2016 at 4:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Ted Toth <txt...@gmail.com> writes: >> I had been using CrunchyDatas 9.4 with backported RLS but I decided >> since my ultimate target is 9.5 that I update to it. However now the >> function called for the SELECT policy is not being called. \dt shows >> the policy but EXPLAIN ANALYZE of a select doesn't show the filter. > > I'm not sure how Crunchy's 9.4 version behaves, but I'd expect the > policy USING condition to be visible in EXPLAIN in 9.5. > > Are you perhaps testing this as a superuser? Superusers bypass RLS > even with FORCE ROW LEVEL SECURITY. Yes I was a Superuser but without 'Bypass RLS'. So there's no way to enforce RLS for all users/roles? > >> ... The only >> changes I made for 9.5 were to no longer set row_security to 'force' >> in postgresql.conf > > What did you set it to instead? row_security=on. Maybe 'force' did what I wanted in Crunchy's 9.4 version :( > > regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] native api or odbc?
What are peoples experiences with either/both, pluses/minuses? Ted
[GENERAL] RLS policy issue
I'm work on understanding and implementing RLS. Since I work on systems using SELinux (MLS policy) I'm using the sepgsql module that I've modified slightly i.e. I've added a function named sepgsql_check_row_perm that I'm using in the policy for example I have a 'reports' table that looks like: Table public.reports Column | Type | Modifiers | Storage | Stats target | Description +-+--+--+--+- id | integer | not null default nextval('reports_id_seq'::regclass) | plain| | report | json| | extended | | message_id | integer | not null | plain| | location | geometry(Point) | | main | | security_label | text| default sepgsql_getcon() | extended | | Policies: POLICY check_report_delete_selinux FOR DELETE USING sepgsql_check_row_perm(security_label, sepgsql_getcon(), 'delete'::text) POLICY check_report_insert_selinux FOR INSERT WITH CHECK sepgsql_check_row_perm(security_label, sepgsql_getcon(), 'insert'::text) POLICY check_report_select_selinux FOR SELECT USING sepgsql_check_row_perm(sepgsql_getcon(), security_label, 'select'::text) POLICY check_report_update_selinux FOR UPDATE USING sepgsql_check_row_perm(security_label, sepgsql_getcon(), 'update'::text) WITH CHECK sepgsql_check_row_perm(security_label, sepgsql_getcon(), 'update'::text) When I do a select I expect sepgsql_check_row_perm to be called and at least output the elog message I added here's part of the patch I apply to add the sepgsql_check_row_perm funstion to the module: /* + * BOOL sepgsql_check_row_perm(TEXT, TEXT, TEXT) + * + * Check if perm allowed for tuple. + * This is a variant of sepgsql_avc_check_perms_label which allows the + * specifying of both the source and target contexts. For MLS + * (write up read down) dominance purposes in the case of + * INSERT/UPDATE/DELETE (write) the source is the tuples context + * and it must dominate the peers context however in the case of + * SELECT (read) the source is the peers context and it must dominate + * the tuples context. + */ +PG_FUNCTION_INFO_V1(sepgsql_check_row_perm); +Datum +sepgsql_check_row_perm(PG_FUNCTION_ARGS) +{ + const char *scontext; + const char *tcontext; + const char *perm_name; + access_vector_t av_perm; + + elog(DEBUG1, sepgsql_check_row_perm); I'd also expect that the rewrite would have added the POLICY SELECT USING clause to the query but I don't see any indication of that in the details that follow: 2015-05-21 16:59:39.030 CDT STATEMENT: select * from reports 2015-05-21 16:59:39.030 CDT LOG: rewritten parse tree: 2015-05-21 16:59:39.030 CDT DETAIL: ( {QUERY :commandType 1 :querySource 0 :canSetTag true :utilityStmt :resultRelation 0 :hasAggs false :hasWindowFuncs false :hasSubLinks false :hasDistinctOn false :hasRecursive false :hasModifyingCTE false :hasForUpdate false :hasRowSecurity true :cteList :rtable ( {RTE :alias :eref {ALIAS :aliasname reports :colnames (id report message_id location security_label) } :rtekind 0 :relid 19116 :relkind r :lateral false :inh true :inFromCl true :requiredPerms 2 :checkAsUser 0 :selectedCols (b 9 10 11 12 13) :modifiedCols (b) :securityQuals } ) :jointree {FROMEXPR :fromlist ( {RANGETBLREF :rtindex 1 } ) :quals } :targetList ( {TARGETENTRY :expr {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 7 } :resno 1 :resname id :ressortgroupref 0 :resorigtbl 19116 :resorigcol 1 :resjunk false } {TARGETENTRY :expr {VAR :varno 1 :varattno 2 :vartype 114 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 2 :location 7 } :resno 2 :resname report :ressortgroupref 0 :resorigtbl 19116
[GENERAL] 9.5 RLS 'check policy' function arguments
I'm contemplating writing a function for use with the CHECK POLICY statement. Where can I find documentation describing the arguments that will be passed to the function? Ted -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] sepgsql where are the security labels
This table maintains information about the context of postgresql objects not the data in tables. On Wed, Nov 12, 2014 at 5:56 PM, Adrian Klaver adrian.kla...@aklaver.com wrote: On 11/12/2014 02:45 PM, Ted Toth wrote: I'm running selinux mls policy I've got labeled ipsec working and my postgresql configured to load sepgsql. I've created a db, run the sepgsql.sql script on it, created tables and inserted data. How do I query the security labels on the data? I do not use SECURITY LABELS, but it seems they can be queried here: http://www.postgresql.org/docs/9.3/interactive/view-pg-seclabels.html As best I can tell there is no security_context column on either of the tables I've created that I see? How does the system column security_context get added to tables? I've read everything I can find on the web but a lot of it is dated. Here's how I'm creating my db and tables: -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] sepgsql where are the security labels
Exactly what I talking about ... but unfortunately that appears to have been based on KaiGai's branch and is not in 9.3. The current discuss/work is around row-level-security with patches to 9.5 which is not much help to me now :( On Thu, Nov 13, 2014 at 9:26 AM, Adrian Klaver adrian.kla...@aklaver.com wrote: On 11/13/2014 05:58 AM, Ted Toth wrote: This table maintains information about the context of postgresql objects not the data in tables. http://www.slideshare.net/kaigai/label-based-mandatory-access-control-on-postgresql Slide 23 On Wed, Nov 12, 2014 at 5:56 PM, Adrian Klaver adrian.kla...@aklaver.com wrote: On 11/12/2014 02:45 PM, Ted Toth wrote: I'm running selinux mls policy I've got labeled ipsec working and my postgresql configured to load sepgsql. I've created a db, run the sepgsql.sql script on it, created tables and inserted data. How do I query the security labels on the data? I do not use SECURITY LABELS, but it seems they can be queried here: http://www.postgresql.org/docs/9.3/interactive/view-pg-seclabels.html As best I can tell there is no security_context column on either of the tables I've created that I see? How does the system column security_context get added to tables? I've read everything I can find on the web but a lot of it is dated. Here's how I'm creating my db and tables: -- Adrian Klaver adrian.kla...@aklaver.com -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] sepgsql where are the security labels
I'm running selinux mls policy I've got labeled ipsec working and my postgresql configured to load sepgsql. I've created a db, run the sepgsql.sql script on it, created tables and inserted data. How do I query the security labels on the data? As best I can tell there is no security_context column on either of the tables I've created that I see? How does the system column security_context get added to tables? I've read everything I can find on the web but a lot of it is dated. Here's how I'm creating my db and tables: CREATE DATABASE contacts WITH OWNER = jcdx ENCODING = 'UTF8' TABLESPACE = pg_default LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8' CONNECTION LIMIT = -1; SECURITY LABEL FOR selinux ON DATABASE contacts IS 'user_u:object_r:sepgsql_db_t: s0'; -- -- PostgreSQL database dump -- SET statement_timeout = 0; SET lock_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SET check_function_bodies = false; SET client_min_messages = warning; -- -- Name: plpgsql; Type: EXTENSION; Schema: -; Owner: -- CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog; -- -- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner: -- COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language'; -- -- Name: postgis; Type: EXTENSION; Schema: -; Owner: -- CREATE EXTENSION IF NOT EXISTS postgis WITH SCHEMA public; -- -- Name: EXTENSION postgis; Type: COMMENT; Schema: -; Owner: -- COMMENT ON EXTENSION postgis IS 'PostGIS geometry, geography, and raster spatial types and functions'; -- -- Name: pgrouting; Type: EXTENSION; Schema: -; Owner: -- CREATE EXTENSION IF NOT EXISTS pgrouting WITH SCHEMA public; -- -- Name: EXTENSION pgrouting; Type: COMMENT; Schema: -; Owner: -- COMMENT ON EXTENSION pgrouting IS 'pgRouting Extension'; SET search_path = public, pg_catalog; SET default_tablespace = ''; SET default_with_oids = false; -- -- Name: messages; Type: TABLE; Schema: public; Owner: jcdx; Tablespace: -- CREATE TABLE messages ( id integer NOT NULL, message json ); SECURITY LABEL FOR selinux ON TABLE messages IS 'user_u:object_r:sepgsql_table_t:s0'; ALTER TABLE public.messages OWNER TO jcdx; -- -- Name: messages_id_seq; Type: SEQUENCE; Schema: public; Owner: jcdx -- CREATE SEQUENCE messages_id_seq START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE CACHE 1; SECURITY LABEL FOR selinux ON SEQUENCE messages_id_seq IS 'user_u:object_r:sepgsql_seq_t:s0'; ALTER TABLE public.messages_id_seq OWNER TO jcdx; -- -- Name: messages_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: jcdx -- ALTER SEQUENCE messages_id_seq OWNED BY messages.id; -- -- Name: reports; Type: TABLE; Schema: public; Owner: jcdx; Tablespace: -- CREATE TABLE reports ( id integer NOT NULL, report json, message_id integer NOT NULL, location geometry(Point) ); SECURITY LABEL FOR selinux ON TABLE reports IS 'user_u:object_r:sepgsql_table_t:s0'; ALTER TABLE public.reports OWNER TO jcdx; -- -- Name: reports_id_seq; Type: SEQUENCE; Schema: public; Owner: jcdx -- CREATE SEQUENCE reports_id_seq START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE CACHE 1; SECURITY LABEL FOR selinux ON SEQUENCE reports_id_seq IS 'user_u:object_r:sepgsql_seq_t:s0'; ALTER TABLE public.reports_id_seq OWNER TO jcdx; -- -- Name: reports_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: jcdx -- ALTER SEQUENCE reports_id_seq OWNED BY reports.id; -- -- Name: id; Type: DEFAULT; Schema: public; Owner: jcdx -- ALTER TABLE ONLY messages ALTER COLUMN id SET DEFAULT nextval('messages_id_seq'::regclass); -- -- Name: id; Type: DEFAULT; Schema: public; Owner: jcdx -- ALTER TABLE ONLY reports ALTER COLUMN id SET DEFAULT nextval('reports_id_seq'::regclass); -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] does postgresql works on distributed systems?
Microsoft sql server 2005 cannot be shared on multiple systems i,e in a network environment when it is installed in one system it cannot be accessed one other systems. Nonsense! Where did you get this stuff? I have even played with MS SQL Server 2005 Express, and it is not crippled in the way you describe. I am not a big fan of MS, but I have worked in shops where we used MS SQL Server 2005, and once the DB was set up, we could access it from anywhere. Since I often develop for it, I even have this capability, with MS SQL Server 2005, set up on the little LAN in my home office. One can access only from a system where it is already installed but not on the system where there is no sqlserver.Is postgresql similar to sql server or does it supports network sharing i,e one one can access postgresql from any system irrespective on which system it is installed. You can do this with any RDBMS I have seen. A RDBMS is of little commercial utility if you can't access it from other machines in a network. Mind you, I have worked with systems where the RDBMS was configured to respond only to apps on localhost, or a specific IP on the LAN, with access to the DB mediated through middleware. You should probably look at a) how your server is configured and b) how your client is configured (including whether or not you actually have client software on your client machine). Cheers Ted -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] No Return??
--- Bob Pawley [EMAIL PROTECTED] wrote: I'm getting a little frustrated with this problem. Can anyone tell me what is wrong with the following code. I have tested the portions separately and they all work. When I try it as a whole I get the message control reached end of trigger procedure without RETURN. Any help greatly appreciated. Bob Declare pumpnumber integer; Begin Select count(*) Into pumpnumber From p_id.devices, p_id.processes Where device_number = '11' and p_id.devices.fluid_id = p_id.processes.fluid_id and p_id.processes.ip_op_equipment = 'op'; If pumpnumber = 1 then Update p_id.devices Set number = '#1' From p_id.processes Where p_id.devices.number is null and p_id.devices.device_number = '11' and p_id.devices.fluid_id = p_id.processes.fluid_id and p_id.processes.ip_op_equipment = 'op' ; Else If pumpnumber = 2 Then Update p_id.devices Set number = '#2' From p_id.processes Where p_id.devices.number is null and p_id.devices.device_number = '11' and p_id.devices.fluid_id = p_id.processes.fluid_id and p_id.processes.ip_op_equipment = 'op' ; End If; RETURN NULL; End If; END; I have tried 'Return New' and 'Return Result' without luck, and if I leave off either of the two 'End If ' statements the procedure returns an error. Look at your flow control! Your return is within a conditional block. If the condition for your first returns false, flow goes to the very end of the function and reaches end without encountering a return statement. Cheers, Ted -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] (FAQ?) JOIN condition - 'WHERE NULL = NULL'
--- Martijn van Oosterhout [EMAIL PROTECTED] wrote: 'transform_null_equals' won't help you at all here since it only help in the very specific case of comparing with a constant. The easiest is to think of NULL as meaning 'unknown'. Clearly you can't compare that usefully with anything. Not even a null in another record ... (hence my question below). If the value is unknown, then it could be anything, and (thinking as a mathematician considering real numbers) the probability of two records having null having their true, but unknown values be the same is indistinguishable from 0. (with integers or decimal numbers or floating point numbers, that would be qualified with the clause, for practical purposes :) Perhaps you can use a marker like -1 to achieve the effect you want? Is that really valid, though, especially in a join? I mean, if the column in question has multiple nulls, in each of the tables, then how do you, of the DB, figure out which of the rows containing nulls in the one table match up with rows in the other table containing nulls? Or is the resultset supposed to be the product of the two sets (match each row with a null in the one table with each row with a null in the other)? That, for me, creates a nightmare situation where some of my tables have tens of millions of rows, and if even 1% of the rows contains null in the relevant column, I don't even want to think about processing the resultset that would be produced from such an idea using these tables. My joins always only involve primary keys, or indeces on columns that prohibit nulls, so this problem doesn't crop up in my code, but would I be not too far from right in expecting that the rational thing to do when creating a join on columns that allow nulls is to exclude ALL rows, in either table, where the columns involved are null? Cheers, Ted -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] (FAQ?) JOIN condition - 'WHERE NULL = NULL'
--- Ian Sillitoe [EMAIL PROTECTED] wrote: I completely take your points - so maybe I should be asking for advice on database design instead. We are annotating nodes on a hierarchical structure where NULL implied an I don't mean to be rude, but yuck. Why provide a record for data that isn't there? I recently put together a database (at present at a very early prototype stage) to handle biological data. Considering ONLY the taxonomic portion of it, I opted for a general hierarchical model. Maybe not the most efficient, yet, but no waste, yet. In what is an over simplification, I created a taxon table, with columns for a unique ID number, taxonomic level (species, genus, c. with all the glorious subcategories taxonomists of varius tripes are wont to create/define). The taxonomic levels are predefined (taken from my references that deal with such matters), in a lookup table. Then, I have columns to hold parent taxon ID number. Of course, there is, in a middle layer, constraints that prevents recording a species as a parent of a genus, and other silliness (no linking a species epithet directly to a class or order). But you get the idea. An object oriented programming metaphore might be that of a singly linked list. And of course, I have deliberately obfuscated the complexity arising from having to handle synonyms both usefully and gracefully, but the core idea is simple, and there are no nulls, except for taxa representing a whole kingdom. Last I checked, there were no taxa more general than the kingdom, and there's only a handful of kingdoms. If you don't have data on subclass or superfamily or subspecies, you just don't put it in. Therefore no nulls! I have no idea if this model would work for you, but maybe it will help. Cheers, Ted -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] arrays of floating point numbers / linear algebra operations into the DB
--- Webb Sprague [EMAIL PROTECTED] wrote: ...linear algebra ... ... matrices and vectors . ...Especially if some GIST or similar index could efficiently search for vectors close to other vectors... I see a potential problem here, in terms of how one defines close or similitude. I think, though, practical answers can be found in examples of applying quantitative methods in some subdisciplines of biology. Hmm. If I get some more interest on this list (I need just one LAPACK / BLAS hacker...), I will apply for a pgFoundry project and appoint myself head of the peanut gallery... Someone pointed to the potential utility of pl/R. I would be interested at least in learning about your assessment of the two (postgis and pl/r. Alas, I don't have decent date I could use to experiment with either (except possibly for time series analysis, which is a completely different kettle of fish. and deal with a big database doing lots of similarity-based searches (a 6'2 guy with light brown hair being similar to a 6'1 guy with dark blond hair) - and am experimenting with modeling some of the data as vectors in postgres. Well, I bet a good linear algebra library would help. A lot. :) If you're looking at similarity, and some practicality in the USE of quantitative procedures, you may want to look into the biogeography and numerical taxonomy literature, and to a lesser extent quantitative plant ecology. All three subdisciplines of biology have decades of experience, and copious literature, looking at similarity measures, and in my experience much more practical or pragmatic than the 'pure' biostatistics literature, and infinitely more practical than any theoretical statistical or mathematical literature I have seen (trust me, I have a bookcase full of this stuff). A good linear algebra library would be useful, but there are a lot of nonlinear analyses that would be of interest; and there are nonparametric, yet quantitative approaches that are of considerable interest in assessing similarity. I don't know of work looking at applying things like discriminant functions analysis or cluster analysis or any of the many ordination analyses that may be considered to searches in a database, but then I haven't looked at the question since I graduated. I am interested in the question, though, and would be interested in hearing about your experience on the question. If I can manage the time, I hope to start a project where I can store description data for specimens of plants and animals, use analyses including but not limited to ordination, clustering, discriminant functions, cannonical correlation, to create a structure for comparing them, and for identifying new specimens, or at a minimum, if the specimen is truly something unknown, learn what known specimens or groups thereof it is most similar to, and how it is different. I have managed to install pl/r, but I haven't had the time to figure out how best to analyze data stored in the database using it. In the data I Do have, it changes daily, and some of the tables are well over 100MB, so I am a bit worried about how well it can handle such an amount of data, and how long it would take. Cheers, Ted ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Sun acquires MySQL
--- Bill Moran [EMAIL PROTECTED] wrote: In response to dvanatta [EMAIL PROTECTED]: What's up with 3 of the 7 being from Pennsylvania? What's the connection? Well, as everyone knows, Pennsylvania is a haven for brilliant people. In fact, simply living in Pennsylvania makes you smarter. Does it count if I lived there for a year many many years ago? ;-) Ted ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] logging arguments to prepared statements?
--- rihad [EMAIL PROTECTED] wrote: Dec 18 15:49:41 myhost postgres[29832]: [35-1] ERROR: 23505: duplicate key value violates unique constraint foo_key Dec 18 15:49:41 myhost postgres[29832]: [35-4] INSERT INTO foo Dec 18 15:49:41 myhost postgres[29832]: [35-5] (a,b,c) Dec 18 15:49:41 myhost postgres[29832]: [35-7] VALUES ($1,$2,$3) Dec 18 15:49:41 myhost postgres[29832]: [35-8] And that's it, leaving me wondering which value triggered the error. Any Why? It seems simple enough. You have a table called foo, with at least three columns: a, b, and c. And you have a violation of your unique constraint. If it isn't that simple, you have left out useful information. You did not say, for example, which of your columns, if any, are involved in your unique constraint. If the answer to that is none, then you need to show how the constraint is defined. Which of the three columns are involved in a unique constraint? If none of the columns you use are involved in a unique constraint, there must be other columns that are, and that would imply that there is either a problem with your prepared statement, ignoring certain columns that can't be ignored, or a problem with how you set up the default values for another column that is involved in a unique constraint; or the table has grown so big that it is impossible to add a new record without violating the existing unique constraint (unlikely as that is in most cases, especially during development). I could see creating a before insert trigger that stores the values to be inserted in a log table with a timestamp, but I don't see the profit in that. Doesn't such an error generate a SQL exception to your client? If so, the client code will know immediately what insert attempt failed, and therefore what values are involved in the problem. Using JDBC, for example, all of the JDBC functions that execute a prepared statement (or any other SQL) will throw a java.sql.SQLException. One therefore knows immediately when there is a problem of the sort you describe, and so you can determine quickly what the values were that resulting in your error. If need be, that could be stored in your application's log. If one needed full audit functionality, one could create the tables to store the details of every SQL statement, including who is responsible for the statement and a timestamp. But if you don't need to support that kind of detailed audit, why bother when there are easier ways to address your issue? HTH Ted ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Efficiency vs. code bloat for SELECT wrappers
--- Colin Wetherbee [EMAIL PROTECTED] wrote: Sam Mason wrote: On Sun, Dec 16, 2007 at 06:31:56PM -0500, Colin Wetherbee wrote: If I write one Perl sub for each operation on the table (e.g. one that gets the username and password hash, another that gets the last name and first name, etc.), there will be a whole lot of subs, each of which performs one very specific task. Right. First rule of software engineering is keep functions as small as possible, focussed on one thing wherever practicable. It doesn't matter if the language is Perl or C++ or Java, or a stored procedure in an RDBMS. One can always create additional driver functions that use the elemental simple functions to do more complex tasks (bearing in mind the complexities that will inevitably arise in multiple user situations). If I write one larger Perl sub that grabs the whole row, and then I deal with the contents of the row in Perl, ignoring columns as I please, it will require fewer subs and, in turn, imply cleaner code. Define cleaner code. The more data, and the more complex that data, the more code you have to write, regardless of whether that is in one function or several. Either way, done badly, can be a problem for both maintenance and performance. It sounds as though you're just treating the database as a relatively dumb datastore. They can be used as this, and Toms comments are as always good, but relational databases come into their own when you're writing more complex queries. When I'm writing my code I tend to put the SQL statements directly in with the rest of the code, abstracting away from the database tends to make things more complicated than they need to be. Because I know Perl a whole lot better than SQL, PostgreSQL, and even the Perl DBI, I'm always inclined to wrap the database stuff in a nice little package and forget about it. This method has worked well for me in the past, but the project I'm starting is much bigger in terms of database use than anything else I've written. I routinely keep my SQL code distinct from my Perl, java or C++ code. When a client program needs to do something with the database, then either a child process executes a script I have written, if the client program doesn't need to do anything with data drawn from the database, or I have all the SQL code in one or more stored procedures, and use the appropriate client interface to invoke the stored procedure(s). Whether the SQL is in a specific script or in a stored procedure, my SQL code is kept distinct from the client code, regardles of the language I have used for that. I find this even MORE useful as my projects get bigger. You say you write SQL directly in your application code, rather than writing task-oriented wrappers. I like that idea, but I can see that getting really messy. What is your opinion on how it affects code maintenance and things like that? From what I have seen, even in small, almost trivial, client programs, I find this gets messy real quick. I, therefore, hate the idea of mixing SQL in with client code (it is all, in my view, application code). I like the use of middleware objects since, if well designed, they can make developing the code required all the simpler. It is only when badly done that an abstraction leads to complicated code that is a nightmare to maintain; worse if it is inadequately documented. The whole purpose of abstraction, whether one is using objected oriented development of middleware, or generic programming, or structured programming, or functional programming, is to analyze a complex problem into more manageable parts. The parts themselves become easier to code (witness java beans of various kinds, such as backing beans - my beans get their data either from the database or from the user interface - in either case, they make interaction between a web based interface and the database back end MUCH simpler to code), and the relationships among the parts are easier to understand. Each kind of abstraction has its place. It is up to the analyst or architect to figure out how many layers and what abstractions are appropriate for a given project. HTH Ted ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Efficiency vs. code bloat for SELECT wrappers
--- Sam Mason [EMAIL PROTECTED] wrote: On Mon, Dec 17, 2007 at 01:29:10PM -0500, Ted Byers wrote: I routinely keep my SQL code distinct from my Perl, java or C++ code. When a client program needs to do something with the database, then either a child process executes a script I have written, if the client program doesn't need to do anything with data drawn from the database, or I have all the SQL code in one or more stored procedures, and use the appropriate client interface to invoke the stored procedure(s). Whether the SQL is in a specific script or in a stored procedure, my SQL code is kept distinct from the client code, regardles of the language I have used for that. I find this even MORE useful as my projects get bigger. Humm, this discussion is horribly domain specific. Abstractions are the bread and butter of programming and designing them appropriately makes the difference between things turning into a mess later on or not. The only solid rules I stick to is that when I'm working to solve a problem I've not solved before, I will get it (i.e. the way I go about solving the problem, getting the right answer is generally pretty easy) wrong the first time and probably only get it reasonable the third or fourth time. Right. But there is a quicker way. I like to work with people who have experience that is different from mine, expecting they will have seen a different suite of problems and found solutions I have come to rely on. In that way, in discussing my current problems, I can learn something new, and that much faster than trial and error. Of course, there are always test or proof of concept programs, where I test ideas. For example, I have routinely been advised that left joins are faster than correlated subqueries. In the past month, I have found two problems requiring either a correlated subquery or a left join, and in one case the correlated subquery was more than ten times faster while in the other the left join was about twice as fast. In both cases, the results returned by the two approaches were identical, but there were significant differences in performance; most astonishing in the one case that proved to be quite different than expected. Each kind of abstraction has its place. It is up to the analyst or architect to figure out how many layers and what abstractions are appropriate for a given project. Indeed. But to be able to choose a suitable set of abstractions, it helps for the designer to know the ins and outs of the tools being used. I was trying to encourage Colin to look at using databases in a different way, different styles of programming suit different applications and hiding the database can have detrimental effects as well as positive effects. I find the opportunity to look at problems in a different light priceless. But some practices generally turn out to be counter productive. Producing overly long functions, or mixing code involving different languages into the same file, often leads to an unmaintainable mess. More often than not, such practices are a consequence of poor design. But even here, there is no hard and fast rule, since some algorithms, such as numeric quadrature or QR factorization of general real matrices can not be written in a short, simple function although they logically ought to be one function in the simplest cases; but even these admit enhancements that warrant more interesting data structures and ancillary functions. But for these algorithms, which admittedly have nothing to do with databases, the top consideration is the speed of provably correct code. If that means it is too complex for junior or even intermediate programmers, so be it. I have seen and used library code for number crunching that only a specialist in numeric methods have a hope of easily understanding, and been glad I didn't have to write those functions myself. ;-) For Colin's situation, what I would have suggested would be a simple extension of what he normally does, and that is look at an additional layer that gets the data from the database as efficiently as possible, and cache it in that layer, providing it to the UI layer as required. Similarly, perhaps in a different layer, get user data from the interface and cache it, sending it to the database back end at the most appropriate time in the most efficient way. And perhaps with all this, some form of database connection pooling; although this latter option will depend on how many concurrent users there may be in the worse case scenario (or maybe that is the best case scenario since it would mean that there is significant demand for what the app does ;-). I haven't attempted this in Perl, but it is dirt simple in Java/J2EE. In a web application, based, e.g. on Tomcat, it is easy to set this up with as many layers as one can justify, especially with an IDE like NetBeans6. Not having thought about how to do
Re: [GENERAL] Simpler dump?
Thanks Uwe This is a great start. It reduces the dump from 2 MB down to 167K, but out of 6833 lines of SQL, 5744 relate to the public schema in the DB, and I didn't touch that. It has over a dozen types, 419 functions, c., that were put there by postgresql the moment I created the database. I'd expect the same stuff to be there the moment I issue the create database directive on the host machine, so all I really want is the dozen sequences, two dozen tables, and the suite of constraints I created, all in the schema specific to my new DB. Is there a reason pg_dump dumps the stuff in public even though that stuff seems to be created, and therefore present, in every database I create on a given server instance? Isn't that duplication a waste of space, and it's presence in the dump a waste of CPU cycles? Thanks again. Ted --- Uwe C. Schroeder [EMAIL PROTECTED] wrote: pg_dump -x -O -s [databasename] outfile.sql HTH Uwe ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Simpler dump?
--- Tom Lane [EMAIL PROTECTED] wrote: Ted Byers [EMAIL PROTECTED] writes: Is there a way to tell pg_dump to just dump the SQL statements required to create the tables, sequences, indeces, keys, c.? pg_dump -s ? Thanks Tom I DON'T need to restore or recreate things like users, or most other kinds of DB objects. Just routine DDL statements. pg_dump doesn't try to recreate users, and to most people DDL would include the creation commands for any type of DB object whatsoever. Your demarcation between stuff you want and stuff you don't seems far too vaguely stated. Sorry, I just wanted the statements I need to recreate the tables, sequences and constraints I created. When I create a database, I leave the default public schema alone. The tables, etc. I add are placed in a separate schema. It seems the public schema is automagically created by Postgres every time I create a new database on a given server, and it has over a dozen types, over 400 functions, c. I don't really understand why it needs to be duplicated in every Db on a server, but that is another matter. In my database, so far I have only created a dozen sequences and two dozen tables, along with a suite of constraints. So when I look at the dump file, out of 6833 lines of SQL, the first 5744 relate to the default public schema and only the last 1100 relate to DB objects I'd created. I'd assume that the public schema would be created with the usual stuff when I create the database name on the production host anyway, so why recreate all that in the dump file? Thanks again, Ted ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Simpler dump?
--- Tom Lane [EMAIL PROTECTED] wrote: Ted Byers [EMAIL PROTECTED] writes: It seems the public schema is automagically created by Postgres every time I create a new database on a given server, and it has over a dozen types, over 400 functions, c. I don't really understand why it needs to be duplicated in every Db on a server, but that is another matter. What it sounds like to me is that you've managed to clutter your template1 with a lot of stuff you don't actually want, and that's getting propagated into new databases by CREATE DATABASE. If so, it's not pg_dump's fault --- you need to clean out template1. Thanks Tom, Where will I find template1? When I look at the databases on the server, the only template I see is called template_postgis. Most of the extra stuff I see in all my databases relates to geometry that I find in this template. When I installed postgresql, I enabled postgis because I need it for some, but not all, of my databases. Is it possible to have more than one template, and to specify which template to use when creating a new DB? Thanks again, Ted ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Simpler dump?
--- Richard Huxton [EMAIL PROTECTED] wrote: Tino Wildenhain wrote: Hi Ted, Ted Byers wrote: Thanks Uwe This is a great start. It reduces the dump from 2 MB down to 167K, but out of 6833 lines of SQL, 5744 relate to the public schema in the DB, and I didn't touch that. It has over a dozen types, 419 functions, c., that were put there by postgresql the moment I created the database. Well thats usually not the case unless you changed the default database per accident. The database is called template1 See the manuals regarding CREATE DATABASE for details. Thanks Richard. To date, I never worried about templates for my databases. I just created them, and when I needed to deploy them, I dumped them, put the dump file on a memoery stick and carried it physically to the production server and restored there. Anyway, the extra stuff i don't need for this specific database appears to be for postgis, which I'd enabled when I installed because I need gis capability for another database. But it seems to be putting gis support in all of the databases I created. I'd thought that, by enabling it, I'd be able to turn it on when I needed it. Amyway, when I look at the server using pgadmin, I don't see either template0 or template1. I see only template_postgis. Should I be able to see template0 and template1 among the databases on the server, or are they normally hidden to minimise the chances of getting them screwed up. At this stage, how can I get a template I can use by default that doesn't include the postgis stuff, so I can use the template_postgis only when I need it? Thanks Ted ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Simpler dump?
OK, Now I am a bit confused. The manual doesn't say what to do if you don't see template1 or template0 on the server, or even whether or not you should be able to see them when using a tool like pgAdmin. But it does say: The postgres database is also created when a database cluster is initialized. This database is meant as a default database for users and applications to connect to. It is simply a copy of template1 and may be dropped and recreated if required. Now, when I look at postgres, it is empty, apart from one public schema, and all the items (aggregates c.) have nothing in them. The manual talks about creating a template from an existing database, but not about how to create a template from a template, apart from :template1 is the default source database name for CREATE DATABASE. For example, one could drop template1 and recreate it from template0 without any ill effects. Obviously, template1 must have been either blown away by the creation of template_postgis, or the one has been made to be an alias for the other, or the default template has been set to use template_postgis. And if postgres is a copy of template0, and template1 starts as a copy of template0, and all my databases have included the postgis stuff, then template_postgis is the template being used by default. So, how do I determine whether or not template1 really exists on my server and is a copy of template0 (as I'd infer from what I see in postgres) rather than template_postgis, and then modify things so that the default is the normal template1 rather than template_postgis, but leaving the latter in place so I can use it when I need it? Thanks Ted ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Simpler dump?
Thanks Richard. --- Richard Huxton [EMAIL PROTECTED] wrote: Ted Byers wrote: Amyway, when I look at the server using pgadmin, I don't see either template0 or template1. I see only template_postgis. Should I be able to see template0 and template1 among the databases on the server, or are they normally hidden to minimise the chances of getting them screwed up. There'll be an option in pgadmin somewhere to show them. Not sure where I'm afraid, I mostly use the command-line. Found it. Under the view menu, the last item is System object. That now shows the template0 and template1. It won't let me examine the contents of template0, BUT ... It looks like template1 is a copy of template_postgis. I see the same stuff in the two. At this stage, how can I get a template I can use by default that doesn't include the postgis stuff, so I can use the template_postgis only when I need it? You can just drop template1 and re-create it using template0 (which is read-only) as it's template. Check the docs for CREATE DATABASE and google a little for examples. OK. A worry. How is template_postgis constructed? Is it just a handy reference to template1? Or does it exist independantly? I don't want to be dropping template1 only to find that breaking template_postgis. A metaphore might explain the origin of my worry. I use both C++ and Java. In C++, copy/assignment gives you two different objects with the same data and structure. Assignment in Java gives you two references to the same object, so changes using the one reference will be seen when examining the other (this is a common gotcha for beginning developers using both C++ and Java - the way to get C++ functionality inJava is to use operator new and the class' copy constructor). So, in adding postgis support, is the template_postgis creation more analogous to assignment in C++ or to assignment in Java? Thanks Ted ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Simpler dump?
Thanks All. I learned plenty this morning. --- Douglas McNaught [EMAIL PROTECTED] wrote: On 12/10/07, Ted Byers [EMAIL PROTECTED] wrote: OK. A worry. How is template_postgis constructed? Is it just a handy reference to template1? Or does it exist independantly? I don't want to be dropping template1 only to find that breaking template_postgis. All databases are separate entities--the data copy only happens at creation time, based on the template you specify. The 'template_postgis' DB is not a standard PG database, so either it's created by the PostGIS install, or someone at your site set it up. It sounds like the thing for you to do is drop template1 (which will have no effect on template_postgis), create it again from template0, and use template_postgis when you need it (otherwise template1 will be used by default for new databases). Great! Thanks for this. I appreciate it. Ted ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] SQL design pattern for a delta trigger?
Thanks all. I tried the appended code in a trigger function, but postgresql won't take it. It complains that assets.quantity is not a scalar. However, the WHERE clause in that select statement guarantees that at most only one record will be returned. An open position on a given kind of asset is represented by null in the end_valid_time field, and the combination of asset_type_id, portfolio_id and end_valid_time is certain to be unique, if there is a record for that asset type in that porfolio at all. I thought I'd try checking for an open position first because the manual indicated that exception handling is quite expensive. But I must have missed something, because it doesn't like how I tried to define my trigger function. I have four sequences, one each for four tables. Two of the tables are just look up tables, for asset types and portfolios; trivial for test case with only an autoincrementing integer primary key and a name. The other two are the ones of interest. Assets is treated as read only as far as the user is concerned. The user's data in the assets table is mediated through transactions inserted (and NEVER deleted or updated), into the transactions table. Assets has the minimal suite of columns (autoincrementing integer primary key, asset_typeID, portfolio_id, all integers, quantity with is a floating point number and two dates: start_valid_time and end_valid_time). Transactions has only a transaction_id, portfolio_id, asset_type_id, quantity and transaction_date. There are of course foreign keys connectin the assets and transactions tables to the lookup tables, and a composite index on assets to make looking up records based on portfolio_id, asset_id and end_valid_time as quick as possible. It couldn't be simpler, conceptually! yet I must have missed something, cause postgresql won't accept the function body I show below. If I can't get this working quickly, I may just resort to creating a stored procedure that takes the transaction details as arguments and processes both tables appropriately without relying on a trigger. :-( Thanks for everyone's help. Ted === DECLARE id BIGINT; q DOUBLE PRECISION; BEGIN SELECT assets.id INTO id, assets.quantity INTO q FROM assets WHERE assets.asset_type_id = NEW.asset_type_id AND assets.portfolio_id = NEW.portfolio_id AND assets.end_valid_time IS NULL; IF (id IS NULL) THEN INSERT INTO assets (asset_type_id, portfolio_id,quantity,start_valid_stime,end_valid_time) VALUES (NEW.asset_type_id,NEW.portfolio_id, NEW.quantity, NEW.transaction_date,NULL); ELSE UPDATE assets SET end_valid_time = NEW.transaction_date WHERE id = id; INSERT INTO assets (asset_type_id, portfolio_id,quantity,start_valid_stime,end_valid_time) VALUES (NEW.asset_type_id,NEW.portfolio_id, q + NEW.quantity, NEW.transaction_date,NULL); END END ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SQL design pattern for a delta trigger?
--- Vivek Khera [EMAIL PROTECTED] wrote: On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote: For what it's worth, the real algorithm would be as follows. I hadn't had enough coffee yet, and I forgot the UPDATE bit. IF (a query matching your old data returns rows) THEN UPDATE with your new data ELSE INSERT your new data Still exists race condition. Your race comes from testing existence, then creating/modifying data afterwards. You need to make the test/ set atomic else you have race. Yes, but how do you do that in a stored function or procedure or in a trigger. It would be obvious to me if I were writing this in C++ or Java, but how do you do it using SQL in an RDBMS? I saw something about table locks, but that doesn't seem wise, WRT performance. The classic example of a race condition, involving a bank account, was used in the manual to introduce the idea of a transaction, but we can't use a transaction in a trigger, can we? It is one thing to point out a race condition, but a pointer to a solution that would work in the context of the problem at hand would be useful and appreciated. Thanks all. Ted ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] SQL design pattern for a delta trigger?
Thanks Erik In a stored procedure you'd just execute the UPDATE and then check the FOUND variable to see if it found a row to update: UPDATE table_name SET foo='bar' WHERE id=5; IF NOT FOUND THEN INSERT INTO table_name (id, foo) VALUES (5, 'bar'); END IF; To be clear, if I understand you correctly, with your example, if there is no record where id=5, nothing happens except FOUND is set to false? Can I, then, declare a variable prior to your update statement, and then modify your update statement so that the value in a particular field on the row where id=5 can be captured? Bearing in mind this is to be in a row level trigger after an insert into table_name, something like: DECLARE q DOUBLE; UPDATE table_name SET foo='bar', q = table_name.quantity WHERE id=5; And then follow that with something like: IF FOUND THEN INSERT INTO another_table (baz,quantity) VALUES (foo,q+NEW.quantity); ELSE INSERT INTO another_table (baz,quantity) VALUES (foo,NEW.quantity); END IF Thanks again, Ted ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Simpler dump?
Is there a way to tell pg_dump to just dump the SQL statements required to create the tables, sequences, indeces, keys, c.? I DON'T need to restore or recreate things like users, or most other kinds of DB objects. Just routine DDL statements. Looking through a dump file for a small database, it looks like pg_dump is serious overkill, dumping a lot of stuff I don't need (since I'm just using defaults for them anyway). I am developing a new DB app, to be deployed on a web based host on the other side of the planet. There is, at present, no 'data', and the only information to be transferred consists of the various tables, indeces, c. I am creating. Obviously, we don't want to put any of my test data on a server that will in due course be the production host, when the app goes live (so once my colleagues on the other side of the planet have had a chance to play with what I've developed, we'll mostly empty the DB of test data, except for a small amount of data we've obtained). I expect that a few tiny csv files I have here will be ftped to the host and we'd use a simple script to bulk load that. Another fly in the ointment is that the hosting company is still using v 8.1.9 and I am using 8.2.5 on my machine, so I am concerned that a regular dump and restore may be problematic: it hasn't worked so far, but then I've spent much of my time so far wrestling with phppgadmin. :-( I'm just looking for something that will save me a little time. I've created the core of the DB already on my development machine, using pgAdmin, but I can recreate it in about a day using Emacs to create a SQL script that preproduces what I did in pgAdmin. Any information would be appreciated. Thanks Ted ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] SQL design pattern for a delta trigger?
--- Erik Jones [EMAIL PROTECTED] wrote: On Dec 6, 2007, at 2:36 PM, Ted Byers wrote: [snip] What you want to do here for handling the update v. insert is called an UPSERT. Basically, what you do is run the update as if the row exists and catch the exception that is thrown if it doesn't at which point you insert the record with the end date = now(). After that you can proceed normally with creating the new record with start date = now() and end date = NULL. Thanks Eric. Do you know of an URL where this is discussed or where I can find an example. None of my books discuss this, and my search using google has so far produced only noise. Thanks again. Ted ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] SQL design pattern for a delta trigger?
IS there such a thing? I can be the first to consider this. What I am aiming for is a solution with a couple coupled tables, one of which represents state through time and the other represents transactions or deltas on the state. With one field (a floating point number) in the state table (or should I say a number for each id field), it obviously has a time interval for which it is valid: a start time and an end time. What I am after is a situation where the moment a record is inserted in the deltas table, a trigger function first looks to see if the id provided presently has a state in the state table. If not, then it creates one. Then, the end date for that state record gets set to the current time and a new record is inserted with the new state (computed by applying the delta to the value in the previous record for the state), the current date as the start date and null for the end date. This seems like an obvious thing to try, but I am floundering a little and am therefore wondering if anyone has seen an SQL design pattern that talks about this, and an url where I can see such a discussion and, better, an example. The first concern is to ensure that every record inserted into the deltas table is immediately reflected in the state table, and the second is that the history of state can be reconstructed from a suitable query on the state table. I can do this easily in client code, but isn't this the sort of thing best suited to living within the database itself? Thanks Ted ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] SQL design pattern for a delta trigger?
--- Ted Byers [EMAIL PROTECTED] wrote: IS there such a thing? I can be the first to consider this. OOPS. The mind is faster than the fingers. That should have been I can NOT be the first to consider this. Ted ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] stored procedures and dynamic queries
? Is it different from what I normally do with regard to ensuring all the requisite data is available, properly normalized, with a suitable suite of indeces, keys, c., and as simple as practicable? (That is, over-simplification is avoided.) I also tend to ensure that all user access to the data is through either a stored procedure or a read only view (perhaps with a little paranoia thrown in ;). Ted ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Need help with complicated SQL statement
Thanks Shane, It works reasonably well. It gets the right answer, but I guess my data set is much larger than your test. Please consider the appended data. The first two SQL statements are directly comparable. My Left join is marginally simpler, as shown by EXPLAIN, and runs to completion in about 3 seconds (elapsed real time), as opposed to about 30 seconds for the two views. It makes a little sense, though, in that according to EXPLAIN, the LEFT JOIN needs to consider a dramatically smaller number of rows. What I find puzzling, though, is that it takes less time to get the report for 28 stocks at a given time than it does to get the report for 1. (Both take about 30 seconds, but for 28 stocks, it takes about 0.005 seconds less time ;-) This is a case where LEFT JOINS appear to be much faster than subqueries. I appreciate all your help, but I am struggling to figure out how best to adapt my LEFT JOINs in your VIEWs, so that the latter benefit from the speed of the JOINs. The heart of my problem is to figure out how to use a stock_id in the WHERE clause. One thing I am not certain of is, Is there a way to preserve the logic of the WHERE clauses by replacing the WHERE clause, which I use to sample the time series at 22 days ago, 66 days ago, 132 days ago c., by a GROUP BY clause, grouping by stock_id? If so, might that, along with an additional LEFT JOIN, get me the result I am after? I created a stored procedure that takes an id argument (and can usefully invoke it on any stock_id in the database), but the problem remains as to how to construct a record set by applying the procedure to each id in a set of ids returned, e.g., by SELECT stock_id FROM stocks; Ted === test data = EXPLAIN SELECT A1.stock_id, A1.price_date, A1.adjusted, A2.price_date AS pd22, 100.0 * (A1.adjusted - A2.adjusted)/A2.adjusted AS gl22pc, A3.price_date AS pd66, 100.0 * (A1.adjusted - A3.adjusted)/A3.adjusted AS gl66pc, A4.price_date AS pd132, 100.0 * (A1.adjusted - A4.adjusted)/A4.adjusted AS gl132pc, A5.price_date AS pd264, 100.0 * (A1.adjusted - A5.adjusted)/A5.adjusted AS gl264pc FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) AS A1 LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 22) AS A2 ON A1.stock_id = A2.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 66) AS A3 ON A1.stock_id = A3.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 132) AS A4 ON A1.stock_id = A4.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264) AS A5 ON A1.stock_id = A5.stock_id; ++-+-++---+-+-+--+---+-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-+-++---+-+-+--+---+-+ | 1 | PRIMARY | derived2 | system | NULL | NULL| NULL| NULL | 1 | | | 1 | PRIMARY | derived3 | system | NULL | NULL| NULL| NULL | 1 | | | 1 | PRIMARY | derived4 | system | NULL | NULL| NULL| NULL | 1 | | | 1 | PRIMARY | derived5 | system | NULL | NULL| NULL| NULL | 1 | | | 1 | PRIMARY | derived6 | system | NULL | NULL| NULL| NULL | 1 | | | 6 | DERIVED | stockprices | ref| PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 5 | DERIVED | stockprices | ref| PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 4 | DERIVED | stockprices | ref| PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 3 | DERIVED | stockprices | ref| PRIMARY | PRIMARY | 4 | | 17442 | Using where | | 2 | DERIVED | stockprices | ref| PRIMARY | PRIMARY | 4 | | 17442 | Using where | ++-+-++---+-+-+--+---+-+ 10 rows in set (0.08 sec) EXPLAIN SELECT * FROM stock_price_history WHERE stock_id = 1; +++-+---+---+-+-+---+++ | id | select_type| table | type | possible_keys | key | key_len | ref | rows | Extra | +++-+---+---+-+-+---+++ | 1 | PRIMARY| derived3 | ALL | NULL | NULL| NULL| NULL | 494 | Using
Re: [GENERAL] Need help with complicated SQL statement
--- Shane Ambler [EMAIL PROTECTED] wrote: Ted Byers wrote: Please consider the following statement (it becomes obvious if you remember the important thing about the table is that it has columns for each of stock_id, price_date, and price). (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2 ORDER BY T2.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3 ORDER BY T3.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4 ORDER BY T4.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5 ORDER BY T5.price_date ASC LIMIT 1); This statement works flawlessly, and is blindingly fast relative to everything else I have tried. But I am stuck. I would have these subselects as - UNION (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264) I would expect that to give the same result but make the query plan a bit simpler and quicker using less memory. It gave apparently correct values, but for some reason, it insisted on returning thousands upon thousands of identical record. There is something awry there, but I can't place what. Yes, I know I could use SELECT DISTINCT, but I worry that it may be doing a full table scan, as opposed to the relatively direct lookup I came up with after looking at your statement. I don't yet know how long it would take because it is the slowest option I tied, and I gave up after it had returned over 10,000 rows and still showed no signs of finishing. I don't understand this as explain returned apparently much better results for yours than it did for mine. My latest is as follows: SELECT A1.stock_id, A1.price_date, A1.adjusted, A2.price_date AS pd22, 100.0 * (A1.adjusted - A2.adjusted)/A2.adjusted AS gl22pc, A3.price_date AS pd66, 100.0 * (A1.adjusted - A3.adjusted)/A3.adjusted AS gl66pc, A4.price_date AS pd132, 100.0 * (A1.adjusted - A4.adjusted)/A4.adjusted AS gl132pc, A5.price_date AS pd264, 100.0 * (A1.adjusted - A5.adjusted)/A5.adjusted AS gl264pc FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) AS A1 LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 22) AS A2 ON A1.stock_id = A2.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 66) AS A3 ON A1.stock_id = A3.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 132) AS A4 ON A1.stock_id = A4.stock_id LEFT JOIN (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1 OFFSET 264) AS A5 ON A1.stock_id = A5.stock_id; This still gives me the correct answer, but is faster still than anything I came up with before. Now that I have the correct result for one stock, I need to adapt it to apply to each stock individually, in some small selection from a large number of stocks. Thanks again Ted ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] Need help with complicated SQL statement
Please consider the following statement (it becomes obvious if you remember the important thing about the table is that it has columns for each of stock_id, price_date, and price). (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 22) AS T2 ORDER BY T2.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 66) AS T3 ORDER BY T3.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 132) AS T4 ORDER BY T4.price_date ASC LIMIT 1) UNION (SELECT * FROM (SELECT * FROM stockprices WHERE stock_id = 1 ORDER BY price_date DESC LIMIT 264) AS T5 ORDER BY T5.price_date ASC LIMIT 1); This statement works flawlessly, and is blindingly fast relative to everything else I have tried. But I am stuck. First, while this statement gets me the correct data, I need to obtain a single record with stock_id, current price (that obtained from the first select statement in the union, and each of the prices returned by the subsequent select statements as a the current price minus the price at the previous date, and the result divided by the price at the previous date, expressed as a percentage. I do not yet know how to do this using SQL (it would be trivial if I exported the data to Java or C++ - but it isn't clear how to do it within SQL). To make things more difficult, suppose I have another select statement that returns a set of stock_ids. How do I apply the SQL logic I require to only those stocks in the set returned by a statement like SELECT stock_id FROM someTable WHERE ... The result of this extension would be that I have one record for each stock in the selected set of stocks. I do NOT want to have to recompute the set of stocks for each of the select statements in the above union (since that would be a waste because the resulting set of stocks would always be the same for the given criteria). Nor do I want to apply the SQL logic I need for the prices to all the stocks in the database. There could be thousands, or even tens of thousands, of stocks represented in the database and I'd need the gain/loss logic only for a few dozen at any given time! How do I make the two extensions I require? I expect the SQL I get to be eventually placed in a stored procedure, which may then be used to construct a view, but that is the easy part. Maybe I have been staring at this for too long to see the obvious solution, but I am exhausted and am not seeing the next step. If there IS an obvious next step, please at least give me a hint. Thanks Ted ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Optimal time series sampling.
--- Gregory Stark [EMAIL PROTECTED] wrote: Ted Byers [EMAIL PROTECTED] writes: As a prelude to where I really want to go, please consider the following SELECT statement. SELECT close_price FROM stockprices A WHERE price_date = (SELECT MAX(price_date) FROM stockprices B WHERE A.stock_id = B.stock_id AND A.stock_id = id); I assume you're missing another stock_id = id on the outer query? Right. I think you'll have to post the actual explain analyze output you're getting and the precise schema you have. OK, it is challenging to present it in plain text, but here is the HTML exported by MySQL Query Browser. If you cut between the lines and paste the content into a file with an html extension, it will look fine and be easy to read. == html head titleQuery EXPLAIN SELECT price_date,`close` AS close_price FROM stockprices A WHERE A.stock_id = 1 AND price_date IN (SELECT MAX(price_date) FROM stockprices B WHERE A.stock_id = B.stock_id AND A.stock_id = 1 GROUP BY YEAR(B.price_date),WEEKOFYEAR(B.price_date));, Fri Nov 09 11:12:46 2007 /title meta http-equiv=Content-Type content=text/html; charset=utf-8 /head bodyh1Query EXPLAIN SELECT price_date,`close` AS close_price FROM stockprices A WHERE A.stock_id = 1 AND price_date IN (SELECT MAX(price_date) FROM stockprices B WHERE A.stock_id = B.stock_id AND A.stock_id = 1 GROUP BY YEAR(B.price_date),WEEKOFYEAR(B.price_date));, Fri Nov 09 11:12:46 2007 /h1 table border=1 cellspacing=1 cellpadding=0tr thid/ththselect_type/ththtable/ththtype/ththpossible_keys/ththkey/ththkey_len/ththref/ththrows/ththExtra/th/tr tr td1/tdtdPRIMARY/tdtdA/tdtdref/tdtdPRIMARY/tdtdPRIMARY/tdtd4/tdtdconst/tdtd17442/tdtdUsing where/td/tr tr td2/tdtdDEPENDENT SUBQUERY/tdtdB/tdtdref/tdtdPRIMARY/tdtdPRIMARY/tdtd4/tdtdconst/tdtd17442/tdtdUsing where; Using index; Using temporary; Using filesort/td/tr /table /body/html === You might need an index on stock_id,price_date. That is the definition of the primary key. That is why both the main query and the subquery are shown using the primary key. That would be an entirely different ball of wax than trying to pull out a single stock's closing price. I suspect you're going to want to use Postgres's DISTINCT ON SQL extension. Something like: SELECT DISTINCT ON (stock_id,price_date) * FROM stockprices ORDER BY stock_id, price_date DESC And you may want an index on stock_id, price_date DESC That pair, as I mentioned, formed the primary key for the stockprices table. Here is my SQL for subsampling a time series: SELECT price_date,close_price FROM stockprices A WHERE A.stock_id = 1 AND price_date IN (SELECT MAX(price_date) FROM stockprices B WHERE A.stock_id = B.stock_id AND A.stock_id = 1 GROUP BY YEAR(B.price_date),WEEKOFYEAR(B.price_date)); This performs better than the outer join algorithm for getting only the last price for a couple dozen stocks. This particular statement crawls to completion in about 4 or 5 minutes, as compared to over ten to get just the last price for a couple dozen stocks. Not too surprisingly, Explain gives identical results for this query as it did for the simpler SELECT above. Do you care what happens if there were no trades for a given stock in the time period? The query you give above using MAX would still work but the query I described using DISTINCT ON would not emit a record for the stock at all. No. But then I haven't yet analyzed the data to learn what the prices for a given period really mean if there haven't been any trades within the period. I have yet to see a series of prices for which the volume is 0. That may be an artifact of how my colleagues selected stocks and etfs to use to test our algorithm. I do not yet know if it will be a significant issue for us since I don't see how a stock or etf that can go for a while without any trades at all would be of interest given the kind of information we will be producing for our clients. I am presently focussed on developing ways of looking at the data, to help my colleagues better understand the data and what our algorithm does with it. One of the properties of interest, and relevance to our algorithm is the common property that the series seem to be self affine (as described most notably by B. Mandlebrot): hence the need to sample with different degrees of granularity. My colleagues have worked primarily with finance data (esp. commodities), while my background is more focussed on risk management in environmental science. Thanks Ted ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Optimal time series sampling.
--- Tom Lane [EMAIL PROTECTED] wrote: Ted Byers [EMAIL PROTECTED] writes: OK, it is challenging to present it in plain text, but here is the HTML exported by MySQL Query Browser. Why are you asking this list for help with a MySQL performance problem? because my question isn't really about MySQL, but rather about how best to construct the SQL required to get the job done, regardless of what database is used. I have seen some claims that it is better to use joins instead of correlated subqueries and others that say the opposite. And I do not, at this stage, know if there are other options in SQL that may or may not be better. At this time, the database in use is irrelevant (I want to stick as close to the ANSI standard as practicable so the rewriting required will be minimal should we decide to change the database later, for whatever reason). Thanks, Ted ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Optimal time series sampling.
--- Scott Marlowe [EMAIL PROTECTED] wrote: On Nov 9, 2007 11:47 AM, Ted Byers [EMAIL PROTECTED] wrote: --- Tom Lane [EMAIL PROTECTED] wrote: Ted Byers [EMAIL PROTECTED] writes: [snip] Which is better depends largely on how your database is built. MySQL still uses loops for all subselects, so with large numbers of tuples in the subselect method, it will be slow. But they might fix this in a later release. Fairly recent versions of PostgreSQL could make some bad choices when doing joins for certain datasets that would be much faster with a correlated subquery (specifically the old left join where righttable.field is null trick made some pgsql versions choose an inefficient join method) So, the right way is a question of which db, and even which version of that DB you're on. My life just got s much more complicated. Oh well, I guess that will be useful when providing advice to management when they start seeing performance issues. Thanks ;-) Have you looked at version 5.0.45? I am always seeing the claim that the left join trick is so much more faster than the correlated subquery, especially if a function like MAX() is used, but the numbers I am seeing with real stock price data has it running, on average, about three times faster than the left join. So I assumed I was doing something wrong in a manner than would get me the right answer the slowest way possible. At this time, the database in use is irrelevant (I want to stick as close to the ANSI standard as practicable so the rewriting required will be minimal should we decide to change the database later, for whatever reason). If you want to stick with ANSI, MySQL tends to be more divergent from the spec than pgsql and other rdbms. The books I am using, which describe the SQL language, don't seem to mention or illustrate much difference among any of the rdbms (including my references that talk about Oracle and MS SQL Server). The SQL I try from those books seem to work reasonably well in all of them (I can't check against Oracle, though, since I don't have that), and I try most of my SQL against MySQL, Postgres and MS SQL Server (the biggest divergences seem to be in how bulk loading of data happens). Maybe I haven't explored enough of the SQL language, with large enough datasets, to see the differences you mention; or perhaps things are improving with all of them. Most people would consider the correlate subquery the better method. But it's also likely to be the slowest on MySQL. Right now, with this particular query the correlated subquery is the one that gets me the right answers about 3 times faster than any other method I have tried. But it still takes several minutes to get the results for only a few dozen stocks. And yet I can get several megabytes of data from the following query in about a quarter of the time. SELECT price_date,close_price FROM stockprices A WHERE A.stock_id = 1 AND price_date IN (SELECT MAX(B.price_date) FROM stockprices B WHERE A.stock_id = B.stock_id AND A.stock_id = 1 GROUP BY YEAR(B.price_date),WEEKOFYEAR(B.price_date)); I had expected this to take many times longer than the simple select that gets only the last price for a given stock, but I was surprised to see it so much faster than the query that gets just the last prices for only a couple dozen stocks. Thanks alot. I learned alot from your reply. Ted ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Optimal time series sampling.
As a prelude to where I really want to go, please consider the following SELECT statement. SELECT close_price FROM stockprices A WHERE price_date = (SELECT MAX(price_date) FROM stockprices B WHERE A.stock_id = B.stock_id AND A.stock_id = id); stockprices has a primary key comprised of stock_id and price_date, and I tried the same query with an extra inex on price_date (but that index made no difference in apparent performance as seen on the clock on the wall). I have been advised (on the MySQL board), to use the following (with the claim, unsupported as far as I can tell, that it is both correct and much faster - it appears to be correct, but it is certainly no faster): SELECT A.`close` AS close_price FROM stockprices A LEFT JOIN stockprices B ON A.stock_id = B.stock_id AND B.price_date A.price_date WHERE B.price_date IS NULL AND A.stock_id = id; It appears to do the right thing. I certainly get the right answer, but I am not seeing a significant difference in performance. Worse, when I invoke something like it for a suite of about two dozen stocks, it takes about ten minutes to complete. (I may try a variant in which the last clause used in WHERE is replaced by IN followed by a trivial select that gets the same two dozen stock_ids, to see if that helps.) Now, I am concerned with performance because, and this is where I really want to go, I want to adapt this logic to create new time series of closing prices, but at the granularity of a week, a month or quarter, and there is no predicting a priori how long the series is. IBM's data goes back decades while I have data for other stocks that go back only a couple years. Now, a junior programmer here had suggested just doing a simple select, at least for weekly granularity, by selecting a value if it's day of the week computes to Friday. That can't work correctly because in some weeks, there are statutory holidays that land on Fridays, resulting in the last actual trading day for that week being Thursday. His simple approach guarantees that many records that ought to be included will be ignored. I need a more dynamic and flexible approach which allows me to work on the basis that I have prices for all trading days for a given stock from the time my data for it begins. So I need a more complex select statement that will just select the most recent price for a given stock for each week (or month or quarter or year). Now, I can get the full time series for two dozen stocks, as slow and brain dead as doing a select for each stock ID, AND have my Java code construct and display a chart, in less than 20 seconds (and Java does not have a reputation for being fast). I need whatever solution I use to be that quick. Any thoughts about how best to attack this in order to get the correct results as fast as is possible? What options would you consider, WRT defining the SQL statements you would benchmark, in order to design your benchmark testing? Thanks, Ted ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] young guy wanting (Postgres DBA) ammo
--- Greg Smith [EMAIL PROTECTED] wrote: On Fri, 2 Nov 2007, Kevin Hunter wrote: I don't have ammo to defend (or agree?) with my friend when he says that Postgres requires a DBA and MySQL doesn't so that's why they choose the latter. [snip] To step back for a second, the software industry as a whole is going through this phase right now where programmers are more empowered than ever to run complicated database-driven designs without actually having to be DBAs. It used to be that you needed a DBA for every job like this because they were the only people who knew how to setup the database tables at all, and once they were involved they also (if they were any good) did higher-level design planning, with scalabilty in mind, and worried about data integrity issues. Software frameworks like Ruby on Rails and Hibernate have made it simple for programmers to churn out code that operates on databases without having the slightest idea what is going on under the hood. From a programmer's perspective, the better database is the one that requires the least work to get running. This leads to projects where a system that worked fine in development crashes and burns once it reaches a non-trivial workload, because if you don't design databases with an eye towards scalability and integrity you don't magically get either. As one of these programmers, where is the best place to find the information I need to get it right. Finding information, and finding good information is not the same thing, and I am wary of 99% of what I find using google. Since you know what a DBA needs to know, I ask you where I can learn what you believe a good DBA needs to know. Or am I OK just relying on the documentation that comes with a given RDBMS (Postgres, MySQL, MS SQL, c.)? Ted ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Design Question (Time Series Data)
--- Andreas Strasser [EMAIL PROTECTED] wrote: Hello, i'm currently designing an application that will retrieve economic data (mainly time series)from different sources and distribute it to clients. It is supposed to manage around 20.000 different series with differing numbers of observations (some have only a few dozen observations, others several thousand) and i'm now faced with the decision where and how to store the data. If you really have such a disparity among your series, then it is a mistake to blend them into a single table. You really need to spend more time analyzing what the data means. If one data set is comprised of the daily close price of a suite of stocks or mutual funds, then it makes sense to include all such series in a given table, but if some of the series are daily close price and others are monthly averages, then it is a mistake to combine them in a single table, and two or more would be warranted. Or if the data are from different data feed vendors, then you have to think very carefully whether or not the data can logically be combined. So far, i've come up with 3 possible solutions 1) Storing the observations in one big table with fields for the series, position within the series and the value (float) 2) Storing the observations in an array (either in the same table as the series or in an extra data-table) 3) Storing the observations in CSV-files on the hard disk and only putting a reference to it in the database I don't much like any of the above. When I have had to process data for financial consultants, I applied a few simple filters to ensure the data is clean (e.g. tests to ensure data hasn't been corrupted during transmission, proper handling of missing data, c.), and then bulk loaded the data into a suite of tables designed specifically to match the vendor's definitions of what the data means. Only then did we apply specific analyses designed in consultation with the financial consultant's specialists; folk best qualified to help us understand how best to understand the data and especially how it can be combined in a meaningful way. If the data are stored in a suite of well defined tables, subsequent analyses are much more easily designed, implemented and executed. I do not know if PostgreSQL, or any other RDBMS, includes the ability to call on software such as R to do specific statistical analysis, but if I had to do some time series analysis, I would do it in a client application that retrieves the appropriate data from the database and either does the analysis in custom code I have written (usually in C++, as some of my favourite analyses have not made it into commonly available open source or commercial statistical software) or invokes the appropriate functions from statistical software I have at my disposal. The strategy I describe above makes the SQL required for much of this dirt simple. HTH Ted ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Design Question (Time Series Data)
--- Michael Glaesemann [EMAIL PROTECTED] wrote: On Oct 4, 2007, at 9:30 , Ted Byers wrote: I do not know if PostgreSQL, or any other RDBMS, includes the ability to call on software such as R See PL/R: http://www.joeconway.com/plr/ Thanks. Good to know. Ted ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] How to avoid: 9.50184e+06
--- Stefan Schwarzer [EMAIL PROTECTED] wrote: Hi there, how can I avoid results like this: 9.50184e+06 Instead it should return the real value, as 950184. But 9.50184e+06 IS the real value! That is about nine and a half million, not nine hundred and fifty thousand, BTW. I do not see why you want the database back end to divide the real number by ten and then display the result as an integer. I have not checked tbe behaviour of the functions provided by Postgresql to convert numbers to strings, but I would be surprised if a function suitable for use in serializing a floating point number failed to show 9.50184e+06 as 9.50184e+06; to have it automagically convert it to an integer would be counter intuitive to me. Really, how a number is displayed is properly in the domain of the client application. If it is written in C, then you have functions like printf, with all their glorious format specifiers, to give you exactly what you want and expect. And you have similar control with IO streams in C++ and Java. ALL real programming languages provide support for producing formatted output, and give you absolute control over the format used. Whether you are using a thin client or a thick client, manipulating how floating point numbers really belongs in the interface layer of the client. If you want your numbers displayed as integers, then you used the wrong type (as others have also suggested). If the data really requires use of floating point numbers, then use the libraries provided by whatever language you're using to develop your client to produce the format you want. HTH Ted ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Audit-trail engine inner-workings
--- Marcelo de Moraes Serpa [EMAIL PROTECTED] wrote: Hello list, [snip] * Each http request gets an instance of a session_id ? Or is it per-connection ? It depends. There is no necessary correlation between a session as defined within a web application and a session as defined in the RDBMS. I routinely set up web applications that may have multiple sessions as seen by the RDBMS. Consider the idiom of doing operations with the least priviledge required to get the job done. The application might log into the database as one databse user with very limited rights, to authenticate the user and pass data to the web application regarding what the user is permitted to do (all server side, on the application server). Then, the application may log in as a very different user with limited rights to perform some operation the user has initiated. So far, we have two sessions as far as the database is concerned and only one as far as the web application is concerned. If you're working with web applications, you must know that multiple http requests can share a web application session, as can multiple web applications (if written to do so using the single sign-on idiom), assuming you use technologies such as cookies or URL rewriting or hidden form fields, or the magic of ASP.NET, to set up and maintain sessions, and that the session is restricted to a single http request if you don't (plain old http/html is stateless, so there is no useful concept of session without help from other technologies). HTH Ted ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] [RESEND] Transaction auto-abort causes grief with Spring Framework
Sorry for top posting - but this is an annoying of this web interface to email. :-( Isn't what you're doing here a misuse of the idea of a transaction. I don't claim to be an expert in this, but I thought the idea of a transaction was that you bundle a group of statements together that must all succeed or all fail together; so if any one of them fails, all the preceding work completed in the transaction gets rolled back. Did I misunderstand something here, then? If not, then the operations you describe for your first transaction really ought not be in the same transaction. Now, when I run a script of SQL statements in MS SQL Server, a statement that encounters an error will simply stop, and then control can be passed to the next SQL statement in the script, BUT THIS IS NOT IN A TRANSACTION!!! It is only a script, and in the context in which I was working, I didn't need to use transactions. And, I can provide additional options in the statements that will modify this behaviour, so the script aborts at the error condition rather than continuing with the next SQL statement in the script. With some statements, what I would regard as an error requiring the script to abort seem to be regarded as merely a warning by default. For example, in a bulk load of data into a table, and there is a problem with the data for the tenth field on the second row, the first row, which is correct, is stored, the statement aborts at the problem row, and control is passed to the next SQL statement. In my situations, I had other programs that would clean up the data if this sort of problem arises, so where the problem is seen by default as warranting only a warning, I could upgrade it to be regarded as an error. And of course, there are options for controlling how the script behaves when an error occurs. Perhaps that is involved in the behaviour you are reporting for MS SQL Server?? I haven't investigated this myself as I haven't had the problem you describe. I didn't quite understand your description, in another post, of how Spring treats your database statements. Am I to understand it puts all your SQL statements into a single transaction? If so, either they badly mishandle transactions or they are working with a very different concept of what a transaction is. One last question. You describe part of your problem as being correct addition of data to an audit table. If I haven't misunderstood what you're doing, isn't it incomplete if you record only statement failures? When I deal with audits, I put the logic into triggers whenever possible. And I'd maintain an independant error log from my application code, probably in a catch clause, and this either goes to a flat file or uses an independant connection to the database. And my practice is to use separate try/catch blocks for each statement that can throw an exception, to improve the granularity of error handling logic. That is the only way to have a chance of getting one statement to execute regardless of whether or not a preceding statement throws an exception. I have a special interest in this because I am just beginning to look at Spring (I downloaded it just a few days ago). Cheers, Ted Tyson Lloyd Thwaites [EMAIL PROTECTED] wrote: Our app uses system state. We scan filesystems and record file information in a database. Here is one example: - insert 250 files - update some stats (MUST work even if insert fails) - update agent last-contact time (also must work so we know it's not awol) When last insert has finished, server will kick into summary mode: - wait for asynchronous inserts to finish - summarise data - do some misc operations - switch back to 'Idle' state That last step is where the trick is: if anything goes wrong with the previous operations, we MUST be able to put the system back into an 'Idle' state. Currently we do this in a catch block, ie: try { } catch { } Now of course that won't work in pg. We use the same logic all through the system, so you can see my problem. For example, if the user deletes a drive record that still has referential links to it, we do this: try { } catch (ReferentialIntegrityException e) { } We rely on the fact that we can still do things and commit a transaction even if a single statement has failed. The above drive delete case is where I first noticed the problem when switching from MSSQL: instead of the server handling the exception and doing something else instead, the 'something else' also threw an exception (txn is aborted), which propagated to the client end. UPDATE: Webb has proposed that this behaviour may be MSSQL misbehaving. -- Tyson Lloyd Thwaites Lead Software Developer Allianza Services Pty Ltd M: 0404 456 858 P: (+618) 8232 5525 F: (+618) 8232 8252 W: www.allianza.com.au ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner
Re: [GENERAL] How best to represent relationships in a database generically?
. Yes, I'd love to have more computing power at my disposal, but that doesn't stop me from doing useful stuff now! Modern ecological models are, at present, much more limited by the availability of good data, both for parameterizing models and for validating models, than they are by available computing resources. HTH Ted Lincoln Yeoh [EMAIL PROTECTED] wrote: Hi, Sorry, this really isn't postgresql specific, but I figure there are lots of smarter people around here. Say I have lots of different objects (thousands or even millions?). Example: cow, grass, tiger, goat, fish, penguin. BUT I'm not so interested in defining things by linking them to categories or giving them names, I'm trying to figure out a way to define things by their relationships with other things, and more importantly do searches and other processing by those relationships. So, what would be the best way to store them so that a search for the relationship like grass is to cow, will also turn up cow is to tiger, and goat is to tiger, and fish is to penguin (and penguin is to bigger fish ;) ), and electricity is to computer. And a search for cow is to goat, could turn up tiger is to lion, and goat is to cow. Is the only way to store all the links explicitly? e.g. have a huge link table storing stuff like obj = cow, subj = grass, type = consumes, probability=90% ( = means points/links to). Or even just have one table (links are objects too). Or is it possible to somehow put the objects in a multidimensional space (1000 dimensions?) and keep trying to arrange the objects so that their relationships/vectors with/from each other are fairly consistent/reasonable based on current knowledge? Trouble is in some cases the grass eventually eats the cow, so maybe that doesn't work at all ;). Or even do both? Maybe use the first as a cache, and the second for deeper stuff (flash of insight or got the punchline = figure out better arrangement/ joining of disparate maps). My worry about the first approach is that the number of links might go up very much faster as you add more objects. But perhaps this won't be true in practice. The worry about the second approach is that it might get stuck, or run out of dimensions. Is there a better way to do this? There must be right? Wait for huge quantum computers and use qubits for each multidimensional coordinate? ;). Regards, Link. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Will partial index creation use existing index?
I don't know about partial indexing, but in a similar situation to what you describe, I have resorted to special tables designed specifically to hold one day's worth of data and to support our daily processing. I'd put the daily data into specific tables, with all the requisite indeces, and then, as part of the daily processing and after it has been processed, copy the day's data into the main tables and empty the tables used to hold the data temporarily. It may not be elegant or pretty, but it serves to greatly simplify processing since you know a priori that these tables content only the data you need to process, and you can be certain that eventually it finds its way into the main data tables. And you have your indeces only on the tables used for daily processing, rather than on your main tables. An additional concern I had was that if the process for getting the data is vulnerable to error (and in my case it was), you can apply your verification procedures to it before bad data gets into your main tables, thereby reducing the risk of bad data getting into the database. I though this might be an option for you if you have trouble getting your partial indeces to work well for you. HTH Ted Steve Crawford [EMAIL PROTECTED] wrote: Does PostgreSQL use an existing index, if possible, when creating a partial index? By way of background, we have some nightly bulk processing that includes a couple of 15-30 million row tables. Most of the processing is only looking at prior-day data (up to ~200,000 rows) and for efficiency requires several indexes. Except for this one process, the indexes are useless and I'd rather not constantly maintain them.
Re: [GENERAL] COPY to/from non-local file
I just checked the manual and it says nothing about being able to use the SQL copy to access network files. But you have an option that almost works for you. I am sure others here will have some suggestions, but if I were in your place, I'd gather more information about where bottlenecks exist in a solution that works for me, albeit slowly. To check your performance problem, you may want to add benchmarking code to your Ruby program to see where it is spending its time: on getting the data from PostgreSQL or writing it to a file. That will tell you where your efforts are best spent. I don't work with Ruby, but if I was using C++ or Java or Perl, I'd use the same strategy: get the data en mass, and then write it out as efficiently as possible. In C++ for example, there is an iostream class based idiom that using one line of code in the application program and that line is as fast as you can make your io code without delving deeply into IO processing and developing your own IOstream classes. In both C++ and Java, you have stream classes, and significant control over how the data is written: one character at a time (if you want the process to take forever ;-) or in blocks of whatever size you want. But this involves being willing to develop your own stream classes to implement your preferred buffering strategy. In C++, you can save a lot of development time by having template policy classes that control how best to optimize disk IO. In Perl, you can read a file en mass and then iterate through it a line at a time, but for this my preference at present is to use C++. Since I don't know Ruby, I don't know how well it supports these, and related, IO programming idioms. If it's support is poor, it may pay to use a more powerful and flexible language. If it has outstanding power and flexibility for supporting IO optimization programming idioms, let me know and I'll invest the time to add Ruby to my list of languages in my development toolbox. :-) I would assume that what I have learned using the languages I know can be applied in some fashion to programming in Ruby. HTH Ted Jaime Silvela [EMAIL PROTECTED] wrote: I've been looking around for this functionality: Is it possible to use COPY with a file that doesn't reside in the DB's filesystem? I know there is sort of a solution in using COPY from stdin/ COPY to stdout, however that depends on calling the psql command, which breaks the flow of control of the programs I'm trying to write. Here's the story: I have a Ruby on Rails application which sits on server A, and connects to a Postgres Database running on B. Using the regular flow, the rails code writes into various tables, and then generates a table that needs to be exported into a file which will be used as input into a separate program, a simulator, and kept in the filesystem of the Rails server. Using the COPY command would entail either of a) Using COPY to generate the file on server B, then transfer to A - but how? ftp? I want to reduce this type of coupling b) Using COPY TO STDOUT from psql, called in the Rails code with a backtick, then gathering the output and filing it. - but this solution depends on having psql in the path of the Rails server, reintroducing the server credentials, and from a programming point of view is kind of ugly. c) The Postgres driver for Rails tries to give an interface to the COPY command using raw_connection, getline and endcopy, but it's quite brittle, so I'm avoiding it altogether. At the moment I'm avoiding those solutions, and instead get the table into Rails space with a CURSOR on a SELECT, then simply write the file in Rails, one line at a time. However, the tables I'm writing are pretty big, and the performance is so much worse than with COPY... Any suggestions? Thanks Jaime *** Bear Stearns is not responsible for any recommendation, solicitation, offer or agreement or any information about any transaction, customer account or account activity contained in this communication. Bear Stearns does not provide tax, legal or accounting advice. You should consult your own tax, legal and accounting advisors before engaging in any transaction. In order for Bear Stearns to comply with Internal Revenue Service Circular 230 (if applicable), you are notified that any discussion of U.S. federal tax issues contained or referred to herein is not intended or written to be used, and cannot be used, for the purpose of: (A) avoiding penalties that may be imposed under the Internal Revenue Code; nor (B) promoting, marketing or recommending to another party any transaction or matter addressed herein. *** ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Temporal Units
Rich, I would think that as an ecologist, you would have a better sense than most here of the kinds of things I'd be doing. After all, I am a mathematical ecologist by training and the majority of applications I have developed have been either for agricultural consultants or environmental consultants. As a very simplistic example of where I find the kind of auditing I mentioned being priceless, consider a decision support tool for which the input data involves concentrations of various contaminants in water leaving some facility (at this point we don't care what the facility is). While I would prefer that the input come from a laboratory information system, and fully automated, imagine that the data is recorded and input by some lab tech. as each test is completed. On April 1, 2006, there is a typo leading the decision maker to believe that the concentration of contaminant X is 1 mg/l, and that this is an error and the true value is 1 g/l. The decision maker may decide to do nothing because 1 mg/l is considered safe. But doing nothing, some bad environmental effect occurs, and it is discovered days after the decision was made that the data entered is in error. Who is to blame? Where does responsibility lay and what corrective action is needed to ensure the problem does not recurr? When there is an issue of accountability and responsibility involved in environmental issues, every scientific measurement taken becomes a candidate for the kind of multitemporal data I routinely use. Now, for lookup tables vs functions, there is always a tradeoff even within a given programming language (I am most familiar with FORTRAN, C++ and Java). The tradeoff is between memory use (including how that may affect performance if the table is large relative to available memory) and the speed or size of the function. In the case of the date and time functions I typically use in my perl scripts, they are so short, I doubt the performance impact is significant or easily measurable. On a side note, how do you like Ruby and Python? How would you compare them to the other options (such as C++, Java, Perl, c.)? Cheers, Ted Rich Shepard [EMAIL PROTECTED] wrote: On Mon, 30 Apr 2007, Ted Byers wrote: I am not sure I see why it would be good to do this using SQL, but I do know that I have used a number of Perl packages for this sort of thing. I am not arguing with you. I just want to know in what circumstances my schemas can be improved by a calendar table, and how it provides a benefit over my more usual Perl functions. Ted, Having never used such a table -- or having written an application that had such a heavy use of temporal data rather than scientific data -- I have no idea in what circumstances your schemas might be improved with a calendar table. I suspect, however, that a SQL table lookup may well be quicker than running a script (or compiled function) in another language, and the table is available for use in multiple apps. Isn't it faster or more efficient to run SELECT queries with table lookups rather then use stored procedures? For this web-based application, the UI and communications between client and server are being written in Ruby (with Rails) while the report generation is written in Python using ReportLab. If most of the queries can be done with SQL, I think it will be much easier to maintain, modify, and expand. Could be wrong, of course. Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) Voice: 503-667-4517 Fax: 503-667-8863 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Temporal Units
I am not sure I see why it would be good to do this using SQL, but I do know that I have used a number of Perl packages for this sort of thing. When I have done this in the past, I'd do the date and time calculations in Perl and feed the result to whatever RDBMS I happen to be using (PostgreSQL, MySQL, MS SQL Server 2005). I suppose that if it is nessary to do it within an SQL script, resort could be made to functions that in turm use the Perl packages. But a question: Why would any schema that includes temporal components need a calendar table? I use temporal components all the time and have yet to need a calendar table. In fact, some of my database applications are multitemporal, keeping track of edits to data that correct or update data, so that during an audit script, one can determine what a decision maker knew at the time he made a decision. This is so that a decision that was bad, but based on good data can be distinguished from a decision that had been based on bad data, but which would have been a good decision had the data been correct. The first option warrants correction of the decision maker while the latter warrants examination of the data entry process. I have found my Perl scripts adequate for those few instances where use of my tyemporal data depended on a calendar. I am not arguing with you. I just want to know in what circumstances my schemas can be improved by a calendar table, and how it provides a benefit over my more usual Perl functions. Cheers, Ted Rich Shepard [EMAIL PROTECTED] wrote: On Mon, 30 Apr 2007, Brent Wood wrote: If I'm following this correctly, then interval extract timepart can be used to provide all the required functionality: Thanks, Brent. Your suggestions complete the approach I was considering. There is no need for real-time response, to checking after each shift or day -- or other time period -- will be sufficient. I wonder if a workweek/holiday calendar table for PostgreSQL already exists. If not I need to track down the procedure for creating one as Joe Celko references such a calendar in his books. I think that any schema that has temporal components needs such a table. Rich -- Richard B. Shepard, Ph.D. | The Environmental Permitting Applied Ecosystem Services, Inc. | Accelerator(TM) Voice: 503-667-4517 Fax: 503-667-8863 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Design / Implementation problem
procedures. Note, this should allow your customers to get their information almost instantly since the amount of data that would be processed for them would be very small. If they want to know how many points they have, and even how many points they have that will expire in the next day, they're looking at dirt simple queries with at most one substraction. Note, I did not spend the time to refine this to minimize the total computational load or the data storage requirements. That could take days, or even weeks, depending on your attention to detail and how concerned you are about efficiency. I am sure there are some computational efficiencies that can be gained, with additional analysis, perhaps with some tradeoffs regarding the detail of the data stored available for audit purposes, but I'll leave that as an exercise for you. :-) Note, I do not see how, from what you'd written, how your proposed solution would ensure that customers used their oldest points first. Maybe it does, and you didn't describe that aspect well, but that is something you'll have to be careful about if you want to avoid upset customers. Unit testing, followed by integration tests, are your friends! HTH Ted ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [Bulk] Re: [GENERAL] quoted identifier behaviour
is broken and preserve what is not broken. HTH Ted ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Setting week starting day
It is not hard to calculate, as you can see... but it would be nice if date_trunc('week', date) could do that directly. Even if it became date_trunc('week', date, 4) or date_trunc('week', date, 'Wednesday') it would be nice... :-) And that is what I was trying to ask ;-) Use date_trunc('week', current_day + 1) and date_trunc('dow', current_day + 1) to have a one day offset from the standard first day of the week. I believe there's more than that... Probably the +1 should be outside the date_trunc, anyway. It might help, but I still see the need to to do calculations... Specially if it was Tuesday today... Out of curiosity, why does the database need to know this, or to be able to calculate it? There are lots of things that would be useful to me, if the RDBMS I'm using at the time supported them (particularly certain statistical functions - ANOVA, MANOVA, nonlinear least squares regression, time series analysis, c.), but given that I can readily obtain these from other software I use, and can if necessary put the requisite code in a middleware component, I would rather have the PostgreSQL developer's focus on issues central to having a good DB, such as ANSI standard compliance for SQL, or robust pooling, c. and just leave me a mechanism for calling functions that are external to the database for the extra stuff I need. I would prefer a suite of applications that each does one thing well than a single application that does a mediocre job on everything it allegedly supports. What would be 'nice' and what is practical are often very different things. I know what you're after is simple, but remember the good folk responsible for PostgreSQL have only finite time available to work on it, and thus, when they're making choices about priorities, I'd rather they ignore even simple ancillary stuff and focus on what really matters. I just recently finished a project in which the data processing needed information similar to what you're after, but instead of doing it in the database, we opted to do it in the Perl script I wrote that fed data to the database. In fact, it wasn't so much the day of the week that mattered to the processing algorithm but the resulting dates for the immediately preceding business day and the immediately following business day. It was those dates we fed to the database rather than the weekday. There are several Perl packages (see CPAN) supporting this kind of calculation. These are generally outstanding (and would probably be useful if you want to create your own stored function implemented in Perl), but you may have to customize them by providing additional configuration information such as timezone and statutory and religious holidays if you need to determine business days in addition to just the day of the week. the day of the week can be obtained in Perl with a single function call! I just took a quick break to read about the date functions available within PostgreSQL, and while apparently nice, you have much greater flexibility, and many more functions, in these Perl packages I mentioned. If you just want a function call, I'd suggest you create a function that just dispatches a call to the Perl function that best meets your needs. In a sense, you are not really rolling your own. You're just dispatching the call to a function in a Perl package. Cheers Ted ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Setting week starting day
- Original Message - From: Omar Eljumaily [EMAIL PROTECTED] To: Ted Byers [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Sent: Friday, March 09, 2007 5:00 PM Subject: Re: [GENERAL] Setting week starting day Ted, my reason for asking the question that I believe precipitated this thread was that I wanted a single sql statement that aggregated time data by week. Yes, I could do the aggregation subsequently in my own client side code, but it's easier and less error prone to have it done by the server. I don't buy the suggestion that server side code is less error prone that client side code, but be that as it may, we're talking about a function that has one line of code. And given what you just said, you don't want the day of the week, you want a function that returns the week of the year. This can be had from the same Perl functions I mentioned before, with a minor alteration in how you call it. my suggestion would be to create that one line function that invokes the relevant Perl function, which can then be invoked in your select statement (presumably with a group clause to avoid mixing data from different years). It should take about ten to fifteen minutes to write and test? Ted ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Data validation - was Re: [GENERAL] OT: Canadian Tax Database
- Original Message - From: omar [EMAIL PROTECTED] To: pgsql-general@postgresql.org Sent: Friday, March 09, 2007 10:40 PM Subject: Re: [GENERAL] OT: Canadian Tax Database Tom, I promise this isn't a political statement, even though it's on the same thread. I'm curious what people think about the following statement considering the database typing talk being brought up here. My experience is that more times than not I have to put data validation in my client code even when it's available on the server, if for no other reason that users don't understand what foreign key violation, etc messages mean. It begs the question of whether it's really necessary on the server or not. SQLite seems to take the position that it isn't since there is no referential integrity and the following. To be honest, there's a lot of power in the ability to view everything as a string, with of course proper data validation. This risk of this is far too high. Treating everything as a string is, IMHO, a very bad idea. There are, especially for a web application, numerous forms of attack, so I routinely provide code for client side validation, server side validation (in a web app or in filters that process the data before providing it to whatever is going to do something useful with the data. this includes designing stored procedures to receive, and validate, data before the data is stored in the database. On the client side, the main benefit is to ensure the user doesn't miss anything that is necessary and that he enters valid data. If the user is malicious, and wants to try a SQL injection attack, nothing you do on the client side can prevent him from creating his own version of your page bypassing all of your client side validation code. And it is possible for a scoundrel to try a man in the middle attack (intercepting a transaction mid stream and trying, e.g., a SQL injection attack). So even with client side validation, server side validation is absolutely essential. I like Perl for that, but it can be done in your favourite programming language. And it can be done in .NET also, if you prefer. Maybe I am paranoid, but whether I am writing code to be run at the very back end, or the very front end, or anywhere between the two, my preference is to validate the data that specific object has received before I do anything with it. That is key in secure application development. You generally assume that your system, and any component therein, can been compromised so you program on the assumption that it can be compromised somewhere and write code that minimizes or eliminates the damage that can be done if some component anywhere else in the system has been compromised. Just 'coz I'm paranoid doesn't mean they're not out to get me. ;-) I value really good system administrators who go the extra mile to make intranets and systems as secure as humanly possible, but as an application developer, I never assume they have not overlooked something. Instead, I assume the opposite and that therefore, they got everything wrong and that the intranet and every server in it either has been compromised or will soon be compromised, and I therefore try to minimize the risk of damage or violation of data confidentiality or security in a network or on a system that has been compromised. I know perfection is not possible, but I hope we can make it too expensive for a cyber criminal to get what he wants illegally. If we make his cost greater than his potential return, he should rationally move on to easier targets. Cheers Ted ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] OT: Canadian Tax Database
- Original Message - From: Joshua D. Drake [EMAIL PROTECTED] To: Richard Huxton dev@archonet.com Cc: pgsql-general@postgresql.org Sent: Thursday, March 08, 2007 8:00 AM Subject: Re: [GENERAL] OT: Canadian Tax Database Richard Huxton wrote: http://www.thestar.com/News/article/189175 For instance, in some cases the field for the social insurance number was instead filled in with a birth date. Unbelievable. Sixty years of electronic computing, fifty years use in business and the professionals who built the tax system for a wealthy democratic country didn't use data types. This is Unbelievable? This is commonplace. And due at least in part to government (and other institutions operated by damned fools) opting for the least expensive provider rather than paying for someone who actually knows what they're doing. Just as buying cheap junk always comes back to get you, hiring incompetent fools that don't know their ass from a hole in the ground will come back to get you too. This time CRA is embarrassed, but they don't care because the people that suffer are the taxpayers who ultimately paid for such shoddy work in the first place. There's no consequences for the bureaucratic peons really responsible for it. They probably even get paid obscene sums in overtime for the time they spend fixing the problem. More annoying, for me, are the scurrilous scoundrels that pass themselves off as competent software consultants who take advantage of such incompetence in their clients' staff. I couldn't begin to document all the cases I have seen where either the wrong software was used (imagine a spreadsheet being used as an RDBMS) or the right software was grossly abused (imagine forcing a data entry clerk to enter the same data four times because the developer was too damned lazy or incompetent to develop a simple form to collect the data once and then submit it to the four externally owned databases that needed to be queried using it, and then having to manually collate the results returned from the queries). And then businesses operated by capable folk get burned by such incompetent and unethical scoundrels and swear off custom software because they'd rather have a COTS product that gives a 80% fit than try for a 100% fit with a custom product that in the end doesn't work at all. I have been told by some of these folk that they have found it virtually impossible to find capable software developers. This is because these scoundrels I mention outnumber capable developers by several orders of magnitude (and the current state of the curricula at colleges 'training' programmers doesn't help). It is s easy to get cynical, and very discouraged, when I think about this. :-( Maybe I should have myself lobotomized and become one of the mindless grunts at Canada post. Cheers Ted ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] OT: Canadian Tax Database
And due at least in part to government (and other institutions operated by damned fools) opting for the least expensive provider rather than paying for someone who actually knows what they're doing. Just as buying cheap junk always comes back to get you, hiring incompetent fools that don't know their ass from a hole in the ground will come back to get you too. What you describe is a hundred times better than the reality... most of them actually get _expensive_ junk with some kick-back ;-) I concede. You're right. I recall being told by one project manager I knew years ago who had an opportunity to create a bid for an RFP issued by Transport Canada (long long ago). He refuse, so his employer prepared the bid. He refused because the RFP was a joke. There were absolutely no functional requirements, nor non-functional requirements, identified in the RFP. His company didn't get the contract, but being a bidder they did see the winning bid. It was just as ludicrous! It, too, failed to identify any requirements, and so it did not actually promise to deliver anything, working or not! They would have satisfied the terms of their contract if, after a few years, and hundreds of man-years, they walked away without delivering anything. That tragedy cost Canada hundreds of millions, if not billions, of dollars (I don't know if any final accounting was ever done - that would be opposed by the civil servants responsible lest they should be criticised for their incompetence), and ultimately nothing was delivered because the next elected government cancelled the project, refusing to through more money into their opposition's money pit - they prefer, of course, to through it into money pits created by their political supporters. The decisions to create the project, and to cancel it, were political, but the incompetence really responsible for it was lower done within the ranks of the civil service. The project could have delivered something good had the civil servants involved been competent! Similar nonsense happened with the firearms registry. For most of the early history of it, the software systems used where so bad, and inappropriate, that the clerks that had to use it could have been ten times more productive if they had the use of properly designed and implemented software. I can not understand how it became so outrageously expensive when the real needs for it were so simple (relative to products I have worked on). I'll bet real, genuinely capable, software engineers could have delivered a gold and platinum plated product for just a few million dollars; nothing really relative to what it ended up costing us. I know contractors that refuse to do business with the government because of this sort of nonsense, and I know, from discussions with ex-civil servants, that such incompetence is the norm in government. I know engineers who have been burned by government by investing a fortune in new products or services, and then educating relevant civil servants WRT to the new science or technology involved, only to find these same civil servants give contracts to provide the new product or service to incompetent bozos who didn't know the first thing about it. They just happened to be cheaper. Why waste time and money developing a product or service that is really relevant only to government when the risk of such unethical conduct by government is so high? I don't support anyone out there can describe a project or three where things were done right, to provide a cure for the depressing and discouraging nature of what this thread has turned out to be? Cheers Ted ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] OT: Canadian Tax Database
- Original Message - From: Alan Hodgson [EMAIL PROTECTED] To: pgsql-general@postgresql.org Sent: Thursday, March 08, 2007 11:32 AM Subject: Re: [GENERAL] OT: Canadian Tax Database On Thursday 08 March 2007 08:15, Ted Byers [EMAIL PROTECTED] wrote: They would have satisfied the terms of their contract if, after a few years, and hundreds of man-years, they walked away without delivering anything. That tragedy cost Canada hundreds of millions, if not billions, of dollars It didn't happen to be a gun owners' registry, perhaps? (fellow Canadians will understand :) No. This predated that fiasco by more than ten years. In fact, had it been done right, it would have been a much much larger project than the registry. Ted ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] OT: Canadian Tax Database
Richard Huxton wrote: http://www.thestar.com/News/article/189175 For instance, in some cases the field for the social insurance number was instead filled in with a birth date. Unbelievable. Sixty years of electronic computing, fifty years use in business and the professionals who built the tax system for a wealthy democratic country didn't use data types. This is Unbelievable? This is commonplace. And due at least in part to government (and other institutions operated by damned fools) opting for the least expensive provider rather than paying for someone who actually knows what they're doing. Just as buying cheap junk always comes back to get you, hiring incompetent fools that don't know their ass from a hole in the ground will come back to get you too. Come on, they don't hire incompetent fools. The hire the people You CAN'T be serious! Have you ever dealt with them or with the consequences of their incompetence? they need to fill their quota regardless of how well trained and experienced they are. I am not saying that non white males are in any way less competent than white males, but by removing them from the pool does not make things better. The biggest problem with quotas is not hiring less qualified staff, it is that less qualified staff know why they were hired and know that they are very unlikely to be fired, so they have little incentive to work hard or attempt to do their best, they can always fail upwards. What does this have to do with anything? No one here, except you, has said anything about the profile of the people involved WRT race, gender, religion, c. Nor has anyone said anything about qualifications. The only thing that has been said is that, based on what is seen in the work, the people responsible for that work must be incompetent. It is an inference based on what is seen in what has been done and has nothing to do with any of the prohibited grounds for discrimination used as excuses for affirmative action. And yes, I have seen cases where less qualified, even unqualified, people have been hired as a result of these affirmative action initiatives (and I have been told, by HR personelle in government, that certain favoured groups are deemed to be superior to white men, even if the favoured party has no education nor experience and the latter have earned doctorates and decades of experience), but no one has said anything about such people being employed on the projects to which I referred. But this is an aspect of our present society that is bound to degenerate into a flame war, launched by the politically correct, so we ought to say little, or even leave it alone. Those in power tend to be vicious, especially when there are no effective checks on their conduct and no consequences for what they do. Cheers Ted ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] security permissions for functions
- Original Message - From: Bill Moran [EMAIL PROTECTED] To: Rikard Pavelic [EMAIL PROTECTED] Cc: Tom Lane [EMAIL PROTECTED]; pgsql-general@postgresql.org Sent: Thursday, March 08, 2007 3:18 PM Subject: Re: [GENERAL] security permissions for functions In response to Rikard Pavelic [EMAIL PROTECTED]: Tom Lane wrote: No, it's operating as designed. Per the GRANT reference page: : Depending on the type of object, the initial default privileges may : include granting some privileges to PUBLIC. The default is no public : access for tables, schemas, and tablespaces; CONNECT privilege and TEMP : table creation privilege for databases; EXECUTE privilege for functions; : and USAGE privilege for languages. The object owner may of course revoke : these privileges. (For maximum security, issue the REVOKE in the same : transaction that creates the object; then there is no window in which : another user may use the object.) This seems clear enough. You'll need to revoke the default public EXECUTE privilege on any functions you don't want to be callable. As does this. Hmm, so the answer to my question How can I assign execute permission to a role for a single function inside schema. is I can't? How did you interpret do it like this to mean you can't do it? I too can not understand how he came to this conclusion, unless it wasn't obvious to him how to grant certain permissions to roles. I am curious, though. I shape my understanding of this using a metaphore of private, protected and public member functions in C++ classes. There is, of course, no point in having a function that can't be called under any circumstances, but it is often useful to have a variety of protected and private functions (and other members) that can only be called by other member functions or member frunctions of derived classes. Does the permission granting procedure for functions work in a similar fashion? Can I make a function as a part of a schema that is executable only by the owner and other functions in the schema, and no-one else, and still have a publically callable function in that schema invoke the private function? Or is my C++ based understanding leading me astray here? REVOKE ALL ON function name FROM PUBLIC; So this basically means that I can't fine tune the permissions through functions, but I can through views and tables? This looks like a bug in design to me ;( Relax. You (somehow) misunderstood Tom. I wonder if he's worried about granting permissions to roles or groups rather than to individual users. I mean the obvious statement, for the fine tuning he appears to me to want to do, would be to follow the REVOKE statement you show with a GRANT statement for a specific user.At least that is what I'd infer from what you and Tom wrote. Did I misunderstand either of you, or what Rikard is worried about? Ted ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] security permissions for functions
Functions are controlled by the same ACL mechanism that tables and everything else follows. Thus you have the idea of user id X may do Y with object Z i.e. user barbara may execute function somefunction(). But there's no real way to alter those permissions outside of changing the user ID context. So, I should be able to have user barbara execute function somefunction(), but, though barbara must not have access of object alpha lets say for data security reasons (and user sarah does), I could have function somefunction invoke another function that stores information about barbara's action to object alpha by changing user context temporarily and without barbara's knowledge; basically saying within function somefunction() something like execute function 'someotherfunction()' impersonating sarah and stop impersonating sarah once someotherfunction returns. Much like the way I can log in to Windows or Linux as one user and temporarily impersonate another while executing a particular program or administrative function (e,g, log into Linux as a mere mortal, start a bash shell providing credentials for an admin account, do my admin type stuff and then close the shell). Or have I misunderstood you here WRT user ID context? Ted ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] php professional
Mark Similar issues with Mysql. It's faster, But it doesn't matter *how* fast you get the *wrong* answer. :) I thought one of the first rules of software engineering was First make it right and only then make it fast! Granted, most of my experience has more to do with number crunching and scientific computing, but for the past 27 years, I always first ensured the code was provably correct before worrying about optimization. And this always involved a test suite that applied every possible mathematical test with randomly generated test data. For example, my code for matrix algebra problems generally came with test code and driver that would exercise the code with tens of millions of random matrices and vectors. One failure, even in millions of test cases, means the code is somehow broken. I have seen, and fixed, such cases before. One of my problems with database development is how to construct analogously strong test cases in order to prove the code correct. And it seems to get harder as the database gets larger. :-( In other words, I know how to do this with algebra and calculus, and C++/Java/FORTRAN programming, but not with sets and SQL programming. Cheers, Ted ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] backup database by cloning itself
On Feb 14, 2:14 am, filippo [EMAIL PROTECTED] wrote: On 13 Feb, 14:54, filippo [EMAIL PROTECTED] wrote: My target is to have the backup operation not affecting the users, so I want to be able to copy a database even if the database is used by someone. I could use pg_dump/pg_restore. pg_dump doesn't have to have exclusive access to database to perform the operation. My only problem is that pg_dump create a backup on a file, the best to me whould be to have a perfect clone (users/ data etc) of original database ready to be used just after the cloning. Is it possible? Thanks, Filippo Well, I could see you writing a client application that creates a clone by first recreating all the schemas in your database and then copying the data to the clone, and probably quite a bit more, In such a case, since you have absolute control over your client code, you can do anything you want. I am not sure, though, that that is the best use of your time and hardware resources, especially if all you're after is a backup. Just think of all the overhead involved in creating a new clone, and everything that implies, every hour. But why not further explore your backup options if all you're concerned about is a reliable backup. You may find 23.3. On-line backup and point-in-time recovery (PITR) in the postgresql documentation useful. You haven't given any information about why it might not be appropriate in your situation. If you're really doing what it looks to me like you're doing, then you may be in the beginning stages of reinventing Postgresql's PITR capability. The builtin support for PITR in Postgresql strikes me as sufficient for what you say you need. If you require more, which would imply you want more than the simple backup you say you're after, then defining a suitable suite of triggers and audit tables may serve. Neither should adversely affect your users. especially if your database is not very big . HTH Ted ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Having a problem with my stored procedure
As suggested earlier, it is probably cleaner to define separate triggers on insert and on update. That is possible, but they can't have the same names. You probably want to name them accordingly too, or you'll get naming conflicts. I suggest: DROP TRIGGER archive_articles ON news_content; While this is something I'd do if I had absolute control over all the software, in the OP's case, there is a third party application involved that can't be modified. I would therefore worry that the developer of that software may have also placed triggers of that name on that table. This would suggest a little slop in the practices of that developer (i.e. of the third party application, not the OP) since a decent naming convention would make a name collision between that developer's code and the OP's code highly unlikely, but that is another issue. If the OP is getting name collision when trying to create these triggers, the implication is that the developer of the third party app in fact defined triggers of the same names, so dropping previously created triggers may well break that app. It seems to me that dropping something I haven't created is a high risk action. A naming convention similar to what I use would solve that problem without the risk associated with dropping something someone else has developed. If I am working on something to be distributed, I use a naming scheme that prepends a very short string that makes it clear the code was developed by myself or one of my staff, and in languages that support a namespace, such as C++, I make certain there is a namespace ID unique to my organization. This eliminates the risk of a name collision unless some developer actually tries to impersonate me. In some cases, where I am working as part of a team, my preference is to do the same with the developer's ID (since always the developer who developed a given peice of code is responsible for fixing any bugs in it whever possible). Of course, always the fact is fully documented, both in the code and in design documents provided to the client. This is a discipline I impose on myself, as a courtesy to those who come after me, and it involves considerations any library developer necessarily worries about. It is not something I want to impose on those who come after me, but which I would require of those who develop libraries or databases or tools I need to use in order to be productive. Damn. It just occured to me that the OP had to be able to see the structure of the DB using a tool like pgAdmin, in order to just get the names of the tables and columns. Therefore, the OP should have also been able to see the definitions of any existing triggers and trigger functions. Further, the OP should be able to create more functions that could be called at the end of any existing trigger functions, thereby obtaining the functionality desired without compromizing the third party app. You can edit these functions from within pgAdmin. I have done so myself on trigger functions I created myself. This would make the OP's task almost trivially simple. Don't you just hate when you see the obvious solution only after spending time on other options? ;-) Cheers, Ted ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] backup database by cloning itself
Maybe his real goal all the backups readily available to be read by my program (opening the backup read only) is to have a historical record of what certain records looked like in the past. There are other ways of doing that, though. If your speculation is right, perhaps the OP ought to explain a little more fully why he needs 24 snapshots a day, or indeed any at all. It seems to me that if you really want a historical record of what certain tables looked like in the past, it would be smarter and more accurate to create triggers, for each possible operation, that store the relevant details in an audit table including especially who made the edits and when. This strikes me as being much less work than developing code that processes so many backups. Cheers Ted ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] backup database by cloning itself
- Original Message - From: Ron Johnson [EMAIL PROTECTED] To: pgsql-general@postgresql.org Sent: Wednesday, February 14, 2007 3:46 PM Subject: Re: [GENERAL] backup database by cloning itself It seems to me that if you really want a historical record of what certain tables looked like in the past, it would be smarter and more accurate to create triggers, for each possible operation, that store the relevant details in an audit table including especially who made the edits and when. This strikes me as being much less work than developing code that processes so many backups. I dunno about that. We use triggers to populate log tables that get extracted, truncated then loaded into a reporting db every night. Because of the night time batch cycle, there is no quiescent time to do this, so we have 2 log tables, and the triggers alternate which log table to insert into, depending on whether the day number is even or odd. That's in addition to the trigger logic to insert into history tables. It's a royal pain. Sounds painful, but not as painful as trying to do the same thing with a suite of backups and client code to read the backups. Or maybe the latter task is easier than it looks at first glance. Doesn't the amount of pain depend on the amount of traffic your servers see, the capability of the hardware you can devote to the task, what you're doing with the history tables, c.? When I've worked on this kind of problem, everything was in a single DB. The traffic, though, was low enough that for daily reports a simple view of the log selecting only the previous day's data, was sufficient, but only occasionally used. The principal reason for the log was to support accountability, to know who made changes, and when, and who knew what when. Historical reconstructions were required, e.g., only when something went awry and there was a need to know if a good decision was made based on bad data or a bad decision was made based on good data, or, e.g., during an audit of the business processes and decision support systems. Cheers, Ted ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Having a problem with my stored procedure
Would it not be simpler to just create two trigger functions, one that acts on insert operations and a second that acts on update operations? A 30 second glance at the Postgresql documentation showed me that it is possible to have more than one row level trigger for a given table, which implies the simpler options is possible. This would make for a much simpler design and avoid a conditional block that would then be unnecessary. This extra cost is, of course, trivial if only a handful of records are modified or created, but if the number is large, it could become significant. Or is there something in how an RDBMS handles triggers that would make it preferable to have a single trigger for all possible operations on a record? Something an old C++ programmer would miss if not informed about the peculiarities of database development. Did I miss something critical? My usual approach is to have functions remain as simple as practicable and do only one thing, unless there is a very good reason to have them more complex (in which a driver function that calls a number of simple functions may be preferable to one that tries to do everything). Simple functions are easy to validate, and once validated make validation of more complex driver functions easier. Why bother with so many temporaries? Isn't that a waste of both development time (lots of extra typing and opportunity for errors such as typos) and runtime CPU cycles? Why not just insert or update values directly from the NEW or OLD record into the target table rather than copying the values first into the temporaries and then from the temporaries into their final destination? HTH Ted - Original Message - From: William Leite Araújo To: Laura McCord Cc: pgsql-general@postgresql.org Sent: Tuesday, February 13, 2007 12:19 PM Subject: Re: [GENERAL] Having a problem with my stored procedure 2007/2/13, Laura McCord [EMAIL PROTECTED]: To make a long story short, I am archiving data from an original table to a table I created. This is a third party web application that I am doing this with, so I can't revise the structure/code of this application. With this said, if the original table goes through an insert or update action I want to replicate the information to my archive table. I don't want to delete any articles from my archive table so this is why I am not wanting to do anything based on a delete action. The only problem that I am facing is how to tell the function that I want to perform an update if an update occurred and an insert if an insert action occurred. I want to have different actions occur depending on if the trigger was based on an insert or update. Help, I've been stumped for two days. Thanks in advance. This is what I have so far: CREATE TRIGGER archive_articles AFTER INSERT OR UPDATE ON news_content EXECUTE PROCEDURE su_archive_articles(); CREATE OR REPLACE FUNCTION su_archive_articles() RETURNS TRIGGER LANGUAGE plpgsql AS ' DECLARE tmp_news_id CHARACTER varying(48); tmp_title CHARACTER varying(100); tmp_abstract CHARACTER varying(300); tmp_news_story TEXT; tmp_topic_id CHARACTER varying(10); tmp_create_date DATE; tmp_author CHARACTER varying(50); tmp_begin_date DATE; tmp_end_date DATE; tmp_priority CHARACTER(1); tmp_image_name CHARACTER varying(512); tmp_image_mime_type CHARACTER varying(50); tmp_layout_type CHARACTER varying(10); BEGIN SELECT INTO tmp_news_id news_id from news_content where last_inserted(news_id); SELECT INTO tmp_title title from news_content where last_inserted(news_id); SELECT INTO tmp_abstract abstract from news_content where last_inserted(news_id); SELECT INTO tmp_news_story news_story from news_content where last_inserted(news_id); SELECT INTO tmp_topic_id topic_id from news_content where last_inserted(news_id); SELECT INTO tmp_create_date create_date from news_content where last_inserted(news_id); SELECT INTO tmp_author author from news_content where last_inserted(news_id); SELECT INTO tmp_begin_date begin_date from news_content where last_inserted(news_id); SELECT INTO tmp_end_date end_date from news_content where last_inserted(news_id); SELECT INTO tmp_priority priority from news_content where last_inserted(news_id); SELECT INTO tmp_image_name image_name from news_content where last_inserted(news_id); SELECT INTO tmp_image_mime_type image_mime_type from news_content where last_inserted(news_id); SELECT INTO tmp_layout_type layout_type from news_content where last_inserted(news_id); IF TG_OP = 'INSERT' THEN //This is to be done if an INSERT action was done on the table INSERT INTO su_archives(news_id, title, abstract, news_story, topic_id, create_date, author, begin_date, end_date, priority, image_name
Re: [GENERAL] Having a problem with my stored procedure
On Feb 13, 1:20 pm, [EMAIL PROTECTED] (Laura McCord) wrote: I tried doing two different triggers as you suggested but I kept getting an error stating: psql:archive_news_articles.sql:75: ERROR: trigger archive_articles for relation news_content already exists psql:archive_news_articles.sql:80: ERROR: trigger update_archives for relation news_content already exists So, I thought perhaps it couldn't be done. Ted Byers wrote: Would it not be simpler to just create two trigger functions, one that acts on insert operations and a second that acts on update operations? A 30 second glance at the Postgresql documentation showed me that it is possible to have more than one row level trigger for a given table, which implies the simpler options is possible. This would make for a much simpler design and avoid a conditional block that would then be unnecessary. This extra cost is, of course, trivial if only a handful of records are modified or created, but if the number is large, it could become significant. Or is there something in how an RDBMS handles triggers that would make it preferable to have a single trigger for all possible operations on a record? Something an old C++ programmer would miss if not informed about the peculiarities of database development. Did I miss something critical? My usual approach is to have functions remain as simple as practicable and do only one thing, unless there is a very good reason to have them more complex (in which a driver function that calls a number of simple functions may be preferable to one that tries to do everything). Simple functions are easy to validate, and once validated make validation of more complex driver functions easier. Why bother with so many temporaries? Isn't that a waste of both development time (lots of extra typing and opportunity for errors such as typos) and runtime CPU cycles? Why not just insert or update values directly from the NEW or OLD record into the target table rather than copying the values first into the temporaries and then from the temporaries into their final destination? HTH Ted - Original Message - *From:* William Leite Araújo mailto:[EMAIL PROTECTED] *To:* Laura McCord mailto:[EMAIL PROTECTED] *Cc:* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *Sent:* Tuesday, February 13, 2007 12:19 PM *Subject:* Re: [GENERAL] Having a problem with my stored procedure 2007/2/13, Laura McCord [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]: To make a long story short, I am archiving data from an original table to a table I created. This is a third party web application that I am doing this with, so I can't revise the structure/code of this application. With this said, if the original table goes through an insert or update action I want to replicate the information to my archive table. I don't want to delete any articles from my archive table so this is why I am not wanting to do anything based on a delete action. The only problem that I am facing is how to tell the function that I want to perform an update if an update occurred and an insert if an insert action occurred. I want to have different actions occur depending on if the trigger was based on an insert or update. Help, I've been stumped for two days. Thanks in advance. This is what I have so far: CREATE TRIGGER archive_articles AFTER INSERT OR UPDATE ON news_content EXECUTE PROCEDURE su_archive_articles(); CREATE OR REPLACE FUNCTION su_archive_articles() RETURNS TRIGGER LANGUAGE plpgsql AS ' DECLARE tmp_news_id CHARACTER varying(48); tmp_title CHARACTER varying(100); tmp_abstract CHARACTER varying(300); tmp_news_story TEXT; tmp_topic_id CHARACTER varying(10); tmp_create_date DATE; tmp_author CHARACTER varying(50); tmp_begin_date DATE; tmp_end_date DATE; tmp_priority CHARACTER(1); tmp_image_name CHARACTER varying(512); tmp_image_mime_type CHARACTER varying(50); tmp_layout_type CHARACTER varying(10); BEGIN SELECT INTO tmp_news_id news_id from news_content where last_inserted(news_id); SELECT INTO tmp_title title from news_content where last_inserted(news_id); SELECT INTO tmp_abstract abstract from news_content where last_inserted(news_id); SELECT INTO tmp_news_story news_story from news_content where last_inserted(news_id); SELECT INTO tmp_topic_id topic_id from news_content where last_inserted(news_id); SELECT
Re: [GENERAL] Having a problem with my stored procedure
On Feb 13, 1:45 pm, [EMAIL PROTECTED] (Laura McCord) wrote: About your last comment, I can't do any revisions of the third party application where the inserts and updates are occurring. Plus, this whole idea came from a workaround based on a glitch in the software where the expiration of articles is not occurring ,therefore I have to do a delete articles to prevent them from being displayed on the web interface. Also, I don't entirely want to get rid of them completely and that is why I am saving records in an archive table to be used in another application that I need to write in the future. So, this is the reason for the redundancy.it's a long story. -Laura I don't think I was suggesting you do anything with the original application. Rather, I was suggesting you alter what you're were already trying to do. You can apparently create as many trigger functions as you like (implied by what I have read in the PostgreSQL documentation), presumably without name collisions though, and you have absolute control of what you do within the functions you develop, as long as it is syntactically correct. So far, I see nothing preventing you from writing two trigger functions or requiring you to use so many temporaries. How does not touching the third party application affect the triggers you add to the database? Cheers Ted ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postgres SQL Syntax
to assume that the tools we like now will still exist ten years from now. Anyone remember IBM's OS/2? That was my favourite OS, and it was arguably better than any flavour of Windows available at the time. But even though I used it exclusively ten years ago, it is no longer a viable option for any machine I now use. I am sure anyone reading this who is of my generation could produce a long list of products that they used and liked which no longer exist or which are now no longer commercially viable. C++ is still a good option for many kinds of application development, but I don't think there are very many people using Watcom's compiler to produce commercial applications. SQL will likely exist ten years from now, but will our favourite RDBMS? I won't make that prediction, but I think I can minimize the risk by creating SQL scripts that use the highest common factor across the RDBMS products I have at my disposal. Therefore, I should be able to use my scripts, with minimal pain, regardless of what RDBMS products are available at that time. I do much the same for my C++ coding. I try to use the highest common factor in the language, as implemented by the suite of compilers I use, and keep the implementation specific stuff to a minimum in completely separate compilation units (and even in different directories). I just see it as inevitable that significant refactoring will be required, especially when migrating from one suite of tools to another, so I plan an architecture for my code that should make it as easy as possible, and similarly, for moving data around, I developed my approach to be as easy and reliable as possible, even if it is not the most elegant or efficient. Cheers, Ted ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] SQL textbook
- Original Message - From: Michael Schmidt To: Jan Mura ; pgsql-general@postgresql.org Sent: Thursday, February 08, 2007 12:54 PM Subject: Re: [GENERAL] SQL textbook Many books are available for download. Joe Celko is a well respected author and just about anything with his name on it is good. Here are some specific suggestions. Do you have any URLs for the books that are available for download? Cheers Ted ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Problem with result ordering
Hi Ted, Hi Thorsten, Am Donnerstag, 25. Januar 2007 19:53 schrieb Ted Byers: The question I'd ask before offering a solution is, Does the order of the id data matter, or is it a question of having all the results for a given id together before proceeding to the next id? The answer to this will determine whether or not adding either a group by clause or an order by clause will help. Is there a reason you client app doesn't submit a simple select for each of the required ids? You'd have to do some checking to see whether it pays to have the ordering or grouping operation handled on the server or client. Other options to consider, perhaps affecting performance and security, would be parameterized queries or stored procedures. Yes, the reason is, that a large list of ids are generated from a users request, coming from outside our application (for example as an EXCEL sheet), and the output msut be ordered by the ordering in this list. Surely we can handle this in our client application (java code), but I think it would be more sexy to have this possibility in the database logic, since our client-application should not do much more than retrieve data from the db and then present it. To be honest, your rationale here makes no sense to me, business or otherwise. I think I'd be remiss if I didn't tell you this. Of course, what you do is up to you, but I never concern myself with what is 'more sexy' when designing a distributed application. I can see a number of situations in which your approach would result in terrible performance. If you have a lot of users, and you're putting all the workload on your data server, all the client apps will end up spending a lot of time waiting for the server to do its thing. Ordering the display of data, while it can be helped by the database, is really a presentation issue and IMHO the best place for that, by default, is the client app (i.e. do it server side only if there is a very good reason to do so). If I understand you correctly, you have a java application as the client app, and it receives your users' ID data, from whatever source. I'd assume, and hope, that you have built code to read the IDs from sources like your Excel spreadsheet, but that doesn't matter that much. For the purpose of this exchange, it wouldn't matter if you made your clients enter the data manually (though IMHO it would be sadistic to make users manually enter a long list of values when you can as easily have the program read them from a file of whatever format). The point it that you say our client-application should not do much more than retrieve data from the db and then present it, and this implies you get the IDs into your client application. You say you're getting a large list of ids coming from outside our application. If your database is large, and your list of IDs is long, you may be taking a huge performance hit by making the database perform either an ordering or grouping that both would be unnecessary if you constructed a series of simple parameterized queries in your client app and executed them in the order you desire. Whether or not this matters will depend on just how large your large is, how many simultaneous users there'd be, and how powerful the server hardware is (but I'd be quite upset if one of my developers made me pay more for a faster server just because he or she thought it would be sexy to do all the work on the server rather than the client). Given what you've just said, if I were in your place, I'd be addressing the ordering issues in the client java application first, and then look at doing it in the database only if doing this work in the client app presented problems that could be addressed by doing the work on the server. Equally importantly, if there is a reason to not take the obvious and simple option, I'd be doing some performance evaluation based on enough test data to reasonably simulate real world usage so that I'd have hard data on which option is to be preferred. Thanks for your comments Thorsten You're welcome. I hope you find this useful. Ted -- CappuccinoSoft Business Systems Hamburg ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Problem with result ordering
The question I'd ask before offering a solution is, Does the order of the id data matter, or is it a question of having all the results for a given id together before proceeding to the next id? The answer to this will determine whether or not adding either a group by clause or an order by clause will help. Is there a reason you client app doesn't submit a simple select for each of the required ids? You'd have to do some checking to see whether it pays to have the ordering or grouping operation handled on the server or client. Other options to consider, perhaps affecting performance and security, would be parameterized queries or stored procedures. Much depends on the design and implementation of your client app. I know, e.g., that in ASP.NET 2, and later, you can handle multiple resultsets from a single datasource, so a trivially simple SQL script that consists of the simplest SELECT statements might be a viable option. But it is hard to advise since you don't say if you have access to or control over the source code for the client app or what it is written in. In my experience, I always have to run some benchmarks for a given distributed application to figure out how best to distribute the workload, and there are always plenty of different ways to do things, with often big differences in performance and security. It seems never to be trivial to figure this out without some testing before a final decision. I can never just assume that it is best to do all the processing in the RDBMS backend to my apps. HTH Ted - Original Message - From: Thorsten Körner [EMAIL PROTECTED] To: pgsql-general@postgresql.org Sent: Thursday, January 25, 2007 10:45 AM Subject: [GENERAL] Problem with result ordering Hi, when I fire the following query: select m_id, m_u_id, m_title, m_rating from tablename where m_id in (26250, 11042, 16279, 42197, 672089); I will get the same results in the same order, as in in the next query: select m_id, m_u_id, m_title, m_rating from tablename where m_id in (11042,42197,672089,26250,16279); I wonder, how it is possible, to retrieve the results in the same order, as queried in the list. The listed IDs are from an application outside the database. Version is PostgreSQL 8.2.1 Has anyone an idea, how to do this, while PostgreSQL knows nothing about hints, like oracle does? THX, Thorsten ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] MSSQL/ASP migration
Is the original application ASP or SP.NET? It makes a difference, particularly if it was developed to take advantage of ASP.NET 2. It might conceivably be ASP.NET 3, but since that is brand new I can't see anyone paying to replace an ASP.NET 3 application that was just created. If it is ASP.NET 2, and you can't find a PostgreSQL provider class, your simplest approach after migrating the data might be to write your own provider (check a recent, decent reference on ASP.NET 2 for details - there are several). OTOH, if it is ASP.NET 1.x or the earlier ASP, your planned conversion to PHP might be worth comparing to developing it de novo with ASP.NET 3. I am not an MS advocate, and I don't like tying myself to one vendor, but for obvious commercial reasons I have to be aware of the options including MS options. I recently, therefore, started studying all things .NET, and comparing to other application frameworks I've worked with, MS seems to have done a decent job with ASP.NET 2 and 3. Therefore, if I have a client running mostly MS software, and time is of the essence, I would probably make .NET, ASP.NET3 or a Windows .NET app, as the case may be, my first choice; that is unless I find a public domain framework in Perl or PHP that is competitive with .NET. That said, I've not had an opportunity to see how it performs in a production setting, so YMMV. HTH Ted - Original Message - From: Robert Fitzpatrick [EMAIL PROTECTED] To: PostgreSQL pgsql-general@postgresql.org Sent: Monday, January 22, 2007 11:05 AM Subject: [GENERAL] MSSQL/ASP migration I have a customer who is wants to migrate his MSSQL database to PostgreSQL and we'll replace his application ASP with PHP. The issues should be limited as there are no stored procedures or triggers in MSSQL, just structure and data should be all that is needed to migrate. I have never migrated from MSSQL or to PostgreSQL, but have handled database migration in the past for other DB's. I know there is mssql2pgsql script out there somewhere and I find lots of info on the subject. Of course, when we rebuild his application, some db structure will change, I was planning to do all the changes to structure after a successful migration of his current structure now w/o data. After the new application is done, then create a migration path for the data. Is that the best way to handle these types of migrations? The customer will want to continue working on the old system during the several months of development in the new. Any docs or other helpful info is welcome, just looking for some advise. Thanks in advance, -- Robert ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Hardware related question: 3ware 9500S
- Original Message - From: Merlin Moncure [EMAIL PROTECTED] To: Janning Vygen [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Sent: Wednesday, April 12, 2006 12:31 PM Subject: Re: [GENERAL] Hardware related question: 3ware 9500S [snip] - I want to know if 3ware 9500 S is recommended or if its one of those controllers which sucks. escalade is a fairly full featured raid controller for the price. consider it the ford taurus of raid controllers, it's functional and practical but not sexy. Their S line is not native sata but operates over a pata-sata bridge. Stay away from raid 5. Hi Merlin Why? What's wrong with raid 5? I could well be wrong (given how little attention I have paid to hardware over the past few years because of a focus on developing software), but I was under the impression that of the raid options available, raid 5 with hot swappable drives provided good data protection and performance at a reasonably low cost. Is the problem with the concept of raid 5, or the common implementations? Do you have a recommendation regarding whether the raid array is built into the server running the RDBMS (in our case PostgreSQL), or located in a network appliance dedicated to storing the data managed by the RDBMS? If you were asked to design a subnet that provides the best possible performance and protection of the data, but without gold-plating anything, what would you do? How much redundancy would you build in, and at what granularity? Ted ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Hardware related question: 3ware 9500S
- Original Message - From: Scott Marlowe [EMAIL PROTECTED] To: Ted Byers [EMAIL PROTECTED] Cc: Merlin Moncure [EMAIL PROTECTED]; Janning Vygen [EMAIL PROTECTED]; pgsql general pgsql-general@postgresql.org Sent: Wednesday, April 12, 2006 2:24 PM Subject: Re: [GENERAL] Hardware related question: 3ware 9500S On Wed, 2006-04-12 at 13:10, Ted Byers wrote: - Original Message - From: Merlin Moncure [EMAIL PROTECTED] To: Janning Vygen [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Sent: Wednesday, April 12, 2006 12:31 PM Subject: Re: [GENERAL] Hardware related question: 3ware 9500S [snip] Why? What's wrong with raid 5? I could well be wrong (given how little attention I have paid to hardware over the past few years because of a focus on developing software), but I was under the impression that of the raid options available, raid 5 with hot swappable drives provided good data protection and performance at a reasonably low cost. Is the problem with the concept of raid 5, or the common implementations? Do you have a recommendation regarding whether the raid array is built into the server running the RDBMS (in our case PostgreSQL), or located in a network appliance dedicated to storing the data managed by the RDBMS? If you were asked to design a subnet that provides the best possible performance and protection of the data, but without gold-plating anything, what would you do? How much redundancy would you build in, and at what granularity? There have been NUMEROUS discussions of RAID-5 versus RAID 1+0 in the perform group in the last year or two. Short version: Interesting. I take it that RAID 1+0 refers to a combination of Raid 1 and RAID 0. What about RAID 10? I am curious because RAID 10 has come out since the last time I took a look at RAID technology. I am not sure what it actually does differently from RAID 5. This question of data security is becoming of increasing importance to me professionally since I will soon have to advise the company I'm working with regarding how best to secure the data managed by the applications I'm developing for them. I will need overall guidelines to produce a design that makes it virtually impossible for them to lose even on field in one record. The data is both sensitive and vital. Fortunately, I have a few months before we need to commit to anything. Also, fortunately, with one exception, the applications rely on a data feed that comes in once a day after normal working hours, so I won't have to worry about writes to the DB other than what my script does to load the datafeed into the DB. All other access is read only. This should make it easier to produce a strategy to protect the data from any kind of technology failure (software or hardware). Cost is a factor, but reliability is much much more important! Thanks, Ted ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] [Slightly OT] data model books/resources?
On Thursday 30 March 2006 03:03, Aaron Glenn wrote: Anyone care to share the great books, articles, manifestos, notes, leaflets, etc on data modelling they've come across? Ideally I'd like to find a great college level book on data models, but I haven't come across one that even slightly holds definitive resource-type status. I've heard that Relational Database Design (ISBN: 0123264251) is good for college level introductory material, though the book I generally recommend most is Practical Issues in Database Management (ISBN: 0201485559) Feel free to reply off list to keep the clutter down - I'd be happy to summarize responses for the list. We're all about clutter :-) Well then, in that case, can I add to the clutter by asking a question about IT training? I was just asked today, by a vice president in the company I'm working with, to train one of his staff to become a database programmer and administrator. I have taught software engineering using UML, and programming in Java and C++. I have not taught database programming and administration, although I have done some of each for some of my own applications. My Question? Can the folk in this group help me develop a reading list and a list of competencies for this fellow to master? While I can easily develop a list of books dealing with databases in general and SQL in particular, it is not so easy to separate the wheat from the chaff, and I do not want to waste a pile of money on evaluating the range of books that are available. I'd therefore like accounts of books to avoid, and why, as well as books that are essential in any respectable collection, and why. I'm interested both in text books, with exercises, and reference books (both theoretical and practical). Thanks Ted ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] PostgreSQL's XML support comparison against other RDBMSes
May I ask a question about this? I will be working on an older database in which the original developer stored XML documents as a single variable length text field. To process it, it has to be retrieved in full and parsed. But the structure of it is simple in that it has an element for each field in the DB that replaced the earily version. But people are still using the earlier one because they still need access to the old data and no tool has yet been written by my predecessors to move the old data over to the new DB. Does the XML support you're all talking about make it less tedious to develop tools to parse these old XML files and put their data into the right field of the right table? I can develop the tool our users need using the resources I have at hand, but the proces of parsing these XML files is certain to be very tedious; something I am not looking forward to. There is a reason I don't do much with XML even though I know how. Ted - Original Message - From: Steve Atkins [EMAIL PROTECTED] To: pgsql general pgsql-general@postgresql.org Sent: Wednesday, March 29, 2006 12:21 PM Subject: Re: [GENERAL] PostgreSQL's XML support comparison against other RDBMSes On Mar 29, 2006, at 2:57 AM, Ivan Zolotukhin wrote: On 3/29/06, Peter Eisentraut [EMAIL PROTECTED] wrote: Ivan Zolotukhin wrote: BTW, are there any plans for its improvement? What are the first steps for the people who want to contribute to PostgreSQL in this field? The first step would be making a detailed analysis of what XML support is supposed to mean. So let's make such analysis here within PostgreSQL community! There were a lot of talks about XML support in Postgres, but they did not lead to any steady work in this direction. IMHO, it's time to start it now, no? The people who want it should explain what it is, ideally by pointing at the relevant standards documents. Cheers, Steve ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [Bulk] [GENERAL] General advice on database/web applications
I am developing a small web application. Currently, our web server is sitting outside our firewall (running its own firewall), and the application being developed would let users do things like place orders. My question is...what and where is the database for this? What do you mean when you say your web server is running its own firewall? I could well be wrong, but I am not aware of a web server that can run a firewall; web servers and firewalls are, as I understand them, quite different kinds of software, though I am aware of some hardware that have built in firewalls. Your question, though, doesn't make sense. If, as you say explicitly in your first sentence, that you're developing a small web application, then either you don't have a database and need to create it, or you have already created your database and know both where and what it is. If you haven't created it already, then you can create it and you have absolute control over where to put it and what RDBMS to use. The only circumstance in which I could imagine you having a database back end for your application but not knowing about it is if you bought hosting services from a company that provides such services. But if that's the case, then you ought to be asking that company about it. But if that's the case, they probably already have a ready made virtual store application for you to use, which makes developing your own unnecessary unless you're planning to do your own hosting, and that takes us back to you having complete control over what you use and where you put it. If I were to create such a web application as you describe, I'd create a database using PostgreSQL or something similar and have it live inside the firewall, configured to respond only to applications running behind the firewall. Under no circumstances would I want it to accept connections across the firewall. Similarly, I'd have my application server and my httpd server behind the firewall and configured to accept connections across the firewall but only from proxy servers set up in a DMZ. Since you are dealing with sensitive information such as financial data, you are going to have to design security into your application from start to finish, and then harden your entire network inside and out, including especially your firewall and each machine individually. You have some legal responsibilities to protect your clients' data. I'm told, by folk who ought to know, that you could face major problems if you fail to exercise due diligence in protecting your clients' data. Cheers, Ted ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Double checking my logic?
I have to deal with the following situation. I get a daily feed of data for several tables. This data is retrieved from the supplier by ftp in the form of CSV textfiles. I know I can load the files using "load data". The relational structure between the files and tables is simple: one file contains data for a lookup table. But the data in the lookup table is dynamic, potentially changing from one day to the next (though in practice, the rate of change is likely to be very slow). I will also need to maintain a history in such a way that any and all changesin the data can be fully audited. This means that, for auditing purposes, I will need to be able to extract valid time for each item in the lookup table. The only likely change in the data will be that occassionally items will be added or removed from the lookup table. None of the data already present in the database will be editable (data validation happens before I get the feed). The only analysis of which I have been told is done daily, as the feed is received. At present, no-one looks at earlier data (although I expect that will change as I create a better structure for the database to support audits). I am trying to create a flexible design so that refactoring will be simple if and when the assumptions or practices need to be changed. I know how to handle all this using a brute force approach, but I expect that approach will be painfully slow. So here is what I am planning. 1) create a suite of tables corresponding to the files in the feed, with the addition of a date/time stamp containing the date and time on which the data being processed was received. 2) create a suite of temporary tables corresponding to the tables created in step 1 (but without the date/time stamp) 3) load the data into the temporary tables 4) analyse the data while it is in the temporary tables, storing the result of the analysis in new tables 5) copy the data into the permanent tables, and add the date and time stamp for the data (this date/time stamp is not present in the files retrieved). 6) free the temporary tables Now, this secondlast step is brute force, adequate for all but one of the tables: the look up table. If I stick with the brute force approach, the lookup table will waste a significant amount of space. This won't be much initially, but it is guaranteed to get worse as time passes and I'd expect the lookup performance to degrade as the amount of data in the lookup table increases. Each record in the lookup table represents a product, and both the name and the product will have valid time intervals that may not be related. The name may change because the product has been renamed for whatever reason (and we don't care why), or the name may disappear altogether because the product has been discontinued. We can distinguish the two cases because each product has an ID that remains valid while the product exists, and the ID won't be in the data at all if the product is discontinued. I am considering creating an additional table just to lookup product names, but with two date and time stamps. The first would represent the first time the product name appears in the data and the last would represent the last time the product nameis present in the data. The first of these would be edited only once, and that is on the first day for which we have data. I am torn between updating the last of these every day, until the name disappears, or leave it null until the name disappears. leaving it null would save on space, but updating it with the current data and time should save time since it would not be necessary to execute a complex conditional on every product ever included in the database. If we update it only for those items in today's data, those that were terminated before today will not have their date/time stamp updated, so the two fields will always represent the time interval for which the name is valid. A similar logic applies to the product ID. Have I overlooked anything that is obvious to you? Any gotchas I should be aware of? What opportunities for improving performance do you see? The ultimate objective is to have this fully automated from a shell script that is configured by our sysop to execute at a specific time every day Monday through Friday. Can all of the steps I describe above be implemented using ANSI standard SQL, and thus put into a stored procedure, or should I look at doing some of it in a Perl script or java application? I suppose I will have to have at least a basic Java application, or perl script, if only to connect to the database and invoke any functions I have created to do this. Any suggestions would be appreciated. Thanks, Ted R.E. (Ted) Byers, Ph.D., Ed.D.R D Decision Support Solutionshttp://www.randddecisionsupportsolutions.com/
Re: [Bulk] Re: [GENERAL] Tool
Bob, If you're interested in a free java tool, you might want to look at NetBeans (now version 5). It is one of the best I have seen, and I have used a wide variety of such tools in a number of different programminglanguages. Understand, though, that you have a wide variety of options, ranging from a web interface (which may be preferred if your users will be accessing the database over a network) at one extreme to a thick client, perhaps started using Sun's Web Start, at the other. An advantage NetBeans offers is that it is easy to learn, and the user interface is quite intuitive, especially for GUI design and implementation. Eclipse is another one with a good reputation, but I don't like it so much. Its support for GUI design seems to me to bequite weak. What you haven't said in any of your posts is what programming languages you know. Nor do you indicate whether or not you have the skills required to work with an editor like emacs to create a very thin client (using a web server like Apache) with server side scripting using perl or php. It is hard to make a recommendation without knowing something about your skills. If you don't already have the skills of an intermediate Java programmer, then none of the tools mentioned in this thread will help you much. If you knew only C++ (again at least at an intermediate level), for example, the only tools I know of that would be suitable are commercial (Borland's C++ Builder and Microsoft's Visual C++). And I have no idea what the corresponding tools would be on unix/Linux (if anyone knows of such tools for Linux, I'd appreciate hearing about them as one of my computers will soon have Windows XP removed and replaced by Suse linux). Cheers, Ted - Original Message - From: Bob Pawley To: Wes Williams ; 'Postgresql' Sent: Friday, February 10, 2006 1:03 PM Subject: [Bulk] Re: [GENERAL] Tool This looks quite interseting. Thanks Wes Bob - Original Message - From: Wes Williams To: 'Bob Pawley' ; 'Postgresql' Sent: Friday, February 10, 2006 9:14 AM Subject: RE: [GENERAL] Tool You may wish to consider creating a web interface GUI. For example: Sun's free Java Studio Creator2 http://developers.sun.com/prodtech/javatools/jscreator/index.jsp Alittletolearnupfront,butthentheGUIgoodnesscomesthrough. -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]On Behalf Of Bob PawleySent: Friday, February 10, 2006 12:02 PMTo: PostgresqlSubject: [GENERAL] Tool I am developing a database which needs testing prior to developing the host program. I am looking for an easy-to-learn tool. I want to develop an interfacethat I can use toenter and display information in a similar fashion to how a user would operate the database. A GUI tool would be ideal. I am running Postgresql v 8.0 on Windows XP. Bob Pawley
Re: [GENERAL] Primary keys for companies and people
or misidentified, and the cost to the owner or user of the application or database. Who will suffer if a mistake is made? Will, or can, bad things happen if a given person ends up with more than one ID? What is the cost, and who bears this cost, if more than one person can use the same ID? 3) Can we construct a suite of best practices from which we can select given specific functional or non-functional constraints as developed for our application? Included with this question is consideration of protection of sensitive data in general, and protection of data that might conceivably be used by cyber-criminals in activity related to identity theft, or to use sensitive data to the harm of the person so identified. 4) How is biometric data best stored and searched for use in authentication processes within an arbitrary application? I guess this question assumes that biometric data needs to be used in an authentication request, and it occurs to me that for some applications, it may be sufficient to use biometric data in creation of a unique user id, and subsequently may be needed only for certain sensitive processes or resources. My own feeling is that some options are very easy, and some of these are adequate for some situations, but that there are others that may be needed depending on the sentivity of the data in question or on the potential cost to one or more parties to a given business process. I expect to be considering these issues extensively over the next few years since they are relevant to some of the web applications I am designing. Any insights you, or others, may have on these questions would be greatly appreciated. Cheers, Ted R.E. (Ted) Byers, Ph.D., Ed.D. R D Decision Support Solutions http://www.randddecisionsupportsolutions.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Logging statements and parameter values
On Mon, Jan 30, 2006 at 04:31:29PM -0500, Bruce Momjian wrote: I assume it is this TODO: * Allow protocol-level BIND parameter values to be logged --- Ted Powell wrote: Our development group needs to have the option of logging all SQL statements including substituted parameter values. [...] That's it! (I should have thought to look in the TODO.) Has any design work been done on this? -- Ted Powell [EMAIL PROTECTED] http://psg.com/~ted/ GPL code ... It's the difference between owning your own home and just renting. --PJ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Logging statements and parameter values
On Mon, Jan 30, 2006 at 05:19:23PM -0500, Bruce Momjian wrote: [...] * Allow protocol-level BIND parameter values to be logged [...] That's it! (I should have thought to look in the TODO.) Has any design work been done on this? No. I am with Simon Riggs today at my house and I asked him, hoping he can get it done for 8.2. I don't think it is very hard. Various things have been pushed on my stack since I posted about this. When it gets near the top again, I'll check back. Thanks for the response. -- Ted Powell [EMAIL PROTECTED] http://psg.com/~ted/ GPL code ... It's the difference between owning your own home and just renting. --PJ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Logging statements and parameter values
Our development group needs to have the option of logging all SQL statements including substituted parameter values. Getting output in the form: ... WHERE contact.login_con = $1 AND company.login_co = $2 was no problem, but nothing that I tried turning on in the config file yielded values for $1 and $2. Digging into the source for 8.1.1 brought me to this code in .../backend/tcop/postgres.c (lines 1449+) /* We need to output the parameter values someday */ if (log_statement == LOGSTMT_ALL) ereport(LOG, (errmsg(statement: BIND %s, portal_name))); /* * Fetch parameters, if any, and store in the portal's memory context. */ if (numParams 0) It seems to me that a point near the bottom of the loop over parameters (1564+) params[i].kind = PARAM_NUM; params[i].id = i + 1; params[i].ptype = ptype; params[i].isnull = isNull; i++; } (params[i].value is set in a couple of places higher up in the loop) would be a good place to log each parameter, but... Has this not been done simply because nobody has gotten around to it, or are there pitfalls? Although I've been using PostgreSQL for several years, this is my first venture into its source code beyond watching it build. Also, the Datum params[i].value, does it necessarily hold displayable text, even when its content is the output of a binary input converter? Is there a utility log routine somewhere that I can simply feed a Datum to? -- Ted Powell [EMAIL PROTECTED] http://psg.com/~ted/ GPL code ... It's the difference between owning your own home and just renting. --PJ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [Bulk] Re: [GENERAL] Best programming language / connectivity for best performance
Nico, I do not believe there is anyone who can tell you the best way to go as that is determined by far more than DB access. That said, I am inclined to agree with John. I would expand on that to say that if Delphi is the only language you know, then it is the best option for getting productive quickly. Otherwise, you have to resort to multiparadigm development and determine which of the languages you know best meets the requirements for your project. I use FORTRAN, C++, and JAVA, and I know VB, Perl, and PHP. If I require an outstanding object model, my choice reduces to C++ and JAVA, because the object model in FORTRAN is almost as seriously broken as that in VB. I generally don't use VB, except for trivial toys, and even there, it has been years since I used it. Perl and PHP are interesting, but I have yet to see an advantage they can offer over JAVA (including servlets and JSP) or C++. I use FORTRAN for specialty programs requiring extensive number crunching, but it is being displaced in my own work by C++ particularly because of the advantages provided by metaprogramming. The only languages I have used for commercial application development are C++ and Java. If there is significant networking and/or security, I lean towards JAVA because of the related libraries built into the SDK. OTOH, if I need real generic programming, I lean toward C++ because, while JAVA has made some progress in generics, it is still quite broken relative to the powerful support for generics in C++. And, if there is a need for metaprogramming based on generic programming, then C++ is the only choice I am aware of. You see, every language has it's strengths and weaknesses, and you generally need to choose based on what you know about each language you know how to use, and the availability of libraries to support your problem domain. In my experience, you become a more productive and better programmer as you make a point of learning more programming languages and give thought to their relative strengths and weaknesses. Almost invariably, when my use of a DB is slow, it is because I have a problem with my queries. This happens more often than it should, but then I have been programming using languages like FORTRAN, Pascal, Basic and C++ (all closely related Algol languages) for well over two dozen years and I started studying RDBMS and SQL only a few years ago. Like John, when I'm using a RAD tool (and I should say I like Delphi almost as much as I like Netbeans and Borland's C++ Builder), I handle DB access in my own code and let the GUI handle the interface only. My experience with bound controls is similar to John's. They are a convenience for novice programmers, but once you start adding more professional features, such as data validation and formatting, they leave just about everything to be desired. In my work, they served as a convenience during proof of concept work to show a client where their project was heading, but in virtually every case, I replaced them for one reason or another. They just didn't provide the feature set I required and it was just less work to use the basic controls than it was to fight with broken bound controls. Cheers, Ted - Original Message - From: John McCawley [EMAIL PROTECTED] To: Nico Callewaert [EMAIL PROTECTED] Cc: pgsql-general@postgresql.org Sent: Tuesday, January 10, 2006 9:06 AM Subject: [Bulk] Re: [GENERAL] Best programming language / connectivity for best performance I really wouldn't take relative DB performance into much consideration when choosing a programming language for your project. I have found that the actual Language/API overhead to be pretty inconsequential in most of my projects. When my DB access is slow, I have found it is almost always a problem with my query or the DB is just overloaded. There is one exception I can think of, and that's doing bound controls in VB through ODBC. I haven't done bound controls in VB for years and years, so I don't even know if it's still in the language, but they're crappy and you shouldn't use them. When I'm using a RAD tool like VB etc. I *never* use design-level tools to handle my DB IO. When I write a client side app, I manually handle my own DB IO and let the GUI simply handle the user interface. Nico Callewaert wrote: Hi, I was wondering what is the best database connectivity and programming language to get the best performance with PostgreSQL. I'm currently working with Delphi, but I don't know if that is the best way to go ? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] another problem with stored procedures
I have just encountered another problem. I am not sure if it is with my code, or with how I am working with Postgres/pgAdmin III.Here is another function, as created using the wizard/dialog boxin pgAmin III for creating functions:CREATE FUNCTION "People".get_pw(ea "varchar") RETURNS "varchar" AS$BODY$SELECT pword FROM "People".uids WHERE email_address = ea;$BODY$LANGUAGE 'sql' VOLATILE;When I click OK to indicate that I am finished, I get an error message saying there is no column called "ea". Of course I know that; that is because it is a function parameter instead. What I don't understand is why pgAdmin would not put the "IN" qualifier for the function's only parameter or why Postgres would think ea is a column when the code clearly identifies it as a function parameter. (BTW: replacing 'sql' by 'plpgsql' has no effect, except the error message is even less informative).Any ideas? What I am after is a simple select procedure returning the contents of pword in the record where the contents of email_address are the same as the contents of the parameter ea. I figure that if the result set returned to the calling Java/JDBC code is empty, the email address offered does not exist in the database and that, if there is one record, I'll compare the string value returned withthe password offered by the user in order to authenticate the user. Then, if authentication succeeds, I'll query a different database to see what resources the user is authorized to use. I have used, through JDBC function calls that end up submitting something like the following to the RDBMS back end: SELECT pword FROM "People".uids WHERE email_address = 'ea_value'; these all worked fine. It was just a little tedious to concatenate the various strings so that the contents of the SQL statement string looked like the above statement. I can't see a reason why I'd have trouble transforming the above select statement into a stored function. BTW: I know I can do this my old way of using prepared statements with JDBC and java, but I read that I can make my distributedapplication more secure by putting all my SQL into stored, parameterizedprocedures. What are the SQL related attacks that a web application is vulnerable to, and how effective is the approach of placing all my SQL into stored procedures at countering them. Are prepared statements any more, or less, usefulin making a distributed application more secure? Of course, I'd have validation code on both the client side and within my servlet that processes user data. After all, I have bitten the bullet to learn about stored procedures and functions precisely because of my studies of ways to make distributed applications secure. Thanks, Ted R.E. (Ted) Byers, Ph.D., Ed.D.R D Decision Support Softwarehttp://www.randddecisionsupportsolutions.com/
[GENERAL] Final stored procedure question, for now anyway
OK, I finally have Postgres accepting my stored procedure. here is the relevant data in the history window of pgAmind III Query: = -- Executing query:CREATE OR REPLACEFUNCTION addEntity (fn IN VARCHAR,ln IN VARCHAR,ivar IN VARCHAR,hi IN VARCHAR,pw IN VARCHAR,ea IN VARCHAR,ad IN VARCHAR,c IN VARCHAR, p IN VARCHAR,co IN VARCHAR,pc IN VARCHAR)RETURNS void AS $$DECLAREvarID INTEGER;BEGINSELECT INTO varID uid from uids where email_address=ea;IF varID IS NOT NULL THENINSERT INTO addy (uid,address,city,province,country,postal_code) VALUES (varID,ad,c,p,co,pc);ELSEINSERT INTO uids(family_name,first_name,initials,hid,pword,email_address) VALUES (ln,fn,ivar,hi,pw,ea);INSERT INTO addy(uid,address,city,province,country,postal_code) VALUES (currval('uids_uid_seq'),ad,c,p,co,pc);END IF;END;$$ LANGUAGE 'plpgsql' VOLATILE; Query returned successfully with no result in 0 ms. = So I now go into pgAmin III, and take a look, and I can't find it. When I execute the above SQL, where will Postgres store the function? I see, in pgAmin's main window, several places where functions are stored, but none of them contain my function. It must be stored somewhere since I get an error saying the function already exists if I attempt to modiy it slightly and re-run it. For example, I noticed I used the wrong sequence in the last INSERT statement. To correct that, I had to add "OR REPLACE" after "CREATE" to make the correction. I am puzzled though that I can't find it when I go back to the pgAmin main window and search through the whole database (after pressing F5 to refresh the contents of the windows). Thanks to all who helped me on this matter. Cheers, Ted R.E. (Ted) Byers, Ph.D., Ed.D.R D Decision Support Softwarehttp://www.randddecisionsupportsolutions.com/
Re: [GENERAL] sending mail from Postgres
- Original Message - From: Vivek Khera [EMAIL PROTECTED] To: PG-General General pgsql-general@postgresql.org Sent: Wednesday, December 28, 2005 11:48 AM Subject: Re: [GENERAL] sending mail from Postgres On Dec 28, 2005, at 8:28 AM, Christopher Browne wrote: Some separate process (perhaps polling, perhaps using NOTIFY/LISTEN) would then grab messages from the table and submit them to [whatever is the communications layer]. That is a clean sort of design for that sort of thing. This is precisely how we build things. Throw in some good locking mechanisms and you can scale this to incredible levels. Here is a general question relating to this. The problem involves due diligence related to environmental protection. Consider a distributed application to support this. You have a facility with an environmental monitoring program. Samples are taken from soil, water and air and sent off-site for analysis. Each sample, when analyzed, results in a report sent to the facility's management for their records. However, if the concentration of some contaminant in a sample is above some threshold, a second report, containing the complete analysis results for the sample, is generated and sent to one or more people, both inside and outside the organisation (e.g. engineers within the organization responsible for fixing problems with the facility and engineers at relevant regulatory agencies). One objective is to automate as much of the data management as possible and to ensure that if a problem arises everyone who needs to know about it is notified. The process has to be auditable, so that information about when each step in the process starts is stored in the database, as well as information about when messages are acknowledged (again automated - so when an engineer opens a message about a problem, an acknowledgement is sent to the database without his intervention). I suppose email might work as a means of sending messages, but I was thinking of Sun's JMS instead, working with triggers. I could then create my own thin client to display the reports, perhaps sorting them according to user specified criteria. I can see how to do it within the web tier, or within the client tier (within the labs doing the analyses). The thing is, of the designs I have considered, the one involving triggers with JMS on the main supporting website (with a database back end) is the simplest in terms of deployment, since all interested parties could interact with the application through the internet using a very thin client (perhaps even with an applet within a web page) and I would not need to worry about deploying software to all relevant people/sites. If you faced this problem, what would you do, and why? Cheers, Ted R.E. (Ted) Byers, Ph.D., Ed.D. R D Decision Support Software http://www.randddecisionsupportsolutions.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Final stored procedure question, for now anyway
Did you find a fix for it? - Original Message - From: Jonel Rienton To: 'Ted Byers' ; pgsql-general@postgresql.org Sent: Wednesday, December 28, 2005 1:24 PM Subject: RE: [GENERAL] Final stored procedure question, for now anyway it's because of the mixed-case you're using in the function name, i've had this issue last week myself and it seems to be the culprit From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ted ByersSent: Wednesday, December 28, 2005 11:13 AMTo: pgsql-general@postgresql.orgSubject: [GENERAL] Final stored procedure question, for now anyway OK, I finally have Postgres accepting my stored procedure. here is the relevant data in the history window of pgAmind III Query: = -- Executing query:CREATE OR REPLACEFUNCTION addEntity (fn IN VARCHAR,ln IN VARCHAR,ivar IN VARCHAR,hi IN VARCHAR,pw IN VARCHAR,ea IN VARCHAR,ad IN VARCHAR,c IN VARCHAR, p IN VARCHAR,co IN VARCHAR,pc IN VARCHAR)RETURNS void AS $$DECLAREvarID INTEGER;BEGINSELECT INTO varID uid from uids where email_address=ea;IF varID IS NOT NULL THENINSERT INTO addy (uid,address,city,province,country,postal_code) VALUES (varID,ad,c,p,co,pc);ELSEINSERT INTO uids(family_name,first_name,initials,hid,pword,email_address) VALUES (ln,fn,ivar,hi,pw,ea);INSERT INTO addy(uid,address,city,province,country,postal_code) VALUES (currval('uids_uid_seq'),ad,c,p,co,pc);END IF;END;$$ LANGUAGE 'plpgsql' VOLATILE; Query returned successfully with no result in 0 ms. = So I now go into pgAmin III, and take a look, and I can't find it. When I execute the above SQL, where will Postgres store the function? I see, in pgAmin's main window, several places where functions are stored, but none of them contain my function. It must be stored somewhere since I get an error saying the function already exists if I attempt to modiy it slightly and re-run it. For example, I noticed I used the wrong sequence in the last INSERT statement. To correct that, I had to add "OR REPLACE" after "CREATE" to make the correction. I am puzzled though that I can't find it when I go back to the pgAmin main window and search through the whole database (after pressing F5 to refresh the contents of the windows). Thanks to all who helped me on this matter. Cheers, Ted R.E. (Ted) Byers, Ph.D., Ed.D.R D Decision Support Softwarehttp://www.randddecisionsupportsolutions.com/
Re: [GENERAL] Final stored procedure question, for now anyway
Well, I just ran another test and the result indicates the cause of the problem lies elsewhere. I replaced the original function name with "add_entity" and the end result is the same. The function gets stored somewhere, but it is notdisplayed anywhereby pgAdmin Any suggestions? - Original Message - From: Jonel Rienton To: 'Ted Byers' ; pgsql-general@postgresql.org Sent: Wednesday, December 28, 2005 4:16 PM Subject: Re: [GENERAL] Final stored procedure question, for now anyway yup, just use lowercase all the time From: Ted Byers [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 28, 2005 1:45 PMTo: Jonel Rienton; pgsql-general@postgresql.orgSubject: Re: [GENERAL] Final stored procedure question, for now anyway Did you find a fix for it? - Original Message - From: Jonel Rienton To: 'Ted Byers' ; pgsql-general@postgresql.org Sent: Wednesday, December 28, 2005 1:24 PM Subject: RE: [GENERAL] Final stored procedure question, for now anyway it's because of the mixed-case you're using in the function name, i've had this issue last week myself and it seems to be the culprit From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ted ByersSent: Wednesday, December 28, 2005 11:13 AMTo: pgsql-general@postgresql.orgSubject: [GENERAL] Final stored procedure question, for now anyway OK, I finally have Postgres accepting my stored procedure. here is the relevant data in the history window of pgAmind III Query: = -- Executing query:CREATE OR REPLACEFUNCTION addEntity (fn IN VARCHAR,ln IN VARCHAR,ivar IN VARCHAR,hi IN VARCHAR,pw IN VARCHAR,ea IN VARCHAR,ad IN VARCHAR,c IN VARCHAR, p IN VARCHAR,co IN VARCHAR,pc IN VARCHAR)RETURNS void AS $$DECLAREvarID INTEGER;BEGINSELECT INTO varID uid from uids where email_address=ea;IF varID IS NOT NULL THENINSERT INTO addy (uid,address,city,province,country,postal_code) VALUES (varID,ad,c,p,co,pc);ELSEINSERT INTO uids(family_name,first_name,initials,hid,pword,email_address) VALUES (ln,fn,ivar,hi,pw,ea);INSERT INTO addy(uid,address,city,province,country,postal_code) VALUES (currval('uids_uid_seq'),ad,c,p,co,pc);END IF;END;$$ LANGUAGE 'plpgsql' VOLATILE; Query returned successfully with no result in 0 ms. = So I now go into pgAmin III, and take a look, and I can't find it. When I execute the above SQL, where will Postgres store the function? I see, in pgAmin's main window, several places where functions are stored, but none of them contain my function. It must be stored somewhere since I get an error saying the function already exists if I attempt to modiy it slightly and re-run it. For example, I noticed I used the wrong sequence in the last INSERT statement. To correct that, I had to add "OR REPLACE" after "CREATE" to make the correction. I am puzzled though that I can't find it when I go back to the pgAmin main window and search through the whole database (after pressing F5 to refresh the contents of the windows). Thanks to all who helped me on this matter. Cheers, Ted R.E. (Ted) Byers, Ph.D., Ed.D.R D Decision Support Softwarehttp://www.randddecisionsupportsolutions.com/ --No virus found in this incoming message.Checked by AVG Free Edition.Version: 7.1.371 / Virus Database: 267.14.8/215 - Release Date: 12/27/2005 --No virus found in this outgoing message.Checked by AVG Free Edition.Version: 7.1.371 / Virus Database: 267.14.8/215 - Release Date: 12/27/2005
Re: [Bulk] Re: [GENERAL] Final stored procedure question, for now anyway
- Original Message - From: Frank L. Parks [EMAIL PROTECTED] To: pgsql-general@postgresql.org Sent: Wednesday, December 28, 2005 7:27 PM Subject: [Bulk] Re: [GENERAL] Final stored procedure question, for now anyway Do you refresh you database after you add the function? It will not show up until after you refresh. I didn't realise that refreshing the display in pgAdmin would not be enough. I did that several times with no luck. I then closed the connection to the database and then connected again, and voila! The functions appear, entirely in lower case, in the public schema in my EntityDatabase. This database has two schemas, public, which was there the moment I created the database, and People which I created. Now, this begs more questions. My function references tables in the People schema. They do not qualify the references to these tables. In my view, the function properly belongs to the people schema. 1) Can the function be simply moved from the public schema to the people schema, or do I need to delete the function and recreate it within the people schema? If the latter is true, how would I specify, within the SQL, that the function is to be added to the People schema instead of public? 2) Do I need to qualify my references to my tables in the function to identify the schema in which the tables exist, or will Postgres find the right tables with the function in its present form? If I have to further qualify the table references, what is the correct qualification of the table references within the SQL in the function definition? Thanks, Ted ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Stored procedure
I am learning how to create stored procedures using pgAdmin and Postgres. I have a couple of questions. 1) For all of my tables (admittedly not many, yet), I created columns with the type of integer to serve as indices. I now learned (well, latelast might)in your documentation that Postgres has additional types (serial, serial4, c.) that are integers that are autoincremented. I tried, therefore, to change these columns to type serial but received an error stating that type serial does not exist. Was the documentation I read mistaken, or is it a problem with pgAdmin, or did I make a mistake? Should I drop the columns in question and recreate them as having type serial(is that possible when the column is used as a foreign key in other tables)? 2) Suppose I have a lookup table with an autoincremented integer index column, used as a foreign key in a second table, and I want a stored procedure to insert data into a second table that uses the index from the first as a foreign key. Now, the stored procedure must: a) check the name passed for the second column of the first table to see if it exists there, and if not insert it b) whether the name provided for the second column had to be inserted or not, retrieve the index that corresponds to it c) execute the insert into the second table using the index value retrieved from the first as the value for the foreign key column in the second table. Doing all this in Java or C++ is trivial, and I have done so when using a database that didn't have stored procedures, but it isn't clear to me how to do this using only SQL inside a stored procedure. I have just learned this morning that MySQL would allow the following inside a stored procedure: INSERT INTO foo (auto,text) VALUES(NULL,'text'); # generate ID by inserting NULLINSERT INTO foo2 (id,text) VALUES(LAST_INSERT_ID(),'text'); # use ID in second table I have yet to figure out how to modify this to verify that 'text' isn't already in foo, and return its index for use in foo2 if it is, but that's another question (I'd want the values in the second column in foo to be unique). But I am curious to know if Postgres has something equivalent to "LAST_INSERT_ID()". Can one embed the first insert above in an if/else block inside a stored procedure, placing the index in a variable that has scope local to the procedure, and use that variable in the second insert? Thanks, Ted R.E. (Ted) Byers, Ph.D., Ed.D.R D Decision Support Softwarehttp://www.randddecisionsupportsolutions.com/
Re: [GENERAL] Stored procedure
Hi Jaime, Thanks. I'd suggest the manual be edited by an educator, since it is a little dense for someone coming to it for the first time. Once I read your reply to me, and reread the manual, I understood. But on first reading, it is a little too dense and short on examples. Regarding serial: I now learned (well, late last might) in your documentation that Postgres has additional types (serial, serial4, c.) that are integers that are autoincremented. serial is not a type is a shorthand for integer with a default expresion that retrives next value in a sequence... I tried, therefore, to change these columns to type serial but received an error stating that type serial does not exist. how did you try? what was the exact error you receive? I tried: ALTER TABLE People.addy ALTER COLUMN aid TYPE serial and the error I received is: ERROR: type serial does not exist I understand this now, but it seems pgAdmin creates the illusion serial can be treated like genuine types by including serial along with all the other types in the drop down list used to set type when creating a new column. Regarding autoincrement: INSERT INTO foo (auto,text) VALUES(NULL,'text'); # generate ID by inserting NULL and this of course is bad... if a insert NULL i want the NULL to be inserted. In programming in C++, I often pass a value of null or 0 as an argument to a function; this is done to use the null value as a flag to control the behaviour of the function at a very fine degree of granularity. This is a commonly used and powerful idiom in C++ programming.It is curious, though, that on thinking about this, I have not used this idiom nearly as much when I am programming in Java. I can't explain why. SQL Standard way of doing things is ommiting the auto incremental fld at all INSERT INTO foo (text) VALUES ('text'); INSERT INTO foo2 (id,text) VALUES(LAST_INSERT_ID(),'text'); # use ID in second table INSERT INTO foo2 (id, text) VALUES (currval('seq'), 'text') On reading more about this, my inclination was to do as you did. However, I would point out that the sample code I showed here was taken directly from the MySQL reference manual. If it matters, I can provide the precise location in the manual. I guess some folk disagree with you about how good or bad it is. I'll reserve judgement until I have more experience working with databases. Assuming I have set up a sequence called 'seq', and set the default value of id in foo to be nextval('seq'), then the following is getting close to what I need (there seems to be only one thing left - what do I replace the question mark with in order to get the id value from the initial select and pass it to the insert in the first block): if exists(select id from foo where x = text) then INSERT INTO foo2 (foo_id, foo2_text) VALUES (?,more_text) else INSERT INTO foo (text) VALUES ('text') INSERT INTO foo2 (foo_id, foo2_text) VALUES (currval('seq'), more_text) end if; The second block of the conditional statement looks like it ought to properly handle inserting new data into foo, autoincrementing id in foo and providing the value of id to the insert into foo2. However, for the first block, there is no way to know where 'text' is located in the table, so it is necessary to get the value of id from the SQL statement used as the argument for exists() and pass it to the insert into foo2 (where the question mark is located). Thanks for your time. Ted ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] To Postgres or not
To everyone, Thanks very much for your replies, and if these responses are any indication of Postgres quality then I'd be in good hands. Ted _ Dont just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] To Postgres or not
Hello Everyone, I considering moving a product to Solaris/Linux and trying to find a good DB. Oracle is out due to cost so as far as i know the only reasonable alternatives are Postgres or Codebase or MySQL. Does anyone here have any experience using Codebase or MySql? If I stick with a true RDBMS then Codebase is out. So that leaves Postgres and MySQL. I'm very used to all the comforts of Oracle so I think Postgres stacks up better but maybe some veterans could shed some light. In other words, is Postgres really that much better than MySQL and/or the other way around? Also, in my searches I ran across an company called EnterpriseDB and another like it. Basically they offer Postgres support. So I'm a little concerned that I'm just robbing Peter to pay Paul. Thanks _ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] LISTEN/NOTIFY for lightweight replication
Richard, Thanks for the response. I'll look into both the dblink and iirc. Do you know of any extended examples of either? --Ted --- Richard Huxton [EMAIL PROTECTED] wrote: Ted Shab wrote: Hi, I'm trying to come up with a relatively simple multi-master replication solution. This is for multiple databases that need to be discreet, and change relatively infrequently (10-30 updates an hour), and almost never update each others data (less than once a day). The TCL-based replication project for multi-master is troublesome to configure and seems to really impact performance. It can be assumed that the master-slave setup will not work for me, nor do we want to purchase a commercial soluton, nor can we run this all from one central database. e. If there is a field level conflict, raise an exception (TBD). Exception handling and failure recovery are what makes for all the work in replication. I don't think a pure listen/notify setup will be enough because iirc the system doesn't guarantee delivery of multiple notifications if 1 are queued. Have you looked into the possibility of using dblink to handle updates of each others' data? That would mean your problem reverting to one of single-master replication. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ___ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] LISTEN/NOTIFY for lightweight replication
Thanks. I was thinking iirc was the transport protocol :-) Looks like dblink is the best bet here. --Ted --- Richard Huxton [EMAIL PROTECTED] wrote: Ted Shab wrote: Richard, Thanks for the response. I'll look into both the dblink and iirc. Do you know of any extended examples of either? dblink is in the contrib/ folder of the source distribution and possibly your packaged version if you use such a thing. Never needed it myself, but the documentation looks clear enough. As for listen/notify possibly dropping duplicate notifications... Ah! it's in the SQL COMMANDS reference part of the manuals NOTIFY behaves like Unix signals in one important respect: if the same condition name is signaled multiple times in quick succession, recipients may get only one notify event for several executions of NOTIFY. So it is a bad idea to depend on the number of notifies received. Instead, use NOTIFY to wake up applications that need to pay attention to something, and use a database object (such as a sequence) to keep track of what happened or how many times it happened. -- Richard Huxton Archonet Ltd ___ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org