Re: [GENERAL] Incorrect FTS result with GIN index
I tested the same backup on our CentOS 5.4 virtual machine (running on xen server) and the results are really weird (118 rows, comparing to 116 on win xp and 123 expected): Aggregate (cost=104.00..104.01 rows=1 width=0) (actual time=120.373..120.374 rows=1 loops=1) - Bitmap Heap Scan on search_tab (cost=5.35..103.93 rows=25 width=0) (actual time=59.418..120.137 rows=118 loops=1) Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@ '''dd'':*'::tsquery)) - Bitmap Index Scan on idx_keywords_ger (cost=0.00..5.34 rows=25 width=0) (actual time=59.229..59.229 rows=495 loops=1) Index Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@ '''dd'':*'::tsquery)) Total runtime: 120.670 ms And here are the configuration details: PostgreSQL: postgresql84-server-8.4.4-1.el5_5.1 # uname -r 2.6.18-164.15.1.el5xen # cat /etc/redhat-release CentOS release 5.4 (Final) # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5140 @ 2.33GHz stepping: 6 cpu MHz : 2333.416 cache size : 4096 KB physical id : 0 siblings: 1 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc pni cx16 lahf_lm bogomips: 5835.83 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Oleg Bartunov wrote: Artur, I don't know, but could you try linux machine ? Oleg -- View this message in context: http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29212116.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect FTS result with GIN index
The CentOS used for testing is a 64-bits version. Artur Dabrowski wrote: I tested the same backup on our CentOS 5.4 virtual machine (running on xen server) and the results are really weird (118 rows, comparing to 116 on win xp and 123 expected): -- View this message in context: http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29212162.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] index scan and functions
In article 20100719162547.ga17...@localhost, arno a...@renevier.net writes: Thanks, that's exactly what I was looking for. No, I'd say you're looking for the ip4r package which provides an indexable IP address range type. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect FTS result with GIN index
Artur, I recommend post your problem to -hackers mailing list. I have no idea, what could be a problem. My machine is: uname -a Linux mira 2.6.33-020633-generic #020633 SMP Thu Feb 25 10:10:03 UTC 2010 x86_64 GNU/Linux PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.4.1-4ubuntu9) 4.4.1, 64-bit As a last resort I recommend you to compile pg yourself and see if the problem exists. Oleg On Tue, 20 Jul 2010, Artur Dabrowski wrote: I tested the same backup on our CentOS 5.4 virtual machine (running on xen server) and the results are really weird (118 rows, comparing to 116 on win xp and 123 expected): Aggregate (cost=104.00..104.01 rows=1 width=0) (actual time=120.373..120.374 rows=1 loops=1) - Bitmap Heap Scan on search_tab (cost=5.35..103.93 rows=25 width=0) (actual time=59.418..120.137 rows=118 loops=1) Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@ '''dd'':*'::tsquery)) - Bitmap Index Scan on idx_keywords_ger (cost=0.00..5.34 rows=25 width=0) (actual time=59.229..59.229 rows=495 loops=1) Index Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@ '''dd'':*'::tsquery)) Total runtime: 120.670 ms And here are the configuration details: PostgreSQL: postgresql84-server-8.4.4-1.el5_5.1 # uname -r 2.6.18-164.15.1.el5xen # cat /etc/redhat-release CentOS release 5.4 (Final) # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5140 @ 2.33GHz stepping: 6 cpu MHz : 2333.416 cache size : 4096 KB physical id : 0 siblings: 1 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc pni cx16 lahf_lm bogomips: 5835.83 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Oleg Bartunov wrote: Artur, I don't know, but could you try linux machine ? Oleg Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Create table if not exists ... how ??
Thanks guys. Joe, I tried ( and learned! ) from your syntax. I didn't have pgsql language installed but I googled it and figured that part out. There was an issue with using your way though, you see the constraints relation also needs to be considered, as if a constraint key already exist, for any other table, not neccessary for hte table we are creating, then we are going to get an error, which won't be covered by the count. So I finally tried Scott's way because it will catch an exception, and I believe the constraint key exception is included in there. Although I am not sure, because he is catching a duplicate_table exception ? What is the most generic exception in postgres ? Throwable in Java ? http://www.postgresql.org/docs/8.1/interactive/errcodes-appendix.html create or replace function create_table_if_not_exists (create_sql text) returns bool as $$ BEGIN BEGIN EXECUTE create_sql; Exception when duplicate_table THEN RETURN false; END; RETURN true; END; $$ Language plpgsql; SELECT create_table_if_not_exists ('CREATE TABLE post_codes ( area character varying(10) NOT NULL, district character varying(10) NOT NULL, sector character varying(10) NOT NULL, CONSTRAINT post_codes_pkey PRIMARY KEY (area, district, sector) ) WITH ( OIDS=FALSE ); ALTER TABLE post_codes OWNER TO postgres;') Thank you all, Jen
Re: [GENERAL] New DB-design - help and documentation pointers appreciated
On Sat, Jul 3, 2010 at 03:51, Craig Ringer cr...@postnewspapers.com.au wrote: [...] You've hit one of those areas where SQL databases kind of suck. You'll have to use one of the well understood workarounds like EAV and live with their limitations, or find a database better suited to the data. Thanks for the feedback Craig. After careful considerations I have decided to ditch the idea of heavilly using parameters and go back to my initial idea of having a simple components index instead. What I've come up to is this structure: -- Logos, etc, users should be able to upload photos of the components CREATE TABLE images ( image_filename varchar(32) NOT NULL, image_filesize integer DEFAULT 0, image_uploaded timestamp with time zone DEFAULT (localtimestamp), image_id serial PRIMARY KEY ); CREATE TABLE manufacturers ( manufacturer_name varchar(32) NOT NULL, manufacturer_url varchar(32), manufacturer_logo integer REFERENCES images(image_id), manufacturer_id serial PRIMARY KEY ); -- E.g. transistors, diodes, amplifiers CREATE TABLE categories ( category_name varchar(32) NOT NULL, category_id serial PRIMARY KEY ); -- E.g. PNP, NPN, JFET CREATE TABLE subcategories ( subcategory_name varchar(32) NOT NULL, subcategory_category integer REFERENCES categories(category_id), subcategory_id serial PRIMARY KEY ); -- PDIP, SO, QFN, etc. CREATE TABLE packages ( package_name varchar(32) NOT NULL, package_image integer REFERENCES images(image_id), package_id serial PRIMARY KEY ); CREATE TABLE users ( user_name varchar(32) NOT NULL, user_password varchar(32), user_id serial PRIMARY KEY ); CREATE TABLE datasheets ( datasheet_filename varchar(32) NOT NULL, datasheet_filesize integer DEFAULT 0, datasheet_uploaded timestamp with time zone DEFAULT (localtimestamp), datasheet_id serial PRIMARY KEY ); CREATE TABLE components ( component_name varchar(32) NOT NULL, component_manufacturer integer REFERENCES manufacturers(manufacturer_id), component_category integer REFERENCES categories(category_id), component_subcategory integer REFERENCES subcategories(subcategory_id), component_package integer REFERENCES packages(package_id), component_pincount smallint, component_owner integer REFERENCES users(user_id), component_image integer REFERENCES images(image_id), component_datasheet integer REFERENCES datasheets(datasheet_id), component_comment text, component_scrap boolean DEFAULT FALSE, component_id serial PRIMARY KEY ); Same kind of components can have different manufacturers, that's why I'm not having any hard constraints anywhere (like UNIQUE in component_name). Some examples that I want to be able to store: Different manufacturers and packaging: Fairchild 74LS14 DIP14 (hole mounted IC with 14 pins) National 74LS14 SO14 (surface mounted IC with 14 pins) Different subcategories / packages: BC547, category transistors, subcategory NPN, package TO-92 BC547, category transistors, subcategory NPN, package TO-220 BC557, category transistors, subcategory PNP, package TO-92 and so forth. The point is that one component name can exist in many different flavors. I have still not yet come up to a definite solution how the subcategories will be implemented, so this is still just a draft. However, I feel that this design is the same design I seem to use for all my databases, and in the end I always find that I designed them wrong from the beginning. The table components feels like that one is going to be locked into a corner; it seems to fixed and not as flexible as I want this database to be. In the future I will probably want to add more relations without having to make zillions of ugly patches to the design. I would therefore appreciate any feedback on this table structure and how I can improve it even further. Thanks. -- - Rikard -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] what do i need to know about array index?
hi i have a table like this CREATE TABLE test( id bigint, tag boolean[] ) WITH ( OIDS=FALSE ); this is a parent table of some partition tables one table contains 1mio entries well in the future it should - currently only filled with test data the array is used to 'tag' a row which means every column of the array has some meaning which can be turned on and off the tag's topic is created by a user and using some php scripts the row's are tagged (turned on and off) now i want to query e.g. table test_1 which has 1 mio entries one problem i have to mention is a NULL value since an entry can have 3 values t,f,null but my application expects only 2 values i modified my queries so that tag[n]=true equals NOT (tag[n]=false OR tag[n] IS NULL) the problem was that complex queries just ignored NULL values instead of counting as false - which makes sense to my index problem: e.g. a query select id from test_1 where NOT (tag[4]=false OR tag[4] IS NULL); doesnt use the index create index idx_test_1 on test(( NOT (tag[4]=false OR tag[4] IS NULL) )); since my index can contain combinations of many tag[] columns it can look like this create index idx_test_2 on test_1(( ( ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND ( NOT (tag[2]=false OR tag[2] IS NULL) ) ) AND NOT ( ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND ( NOT (tag[3]=false OR tag[3] IS NULL) ) ) )); which also doesnt get used by the select SELECT id FROM test WHERE ( ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND ( NOT (tag[2]=false OR tag[2] IS NULL) ) ) AND NOT ( ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND ( NOT (tag[3]=false OR tag[3] IS NULL) ) ); vacuum analyze after create index has been done and im using 8.4 o) is there some obvious bug in my usage so the index can't be used? what im missing here? o) how does the planer know that a seq scan is faster? since im not sure if its just a feature which is not available (like is null index in 8.2) or im doing something wrong im a bit stuck here o) and for those who have a great idea or suggestion for alternative solutions, please tell me thx _ Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. https://signup.live.com/signup.aspx?id=60969
Re: [GENERAL] Full Text Search ideas
It's doable. but requires a lot of work. We need support for this. Oleg On Sun, 18 Jul 2010, Howard Rogers wrote: I asked recently about a performance problem I'd been having with some full text queries, and got really useful help that pointed me to the root issues. Currently, I'm trying to see if our document search (running on Oracle Text) can be migrated to PostgreSQL, and the reason I asked that earlier question points to a fundamental design issue we'll have with PostgreSQL that doesn't affect us in Oracle (not, I hasten to add, that that means Oracle is better/right-er/whatever. It's just different -but the difference will cause us a problem). Consider the following example (which is just one of 40-odd I could have picked). Some of our documents are in panoramic format, for example. But not many (say, 30,000 out of 10,000,000). We have a flag for 'panoramic', called 'sb12'. It's either 'y' or 'n' for any document. So a search for 'sb12n' (find me all documents which are not panoramic) is logically the same as a search for 'not sb12y'. However, 95% or more of documents will be an sb12n, because hardly any documents are panoramic in the first place. So. although the numeric outcome of 'sb12n' and 'not sb12y' will always be the same, you would have to check the entire table to find which ones are 'sb12n' (because most documents are marked that way), whereas you'd only have to check the 5% of records to find 'sb12y', because so few are marked that way. But in Oracle Text, this doesn't seem to happen: SQL select count(*) from search_digital_rm where contains(textsearch,'bat and sb12n')0; COUNT(*) -- 3040 Elapsed: 00:00:00.10 SQL select count(*) from search_digital_rm where contains(textsearch,'bat not sb12y')0; COUNT(*) -- 3040 Elapsed: 00:00:00.06 In both cases, the same number of records are returned. But, within a margin of error, the time taken to do each test is about the same. Even though the first test must be matching 'sb12n' for many millions of records, it's taking not much longer than the search for 'sb12y', which can only match about 90,000. It would seem (I can't tell from the explain plan itself) as though what's happened is that the set of 'bat' records has been fetched first (about 8000 in all). For so few records, whether you're looking for sb12y or sb12n then becomes mostly irrelevant for timing purposes, and hence the duration equivalence of both queries. This is not what happens in PostgreSQL, however (as I now know, thanks to the help on my question from a couple of days ago): ims=# select count(*) from search_rm where to_tsvector('english', textsearch) @@ to_tsquery('english','bat sb12n'); count --- 3849 (1 row) Time: 408.962 ms ims=# select count(*) from search_rm where to_tsvector('english', textsearch) @@ to_tsquery('english','bat !sb12y'); count --- 3849 (1 row) Time: 11.533 ms Now, one test takes about 40 times longer than the other, though the one taking just 11ms is as fast as Oracle can manage (impressive, considering I've done absolutely nothing to tune this PostgreSQL testbed as yet!). Logically equivalent the two tests may be, but hunting through lots of sb12n records and working out which are related to bats is apparently a lot slower than finding things the other way around, it would seem. I'm wondering firstly if there's any way I can configure PostgreSQL FTS so that it produces the sort of results we've gotten used to from Oracle, i.e., where search speeds do not go up wildly when a 'search term' is applied that happens to be used by the vast majority of document records. (For example, we currently allows searches for file types, where 80% of documents would be word documents, another 19% would be PDFs and the remaining 1% of documents could be pretty much anything else! We can't have people searching for definitely want only Word documents if that means matching 8 million records and search speeds shoot to the stratosphere as a result). Secondly, I'm open to any suggestions as to how you would organise things or re-write the SQL so that the attribute filter is only applied to the small subset of records which match the relevant real word search term, if that's what's needed here. In other words, is my best bet in the earlier examples to fetch *all* bat records, and then nest that query within an outer query that adds a test for a separate attribute column? Or is something else called for here? Would appreciate any thoughts on the subject! Regards HJR Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
[GENERAL] Invitación a conectarnos en LinkedIn
LinkedIn Rafael Comino Mateos requested to add you as a connection on LinkedIn: -- Andrew, Me gustaría añadirte a mi red profesional en LinkedIn. -Rafael Accept invitation from Rafael Comino Mateos http://www.linkedin.com/e/v74zw8-gbunc9zb-26/ATaEtqAFKHyclb4yaP2gMVdFpLR9OBj5WwZ3C3FunFK/blk/I2205970191_2/1BpC5vrmRLoRZcjkkZt5YCpnlOt3RApnhMpmdzgmhxrSNBszYOnP4Vcj0TejkMcz99bPBMmjgMlzxnbPkMd3gPdj8Ud34LrCBxbOYWrSlI/EML_comm_afe/ View invitation from Rafael Comino Mateos http://www.linkedin.com/e/v74zw8-gbunc9zb-26/ATaEtqAFKHyclb4yaP2gMVdFpLR9OBj5WwZ3C3FunFK/blk/I2205970191_2/39vcjANc3sVdj0OcAALqnpPbOYWrSlI/svi/ -- DID YOU KNOW you can showcase your professional knowledge on LinkedIn to receive job/consulting offers and enhance your professional reputation? Posting replies to questions on LinkedIn Answers puts you in front of the world's professional community. http://www.linkedin.com/e/v74zw8-gbunc9zb-26/abq/inv-24/ -- (c) 2010, LinkedIn Corporation
Re: [GENERAL] How to distribute quantity if same product is in multiple rows
(anonymous) wrote: Order contains same product in multiple rows. I tried to calculate undelivered quantity using script below but it produces wrong result: delivered quantity is substracted from both rows, not distributed. How to distibute undelivered quantity according to row quantity in every row ? Can it be done using SQL or should SCAN loop in plpgsql used? [...] It can be done in SQL: SUM(kogus) OVER (PARTITION BY toode ORDER BY ID) - kogus (*1) will give you the running sum of the product up to that row. You can then subtract that value from the delivered quantity to calculate the delivered quan- tity for the current row. But doing so automatically is probably bad. For example, if a user has a purchase order with one position of two pieces and one position of four, it is very likely that when a shipment of four pieces arrives, the latter position shall be marked as delivered. So I would leave the decision to the user. Tim (*1) In PostgreSQL 9.0, you might be able to use ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING instead (untested). -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Create table if not exists ... how ??
On Tue, Jul 20, 2010 at 10:18:59AM +0100, Jennifer Trey wrote: What is the most generic exception in postgres ? Throwable in Java ? AFAIR, from programming Java many moons ago, you really don't want to go about catching the most general exception. The ThreadDeath exception for instance is derived from Error rather than Exception for this reason. That said, maybe you want the magic exception type OTHERS, i.e: EXCEPTION WHEN OTHERS THEN PG doesn't have as flexible hierarchy as Java, but a match is considered to have occurred upto the first zero in the error code. So you could also use syntax_error_or_access_rule_violation or transaction_rollback. -- Sam http://samason.me.uk/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Need a better way to do my constraints
Hi Everyone: v8.3.4 on Linux I need to revamp the way I've done check constraints on a table. This is an example of the sort of thing I've done... create table foo ( col1 text, col2 text, constraint c1_constr check (col1 in ('yes','no')), constraint c2_constr check (validate_c2(col2) = 'OK') ); ...with validate_c2 having been declared... create or replace function validate_c2 (text) returns text as $$ declare c2_csv alias for $1; c2_lst text[]; x int; begin c2_lst = string_to_array(c2_csv,','); for x in array_lower(c2_lst,1).. array_upper(c2_lst,1) loop if c2_lst[x] not in ('red','orange','yellow','green','blue','violet') then return 'NO'; end if; end loop; return 'OK'; end; $$ language plpgsql ; As you can see, the constraint on col1 is a simple check that the value is in a list. But the constraint on col2 needs to check that each element in a csv is in a list. I'd like to have one table that contains all the valid values for both column constraints and perhaps use a more sophisticated approach to this than the check constraints and plpgsql you see above. One of the reasons for this is better access to the valid values in error messages I return to the users upon their hitting a constraint violation. These error messages need to tell them what the valid values are. The desire for a single table renders moot the need to identify the correct constraint value table to read in this regard. Another reason for the table(s) containing the valid values is ease of modification of the lists (no need to alter the hard coded constraint or change the stored proc). All ideas will be rewarded with a Thank You, but the winner also gets a Very Much :-)
[GENERAL] INSERT RETURNING and partitioning
Hi all, I've noticed that an insert command with returning clause returns an empty result set if done on a master table. Instead the same insert with returning on partitioned tables works correctly. Do you know if it's a normal action? I'm doing something wrong? The partitioning works correctly with all kind of SQL commands: select, delete, update and normal insert. I've read the manual and it reports this: The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted. This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT. but nothing about partitioning ... Regards, Paolo PS: I'm using PostgreSQL 8.3.4 on x86 Linux box SCARICA TISCALI WIPHONE: parla e invia SMS gratis dal tuo cellulare. http://wiphone.tiscali.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] INSERT RETURNING and partitioning
On 20 July 2010 14:42, pdov...@tiscali.it pdov...@tiscali.it wrote: Hi all, I've noticed that an insert command with returning clause returns an empty result set if done on a master table. Instead the same insert with returning on partitioned tables works correctly. Do you know if it's a normal action? I'm doing something wrong? The partitioning works correctly with all kind of SQL commands: select, delete, update and normal insert. I've read the manual and it reports this: The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted. This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT. but nothing about partitioning ... Regards, Paolo PS: I'm using PostgreSQL 8.3.4 on x86 Linux box Yes, that should work even on partitioned tables. Could you provide some sample SQL you're using? Thom -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Need a better way to do my constraints
On 20 Jul 2010, at 16:19, Gauthier, Dave wrote: Hi Everyone: v8.3.4 on Linux I need to revamp the way I've done check constraints on a table. This is an example of the sort of thing I've done... create table foo ( col1 text, col2 text, constraint c1_constr check (col1 in ('yes','no')), constraint c2_constr check (validate_c2(col2) = 'OK') ); ...with validate_c2 having been declared... create or replace function validate_c2 (text) returns text as $$ declare c2_csv alias for $1; c2_lst text[]; x int; begin c2_lst = string_to_array(c2_csv,','); for x in array_lower(c2_lst,1).. array_upper(c2_lst,1) loop if c2_lst[x] not in ('red','orange','yellow','green','blue','violet') then return 'NO'; end if; end loop; return 'OK'; end; $$ language plpgsql ; As you can see, the constraint on col1 is a simple check that the value is in a list. But the constraint on col2 needs to check that each element in a csv is in a list. I'd like to have one table that contains all the valid values for both column constraints and perhaps use a more sophisticated approach to this than the check constraints and plpgsql you see above. I think your best bet would be to define an ENUM type with those values and store the CSV data as an array of that type. I think it would automatically reject any invalid values that way. A foreign key constraint would be nicer to have, but I don't see any straightforward way to unnest your CSV data in such a way that you could apply one to it. Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4c45c239286211821273955! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Incorrect FTS result with GIN index
Oleg, thanks for your help. I sent a post to pg-hackers list: http://old.nabble.com/Query-results-differ-depending-on-operating-system-%28using-GIN%29-ts29213082.html As to compiling pg... I will no do this since I do not really feel comfortable doing it and cannot dedicate too much time to this problem. Artur Oleg Bartunov wrote: Artur, I recommend post your problem to -hackers mailing list. I have no idea, what could be a problem. My machine is: uname -a Linux mira 2.6.33-020633-generic #020633 SMP Thu Feb 25 10:10:03 UTC 2010 x86_64 GNU/Linux PostgreSQL 8.4.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu 4.4.1-4ubuntu9) 4.4.1, 64-bit As a last resort I recommend you to compile pg yourself and see if the problem exists. Oleg On Tue, 20 Jul 2010, Artur Dabrowski wrote: I tested the same backup on our CentOS 5.4 virtual machine (running on xen server) and the results are really weird (118 rows, comparing to 116 on win xp and 123 expected): Aggregate (cost=104.00..104.01 rows=1 width=0) (actual time=120.373..120.374 rows=1 loops=1) - Bitmap Heap Scan on search_tab (cost=5.35..103.93 rows=25 width=0) (actual time=59.418..120.137 rows=118 loops=1) Recheck Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@ '''dd'':*'::tsquery)) - Bitmap Index Scan on idx_keywords_ger (cost=0.00..5.34 rows=25 width=0) (actual time=59.229..59.229 rows=495 loops=1) Index Cond: ((to_tsvector('german'::regconfig, keywords) @@ '''ee'':*'::tsquery) AND (to_tsvector('german'::regconfig, keywords) @@ '''dd'':*'::tsquery)) Total runtime: 120.670 ms And here are the configuration details: PostgreSQL: postgresql84-server-8.4.4-1.el5_5.1 # uname -r 2.6.18-164.15.1.el5xen # cat /etc/redhat-release CentOS release 5.4 (Final) # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5140 @ 2.33GHz stepping: 6 cpu MHz : 2333.416 cache size : 4096 KB physical id : 0 siblings: 1 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall lm constant_tsc pni cx16 lahf_lm bogomips: 5835.83 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: Oleg Bartunov wrote: Artur, I don't know, but could you try linux machine ? Oleg Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- View this message in context: http://old.nabble.com/Incorrect-FTS-results-with-GIN-index-tp29172750p29215929.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Finding last checkpoint time
Is there a way to find last checkpoint time via SQL command? I know I can grep xlogs by turning on log_checkpoints, but I'd prefer an SQL solution. -- Devrim GÜNDÜZ PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer PostgreSQL RPM Repository: http://yum.pgrpms.org Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz signature.asc Description: This is a digitally signed message part
Re: [GENERAL] Finding last checkpoint time
2010/7/20 Devrim GÜNDÜZ dev...@gunduz.org: Is there a way to find last checkpoint time via SQL command? I know I can grep xlogs by turning on log_checkpoints, but I'd prefer an SQL solution. -- Or you can use pg_controldata /path/to/pgdata and look at Time of latest checkpoint. I don't know of any other way, not to say there isn't one. Thom -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Finding last checkpoint time
Devrim GÜNDÜZ wrote: Is there a way to find last checkpoint time via SQL command? I know I can grep xlogs by turning on log_checkpoints, but I'd prefer an SQL solution. Not directly. Best you can do without linking in new server code is either import the logs via CVS to get them into a table, or shell out and look at what comes out of pg_controldata. There have been two works in progress to improve this situation that didn't make it through to commit yet. http://archives.postgresql.org/pgsql-patches/2008-04/msg00079.php tried to expose just that bit of data and was shot down for a number of reasons, from accuracy concerns to the UI used. The still pending one at http://archives.postgresql.org/message-id/4b959d7a.6010...@joeconway.com just adds a SQL interface to what comes out of pg_controldata. You might prefer to use something like that to calling the shell to parse the output from the utility, if you don't mind adding something new to the server code. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] check_postgres reports most of my tables are unanalyzed for 8 weeks; why isn't autovacuum analyzing them?
Hi. I've just discovered the check_postgres utility and am running all the tests against my database. The last_analyze test comes out critical - many tables unanalyzed for 8 weeks. I am running PostgreSQL 8.4.4 with default autovacuum settings. I thought autovacuum was supposed to take care of running vacuum and analyze. Per last_analyze, 9 out of my 100 tables have been analyzed within the last 24 hours. Why aren't all my tables being analyzed? How do I troubleshoot this, please? Best, -at P.S. I checked the PostgreSQL logs, and in the last week, I just have three messages with vacuum in them: (I am running at default log levels) 2010-07-20 02:05:05 PDT ERROR: canceling autovacuum task 2010-07-20 02:05:05 PDT CONTEXT: automatic vacuum of table mydb.pg_catalog.pg_listener 2010-07-20 07:27:14 PDT ERROR: canceling autovacuum task 2010-07-20 07:27:14 PDT CONTEXT: automatic vacuum of table mydb._slony_cluster.sl_event 2010-07-20 07:27:53 PDT ERROR: canceling autovacuum task 2010-07-20 07:27:53 PDT CONTEXT: automatic vacuum of table mydb._slony_cluster.sl_log_2 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: check_postgres reports most of my tables are unanalyzed for 8 weeks; why isn't autovacuum analyzing them?
I just ran the last_autovacuum test of check_postgres, and it reported 7 (of my 100) tables have been autovacuumed more than 1 day ago; the oldest autovacuum time was 7 weeks ago. 8 more tables in pg_catalog were autovacuumed more than 1 day ago. Thanks, -at -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: check_postgres reports most of my tables are unanalyzed for 8 weeks; why isn't autovacuum analyzing them?
I am sorry for the multiple posts; just noticed there are two tests for analyze: last_analyze and last_autoanalyze last_autoanalyze matches last_autovacuum - 7 weeks ago Aleksey -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] check_postgres reports most of my tables are unanalyzed for 8 weeks; why isn't autovacuum analyzing them?
On Tue, Jul 20, 2010 at 4:31 PM, Aleksey Tsalolikhin atsaloli.t...@gmail.com wrote: Hi. I've just discovered the check_postgres utility and am running all the tests against my database. The last_analyze test comes out critical - many tables unanalyzed for 8 weeks. have those tables been modified at all? this is only an issue if the database's understanding of the table is different from reality AFIAK. P.S. I checked the PostgreSQL logs, and in the last week, I just have three messages with vacuum in them: (I am running at default log levels) 2010-07-20 02:05:05 PDT ERROR: canceling autovacuum task 2010-07-20 02:05:05 PDT CONTEXT: automatic vacuum of table mydb.pg_catalog.pg_listener 2010-07-20 07:27:14 PDT ERROR: canceling autovacuum task 2010-07-20 07:27:14 PDT CONTEXT: automatic vacuum of table mydb._slony_cluster.sl_event 2010-07-20 07:27:53 PDT ERROR: canceling autovacuum task 2010-07-20 07:27:53 PDT CONTEXT: automatic vacuum of table mydb._slony_cluster.sl_log_2 this is normal. note that slony abuses the listener notify system, and autovac punts to avoid messing with user sessions. 9.0 has a much cleaner async notify mechanism so won't have this issue. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] printing vector column seems to print the rest of the row too
I'm sorry if this not the appropriate place for this post, but I couldn't figure out where to put this and I didn't see anything online so far... I've been prototyping a small database, with the following table definition: drop table if exists tradeprices cascade; create table tradeprices ( universedate date not null, tradetimestamp timestamp not null, tradeprice real[] not null, primary key (universedate,tradetimestamp) ); When I insert the following row, insert into tradeprices (universedate,tradetimestamp,tradeprice) values ('2010-07-20','2010-07-20 14:31:02','{1076.09,84.1199,100.21,95.74,107.2822,105.2657,11.67,53.75,40.04,61.41,68.176,71.13,39.85,59.41,61.66,63.32,38.92,83.46,49.03,48.6999,108.21,34.18,54.1901,73.52,54.94,40.17,17.35,51.7,42,43.0195,52.419,41.7,41.8,55.57,35.14,28.31,19.17,18.0608,34.2}'); The results are printed as: select universedate,tradetimestamp,tradeprices from tradeprices where date(tradetimestamp) = CURRENT_DATE order by tradetimestamp desc; -[ RECORD 1 ]--+-- universedate | 2010-07-20 tradetimestamp | 2010-07-20 11:10:44 tradeprices| (2010-07-20,2010-07-20 11:10:44,{1076.09,84.1199,100.21,95.74,107.282,105.266,11.67,53.75,40.04,61.41,68.176,71.13,39.85,59.41,61.66,63.32,38.92,83.46,49.03,48.6999,108.21,34.18,54.1901,73.52,54.94,40.17,17.35,51.7,42,43.0195,52.419,41.7,41.8,55.57,35.14,28.31,19.17,18.0608,34.2}) where the tradeprices vector contains, in the print output only, the values of the previous columns in the row (see the first two elements of the tradeprices vector). I've tried connecting to the database and retrieved to test connections using psql, ODBC drivers, RPostgreSQL, etc. and the results come back, depending on the client. I was originally working with 8.3, and have only recently updated to 8.4 Is this a bug, or something that I'm not doing correctly, or both? Respectfully, Jeff. Jeff Hamann jeff.d.ham...@gmail.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] printing vector column seems to print the rest of the row too
Excerpts from Jeff Hamann's message of mar jul 20 17:59:01 -0400 2010: select universedate,tradetimestamp,tradeprices from tradeprices where date(tradetimestamp) = CURRENT_DATE order by tradetimestamp desc; You're selecting the whole row as a column, which is why you get a row back in the third position -- try tradeprice instead of tradeprices. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Problem with pg_try_advisory_lock and two connections (seemingly) getting the same lock
Hi, I currently have a simple queue written ontop of Postgres. Jobs are inserted and workers periodically check for jobs they can do, do them, and then delete the rows. pg_try_advisory_lock is used to (attempt to) stop two workers from doing the same job. (I'm working on moving to a real messaging queue right now, this is more a point of curiosity and education now.) Here is my queue table, CREATE TABLE queue ( id serial NOT NULL PRIMARY KEY, rcvd timestamp with time zone, sent timestamp with time zone, host character varying(32), job character varying(32), arg text ); Here is an example query, SELECT q.* FROM (SELECT id, job, arg FROM queue WHERE job = 'foo' OR job = 'bar' OFFSET 0) AS q WHERE pg_try_advisory_lock(1, q.id) LIMIT 10 (For information on OFFSET 0 see: http://blog.endpoint.com/2009/04/offset-0-ftw.html) Now if I have two workers running I will periodically see that each worker gets a row with the same q.id (and thus does the work). How is that possible? The outer query seemingly does a WHERE on an advisory_lock. Does anyone have any ideas? Am I grossly misusing advisory_locks? Thanks, Brett -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] New DB-design - help and documentation pointers appreciated
On 20/07/10 18:14, Rikard Bosnjakovic wrote: However, I feel that this design is the same design I seem to use for all my databases, and in the end I always find that I designed them wrong from the beginning. The table components feels like that one is going to be locked into a corner; it seems to fixed and not as flexible as I want this database to be. Flexibility in storing and structuring data isn't a strong point of SQL databases. That said, there *are* some workarounds. I've mentioned EAV and why it's ... ugly. A potentially superior option I didn't think to mention before is hstore. You can use a hstore field to store key/value extension data that isn't easy to model in a generic relational way without landing up with hundreds of tiny tables. See: http://www.postgresql.org/docs/8.4/static/hstore.html That way you can store the common stuff in a typical relational form for easy querying and manipulation, but can fall back to key/value for hard-to-model attributes that might be quite specific to particular classes of component. I guess you could even have your subcategories carry a column that listedrequired hstore keys as an array, so that you could require that all components of a particular subtype have a certain list of hstore keys. (I often wish SQL would see some extensions to support a more ... flexible ... representation of data. SQL is great for accessing purely relational data, but it'd be so nice to be able to break the relational mould where required without having to replace the entire database system just because some of your data doesn't quite fit. There are workarounds of a sort (EAV - ugh!, hstore, etc) but the lack of convenient language support limits them somewhat.) -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Problem with pg_try_advisory_lock and two connections (seemingly) getting the same lock
On 21/07/10 07:27, Brett Hoerner wrote: Here is an example query, SELECT q.* FROM (SELECT id, job, arg FROM queue WHERE job = 'foo' OR job = 'bar' OFFSET 0) AS q WHERE pg_try_advisory_lock(1, q.id) LIMIT 10 (For information on OFFSET 0 see: http://blog.endpoint.com/2009/04/offset-0-ftw.html) Now if I have two workers running I will periodically see that each worker gets a row with the same q.id (and thus does the work). How is that possible? The outer query seemingly does a WHERE on an advisory_lock. Does anyone have any ideas? Am I grossly misusing advisory_locks? You kick off two queries at once. Both have subqueries that grab a set of id,job,arg . There's no exclusion at this stage, so they can easily both land up with some or all of the same results. THEN you filter the result. The filter will drop the result list to empty if it can't acquire the lock. Under what circumstances can it not acquire the lock? If another transaction holds it. The first transaction might have grabbe the data, acquired the lock, done its processing, and committed/rolled back to *release* the lock before the second transaction gets around to checking the lock. In this case, the second transaction will happily acquire the lock. Classic race condition. You should probably use one of the existing queuing mechanisms rather than rolling your own, because building a high-performance, reliable queueing mechanism is surprisingly hard to build. A search of the archives here will turn up several options. I've noticed that PGQ from Skytools gets mentioned a lot. -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] what do i need to know about array index?
On 20/07/10 18:27, Prometheus Prometheus wrote: What's with the pseudonym? to my index problem: e.g. a query select id from test_1 where NOT (tag[4]=false OR tag[4] IS NULL); doesnt use the index create index idx_test_1 on test(( NOT (tag[4]=false OR tag[4] IS NULL) )); You should be using: tag[4] IS DISTINCT FROM 't'; if you want to index on 'true' vs 'false or null'. Your current expression is buggy for null tag values, as can be seen by evaluating it step-by-step. tag[4]=false OR tag[4] IS NULL NULL = false OR NULL IS NULL NULL OR TRUE NULL Remember, NULL = false resolves to NULL, and NULL OR TRUE is also NULL. since my index can contain combinations of many tag[] columns it can look like this create index idx_test_2 on test_1(( ( ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND ( NOT (tag[2]=false OR tag[2] IS NULL) ) ) AND NOT ( ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND ( NOT (tag[3]=false OR tag[3] IS NULL) ) ) )); It'll help make it easier to maintain and more comprehensible if you wrap that test up in an IMMUTABLE function. 'LANGUAGE SQL' is convenient for this. which also doesnt get used by the select SELECT id FROM test WHERE ( ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND ( NOT (tag[2]=false OR tag[2] IS NULL) ) ) AND NOT ( ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND ( NOT (tag[3]=false OR tag[3] IS NULL) ) ); and use the IMMUTABLE function in your tests. -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general