[GENERAL] How to debugging a an external C function(IMMUTABLE STRICT )
Hi I have written an external C function to be called by postgres called using the LANGUAGE 'C' IMMUNTABLE STRICT interface Most of the time when call it, I get the expected results. Some times I get random rubbish in the result set. Postgres always gets the type of the arguments correct, ie it knowns the column x is a integer, column y is a float8 I called elog(NOTICE from within my code, the results always look correct, so I am assuming that I am sometimes returning a random pointer, or have got the arguments to BlessTupleDesc, MemoryContextSwitchTo wrong! If there any debug support in Postgres to catch this type of thing? Are there any useful functions have can be compiled in when building postgres??? I do not have access to things like Purify Dave -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Re: [PERFORM] Disable-spinlocks while compiling postgres 9.1 for ARM Cortex A8
Ok. I did a manual patch and it Postgres 9.1.1 compiled for me without using the --disable-spinlocks option. Thanks a lot for the patch. :) By the way, could you please point me to the explanation on the significance of spinlocks for Postgres? Thanks and Regards Jayashankar -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: 25 February 2012 PM 12:54 To: Jayashankar K B Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Re: [PERFORM] Disable-spinlocks while compiling postgres 9.1 for ARM Cortex A8 Jayashankar K B jayashankar...@lnties.com writes: I tried to apply the patch. I succeeded in patching configure, configure.in and src/include/pg_config.h.in files. But while applying the patch for src/include/storage/s_lock.h , I am getting an error. That patch should apply exactly to 9.1.0 or later. I think either you messed up copying the patch from the web page (note that patch is not forgiving about white space...) or else perhaps fooling with the file names messed it up. You shouldn't have to modify the file taken from the patch link at all. The right way to do it is to cd into the top source directory and use patch -p1 patchfile which will tell patch how much of the filename to pay attention to (viz, not the a/ or b/ parts). If you get too frustrated, just wait till Monday and grab 9.1.3. regards, tom lane Larsen Toubro Limited www.larsentoubro.com This Email may contain confidential or privileged information for the intended recipient (s) If you are not the intended recipient, please do not use or disseminate the information, notify the sender and delete it from your system. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Four issues why old elephants lack performance: Explanation sought Four issues why old elephants lack performance: Explanation sought
On 02/25/2012 06:54 PM, Stefan Keller wrote: Hi, Recently Mike Stonebraker identified four areas where old elephants lack performance [1]: 1. Buffering/paging 2. Locking/Multithreading 3. WAL logging 4. Latches (aka memory locks for concurrent access of btree structures in buffer pool?). He claims having solved these issues while retaining SQL and ACID. But the only issue I understood is #1 by loading all tuples in-memory. = Are there any ideas on how to tell Postgres to aggressively load all data into memory (issue #1)? All remaining issues make me wonder. I actually doubt that there are alternatives even theoretically. = Can anyone help explaining me issues 2,3 and 4, their solutions, and why Postgres would be unable to resolve them? 1. Buffering/paging PG, and your operating system, already do this for reads. It also keeps things that are hit harder and lets things go that are not hit as much. On the writing side, you can configure it PG from write it! write it NOW!, to running with scissors depending on how safe you want to feel. 2. Locking/Multithreading PG does have some internal structures that it needs to lock (and anytime you read lock, think single user access, or one at a time, or slow). Any time you hear about lock contention, it's multiple processes waiting in a line for a lock. If you only had one client, then you really would not need locks. There is where multithreading comes from, but in PG we use multi-process instead of multi-thread, but its the same thing. Two (or more) people are needing to lock something so they can really screw with it. PG does not need as many locks as other db's however. It uses an MVCC architecture so under normal operations (insert, update, select, delete) people dont block eacth other. (ie readers dont block writers and visa versa). I don't see locking going away, but there are not many loads that are lock bound. Most database loads are IO bound, and then you'd probably be CPU bound before you are lock bound. (although it can be hard to tell if its a spin lock that's making you cpu bound). I'm sure there are loads that hit lock contention, but there are probably ways to mitigate it. Say you have a process that alters the table and adds a new column every two seconds, thing updates a single row to add data to the new column just added. I can see that being lock bound. And a really stupid implementation. 3. WAL logging PG writes a transaction twice. Once to WAL and once to the DB. WAL is a simple and quick write, and is only ever used if your computer crashes and PG has to re-play transactions to get the db into a good/known state. Its a safety measure that doesn't really take much time, and I don't think I've heard of anyone being WAL bound. Although it does increase IO ops, it's not the biggest usage of IO. This one falls under lets be safe which is something NoSQL did away with. Its not something I want to give up, personally. I like using a net. 4. Latches I can only guess at this one. Its similar to locks I think. Data structures come in different types. In the old days we only had single user access to data structures, then when we wanted two users to access it we just locked it to serialize access (one at a time mode), but that does not scale well at all, so we invented two new types: lock free and wait free. An index is stored as a btree. To insert a new record into the index you have to reorganize it (rotate it, sort it, add/delete nodes, etc), and while one client is doing that it can make it hard for another to try and search it. Lock free (and wait free) let multiple people work on a btree at the same time with much less contention. Wikipedia does a better job of explaining them than I could: http://en.wikipedia.org/wiki/Non-blocking_algorithm I have no idea if PG uses single user locks or some kind of lock free structure for its internals. I can see different parts of the internals needing different levels. Maybe I'm old fashioned, but I don't see how you'll get rid of these. You have to insert a record. You have to have 12 people hitting the db at the same time. You have to organize that read/write access somehow so they dont blow each other up. -Andy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Four issues why old elephants lack performance: Explanation sought Four issues why old elephants lack performance: Explanation sought
On Sun, Feb 26, 2012 at 08:37:54AM -0600, Andy Colson wrote: 3. WAL logging PG writes a transaction twice. Once to WAL and once to the DB. WAL is a simple and quick write, and is only ever used if your computer crashes and PG has to re-play transactions to get the db into a good/known state. Its a safety measure that doesn't really take much time, and I don't think I've heard of anyone being WAL bound. Although it does increase IO ops, it's not the biggest usage of IO. This one falls under lets be safe which is something NoSQL did away with. Its not something I want to give up, personally. I like using a net. And, one could still effectively disable WAL by using unlogged tables. Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to debugging a an external C function(IMMUTABLE STRICT )
Dave Potts dave.po...@pinan.co.uk writes: I have written an external C function to be called by postgres called using the LANGUAGE 'C' IMMUNTABLE STRICT interface Most of the time when call it, I get the expected results. Some times I get random rubbish in the result set. If there any debug support in Postgres to catch this type of thing? You should pretty much always do development of any C code in a backend built with --enable-cassert --enable-debug. In particular that will turn on clobbering of freed memory, which is really helpful in turning some types of sometimes-failure into consistent failures that can be debugged. That might not be your problem here, but it's worth a try. I also get the impression that the only debug technique you know about is inserting printfs. Learn to use gdb or another debugger to step through your code --- the learning curve isn't that steep, and the benefits numerous. There's useful Postgres-specific info about using gdb here: http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Constant value for a partitioned table query inside a plpgsql function
Em 26 de fevereiro de 2012 12:45, Clodoaldo Neto clodoaldo.pinto.n...@gmail.com escreveu: When I explain a query using a partitioned table the result is the expected. That is, only the corrected partition is scanned. But when the query is inside a plpgsql function it takes forever to complete suggesting it is scanning all partitions. create table p (c integer); create table p1 (like p); alter table p1 add constraint p1c check (c = 1); create table p2 (like p); alter table p2 add constraint p2c check (c = 2); insert into p1 values (1); insert into p2 values (2); alter table p1 inherit p; alter table p2 inherit p; The explain shows the expected plan and the select is also very fast: (obviously the real query and table are more complex) explain select c from p where c = 1; A function like this takes very long to complete: create or replace function pf() returns integer as $body$ declare v constant integer := 1; begin return (select c from p where c = v); end $body$ language plpgsql stable cost 100; Isn't the constant option to a variable declaration enough to the planner? Or else what is the limitation here? Is there some way to see the plan for a plpgsql function? It seems that the only solution is to make the query dynamic: create or replace function pf() returns integer as $body$ declare v constant integer := 1; r integer; begin execute 'select c from p where c = $1' into r using v; return r; end $body$ language plpgsql stable cost 100; Using the dynamic solution the actual function executes very fast. Clodoaldo
Re: [GENERAL] Four issues why old elephants lack performance: Explanation sought Four issues why old elephants lack performance: Explanation sought
Thanks to all who responded so far. I got some more insights from Mike Stonebraker himself in the USENIX talk Scott pointed to before. I'd like to revise the four points a little bit I enumerated in my initial question and to sort out what PG already does or could do: 1. Buffering Pool To get rid of I/O bounds Mike proposes in-memory database structures. He argues that it's impossible to be implemented by old elephants because it would be a huge code rewrite since there is also a need to store memory structures (instead disk oriented structures). Now I'm still wondering why PG could'nt realize that probably in combination with unlogged tables? I don't overview the respective code but I think it's worthwhile to discuss even if implementation of memory-oriented structures would be to difficult. 2. Locking This critique obviously does'nt hold for PG since we have MVCC here already. 3. WAL logging Here Mike proposes replication over several nodes as an alternative to WAL which fits nicely with High Availability. PG 9 has built-in replication but just not for unlogged tables :- 4. Latches This is an issue I never heard before. I found some notion of latches in the code but I does'nt seem to be related to concurrently accessing btree structures as Mike suggests. So if anyone could confirm that this problem exists producing overhead I'd be interested to hear. Mike proposes single-threads running on many cores where each core processes a non overlapping shard. But he also calls for ideas to invent btrees which can be processed concurrently with as less memory locks as possible (instead of looking to make btrees faster). So to me the bottom line is, that PG already has reduced overhead at least for issue #2 and perhaps for #4. Remain issues of in-memory optimization (#2) and replication (#3) together with High Availability to be investigated in PG. Yours, Stefan 2012/2/26 Karsten Hilbert karsten.hilb...@gmx.net: On Sun, Feb 26, 2012 at 08:37:54AM -0600, Andy Colson wrote: 3. WAL logging PG writes a transaction twice. Once to WAL and once to the DB. WAL is a simple and quick write, and is only ever used if your computer crashes and PG has to re-play transactions to get the db into a good/known state. Its a safety measure that doesn't really take much time, and I don't think I've heard of anyone being WAL bound. Although it does increase IO ops, it's not the biggest usage of IO. This one falls under lets be safe which is something NoSQL did away with. Its not something I want to give up, personally. I like using a net. And, one could still effectively disable WAL by using unlogged tables. Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Four issues why old elephants lack performance: Explanation sought Four issues why old elephants lack performance: Explanation sought
On Sun, Feb 26, 2012 at 1:11 PM, Stefan Keller sfkel...@gmail.com wrote: So to me the bottom line is, that PG already has reduced overhead at least for issue #2 and perhaps for #4. Remain issues of in-memory optimization (#2) and replication (#3) together with High Availability to be investigated in PG. Yeah, the real problem pg has to deal with is that it writes to disk, and expects that to provide durability, while voltdb (Mike's db project) writes to multiple machines in memory and expects that to be durable. No way a disk subsystem is gonna compete with an in memory cluster for performance. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] A better COPY?
I have a situation where I am pulling CSV data from various sources and putting them into a database after they are cleaned up and such. Currently I am doing bulk of the work outside the database using code but I think the work would go much faster if I was to use import the data into temp tables using the COPY command and then cleaning up there. The main reason I am not using COPY right now is because postgres will not allow unprivileged users to issue the COPY from FILENAME. The only way I could get around this would be to shell out psql or something but I dont really want to do that. The other reason I am not using COPY is because it is cumbersome to create and maintain tables just for the import. So I am looking for a solution like this. 1. COPY from a text field in a table like this COPY from (select text_field from table where id =2) as text_data ... 2. The copy command creates a table after a cursory examination of the data. If the data has headers it uses those field names, if the data does not have headers it uses col1, col2 etc. Optionally fields and types could be specified. Any suggestions? Cheers. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] A better COPY?
On 26 Feb 2012, at 23:54, Tim Uckun wrote: The main reason I am not using COPY right now is because postgres will not allow unprivileged users to issue the COPY from FILENAME. The only way I could get around this would be to shell out psql or something but I dont really want to do that. Use COPY from STDIN and supply the contents of the file after that. Alban Hertroys -- The scale of a problem often equals the size of an ego. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimise PostgreSQL for fast testing
Hi Guys, Sorry for the late reply. Thanks to all of you for the help. Appreciate all your suggestions. So far (with my pretty limited knowledge of PG) I could speed it up a little bit (~20% or so comparing to the original installation) only by tweaking the settings. I think it is relatively good keeping in mind that no single line of code has been changed. Just my quick summary. Not interested in query tuning for now, just the DB tweaking: Best perf optimisation - `fsync=off`. Paralelisation should be considered as the 2nd option after `fsync=off`. All further optimisations might not be worth the effort unless you know PG well. RAM Disk didn't improve perf much at all. As Craig Ringer replied to my question at SO, the PostgreSQL 9.0 High Performance is worth the read. PG has awesome documentation, including Perf related: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server So far this is my approach: Since SQLite has basic FTS support (which I totally missed; thanks for pointing that out!) I can go a long way with it and probably won't need PG soon. But when I do: Run most of the specs agains SQLite. Only run specs that rely on PG features against PG (which should be minority). Run full acceptance tests (Cucumber) against a production DB (be it SQLite or PG). Will parallelise both unit and acceptance tests in the future. Thanks a lot to all of you guys. Your suggestions, criticism and discussion was really healthy, helpful and to the point. Cheers, Dmytrii http://www.ApproachE.com On 24/02/2012, at 9:25 PM, Simon Riggs wrote: On Fri, Feb 24, 2012 at 12:16 AM, Dmytrii Nagirniak dna...@gmail.com wrote: That's totally fine if PG can't beat SQLite on speed in **this particular case**. The point is that PG can beat SQLite in this test *easily* if you choose to use the main architectural difference as an advantage: running tests concurrently. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services