[GENERAL] How to debugging a an external C function(IMMUTABLE STRICT )

2012-02-26 Thread Dave Potts
Hi

I have written an external C function to be called by postgres called
using the LANGUAGE 'C' IMMUNTABLE STRICT interface

Most of the time when call it, I get the expected results.  Some times I
get random rubbish in the result set.
Postgres always gets the type of the arguments correct, ie it knowns the
column x is a integer, column y is a float8

I called elog(NOTICE from within my code, the results always look
correct,  so I am assuming that I am sometimes returning a random
pointer, or have got the arguments to BlessTupleDesc,
MemoryContextSwitchTo wrong!

If there any debug support in Postgres to catch this type of thing? 
Are there any useful functions have can be compiled in when building
postgres???

I do not have access to things like Purify

Dave

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: [PERFORM] Disable-spinlocks while compiling postgres 9.1 for ARM Cortex A8

2012-02-26 Thread Jayashankar K B
Ok. I did a manual patch and it Postgres 9.1.1 compiled for me without using 
the --disable-spinlocks option.
Thanks a lot for the patch. :)
By the way, could you please point me to the explanation on the significance of 
spinlocks for Postgres?

Thanks and Regards
Jayashankar

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us]
Sent: 25 February 2012 PM 12:54
To: Jayashankar K B
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Re: [PERFORM] Disable-spinlocks while compiling postgres 
9.1 for ARM Cortex A8

Jayashankar K B jayashankar...@lnties.com writes:
 I tried to apply the patch. I succeeded in patching configure, configure.in 
 and src/include/pg_config.h.in files.
 But while applying the patch for src/include/storage/s_lock.h , I am getting 
 an error.

That patch should apply exactly to 9.1.0 or later.  I think either you messed 
up copying the patch from the web page (note that patch is not forgiving about 
white space...) or else perhaps fooling with the file names messed it up.  You 
shouldn't have to modify the file taken from the patch link at all.  The 
right way to do it is to cd into the top source directory and use
patch -p1 patchfile
which will tell patch how much of the filename to pay attention to (viz, not 
the a/ or b/ parts).

If you get too frustrated, just wait till Monday and grab 9.1.3.

regards, tom lane


Larsen  Toubro Limited

www.larsentoubro.com

This Email may contain confidential or privileged information for the intended 
recipient (s) If you are not the intended recipient, please do not use or 
disseminate the information, notify the sender and delete it from your system.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Four issues why old elephants lack performance: Explanation sought Four issues why old elephants lack performance: Explanation sought

2012-02-26 Thread Andy Colson

On 02/25/2012 06:54 PM, Stefan Keller wrote:

Hi,

Recently Mike Stonebraker identified four areas where old elephants
lack performance [1]:

1. Buffering/paging
2. Locking/Multithreading
3. WAL logging
4. Latches (aka memory locks for concurrent access of btree structures
in buffer pool?).

He claims having solved these issues while retaining SQL and ACID.
But the only issue I understood is #1 by loading all tuples in-memory.
=  Are there any ideas on how to tell Postgres to aggressively load
all data into memory (issue #1)?  
All remaining issues make me wonder.

I actually doubt that there are alternatives even theoretically.
=  Can anyone help explaining me issues 2,3 and 4, their solutions,
and why Postgres would be unable to resolve them?




1. Buffering/paging


PG, and your operating system, already do this for reads.  It also keeps things that are hit harder 
and lets things go that are not hit as much.  On the writing side, you can configure it PG from 
write it! write it NOW!, to running with scissors depending on how safe you 
want to feel.



2. Locking/Multithreading


PG does have some internal structures that it needs to lock (and anytime you 
read lock, think single user access, or one at a time, or slow).  Any time you 
hear about lock contention, it's multiple processes waiting in a line for a 
lock.  If you only had one client, then you really would not need locks.  There 
is where multithreading comes from, but in PG we use multi-process instead of 
multi-thread, but its the same thing.  Two (or more) people are needing to lock 
something so they can really screw with it.  PG does not need as many locks as 
other db's however.  It uses an MVCC architecture so under normal operations 
(insert, update, select, delete) people dont block eacth other. (ie readers 
dont block writers and visa versa).

I don't see locking going away, but there are not many loads that are lock 
bound.  Most database loads are IO bound, and then you'd probably be CPU bound 
before you are lock bound.  (although it can be hard to tell if its a spin lock 
that's making you cpu bound).  I'm sure there are loads that hit lock 
contention, but there are probably ways to mitigate it.  Say you have a process 
that alters the table and adds a new column every two seconds, thing updates a 
single row to add data to the new column just added.  I can see that being lock 
bound.  And a really stupid implementation.


3. WAL logging


PG writes a transaction twice.  Once to WAL and once to the DB.  WAL is a simple and 
quick write, and is only ever used if your computer crashes and PG has to re-play 
transactions to get the db into a good/known state.  Its a safety measure that doesn't 
really take much time, and I don't think I've heard of anyone being WAL bound.  Although 
it does increase IO ops, it's not the biggest usage of IO.  This one falls under 
lets be safe which is something NoSQL did away with.  Its not something I 
want to give up, personally.  I like using a net.


4. Latches


I can only guess at this one.  Its similar to locks I think.  Data structures 
come in different types.  In the old days we only had single user access to 
data structures, then when we wanted two users to access it we just locked it 
to serialize access (one at a time mode), but that does not scale well at all, 
so we invented two new types: lock free and wait free.

An index is stored as a btree.  To insert a new record into the index you have 
to reorganize it (rotate it, sort it, add/delete nodes, etc), and while one 
client is doing that it can make it hard for another to try and search it.  
Lock free (and wait free) let multiple people work on a btree at the same time 
with much less contention.  Wikipedia does a better job of explaining them than 
I could:

http://en.wikipedia.org/wiki/Non-blocking_algorithm

I have no idea if PG uses single user locks or some kind of lock free structure 
for its internals.  I can see different parts of the internals needing 
different levels.

Maybe I'm old fashioned, but I don't see how you'll get rid of these.  You have 
to insert a record.  You have to have 12 people hitting the db at the same 
time.  You have to organize that read/write access somehow so they dont blow 
each other up.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Four issues why old elephants lack performance: Explanation sought Four issues why old elephants lack performance: Explanation sought

2012-02-26 Thread Karsten Hilbert
On Sun, Feb 26, 2012 at 08:37:54AM -0600, Andy Colson wrote:

 3. WAL logging
 
 PG writes a transaction twice.  Once to WAL and once to
 the DB.  WAL is a simple and quick write, and is only ever
 used if your computer crashes and PG has to re-play
 transactions to get the db into a good/known state.  Its a
 safety measure that doesn't really take much time, and I
 don't think I've heard of anyone being WAL bound.  Although
 it does increase IO ops, it's not the biggest usage of IO. 
 This one falls under lets be safe which is something NoSQL
 did away with.  Its not something I want to give up,
 personally.  I like using a net.

And, one could still effectively disable WAL by using
unlogged tables.

Karsten
-- 
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to debugging a an external C function(IMMUTABLE STRICT )

2012-02-26 Thread Tom Lane
Dave Potts dave.po...@pinan.co.uk writes:
 I have written an external C function to be called by postgres called
 using the LANGUAGE 'C' IMMUNTABLE STRICT interface
 Most of the time when call it, I get the expected results.  Some times I
 get random rubbish in the result set.
 If there any debug support in Postgres to catch this type of thing? 

You should pretty much always do development of any C code in a backend
built with --enable-cassert --enable-debug.  In particular that will
turn on clobbering of freed memory, which is really helpful in turning
some types of sometimes-failure into consistent failures that can be
debugged.  That might not be your problem here, but it's worth a try.

I also get the impression that the only debug technique you know about
is inserting printfs.  Learn to use gdb or another debugger to step
through your code --- the learning curve isn't that steep, and the
benefits numerous.  There's useful Postgres-specific info about using
gdb here:
http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Constant value for a partitioned table query inside a plpgsql function

2012-02-26 Thread Clodoaldo Neto
Em 26 de fevereiro de 2012 12:45, Clodoaldo Neto 
clodoaldo.pinto.n...@gmail.com escreveu:

 When I explain a query using a partitioned table the result is the
 expected. That is, only the corrected partition is scanned. But when the
 query is inside a plpgsql function it takes forever to complete suggesting
 it is scanning all partitions.

 create table p (c integer);
 create table p1 (like p);
 alter table p1 add constraint p1c check (c = 1);
 create table p2 (like p);
 alter table p2 add constraint p2c check (c = 2);
 insert into p1 values (1);
 insert into p2 values (2);
 alter table p1 inherit p;
 alter table p2 inherit p;

 The explain shows the expected plan and the select is also very fast:
 (obviously the real query and table are more complex)

 explain select c from p where c = 1;

 A function like this takes very long to complete:

 create or replace function pf() returns integer as
 $body$
 declare
 v constant integer := 1;
 begin
 return (select c from p where c = v);
 end
 $body$
 language plpgsql stable
 cost 100;

 Isn't the constant option to a variable declaration enough to the
 planner? Or else what is the limitation here? Is there some way to see the
 plan for a plpgsql function?


It seems that the only solution is to make the query dynamic:

create or replace function pf() returns integer as
$body$
declare
v constant integer := 1;
r integer;
begin
execute 'select c from p where c = $1' into r using v;
return r;
end
$body$
language plpgsql stable
cost 100;

Using the dynamic solution the actual function executes very fast.

Clodoaldo


Re: [GENERAL] Four issues why old elephants lack performance: Explanation sought Four issues why old elephants lack performance: Explanation sought

2012-02-26 Thread Stefan Keller
Thanks to all who responded so far. I got some more insights from Mike
Stonebraker himself in the USENIX talk Scott pointed to before.
I'd like to revise the four points a little bit I enumerated in my
initial question and to sort out what PG already does or could do:

1. Buffering Pool

To get rid of I/O bounds Mike proposes in-memory database structures.
He argues that it's impossible to be implemented by old elephants
because it would be a huge code rewrite since there is also a need to
store memory structures (instead disk oriented structures).
Now I'm still wondering why PG could'nt realize that probably in
combination with unlogged tables? I don't overview the respective code
but I think it's worthwhile to discuss even if implementation of
memory-oriented structures would be to difficult.

2. Locking

This critique obviously does'nt hold for PG since we have MVCC here already.

3. WAL logging

Here Mike proposes replication over several nodes as an alternative to
WAL which fits nicely with High Availability. PG 9 has built-in
replication but just not for unlogged tables :-

4. Latches

This is an issue I never heard before. I found some notion of latches
in the code but I does'nt seem to be related to concurrently accessing
btree structures as Mike suggests.
So if anyone could confirm that this problem exists producing overhead
I'd be interested to hear.
Mike proposes single-threads running on many cores where each core
processes a non overlapping shard.
But he also calls for ideas to invent btrees which can be processed
concurrently with as less memory locks as possible (instead of looking
to make btrees faster).

So to me the bottom line is, that PG already has reduced overhead at
least for issue #2 and perhaps for #4.
Remain issues of in-memory optimization (#2) and replication (#3)
together with High Availability to be investigated in PG.

Yours, Stefan


2012/2/26 Karsten Hilbert karsten.hilb...@gmx.net:
 On Sun, Feb 26, 2012 at 08:37:54AM -0600, Andy Colson wrote:

 3. WAL logging

 PG writes a transaction twice.  Once to WAL and once to
 the DB.  WAL is a simple and quick write, and is only ever
 used if your computer crashes and PG has to re-play
 transactions to get the db into a good/known state.  Its a
 safety measure that doesn't really take much time, and I
 don't think I've heard of anyone being WAL bound.  Although
 it does increase IO ops, it's not the biggest usage of IO.
 This one falls under lets be safe which is something NoSQL
 did away with.  Its not something I want to give up,
 personally.  I like using a net.

 And, one could still effectively disable WAL by using
 unlogged tables.

 Karsten
 --
 GPG key ID E4071346 @ gpg-keyserver.de
 E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Four issues why old elephants lack performance: Explanation sought Four issues why old elephants lack performance: Explanation sought

2012-02-26 Thread Scott Marlowe
On Sun, Feb 26, 2012 at 1:11 PM, Stefan Keller sfkel...@gmail.com wrote:

 So to me the bottom line is, that PG already has reduced overhead at
 least for issue #2 and perhaps for #4.
 Remain issues of in-memory optimization (#2) and replication (#3)
 together with High Availability to be investigated in PG.

Yeah, the real problem pg has to deal with is that it writes to
disk, and expects that to provide durability, while voltdb (Mike's db
project) writes to multiple machines in memory and expects that to be
durable.  No way a disk subsystem is gonna compete with an in memory
cluster for performance.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] A better COPY?

2012-02-26 Thread Tim Uckun
I have a situation where I am pulling CSV data from various sources
and putting them into a database after they are cleaned up and such.
Currently I am doing bulk of the work outside the database using code
but I think the work would go much faster if I was to use import the
data into temp tables using the COPY command and then cleaning up
there.

The main reason I am not using COPY right now is because postgres will
not allow unprivileged users to issue the COPY from FILENAME.  The
only way I could get around this would be to shell out psql or
something but I dont really want to do that.

The other reason I am not using COPY is because it is cumbersome to
create and maintain tables just for the import.

So I am looking for a solution like this.

1. COPY from a text field in a table like this COPY from (select
text_field from table where id =2) as text_data ...
2. The copy command creates a table after a cursory examination of the
data.  If the data has headers it uses those field names, if the data
does not have headers it uses col1, col2 etc.  Optionally fields and
types could be specified.


Any suggestions?

Cheers.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] A better COPY?

2012-02-26 Thread Alban Hertroys
On 26 Feb 2012, at 23:54, Tim Uckun wrote:

 The main reason I am not using COPY right now is because postgres will
 not allow unprivileged users to issue the COPY from FILENAME.  The
 only way I could get around this would be to shell out psql or
 something but I dont really want to do that.

Use COPY from STDIN and supply the contents of the file after that.

Alban Hertroys

--
The scale of a problem often equals the size of an ego.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Optimise PostgreSQL for fast testing

2012-02-26 Thread Dmytrii Nagirniak
Hi Guys,

Sorry for the late reply.

Thanks to all of you for the help. Appreciate all your suggestions.

So far (with my pretty limited knowledge of PG) I could speed it up a little 
bit (~20% or so comparing to the original installation) only by tweaking the 
settings.

I think it is relatively good keeping in mind that no single line of code has 
been changed.

Just my quick summary. Not interested in query tuning for now, just the DB 
tweaking:
Best perf optimisation - `fsync=off`.
Paralelisation should be considered as the 2nd option after `fsync=off`.
All further optimisations might not be worth the effort unless you know PG well.
RAM Disk didn't improve perf much at all.
As Craig Ringer replied to my question at SO, the PostgreSQL 9.0 High 
Performance is worth the read.
PG has awesome documentation, including Perf related: 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server


So far this is my approach:
Since SQLite has basic FTS support (which I totally missed; thanks for pointing 
that out!) I can go a long way with it and probably won't need PG soon. But 
when I do:
Run most of the specs agains SQLite. Only run specs that rely on PG features 
against PG (which should be minority).
Run full acceptance tests (Cucumber) against a production DB (be it SQLite or 
PG).
Will parallelise both unit and acceptance tests in the future.


Thanks a lot to all of you guys.
Your suggestions, criticism and discussion was really healthy, helpful and to 
the point.


Cheers,
Dmytrii
http://www.ApproachE.com



On 24/02/2012, at 9:25 PM, Simon Riggs wrote:

 On Fri, Feb 24, 2012 at 12:16 AM, Dmytrii Nagirniak dna...@gmail.com wrote:
 
 That's totally fine if PG can't beat SQLite on speed in **this particular
 case**.
 
 The point is that PG can beat SQLite in this test *easily* if you
 choose to use the main architectural difference as an advantage:
 running tests concurrently.
 
 -- 
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services