[GENERAL] Dynamic/polymorphic record/composite return types for C user-defined-functions

2013-04-02 Thread Stephen Scheck
Hi,

I'm trying to write some user-defined functions in C which operate on a
large object (so their common first argument will be an OID referencing an
object in the pg_largeobject catalog table created with lo_create()) and
return either a single row or a set depending on the function. Depending on
the contents of the BLOB, some of the functions have a need to return
polymorphic number column(s) as part of their result row (i.e. it could be
an integer, real or double depending on the input BLOB).

I've tried various approaches for this but none of them quite work the way
I need and I'm wondering if I'm missing a fundamental bit of understanding
of Postgres' type system or it simply doesn't support what I want to do.
Here's what I've tried:

1) Declare the function as RETURNS my_type (or RETURNS SETOF my_type),
my_type having been defined with CREATE TYPE my_type AS ... with the column
defined as a specific number type (integer, real, double precision, etc.).
This works as I want, but only allows supporting the specific number type
declared (since function signature polymorphism can only differentiate by
input types, and any* types are not allowed in CREATE TYPE definitions).

2) Declare the function as RETURNS an anonymous row type, via OUT
parameters or RETURNS TABLE. Declare the polymorphic number column as
anynonarray. The problem here is without a polymorphic IN parameter, the
OUT type cannot be resolved. I worked around this by adding DEFAULT
NULL::integer IN parameter which satisfies CREATE FUNCTION and calls but
doesn't propagate the correct type through the FunctionCallInfo (I can't
recall the exact error message but it didn't work).

3) Declare the function as RETURNS an anonymous row type, via OUT
parameters or RETURNS TABLE. Declare the polymorphic number column as
any, which doesn't enforce correspondence between IN and OUT parameters.
Doesn't work - when I call the function I get this: ERROR: cannot display
a value of type any. I don't think this would work even if the column
isn't in the select-list (i.e. just used as a join or filter condition)
since if I do an explicit cast, I get this error message: 'cannot cast type
any to integer'.

As an aside, does this imply any as an OUT parameter has no use?

4) Declare the function as RETURNS RECORD or RETURNS SETOF RECORD. Use
CreateTemplateTupleDesc()/BlessTupleDesc() to dynamically create a tuple
description on the fly and return it. Depending on call context, I get
different error messges:

SELECT * FROM info(lo_oid);
ERROR: a column definition list is required for functions returning record

-- or

SELECT (info(lo_oid_column)).* FROM test_table;
ERROR: record type has not been registered

I'm out of ideas. Isn't this the kind of dynamic behavior for which
CreateTemplateTupleDesc()/BlessTupleDesc() is intended?

Any suggestions appreciated.

Cheers,
-Steve


Re: [GENERAL] Money casting too liberal?

2013-04-02 Thread Gavin Flower

On 29/03/13 12:39, Jasen Betts wrote:

On 2013-03-28, Gavin Flower gavinflo...@archidevsys.co.nz wrote:


Hmm... This should optionally apply to time. e.g.
time_i_got_up_in_the_morning should reflect the time zone where I got up
- if I got up at 8am NZ time then this should be displayed, not 12pm (12
noon) to someone in Los Angeles or 3am in Tokyo! (have a 'localtime'
data type?- possibly add the timezone code if displayed in a different
time zone.)

it was 12 noon in LA when you got up.
if you want the local time of the even you can specfy where you want it

  at time zone 'Pacific/Auckland'
  at time zone 'NZDT'-- note: some names are ambiguous eg: 'EST'
  or
  at time zone '-13:00'  -- note: offsets are ISO, not POSIX

getting the local time of the even This requires that you store the locale, 
zone name , or offset when
you store the time.

or you could just cast it to text when you store it...


how confusing is 'EST' ?
worse than this:

set datestyle to 'sql,dmy';
set time zone 'Australia/Brisbane';
select '20130101T00Z'::timestamptz;
set time zone 'Australia/Sydney';
select '20130101T00Z'::timestamptz;
set time zone 'America/New_York';
select '20130101T00Z'::timestamptz;

Sorry, I was at my Mum's for a few days with 'limited' Internet access - 
they have a much lower quota than I normally have...


Yes I could store the timezone separately, but semantically it makes 
sense to store the local time  its time zone as a unit, less likely to 
have bugs when someone else (or myself in a years time) go to make 
amendments.


Storing in text is fine for display, but if I then have to also relate 
different local times to a common timeline, then text would not be so 
convenient.


To be honest this is current moot, as I don't have a need for this at 
the moment. Having said that, I realize I am tempting the gods into 
making so that I do need it!



Cheers,
Gavin



Re: [GENERAL] Money casting too liberal?

2013-04-02 Thread Gavin Flower

On 30/03/13 04:08, Gavan Schneider wrote:

Some thoughts.

The current MONEY type might be considered akin to ASCII. Perfect for 
a base US centric accounting system where there are cents and dollars 
and no need to carry smaller fractions. As discussed, there are some 
details that could be refined.


When it comes to this type being used in full blown money systems it 
lacks the ability to carry fractions of cents and keep track of 
currencies. It also needs to play nicer with other exact types such as 
numeric, i.e., no intermediate calculations as real.


Therefore the discussion is really about the desired role for the 
MONEY type. Should it be refined in its current dallar and cents mode? 
or, be promoted to a more universal role (akin to a shift from  ASCII 
to UTF)?


If there is merit in making MONEY work for most situations involving 
financial transactions I think the following might apply:


- keep integer as the underlying base type (for performance)

- generalise the decimal multiplier of a MONRY column so a specific 
MONEY column can be what its creator wants (from partial cents to 
millions of dollars/Yen/Other, along with rounding/truncating rules as 
required by r the user of his/her external agencies)


- define the currency for a given column and only allow this to change 
in defined ways, and specifically forbid implicit changes such as 
would arise from altering LOCALE information


- ensure the MONEY type plays nice with other exact precision types, 
i.e., convert to REAL/FLOAT as a very last resort



Personally I don't think it is appropriate for the MONEY type to have 
variable characteristics (such as different currencies) within a given 
column, rather the column variable should define the currency along 
with the desired decimal-multiplier and whatever else is required. The 
actual values within the column remain as simple integers. This is 
mostly based on performance issues. If the MONRY type is to be used it 
has to offer real performance benefits over bespoke NUMERIC applications.


Regards
Gavan Schneider




I agree 100%.

In the bad old days when I was a COBOL programmer we always stored money 
in the COBOL equivalent of an integer (numeric without a fractional 
part) to avoid round off, but we displayed with a decimal point to 
digits to the left.  So storing as an integer (actually bigint would be 
required) is a good idea, with parameters to say how many effective 
digits in the fractional part, and how many fractional digits to display 
etc. - as you said.



Cheers,
Gavin


Re: [GENERAL] Money casting too liberal?

2013-04-02 Thread Gavin Flower

On 30/03/13 11:30, Gavan Schneider wrote:

On 29/3/13 at 3:32 AM, D'Arcy J.M. Cain wrote:


On Fri, 29 Mar 2013 11:46:40 -0400 Tom Lane wrote:

Well, this has been discussed before, and the majority view every
time has been that MONEY is a legacy thing that most people would
rather rip out than sink a large amount of additional effort into.


The only reason I have tried to explore these ideas is that the type 
is currently too quirky for most use cases. So I must agree that 
remove/ignore is the least work option. An argument for making the 
type more useful can be made by analogy to the geolocation add-in 
type. Most never go there but those who need to do so seem to prefer 
the builtin functionality over hand coding the same behaviour with 
columns of arrays that just happen to contain location data.



It has some use-cases but they are narrow, and it's not clear how
much wider the use-cases would be if we tried to generalize it.


A well designed and specific tool can be worth the effort.

The use cases include:

Financial data, accounts in a single currency, i.e., the money 
column in a transaction


Multi currency data, i.e., keeping track of transactions across 
several currencies.
specifically we are NOT doing conversions, what arrives/leaves 
as $ or ¥ stays that way,
this implies the dB has tables for each area of operation or 
columns for each currency


One thing the type should not attempt or allow any implicit 
transforming of alues. Mostly a currency change is a transaction and 
whenever it happens it has to be recored as such, e.g., so many ¥ 
leave their column, appropriate $ are added to their column, and 
commission $/¥ is added to its column, also included will be: exchange 
rate reference time-stamp journal reference, etc. A constraint could 
be constructed to ensure the double entry book keeping zero sum 
convention has been maintained across the whole transaction.


One time this might not be so detailed is for a VIEW where something 
akin to total worth is being reported. In cases like this the exchange 
rates would usually be in their table and the business rules would 
dictate which one is to be used to build the VIEW, e.g., end of month 
report, and it might be shown with all values in a single currency 
depending on the company's HQ.




I wonder if our vision isn't a little tunneled here.  Using this type
for money is, perhaps, a specialized use and the type should really be
called something else and modified to remove all connotations of money
from it.  So...




- Drop the currency symbol
- Allow number of decimals to be defined once for the column
- Don't use locale except to specify decimal separator (',' vs. '.')

Mostly this is cosmetic and only relevant for parsing text on data 
entry or default formatting with SELECT on the command line. The power 
of the class is that none of this is in the data other than as dB 
column flags. The values themselves are integer. The class is meant to 
keep the books moving right along.



- Allow operations against numeric


Whatever else is done this should happen.


Not sure what to rename it to.  Decimal would be good if it wasn't
already in use.  Maybe DecimalInt.

I don't think there is much use for another fixed precision integral 
type. NUMERIC does a good job when INTEGER isn't suitable. If this 
exercise is worth anything then MONEY should just do its job better so 
people who track money (and there is an awful lot of them) will find 
it useful.



My own experience with this sort of thing leads me to think that
real applications dealing with a variety of currencies will be
needing to store additional details, such as the exact exchange
rate that applied to a particular transaction.  So while merely


Seems like something that can be stored in a different column.


Exactly. We to think this through as would a real user.

If the business is receiving money from multiple regions then there 
will be rows which show the currency, number of units (numeric type 
since the column is not devoted to a specific currency), transaction 
tracing data, exchange reference (another table), 
amt_received::MONEY('USD','D2'), 
amt_transaction_fee::MONEY('USD','D3'), etc.


Within the accounts of the organisation the MONEY columns are likely 
to be in a single currency with movements between ledgers in the time 
honoured fashion of adding to this while removing the same from 
other(s) so all money entries add to zero across the row. Movements 
between currencies are just another transaction as detailed above.


I have sketched something of a notation for MONEY columns along these 
lines:


amt_received MONEY (CURRENCY-- e.g., 'USD' 'AUD' 'YEN' ...
[,SCALE -- default as per currency, 
e.g. USD 2 decimals
-- but could be used to see 
money in bigger units
-- such as '000s (e.g., that 
end-of-month 

Re: [GENERAL] Money casting too liberal?

2013-04-02 Thread Gavin Flower

On 30/03/13 08:36, Michael Nolan wrote:

On 3/27/13, Steve Crawford scrawf...@pinpointresearch.com wrote:



Somewhat more worrisome is the fact that it automatically rounds input
(away from zero) to fit.

select '123.456789'::money;
money
-
   $123.46

So does casting to an integer:

select 1.25::integer
;
int4

1

And then there's this:

create table wkdata
(numval numeric(5,2))

CREATE TABLE
Time: 6.761 ms
nolan= insert into wkdata
nolan- values (123.456789);
INSERT 569625265 1
Time: 4.063 ms
nolan= select * from wkdata;
select * from wkdata;
numval
--
123.46

So rounding a money field doesn't seem inconsistent with other data types.
--
Mike Nolan


In New Zealand at one point we rounded to the nearst 5 cents now to 10 
cents, probably in a few years we will round to the nearest 20c or 
50c...  Not sure how people, if they ever did, coped with printing 
values before or after the change in the value to be rounded (say to the 
nearest 5c then the next day to the nearest 10c)!


There are many rounding modes, from Java (Enum RoundingMode):

CEILING: Rounding mode to round towards positive infinity.

   DOWN: Rounding mode to round towards zero.

  FLOOR: Rounding mode to round towards negative infinity.

  HALF_DOWN: Rounding mode to round towards nearest neighbor unless 
both neighbors are equidistant, in which case round down.


  HALF_EVEN: Rounding mode to round towards the nearest neighbor 
unless both neighbors are equidistant, in which case, round towards the 
even neighbor.


HALF_UP: Rounding mode to round towards nearest neighbor unless 
both neighbors are equidistant, in which case round up.


UNNECESSARY: Rounding mode to assert that the requested operation has an 
exact result, hence no rounding is necessary.


 UP: Rounding mode to round away from zero.


Re: [GENERAL] Problem with pg_basebackup and streaming replication. (9.2.3 / win64)

2013-04-02 Thread Tore Halvorsen
Ah, this was related to file permissions. The service is running as NETWORK
SERVICE, but this user didn't have access to the tablespace-directory.


On Mon, Apr 1, 2013 at 3:14 PM, Adrian Klaver adrian.kla...@gmail.comwrote:

 On 03/31/2013 04:47 PM, Tore Halvorsen wrote:

 Good idea, but both master and the two slaves are separate machines.


 CCing the list.
 On the second machine does the tablespace path already exist and does it
 already have files in it?


 On Mar 31, 2013 10:21 PM, Adrian Klaver adrian.kla...@gmail.com
 mailto:adrian.klaver@gmail.**com adrian.kla...@gmail.com wrote:

 On 03/31/2013 01:15 PM, Tore Halvorsen wrote:

 Hi,

 I'm having problem setting up a slave on windows.
 I'm doing a pg_basebackup and this seems to work, but when I'm
 starting
 the slave with a recovery.conf-file, I'm getting a redo-error
 with FATAL
 creating file (path to tablespace) File Exists. This is
 reproducible.

 The first slave worked perfectly, but number two just fails with
 this -
 any ideas?


 Are you running both slaves on the same machine and pointing both at
 the same tablespace?




 --
 Eld på åren og sol på eng gjer mannen fegen og fjåg. [Jøtul]
 demo 2013 Tore Halvorsen || +052 0553034554



 --
 Adrian Klaver
 adrian.kla...@gmail.com 
 mailto:adrian.klaver@gmail.**comadrian.kla...@gmail.com
 



 --
 Adrian Klaver
 adrian.kla...@gmail.com




-- 
Eld på åren og sol på eng gjer mannen fegen og fjåg. [Jøtul]
demo 2013 Tore Halvorsen || +052 0553034554


[GENERAL] sql text

2013-04-02 Thread Ray Stell
Forwarding a post seen on an oracle list this morning as Stephane has been quite
an inspiration over the years and he is covering postgresql in this text:

Some list members may be adjunct or full-time faculty members and 
  interested by the fact that I'm currently working on a 450-page textbook 
  on Database Programming (title SQL Success, subtitle Database 
  Programming Proficiency) that I will publish under the name of my 
  company in a few months - the target is the Database 
  Fundamentals/Database Programming undergraduate and continuous 
  education courses (besides Oracle I cover SQL Server, MySQL, PostgreSQL, 
  DB2 - and SQLite). I'm short on theory and long on correctness and 
  efficiency; my idea of a successful database instruction isn't being 
  able to parrot the definition of 23 normal forms and requiring a 
  procedure and five cursors to perform 20 times slower what can be done 
  in a single query.
  The book is quite solid on SQL, I can say; some stuff can probably be 
  considered advanced (those of you who don't teach may wish to refer 
  their favorite Java developers to it ...)
   The book will probably be published around July (a few things to 
  finish) but I am beginning to raise awareness about it, as well as the 
  1,000+ slides that come with it for instructors (the kind of slides of 
  my Youtube videos, not the dreadful Word-outline-to-bullet-points slides 
  with a static diagram here and there too often provided by textbook 
  publishers), and the eco-system I am building around it (which includes 
  an SQL sandbox suitable for online classes).
  
   All details, including chapter samples (posted as they return from 
  proof-reading), on http://edu.konagora.com; there is a special section 
  for instructors with full chapters and  a lot of additional stuff, 
  exercises, etc. (requires a college email address and I check manually 
  that you are a faculty member and not a student). The SQL sandbox is 
  publicly accessible, if you need to quickly test SQL developers during 
  job interviews ...
  
  Hope that some of you will find my efforts useful ...
  
  -- 
  Stephane Faroult
  RoughSea Ltd http://www.roughsea.com
  Konagora http://www.konagora.com
  RoughSea Channel on Youtube http://www.youtube.com/user/roughsealtd


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] create temporary table problem

2013-04-02 Thread JPrasanna Venkatesan
Dear All,

I am usnig PGSQL 9.0

When I tried to create a temporary table it is throwing the following error

ERROR:  could not create directory
pg_tblspc/16385/PG_9.0_201008051/20304: No such file or directory


Whole snippet is like this

ctfdb= create temporary table assoc ( origin varchar(32), target
varchar(255), type varchar(128), descr text, generation int, relid
varchar(32) );
ERROR:  could not create directory
pg_tblspc/16385/PG_9.0_201008051/20304: No such file or directory


The same query work in PGSQL 8.0

Please guide me.

Regards,
J Prasanna Venkatesan


Re: [GENERAL] Trigger of Transaction

2013-04-02 Thread Joe Van Dyk
On Mon, Apr 1, 2013 at 8:41 PM, Juan Pablo Cook juamp...@gmail.com wrote:

 Hi everyone! I need your help with this problem.

 I'm using PostgreSQL *9.2 Server*  the latest jdbc
 driver: postgresql-9.2-1002.jdbc4.jar

 I have a many to one relation. I have this piece of code:

 con.setAutoCommit(false); //transaction block start

 // Make an insert to one table (Vector)
 // After that I insert the childs of the first table with their parent_id
 like the FK.
 con.commit(); //transaction block end

 I have this Trigger:

 CREATE *TRIGGER *trigger_update_index *AFTER INSERT*
 ON Vector FOR EACH ROW
 EXECUTE PROCEDURE update_index();

 CREATE OR REPLACE FUNCTION *update_index*() RETURNS *TRIGGER *AS
 $update_index$
 DECLARE
 BEGIN
  -- Make something
 END;
 $update_index$ LANGUAGE plpgsql;

 What's the problem? that when the trigger fire only the Parent (Vector)
 was inserted an not the childs :S so I need that records to be inserted to
 work in my function.

 I'm trying to make a Trigger, only to *execute after ALL the transaction*.
 So, after all the INSERTs INTO (like 5 or 10) I want to launch my function.
 I found some information in google about this: *Constraint Trigger*,
 that perhaps I can tell some rules before triggering but I don't know if it
 is what I need to and also don't know how to code that.


create constraint trigger my_trigger_name

after insert on products

deferrable

for each row

execute procedure blah();

constraint triggers let you change when the trigger executes.

http://www.postgresql.org/docs/9.2/static/sql-createtrigger.html says When
the CONSTRAINT option is specified, this command creates a *constraint
trigger*. This is the same as a regular trigger except that the timing of
the trigger firing can be adjusted using SET
CONSTRAINTShttp://www.postgresql.org/docs/9.2/static/sql-set-constraints.html.
Constraint triggers must be AFTER ROW triggers. They can be fired either at
the end of the statement causing the triggering event, or at the end of the
containing transaction; in the latter case they are said to be *deferred*.
A pending deferred-trigger firing can also be forced to happen immediately
by using SET CONSTRAINTS. Constraint triggers are expected to raise an
exception when the constraints they implement are violated.




 I appreciate your help a lot.

 Thanks ;)

 JP Cook




Re: [GENERAL] Using varchar primary keys.

2013-04-02 Thread Joe Van Dyk
On Mon, Apr 1, 2013 at 1:11 PM, Tim Uckun timuc...@gmail.com wrote:




 On Tue, Apr 2, 2013 at 8:35 AM, jesusthefrog jesusthef...@gmail.comwrote:

 On the topic of 'natural' versus 'synthetic' primary keys, I am generally
 in the camp that an extra ID field won't cost you too much, and while one
 may not need it for a simple table (i.e. id, name) one might add any number
 of columns later, and you'll be glad to have it.


 Nothing prevents you from adding more columns if you use varchar primary
 keys.



 My preferred method is to give every table an ID column of UUID type and
 generate a UUID using the uuid-ossp contrib module. This also prevents
 someone not familiar with the database design from using an ID somewhere
 they should not (as is possible with natural PKs) or treating the ID as an
 integer, not an identifier (as is all too common with serial integers).



 This would be a concern if you had multi master writes . As far as I know
 Postgres does not have a true multi master replication system so all the
 writes have to happen on one server right?

 As for UUIDs I use them sometimes but I tend to also use one serial column
 because when I am communicating with people it makes it so much easier to
 say dealer number X than dealer number SOME_HUGE_UUID.  I often have to
 talk to people about the data and UUIDs make it very difficult to
 communicate with humans.


I've been wishing for a smaller uuid type for a while. Say you need to
assign a Order #. Customers might be referencing the number, so you don't
want it to be too long. But you don't want Order #'s to be guessable or
have the characters/digits be transposed accidently.

I've been using a unique text column with a default of random_characters(12)

CREATE OR REPLACE FUNCTION public.random_characters(length integer)

 RETURNS text

 LANGUAGE sql

 STABLE

AS $function$

SELECT array_to_string(array((

  SELECT SUBSTRING('abcdefghjkmnpqrstuvwxyz23456789'

FROM mod((random()*31)::int, 31)+1 FOR 1)

  FROM generate_series(1, $1))),'');

$function$;

This seems to work ok. I don't allow 1's or l's or 0's or O's, as they can
easily be mistyped or misread.


Re: [GENERAL] Dynamic/polymorphic record/composite return types for C user-defined-functions

2013-04-02 Thread Tom Lane
Stephen Scheck singularsyn...@gmail.com writes:
 I'm trying to write some user-defined functions in C which operate on a
 large object (so their common first argument will be an OID referencing an
 object in the pg_largeobject catalog table created with lo_create()) and
 return either a single row or a set depending on the function. Depending on
 the contents of the BLOB, some of the functions have a need to return
 polymorphic number column(s) as part of their result row (i.e. it could be
 an integer, real or double depending on the input BLOB).

 I've tried various approaches for this but none of them quite work the way
 I need and I'm wondering if I'm missing a fundamental bit of understanding
 of Postgres' type system or it simply doesn't support what I want to do.

It doesn't.  Type analysis happens at parse time, not at run time, so
you cannot expect a query variable's data type to be determined by the
contents of some data value not seen until runtime.

The only way I can see to get this to work is a hack similar to common
usage of dblink: you declare the function as returning RECORD or SETOF
RECORD, and then the calling query has to specify an AS clause that
shows what column type(s) it's expecting to get back on this particular
call.  That works, sorta, for dblink usages where you're writing
SELECT ... FROM dblink('some particular SQL command') AS ...
and so you know what you're expecting to get from the remote SQL
command.  But it's certainly ugly, and you haven't said enough about
your use-case to tell if this is workable for you or not.

If you're only worried about numbers, is it really so critical to
preserve the datatype?  You could coerce 'em all to numeric to dodge the
problem, albeit at some loss of efficiency.

Another thought here is that if you don't try to expand a record value,
the need for the parser to know its column types goes away; that is,
if you just write

SELECT function_returning_record(...) FROM ...

and not

SELECT (function_returning_record(...)).* FROM ...

I think that the run-time-blessed-record-type hack will work okay.
Of course that greatly limits what you can do with the result in SQL,
but if you just need to ship it to a client it might be all right.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using varchar primary keys.

2013-04-02 Thread Merlin Moncure
On Tue, Apr 2, 2013 at 10:34 AM, Joe Van Dyk j...@tanga.com wrote:
 On Mon, Apr 1, 2013 at 1:11 PM, Tim Uckun timuc...@gmail.com wrote:




 On Tue, Apr 2, 2013 at 8:35 AM, jesusthefrog jesusthef...@gmail.com
 wrote:

 On the topic of 'natural' versus 'synthetic' primary keys, I am generally
 in the camp that an extra ID field won't cost you too much, and while one
 may not need it for a simple table (i.e. id, name) one might add any number
 of columns later, and you'll be glad to have it.


 Nothing prevents you from adding more columns if you use varchar primary
 keys.



 My preferred method is to give every table an ID column of UUID type and
 generate a UUID using the uuid-ossp contrib module. This also prevents
 someone not familiar with the database design from using an ID somewhere
 they should not (as is possible with natural PKs) or treating the ID as an
 integer, not an identifier (as is all too common with serial integers).



 This would be a concern if you had multi master writes . As far as I know
 Postgres does not have a true multi master replication system so all the
 writes have to happen on one server right?

 As for UUIDs I use them sometimes but I tend to also use one serial column
 because when I am communicating with people it makes it so much easier to
 say dealer number X than dealer number SOME_HUGE_UUID.  I often have to
 talk to people about the data and UUIDs make it very difficult to
 communicate with humans.


 I've been wishing for a smaller uuid type for a while. Say you need to
 assign a Order #. Customers might be referencing the number, so you don't
 want it to be too long. But you don't want Order #'s to be guessable or have
 the characters/digits be transposed accidently.

 I've been using a unique text column with a default of random_characters(12)

 CREATE OR REPLACE FUNCTION public.random_characters(length integer)
  RETURNS text
  LANGUAGE sql
  STABLE
 AS $function$
 SELECT array_to_string(array((
   SELECT SUBSTRING('abcdefghjkmnpqrstuvwxyz23456789'
 FROM mod((random()*31)::int, 31)+1 FOR 1)
   FROM generate_series(1, $1))),'');
 $function$;

 This seems to work ok. I don't allow 1's or l's or 0's or O's, as they can
 easily be mistyped or misread.

This is pseudo random and can be guessed, which is maybe dangerous
depending on circumstance.  For stronger random stream go to
pgcrypto.gen_random_bytes().  Also, now you have to worry about
collisions -- the whole point of uuid is to try and keep you from
having to deal with that.

My historical comments in this debate are noted.  To summarize, I
strongly believe that natural keys are often (but not always) better.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Money casting too liberal?

2013-04-02 Thread John R Pierce

On 4/2/2013 12:50 AM, Gavin Flower wrote:
In the bad old days when I was a COBOL programmer we always stored 
money in the COBOL equivalent of an integer (numeric without a 
fractional part) to avoid round off, but we displayed with a decimal 
point to digits to the left. So storing as an integer (actually bigint 
would be required) is a good idea, with parameters to say how many 
effective digits in the fractional part, and how many fractional 
digits to display etc. - as you said.


COBOL Numeric was BCD.   same as NUMERIC in SQL (yes, I know postgresql 
internally uses a base 1 notation for this, storing it as an array 
of short ints, but effectively its equivalent to BCD).






--
john r pierce  37N 122W
somewhere on the middle of the left coast



Re: [GENERAL] Dynamic/polymorphic record/composite return types for C user-defined-functions

2013-04-02 Thread Misa Simic
Hi,

We have solved that problem on the way to function always returns text, but
text was actually formated json... We have used plv8 before 9.2 to actually
execute dynamic SQL and return result...
However, I think some kind of dynamic record type would be very usefull...
(Maybe just record but without need to say AS...)

Though have no idea how hard it would be to implement something like
that... And how really it would be usefull widely...

We are avoid much as possible client code - and trying to make it tiny as
possible

All business logic is inside DB... In case we havent found json solution...

We would need to write client code related to business logic to actually
be able to say, in this concrete case SELECT * FROM function(parameters) AS
(expected result)

Or as Stephen described he would need to read BLOB outside DB and there
apply the logic - instead of in DB...

Kind Regards,

Misa

On Tuesday, April 2, 2013, Tom Lane wrote:

 Stephen Scheck singularsyn...@gmail.com javascript:; writes:
  I'm trying to write some user-defined functions in C which operate on a
  large object (so their common first argument will be an OID referencing
 an
  object in the pg_largeobject catalog table created with lo_create()) and
  return either a single row or a set depending on the function. Depending
 on
  the contents of the BLOB, some of the functions have a need to return
  polymorphic number column(s) as part of their result row (i.e. it could
 be
  an integer, real or double depending on the input BLOB).

  I've tried various approaches for this but none of them quite work the
 way
  I need and I'm wondering if I'm missing a fundamental bit of
 understanding
  of Postgres' type system or it simply doesn't support what I want to do.

 It doesn't.  Type analysis happens at parse time, not at run time, so
 you cannot expect a query variable's data type to be determined by the
 contents of some data value not seen until runtime.

 The only way I can see to get this to work is a hack similar to common
 usage of dblink: you declare the function as returning RECORD or SETOF
 RECORD, and then the calling query has to specify an AS clause that
 shows what column type(s) it's expecting to get back on this particular
 call.  That works, sorta, for dblink usages where you're writing
 SELECT ... FROM dblink('some particular SQL command') AS ...
 and so you know what you're expecting to get from the remote SQL
 command.  But it's certainly ugly, and you haven't said enough about
 your use-case to tell if this is workable for you or not.

 If you're only worried about numbers, is it really so critical to
 preserve the datatype?  You could coerce 'em all to numeric to dodge the
 problem, albeit at some loss of efficiency.

 Another thought here is that if you don't try to expand a record value,
 the need for the parser to know its column types goes away; that is,
 if you just write

 SELECT function_returning_record(...) FROM ...

 and not

 SELECT (function_returning_record(...)).* FROM ...

 I think that the run-time-blessed-record-type hack will work okay.
 Of course that greatly limits what you can do with the result in SQL,
 but if you just need to ship it to a client it might be all right.

 regards, tom lane


 --
 Sent via pgsql-general mailing list 
 (pgsql-general@postgresql.orgjavascript:;
 )
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Money casting too liberal?

2013-04-02 Thread Gavin Flower

On 03/04/13 07:16, John R Pierce wrote:

On 4/2/2013 12:50 AM, Gavin Flower wrote:
In the bad old days when I was a COBOL programmer we always stored 
money in the COBOL equivalent of an integer (numeric without a 
fractional part) to avoid round off, but we displayed with a decimal 
point to digits to the left.  So storing as an integer (actually 
bigint would be required) is a good idea, with parameters to say how 
many effective digits in the fractional part, and how many fractional 
digits to display etc. - as you said.


COBOL Numeric was BCD.   same as NUMERIC in SQL (yes, I know 
postgresql internally uses a base 1 notation for this, storing it 
as an array of short ints, but effectively its equivalent to BCD).






--
john r pierce  37N 122W
somewhere on the middle of the left coast

It was many years ago! :-)



Re: [GENERAL] Using varchar primary keys.

2013-04-02 Thread Merlin Moncure
On Tue, Apr 2, 2013 at 2:01 PM, Samantha Atkins sjatk...@me.com wrote:
 Natural keys are in user data space.  Thus they are not guaranteed invariant 
 and therefore cannot serve as persistent identity.

That is true, but irrelevant in most real world cases.  Also, nothing
is keeping you from using an extra marker if/when you need to provide
an invariant lookup.

 Also natural keys have the considerable defect of being of different type and 
 arity per logical entity type.   This means that very nice things like 
 dynamic relationships and persistent collections of references to multiple 
 types of things cannot be built.  It also increases ORM  and data cache 
 complexity.

OO evangelism.

 These are considerable weaknesses.

 You cannot guess a uuid from say uuid.uuid4().  If you think you can I look 
 forward to seeing your proof.

I never claimed that.  I said that postgresql random() can be guessed,
which it can, since it's based on lrand48.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] create temporary table problem

2013-04-02 Thread Jerry Sievers
JPrasanna Venkatesan prasanna1...@gmail.com writes:

 Dear All,

 I am usnig PGSQL 9.0

 When I tried to create a temporary table it is throwing the following error

 ERROR:? could not create directory pg_tblspc/16385/PG_9.0_201008051/20304: 
 No such file or directory

You forgot to migrate your tablespaces when you did the upgrade and
temp_tablespaces is set to an invalid one. 

 Whole snippet is like this

 ctfdb= create temporary table assoc ( origin varchar(32), target 
 varchar(255), type varchar(128), descr text,
 generation int, relid varchar(32) );
 ERROR:? could not create directory pg_tblspc/16385/PG_9.0_201008051/20304: 
 No such file or directory

 The same query work in PGSQL 8.0

 Please guide me.

 Regards,
 J Prasanna Venkatesan


-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] corrupted item pointer in streaming based replication

2013-04-02 Thread Jigar Shah
Hi,

Few days ago we started getting the below message and postgres on our 
server(streaming replication secondary) would not startup. I am wondering what 
are our options at this point. Can we do something to fix this?

2013-03-27 11:00:47.281 PDT LOG:  recovery restart point at 161A/17108AA8
2013-03-27 11:00:47.281 PDT DETAIL:  last completed transaction was at log time 
2013-03-27 11:00:47.241236-07
2013-03-27 11:00:47.520 PDT LOG:  restartpoint starting: xlog

2013-03-27 11:07:51.348 PDT FATAL:  corrupted item pointer: offset = 0, size = 0
2013-03-27 11:07:51.348 PDT CONTEXT:  xlog redo split_l: rel 1663/16384/115085 
left 4256959, right 5861610, next 5044459, level 0, firstright 192
2013-03-27 11:07:51.716 PDT LOG:  startup process (PID 5959) exited with exit 
code 1
2013-03-27 11:07:51.716 PDT LOG:  terminating any other active server processes

Before the secondaries crashed we also noticed the below error message in 
postgres logs on the primary few days ago.

[d: u:postgres p:2498 7] ERROR: could not access status of transaction 837550133
DETAIL: Could not open file pg_clog/031E: No such file or directory. 
[u:postgres p:2498 9]

[d: u:radio p:31917 242] ERROR: could not open file base/16384/114846.39 
(target block 360448000): No such file or directory [d: u:radio p:31917 243]

I'd like to know what can we do to fix this situation.

Thanks
Jigar



Re: [GENERAL] Using varchar primary keys.

2013-04-02 Thread Joe Van Dyk
On Tue, Apr 2, 2013 at 11:16 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Tue, Apr 2, 2013 at 10:34 AM, Joe Van Dyk j...@tanga.com wrote:
  On Mon, Apr 1, 2013 at 1:11 PM, Tim Uckun timuc...@gmail.com wrote:
 
 
 
 
  On Tue, Apr 2, 2013 at 8:35 AM, jesusthefrog jesusthef...@gmail.com
  wrote:
 
  On the topic of 'natural' versus 'synthetic' primary keys, I am
 generally
  in the camp that an extra ID field won't cost you too much, and while
 one
  may not need it for a simple table (i.e. id, name) one might add any
 number
  of columns later, and you'll be glad to have it.
 
 
  Nothing prevents you from adding more columns if you use varchar primary
  keys.
 
 
 
  My preferred method is to give every table an ID column of UUID type
 and
  generate a UUID using the uuid-ossp contrib module. This also prevents
  someone not familiar with the database design from using an ID
 somewhere
  they should not (as is possible with natural PKs) or treating the ID
 as an
  integer, not an identifier (as is all too common with serial integers).
 
 
 
  This would be a concern if you had multi master writes . As far as I
 know
  Postgres does not have a true multi master replication system so all the
  writes have to happen on one server right?
 
  As for UUIDs I use them sometimes but I tend to also use one serial
 column
  because when I am communicating with people it makes it so much easier
 to
  say dealer number X than dealer number SOME_HUGE_UUID.  I often
 have to
  talk to people about the data and UUIDs make it very difficult to
  communicate with humans.
 
 
  I've been wishing for a smaller uuid type for a while. Say you need to
  assign a Order #. Customers might be referencing the number, so you don't
  want it to be too long. But you don't want Order #'s to be guessable or
 have
  the characters/digits be transposed accidently.
 
  I've been using a unique text column with a default of
 random_characters(12)
 
  CREATE OR REPLACE FUNCTION public.random_characters(length integer)
   RETURNS text
   LANGUAGE sql
   STABLE
  AS $function$
  SELECT array_to_string(array((
SELECT SUBSTRING('abcdefghjkmnpqrstuvwxyz23456789'
  FROM mod((random()*31)::int, 31)+1 FOR 1)
FROM generate_series(1, $1))),'');
  $function$;
 
  This seems to work ok. I don't allow 1's or l's or 0's or O's, as they
 can
  easily be mistyped or misread.

 This is pseudo random and can be guessed, which is maybe dangerous
 depending on circumstance.  For stronger random stream go to
 pgcrypto.gen_random_bytes().  Also, now you have to worry about
 collisions -- the whole point of uuid is to try and keep you from
 having to deal with that.


Right, but it's better than using serial's as far as being guessable.

The probability for collisions are fairly low, if you are using 12 or more
characters (with 30 possible characters). Not sure what the math is on the
probability of collisions (birthday problem) though.. and you could have a
trigger that checked for the existence of a matching key before
inserts/updates.

And using UUIDs would be too long for lots of purposes where people are
working with the numbers, and where there might be external constraints on
how long the IDs can be.

An example use case:
https://www.tanga.com/deals/cd8f90c81a/oral-b-sensitive-clean-6-extra-soft-replacement-brush-heads

where cd8f90c81a is the 'uuid' for that product. It's non-guessable
enough, and I don't want to put a full UUID in the URL.





 My historical comments in this debate are noted.  To summarize, I
 strongly believe that natural keys are often (but not always) better.




Re: [GENERAL] in C trigger function find out if column is part of primary key

2013-04-02 Thread Kevin Grittner
Enke, Dr., Michael michael.e...@wincor-nixdorf.com wrote:
 I have to find out if a column ( i = 0, …, tupdesc-natts-1 ) is
 part of a primary key but for performance reasons I do not want
 to execute another query via SPI.  Is this possible?

You might get some inspiration from looking at this bit of code:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=contrib/tcn/tcn.c;h=eb7e1a61a6a1d5c5ed2e840af41410ac4a52418f;hb=master#l127

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Dynamic/polymorphic record/composite return types for C user-defined-functions

2013-04-02 Thread Stephen Scheck
On Tue, Apr 2, 2013 at 10:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 if you just write

 SELECT function_returning_record(...) FROM ...

 and not

 SELECT (function_returning_record(...)).* FROM ...

 I think that the run-time-blessed-record-type hack will work okay.
 Of course that greatly limits what you can do with the result in SQL,
 but if you just need to ship it to a client it might be all right.

 regards, tom lane


Indeed, one of the primary concerns is the ability to query against the
returned record(s):

SELECT * FROM (SELECT (info(lo_oid_col)).* FROM table_with_lo_oids) s1
WHERE
s1.some_property = some_value

I'm thinking a solution might be to create a generic anynumber type which
records its instant
type along with a slate of CREATE CASTs to go back and forth between base
numeric types. For example:

CREATE TYPE number; -- avoid calling it anynumber since it's not
really polymorphic in the postgres sense
CREATE FUNCTION some_func_returning_number(IN some_type some_param)
RETURNS number ...

-- foo knows what type it is and when its CAST function is called from
context, it calls
-- a built-in CAST function to go from its instance type to the
contextual type (integer here):
SELECT some_func_returning_number(foo) + 1::integer FROM bar;

Do you see any architectural/implementation pitfalls to such an approach?
It doesn't seem like it would entail
a great deal of additional code.

Thanks,
-Steve


[GENERAL] How can I perform client-only installation from source code on Windows?

2013-04-02 Thread MauMau

Hello,

Is there any way to perform client-only installation from source code on 
Windows?  On UNIX/Linux, client-only installation is described here:



http://www.postgresql.org/docs/current/static/install-procedure.html

[Excerpt]
Client-only installation:
If you want to install only the client applications and interface libraries, 
then you can use these commands:

gmake -C src/bin install
gmake -C src/include install
gmake -C src/interfaces install
gmake -C doc install



Any information is appreciated.

Regards
MauMau



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] PostgreSQL: CPU utilization creeping to 100%

2013-04-02 Thread David Noel
I'm running into a strange issue whereby my postgres processes are
slowly creeping to 100% CPU utilization. I'm running
postgresql-server-9.2.3, FreeBSD 8.3-RELEASE-p6, and using the
postgresql-9.2-1002.jdbc4 driver.

I'm not sure what information here is relevant, so I'll give
everything I can as concisely as I can.

The hardware is a Dell PowerEdge 1420, dual Xeon Nocona's, 3.2ghz,
16gb ram. The disks are 4 Kingston HyperX SATA3's attached to a
HighPoint RocketRAID 2721 controller, ZFS, RAID10.

There are 2 databases, one a queue, and one containing tables storing
crawled data.

The application is a webcrawler. The application pulls URLs from the
queue and marks them active in a single transaction. It then feeds the
URLs to the crawler threads who crawl the URL, populate the tables
with data, and signal the main thread to update the queue database,
marking the item as inactive and rescheduling it with a new
NextCrawlDate.

The processes that hang are the postgres processes that interact with
the queue database.

'select * from pg_stat_activity' shows that the queries are not
waiting, and are in the idle state.

Essentially, the query in question is responsible for returning 1 URL
from the union of the list of URL's whose crawl session has timed out
and the list of URL's next in line to be crawled according to the
schedule (NextCrawlDate). The query is: select * from ((select * from
crawlq where Active = 'true' AND TimeoutDate = now()) UNION
(select * from crawlq where Active = 'false')) as RS order by
NextCrawlDate asc limit 1

Beyond this I don't know what useful debugging information to include.
I'll take a guess and start with some vmstat output.

Under normal conditions (with the crawler running) vmstat shows the following:

 procs  memory  pagedisks faults cpu
 r b w avmfre   flt  re  pi  pofr  sr da0 da1   in   sy
cs us sy id
 2 0 0119G  8450M  1143   0   1   0   900   0   0   0  128 6700
8632 32  4 65
 2 0 0119G  8444M  1937   0   0   0   100   0   4   4  280 112206
7683 36  5 59
 1 0 0119G  8443M   427   0   0   0  1377   0  90  90  222 115889
9020 35  7 58
 1 0 0119G  8442M  1798   0   0   018   0   1   1  153 104954
7071 30  4 66
 3 0 0119G  8443M   528   0   0   0   681   0  10  10  293 125170
14523 40 15 46
 1 0 0119G  8432M 15227   0   0   4  2850   0   3   3  205 83830
6663 55 12 33
 8 0 0119G  8433M  3306   0   0   0   445   0   2   2  249 113792
7068 34  5 61
10 0 0118G  8461M  1190   0   0   0  9909   0  72  73  451 120263
10036 56  9 35
14 0 0118G  8456M  5887   0   0   0  1202   0   2   2  272 130795
9506 44 12 44
 9 0 0118G  8444M  7750   0   0   0  1070   0   9   9  298 87643
9584 80 13  7
 3 0 0118G  8442M  1335   0   0   0   648   0   5   5  189 143691
9234 36  6 58
 1 0 0118G  8442M   689   0   1   1   472   0   2   2  206 153868
8635 32  7 61
 1 0 0118G  8441M   203   0   0   0  1124   0  75  75  191 142598
8909 31 10 60
 2 0 0118G  8440M  9508   0   0   0   684   0   8   8  231 132785
10247 47 13 41
 4 0 0118G  8456M  4046   0   0   0  5469   0  11  11  299 143119
12475 54 22 24
 4 0 0117G  8490M  1076   0   0   0  9858   0  16  16  291 140701
14849 58 25 17
 1 0 0116G  8524M   344   0   0   0  8936   0   4   4  234 149103
12137 45 15 40
 2 0 0114G  8586M   715   0   0   5 17719   0  73  75  322 151002
11430 34 10 56
 5 0 0112G  8648M  2773   0   0   0 16997   0   6   6  225 118339
8700 30 10 61
 1 0 0110G  8705M  4429   0   0   0 15763   0   7   7  423 139590
10354 40 11 49
 1 0 0108G  8760M  1443   0   0   0 14519   0   7   7  405 139806
10214 37  5 58
 1 0 0104G  8863M   333   0   0   0 26537   0   5   5  284 107770
9947 34  6 60
 1 0 0104G  8859M  1331   0   0   0  1700   0 114 114  464 103248
12113 40  9 51
 1 0 0104G  8854M  1708   0   0   0   272   0   6   6  279 99817
9470 40  5 55
 9 0 0104G  8850M  3653   0   0   0  4809   0  28  28  346 160041
54071 42 32 26
12 3 0105G  8845M 20576   0   0   0 18344   0   7   7  383 95019
32533 46 53  1
20 0 0114G  8721M 46913   0   0   0  2941   0  11  11  461 77480
9794 72 28  0
12 1 0110G  8759M 25109   0   0   0 35881   0  70  70  413 72631
10161 76 24  0
 2 0 0110G  8716M 12993   0   1   1   265   0   8   8  292 83085
10073 61 30  9
 3 0 0110G  8716M  2144   0   0   045   0   3   3  183 100994
7410 39 20 41


...and when postgres goes bonkers:

 procs  memory  pagedisks faults cpu
 r b w avmfre   flt  re  pi  pofr  sr da0 da1   in   sy
cs us sy id
 2 0 0 98G10G  1091   0   1   0   849   0   0   0  114 2641
8582 30  4 66
 2 0 0 98G10G20   0   0   0 0   0   0   0  197 20500
10454 46  2 53
 2 0 0 98G10G59   0   0   0 0   0   0   0  284 23715
11180 46  3 51
 3 0 0 98G10G17   0   0   0   652   0  69  70  288 21968
11571 46  4 50
 2 0 0 98G10G

Re: [GENERAL] PostgreSQL: CPU utilization creeping to 100%

2013-04-02 Thread Ian Lawrence Barwick
2013/4/3 David Noel david.i.n...@gmail.com:
 I'm running into a strange issue whereby my postgres processes are
 slowly creeping to 100% CPU utilization. I'm running
 postgresql-server-9.2.3, FreeBSD 8.3-RELEASE-p6, and using the
 postgresql-9.2-1002.jdbc4 driver.

(...)
 postgresql.conf, all standard/default except for:
 max_connections = 256

It's very likely the default settings are woefully inadequate for your
server; some basic
tuning (especially the value of shared_buffers and other
memory-related parameters)
should help.

 Any thoughts? What other information can I provide?

Sample EXPLAIN/EXPLAIN ANALYZE output from the query in question, and
if possible relevant table definitions etc. would certainly be useful.

Regards

Ian Barwick


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL: CPU utilization creeping to 100%

2013-04-02 Thread David Noel
What's strange is that the crawler will run just fine for up to
several hours. At some point though the CPU utilization slowly begins
to creep higher. Eventually everything locks and the program hangs.
'top' shows the processes connected to the queue database at or near
%100, and the program ceases output (I have debugging messages built
in to show current activity [timestamp] : crawling [URL]). At some
point--anywhere from 30 minutes to several hours later--CPU
utilization drops to normal and the program resumes operation as if
everything were fine. This goes on for up to several hours, then the
utilization issue repeats. So it's a very odd issue I've run into.

On 4/2/13, David Noel david.i.n...@gmail.com wrote:
 I'm running into a strange issue whereby my postgres processes are
 slowly creeping to 100% CPU utilization. I'm running
 postgresql-server-9.2.3, FreeBSD 8.3-RELEASE-p6, and using the
 postgresql-9.2-1002.jdbc4 driver.

 I'm not sure what information here is relevant, so I'll give
 everything I can as concisely as I can.

 The hardware is a Dell PowerEdge 1420, dual Xeon Nocona's, 3.2ghz,
 16gb ram. The disks are 4 Kingston HyperX SATA3's attached to a
 HighPoint RocketRAID 2721 controller, ZFS, RAID10.

 There are 2 databases, one a queue, and one containing tables storing
 crawled data.

 The application is a webcrawler. The application pulls URLs from the
 queue and marks them active in a single transaction. It then feeds the
 URLs to the crawler threads who crawl the URL, populate the tables
 with data, and signal the main thread to update the queue database,
 marking the item as inactive and rescheduling it with a new
 NextCrawlDate.

 The processes that hang are the postgres processes that interact with
 the queue database.

 'select * from pg_stat_activity' shows that the queries are not
 waiting, and are in the idle state.

 Essentially, the query in question is responsible for returning 1 URL
 from the union of the list of URL's whose crawl session has timed out
 and the list of URL's next in line to be crawled according to the
 schedule (NextCrawlDate). The query is: select * from ((select * from
 crawlq where Active = 'true' AND TimeoutDate = now()) UNION
 (select * from crawlq where Active = 'false')) as RS order by
 NextCrawlDate asc limit 1

 Beyond this I don't know what useful debugging information to include.
 I'll take a guess and start with some vmstat output.

 Under normal conditions (with the crawler running) vmstat shows the
 following:

  procs  memory  pagedisks faults
 cpu
  r b w avmfre   flt  re  pi  pofr  sr da0 da1   in   sy
 cs us sy id
  2 0 0119G  8450M  1143   0   1   0   900   0   0   0  128 6700
 8632 32  4 65
  2 0 0119G  8444M  1937   0   0   0   100   0   4   4  280 112206
 7683 36  5 59
  1 0 0119G  8443M   427   0   0   0  1377   0  90  90  222 115889
 9020 35  7 58
  1 0 0119G  8442M  1798   0   0   018   0   1   1  153 104954
 7071 30  4 66
  3 0 0119G  8443M   528   0   0   0   681   0  10  10  293 125170
 14523 40 15 46
  1 0 0119G  8432M 15227   0   0   4  2850   0   3   3  205 83830
 6663 55 12 33
  8 0 0119G  8433M  3306   0   0   0   445   0   2   2  249 113792
 7068 34  5 61
 10 0 0118G  8461M  1190   0   0   0  9909   0  72  73  451 120263
 10036 56  9 35
 14 0 0118G  8456M  5887   0   0   0  1202   0   2   2  272 130795
 9506 44 12 44
  9 0 0118G  8444M  7750   0   0   0  1070   0   9   9  298 87643
 9584 80 13  7
  3 0 0118G  8442M  1335   0   0   0   648   0   5   5  189 143691
 9234 36  6 58
  1 0 0118G  8442M   689   0   1   1   472   0   2   2  206 153868
 8635 32  7 61
  1 0 0118G  8441M   203   0   0   0  1124   0  75  75  191 142598
 8909 31 10 60
  2 0 0118G  8440M  9508   0   0   0   684   0   8   8  231 132785
 10247 47 13 41
  4 0 0118G  8456M  4046   0   0   0  5469   0  11  11  299 143119
 12475 54 22 24
  4 0 0117G  8490M  1076   0   0   0  9858   0  16  16  291 140701
 14849 58 25 17
  1 0 0116G  8524M   344   0   0   0  8936   0   4   4  234 149103
 12137 45 15 40
  2 0 0114G  8586M   715   0   0   5 17719   0  73  75  322 151002
 11430 34 10 56
  5 0 0112G  8648M  2773   0   0   0 16997   0   6   6  225 118339
 8700 30 10 61
  1 0 0110G  8705M  4429   0   0   0 15763   0   7   7  423 139590
 10354 40 11 49
  1 0 0108G  8760M  1443   0   0   0 14519   0   7   7  405 139806
 10214 37  5 58
  1 0 0104G  8863M   333   0   0   0 26537   0   5   5  284 107770
 9947 34  6 60
  1 0 0104G  8859M  1331   0   0   0  1700   0 114 114  464 103248
 12113 40  9 51
  1 0 0104G  8854M  1708   0   0   0   272   0   6   6  279 99817
 9470 40  5 55
  9 0 0104G  8850M  3653   0   0   0  4809   0  28  28  346 160041
 54071 42 32 26
 12 3 0105G  8845M 20576   0   0   0 18344   0   7   7  383 95019
 32533 46 53  1
 20 0 0114G  8721M 46913   0   0   0  2941   0  11  11  461 77480
 9794 72 28  0
 

Re: [GENERAL] PostgreSQL: CPU utilization creeping to 100%

2013-04-02 Thread Kevin Grittner
David Noel david.i.n...@gmail.com wrote:

 'select * from pg_stat_activity' shows that the queries are not
 waiting, and are in the idle state.

The process is idle or the process is running the query?  If the
latter, what do you mean when you say the queries ... are in the
idle state?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL: CPU utilization creeping to 100%

2013-04-02 Thread John R Pierce

On 4/2/2013 3:35 PM, David Noel wrote:

The hardware is a Dell PowerEdge 1420, dual Xeon Nocona's, 3.2ghz,
16gb ram. The disks are 4 Kingston HyperX SATA3's attached to a
HighPoint RocketRAID 2721 controller, ZFS, RAID10.

.

postgresql.conf, all standard/default except for:
max_connections = 256


A) use a connection pool so you don't NEED 256 active database connections.

B) shared_buffers, work_mem, and maintenance_work_mem all need to be 
tuned.   I'd suggest 4gb, 16mb, 1gb respectively as a starting point on 
a 16GB ram system.   if you can, shrink your max_connections by using a 
connection pooler (my target is generally no more than 2-4 active 
queries per CPU core or hardware thread). Ouch, Xeon Nocona was a 
single core, dual thread CPU, with rather poor performance, essentially 
just a Pentium-4...  3Ghz on a P4 is like 2Ghz on other CPUs.


when you said raid10, do you mean zfs mirrored, or are you doing 
hardware raid10 in the Highpoint?   I would have configured the raid 
card for JBOD, and done ZFS mirroring in the OS, so you can take 
advantage of ZFS's data integrity features.  Those are consumer 
grade SSD's, are they even qualified for use with that Highpoint 
controller ?



--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL: CPU utilization creeping to 100%

2013-04-02 Thread David Noel
On 4/2/13, Ian Lawrence Barwick barw...@gmail.com wrote:
 2013/4/3 David Noel david.i.n...@gmail.com:
 I'm running into a strange issue whereby my postgres processes are
 slowly creeping to 100% CPU utilization. I'm running
 postgresql-server-9.2.3, FreeBSD 8.3-RELEASE-p6, and using the
 postgresql-9.2-1002.jdbc4 driver.

 (...)
 postgresql.conf, all standard/default except for:
 max_connections = 256

 It's very likely the default settings are woefully inadequate for your
 server; some basic
 tuning (especially the value of shared_buffers and other
 memory-related parameters)
 should help.

 Any thoughts? What other information can I provide?

 Sample EXPLAIN/EXPLAIN ANALYZE output from the query in question, and
 if possible relevant table definitions etc. would certainly be useful.

 Regards

 Ian Barwick

Thanks for the feedback. I'll look into pg tunings. Hopefully the
problem's there somewhere.

explain analyze select * from ((select * from crawlq where Active
= 'true' AND TimeoutDate = now()) UNION (select * from crawlq
where Active = 'false')) as RS order by NextCrawlDate asc limit 1

Limit  (cost=4092.39..4092.39 rows=1 width=203) (actual
time=23.447..23.450 rows=1 loops=1)
  -  Sort  (cost=4092.39..4096.34 rows=1583 width=203) (actual
time=23.442..23.442 rows=1 loops=1)
Sort Key: public.crawlq.NextCrawlDate
Sort Method: top-N heapsort  Memory: 25kB
-  HashAggregate  (cost=4052.81..4068.64 rows=1583
width=236) (actual time=18.195..20.486 rows=877 loops=1)
  -  Append  (cost=0.00..3997.41 rows=1583 width=236)
(actual time=0.015..13.423 rows=877 loops=1)
-  Seq Scan on crawlq  (cost=0.00..1995.14
rows=18 width=236) (actual time=0.011..3.397 rows=49 loops=1)
  Filter: (Active AND (TimeoutDate = now()))
  Rows Removed by Filter: 828
-  Seq Scan on crawlq  (cost=0.00..1986.43
rows=1565 width=236) (actual time=0.013..7.152 rows=828 loops=1)
  Filter: (NOT Active)
  Rows Removed by Filter: 49
Total runtime: 23.633 ms

Relevant rows from table crawlq:

CREATE TABLE crawlq
(
  URL text NOT NULL,
  LastCrawlDate timestamp with time zone DEFAULT now(),
  NextCrawlDate timestamp with time zone,
  Active boolean DEFAULT false,
  TimeoutDate timestamp with time zone,
  CONSTRAINT crawlq_pkey PRIMARY KEY (URL)
)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL: CPU utilization creeping to 100%

2013-04-02 Thread David Noel
On 4/2/13, Kevin Grittner kgri...@ymail.com wrote:
 David Noel david.i.n...@gmail.com wrote:

 'select * from pg_stat_activity' shows that the queries are not
 waiting, and are in the idle state.

 The process is idle or the process is running the query?  If the
 latter, what do you mean when you say the queries ... are in the
 idle state?

select * from pg_stat_activity returns a table containing a column
labeled state. When the postgres process is at 100% utilization and
the application has hung, this query returns the value idle in that
field. When things are running properly, as they are for the moment
now, the value is active.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] PostgreSQL: CPU utilization creeping to 100%

2013-04-02 Thread David Noel
On 4/2/13, John R Pierce pie...@hogranch.com wrote:
 On 4/2/2013 3:35 PM, David Noel wrote:
 The hardware is a Dell PowerEdge 1420, dual Xeon Nocona's, 3.2ghz,
 16gb ram. The disks are 4 Kingston HyperX SATA3's attached to a
 HighPoint RocketRAID 2721 controller, ZFS, RAID10.
 .
 postgresql.conf, all standard/default except for:
 max_connections = 256

 A) use a connection pool so you don't NEED 256 active database connections.

 B) shared_buffers, work_mem, and maintenance_work_mem all need to be
 tuned.   I'd suggest 4gb, 16mb, 1gb respectively as a starting point on
 a 16GB ram system.   if you can, shrink your max_connections by using a
 connection pooler (my target is generally no more than 2-4 active
 queries per CPU core or hardware thread).

Great, thanks. I'll get those tunables modified and see if that
smooths things out.

 Ouch, Xeon Nocona was a
 single core, dual thread CPU, with rather poor performance, essentially
 just a Pentium-4...  3Ghz on a P4 is like 2Ghz on other CPUs.

I won't tell them you said that. Feelings might get hurt.

 when you said raid10, do you mean zfs mirrored, or are you doing
 hardware raid10 in the Highpoint?   I would have configured the raid
 card for JBOD, and done ZFS mirroring in the OS, so you can take
 advantage of ZFS's data integrity features.

RAID10 under ZFS. Yes, JBOD. ZFS is neat!

 Those are consumer grade SSD's, are they even qualified for use
 with that Highpoint controller?

Consumer grade SSD's, indeed. They've held together so far though.
Fingers crossed.

Thanks again,

-David


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using varchar primary keys.

2013-04-02 Thread Julian
On 03/04/13 06:37, Merlin Moncure wrote:
 On Tue, Apr 2, 2013 at 2:01 PM, Samantha Atkins sjatk...@me.com wrote:
 Natural keys are in user data space.  Thus they are not guaranteed invariant 
 and therefore cannot serve as persistent identity.

Can't find Samantha's original post. I agree but done right they can be
(persistent and unique)

 That is true, but irrelevant in most real world cases.  Also, nothing
 is keeping you from using an extra marker if/when you need to provide
 an invariant lookup.
 
 Also natural keys have the considerable defect of being of different type 
 and arity per logical entity type.   This means that very nice things like 
 dynamic relationships and persistent collections of references to multiple 
 types of things cannot be built.  It also increases ORM  and data cache 
 complexity.
 
 OO evangelism.
 
 These are considerable weaknesses.

The Google cache is filled with foo vs bar
i.e. natural vs surrogate and its amazing what you get, even surrogate
key under wikipedia comes with a its sources remain unclear because it
lacks inline citations disclaimer.
I consider it pretty much a non debate.

 You cannot guess a uuid from say uuid.uuid4().  If you think you can I look 
 forward to seeing your proof.
 
 I never claimed that.  I said that postgresql random() can be guessed,
 which it can, since it's based on lrand48.
 
 merlin
 
 
Trying to get back on topic with the original post.

I have the iso (?) country code table, I have no problem with the PK
being the char(2) country code.
These natural codes/keys are thought out, debated and eventually decided
by other people.
But I also don't have a problem with adding an integer (serial) column
and making that my PK, although that PK might be pretty useless to the
rest of the world.

So thats that, having to really think it out is probably a good sign
that you should stick to a surrogate unless you are really sure.
(again I don't advocate ON UPDATE CASCADE as a solution should you
change your mind)

As to the whole natural vs surrogate/synthetic key debate, as I
mentioned in an earlier post I use them both. The question is when is it
worthy of a PK.

1)cust_id=123 (surrogate: PK)
vs
2)cust_id=1231 (natural: checkbit such as barcode data etc)
vs
3)cust_id=natural: uuencoded binary that spits out 123 after being
decoded by the app

For me, 2) is more likely to be a PK than 3), but it is entirely
possible that neither would be a PK.

Global/Universal unique identifiers, useful with replication and
database partitioning (in my instance)

1)cust_id=uuid

vs

2)cust_id=shard_id-something_extra-cust_id

1) will work, but 128bits is alot of random data that could be useful to
the app.
2) cust_id is not as universally unique, but if that was ever a problem
I could also wrap that in a encoded binary with a small change to the
app and no change to the database now it resembles something truly random.

The difference is 2) is more useful and contains routing data.
These are all natural and exposed to the world. But the question still
remains are they worthy of being a PK within the database of origin?
So far the answer from me is doesn't have to be but everyone else
might think it is, they may even make it a PK.

Regards,
Jules.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Using varchar primary keys.

2013-04-02 Thread Samantha Atkins
Natural keys are in user data space.  Thus they are not guaranteed invariant 
and therefore cannot serve as persistent identity.

Also natural keys have the considerable defect of being of different type and 
arity per logical entity type.   This means that very nice things like dynamic 
relationships and persistent collections of references to multiple types of 
things cannot be built.  It also increases ORM  and data cache complexity. 

These are considerable weaknesses. 

You cannot guess a uuid from say uuid.uuid4().  If you think you can I look 
forward to seeing your proof.

- samantha


On Apr 2, 2013, at 11:16 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Tue, Apr 2, 2013 at 10:34 AM, Joe Van Dyk j...@tanga.com wrote:
 On Mon, Apr 1, 2013 at 1:11 PM, Tim Uckun timuc...@gmail.com wrote:
 
 
 
 
 On Tue, Apr 2, 2013 at 8:35 AM, jesusthefrog jesusthef...@gmail.com
 wrote:
 
 On the topic of 'natural' versus 'synthetic' primary keys, I am generally
 in the camp that an extra ID field won't cost you too much, and while one
 may not need it for a simple table (i.e. id, name) one might add any number
 of columns later, and you'll be glad to have it.
 
 
 Nothing prevents you from adding more columns if you use varchar primary
 keys.
 
 
 
 My preferred method is to give every table an ID column of UUID type and
 generate a UUID using the uuid-ossp contrib module. This also prevents
 someone not familiar with the database design from using an ID somewhere
 they should not (as is possible with natural PKs) or treating the ID as an
 integer, not an identifier (as is all too common with serial integers).
 
 
 
 This would be a concern if you had multi master writes . As far as I know
 Postgres does not have a true multi master replication system so all the
 writes have to happen on one server right?
 
 As for UUIDs I use them sometimes but I tend to also use one serial column
 because when I am communicating with people it makes it so much easier to
 say dealer number X than dealer number SOME_HUGE_UUID.  I often have to
 talk to people about the data and UUIDs make it very difficult to
 communicate with humans.
 
 
 I've been wishing for a smaller uuid type for a while. Say you need to
 assign a Order #. Customers might be referencing the number, so you don't
 want it to be too long. But you don't want Order #'s to be guessable or have
 the characters/digits be transposed accidently.
 
 I've been using a unique text column with a default of random_characters(12)
 
 CREATE OR REPLACE FUNCTION public.random_characters(length integer)
 RETURNS text
 LANGUAGE sql
 STABLE
 AS $function$
 SELECT array_to_string(array((
  SELECT SUBSTRING('abcdefghjkmnpqrstuvwxyz23456789'
FROM mod((random()*31)::int, 31)+1 FOR 1)
  FROM generate_series(1, $1))),'');
 $function$;
 
 This seems to work ok. I don't allow 1's or l's or 0's or O's, as they can
 easily be mistyped or misread.
 
 This is pseudo random and can be guessed, which is maybe dangerous
 depending on circumstance.  For stronger random stream go to
 pgcrypto.gen_random_bytes().  Also, now you have to worry about
 collisions -- the whole point of uuid is to try and keep you from
 having to deal with that.
 
 My historical comments in this debate are noted.  To summarize, I
 strongly believe that natural keys are often (but not always) better.
 
 merlin
 
 
 -- 
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general