Re: [GENERAL] Need help for import of text file

2012-12-16 Thread Peter Bex
On Sun, Dec 16, 2012 at 06:30:24PM -0500, Steve Clark wrote:
> why not use the squeeze option of tr.
> 
> tr -s " "

I wasn't aware of that one, it's even simpler and more elegant.
Thanks!

For this particular case, tr(1) won't do for the same reason
the simple sed(1) expression I gave won't do: it should only
be applied to the data, not the extra 4 lines of meta-data.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Authenticate with hash instead of plaintext password?

2012-12-16 Thread Peter Bex
On Sun, Dec 16, 2012 at 11:17:25AM -0800, Adrian Klaver wrote:
> On 12/16/2012 11:07 AM, Peter Bex wrote:
> > I don't know how Postgres stores its passwords internally or how
> > its authentication works exactly.  Maybe one of the developers
> > can shine a light on this.
> 
> http://www.postgresql.org/docs/9.2/static/encryption-options.html

Thanks for the link; must've missed it somehow.  I now also see
the topic of improving the password hashing has been discussed before.
For others reading along, I've found these two threads from this year:
http://archives.postgresql.org/pgsql-general/2012-02/msg00334.php
http://archives.postgresql.org/pgsql-hackers/2012-10/msg00462.php

Like the poster in the first URL says, password hashing is *not*
encryption.  It's advisable not to refer to it as such - this
will only help increase the widespread confusion about the subject.

A good treatise on password hashing after the explosion of blog
posts about "rainbow tables" is this one:
http://chargen.matasano.com/chargen/2007/9/7/enough-with-the-rainbow-tables-what-you-need-to-know-about-s.html
Note how it deprecates (salted) MD5 as insecure because they can
be brute-forced too easily/quickly.

I understand that it would take a lot for a database to be compromised
and that the used passwords generally aren't going to be used on other
sites.  On the other hand, there's no guarantee of either, and it's
not *that* complicated to improve the way passwords are stored.
IMO, the simplest way to implement a good hashing system that can be
easily upgraded in a backwards-compatible way when the need arises
is to use modular Unix crypt().  Modern libc implementations provide
a variety of decent ways of storing password hashes.

I've collected some more information about UNIX crypt() in the
documentation for a library I wrote for Chicken Scheme:
http://wiki.call-cc.org/eggref/4/crypt
All the fallback crypt() implementations this library provides are
public domain, so they could be used in Postgres.  If public domain
is not acceptable for legal reasons, there are also plenty of
BSD-licensed implementations to be found elsewhere; bcrypt was first
implemented by OpenBSD.

I could try my hand at providing a patch to switch to, say, bcrypt,
but I'm pretty unfamiliar with the PostgreSQL source code.  If
nobody else is interested in working on it I can give it a try
during the holidays.

I'm not sure how to deal with the md5 authentication method.
There is a good point in the -hackers thread above that eavesdroppers
are probably able to hijack existing connections, but there's no reason
to take any risks.

One solution would be to point out in the manual that it's not secure
and advise people to use SSL.  Another would be to implement something
like SCRAM, as pointed out in the -hackers thread I posted above or
some other challenge-response system.  However, this could be done
separately, either before or after the password storage itself has
been improved.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Authenticate with hash instead of plaintext password?

2012-12-16 Thread Peter Bex
On Sun, Dec 16, 2012 at 07:42:40PM +0100, Murray Cumming wrote:
> On Sun, 2012-12-16 at 17:51 +0100, Peter Bex wrote:
> > Hashes (if properly salted and
> > stretched) are only useful if they are only ever checked against the
> > password itself.  Storing a hash of any kind and comparing that directly
> > with user input is equivalent to storing the password and comparing that
> > with user input.
> 
> So PostgresQL, for instance, stores the actual plaintext password (or an
> encrypted, but not hashed) password? And compares that with the hash
> that it receives from libpq.

Hm, that's a good point, I hadn't considered that.  I don't know how
Postgres stores its passwords internally or how its authentication works
exactly.  Maybe one of the developers can shine a light on this.

> [snip]
> > The best solution I can come up with is not provide a web UI at all
> > but let the user connect directly to the database using a secure
> > method (e.g. SSL client certs, GSSAPI etc).
> 
> That's not an option in this case. My system
> ( http://www.glom.org/wiki/index.php?title=Development/OnlineGlom )
> is meant to provide access to databases and I don't wish to implement all
> of it on the client side.)
> 
> I do have the option of creating a different set of user/password logins 
> for the web UI and then either
> - Using one username/password for all web users' databases, with no 
>   PostgreSQL-level separation. But this would have to be in a config file
>   at least. I guess this is what most web systems do, though they
>   generally deal with only one database.

I've been wondering about how to do this correctly.  I think I've asked
before on this list.  I think one way to do it is to create a "master"
user which can do nothing but use "set role" to switch to each account.

Then this user would have access to one table of its own which stores
the usernames and password hashes you have made yourself.  When a user
logs in, you can create a one-time random value (a "ticket") that you
store in a mappings table.  Then you can check whether the user really
is logged in before switching to their database role.

> - Generating PostgreSQL username/passwords for each web user's database,
>   but never exposing these to the web user. But I'd have to store them 
>   somewhere.

I think the GSSAPI might really be a good way to do it, if rather
difficult and labor-intensive.  You could just get a ticket and store
that in the cookie.  No need to do any double bookkeeping.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Authenticate with hash instead of plaintext password?

2012-12-16 Thread Peter Bex
On Sun, Dec 16, 2012 at 01:30:29PM -0500, Tom Lane wrote:
> Peter Bex  writes:
> > On Sun, Dec 16, 2012 at 12:51:08PM -0500, Tom Lane wrote:
> >> Right, they can break into *this account*.
> 
> > Not *just* this one, but any account on any service that uses this
> > same algorithm.
> 
> That's easily fixed.  I'd be inclined to make the "password" hash be a
> hash of the actual password plus the user's name plus some
> web-site-specific random salt string.

That could work, provided the hashing algorithm is one of the stronger
varieties (eg bcrypt, scrypt, pbkdf2).  If it's a simple hash, you
still run the risk of having the password recovered through one of the
methods pointed out in my other post.

However, if the hash is going to be stored as-is in postgres and
immediately used as the password, the user will also need to put the
hash in their web scripts (assuming it's an interface to manage shared
hosting accounts), and *if* the postgres port can be accessed directly
for client programs they'll need to use this hash there as well.
This may or may not be desirable.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need help for import of text file

2012-12-16 Thread Peter Bex
On Sun, Dec 16, 2012 at 06:48:35PM +0100, Andreas wrote:
> With sed as startingpoint I figured it out.
> Those 3 steps make the input files consumable for COPY
> 
> 1. dos2unix
> 2. sed -i 's/[ \t]*$//'
> 3. sed -i 's/  / /g'

You can reduce this to one invocation by separating the commands
by a semicolon (or by passing multiple -e flags)

sed -i 's/[ \t]*$//;s/  / /g'

> The input files get created by a simple windows batch where I can't 
> change anything.
> It uses echo to attach a line of 4 parameters to those textfiles.
> 
> How would you manage if one or more of those parameters contained blanks 
> in some cases?
> This doesn't appear, yet. But I consider this as luck.   :}
> 
> The real column formats are ( TEXT, TEXT, DATE, TIME ).

Well, that's a bit trickier and my sed skills are rather rusty.
I'd probably use awk for these more complex tasks:

awk '/\(.*\)/ { gsub(/ +/, " "); } { print $0 }'

The "gsub" command acts like sed's "s" command with the "g" modifier.
By prefixing the block with the gsub command with a regex, it only
acts on that regex.  The regex in this example only looks for an opening
and a closing paren anywhere on the line; you might need to tweak it
to more closely match your case.  Alternatively, you could implement
a counter that skips the four lines (which can be done with both sed
and awk).

If it gets more complex than this, you can always write a proper
program in a "real" language to do it.  This can be easier to maintain.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Authenticate with hash instead of plaintext password?

2012-12-16 Thread Peter Bex
On Sun, Dec 16, 2012 at 12:51:08PM -0500, Tom Lane wrote:
> Peter Bex  writes:
> > If they do break in and are able to retrieve the password hash, they
> > can still break in with that hash.
> 
> Right, they can break into *this account*.

Not *just* this one, but any account on any service that uses this
same algorithm.  Even if this is a completely custom algorithm
that no other service is going to use, there's still the risk that
when one of his servers is cracked, if a customer has multiple
accounts they are all compromised even if they are on servers that
weren't compromised. (of course, assuming they use the same password
- but that's a reality we'll have to face, hence his original
request)

Finally, if it's a naive implementation of calculating a single hash
run, the password can be easily retrieved.  Either by brute-force 
(with eg Hashcat or John the Ripper), or even common search engines.
See for example http://tools.benramsey.com/md5/

> But Murray is worrying about the all-too-common case where a user
> has used the same or similar password for multiple sites.  I think
> his goal of not having the original password stored anywhere is laudable.

Absolutely, I completely agree.  I wasn't trying to put down this
important goal.  There have been too many incidents of hacked password
databases.  This has to end.  That's why I'm so keen on trying to warn
against doing it in the way he originally proposed.

> Perhaps I'm missing the context here, but it seems unlikely that the
> users have any direct access to the database.  If they can only get to
> it through the website, then what about the idea of hashing the original
> text, and then using that hashed string as the database password?

See above; the password can be easily retrieved.

> IOW I don't see why this must be implemented inside libpq rather than
> in the website logic.

I'm not sure this is necessary either.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Authenticate with hash instead of plaintext password?

2012-12-16 Thread Peter Bex
On Sun, Dec 16, 2012 at 05:51:23PM +0100, Peter Bex wrote:
> The best solution I can come up with is not provide a web UI at all
> but let the user connect directly to the database using a secure
> method (e.g. SSL client certs, GSSAPI etc).

Speaking of which, a custom implementation of the GSSAPI might be
feasible.  I don't have experience with the API, but presumably you
can accept a password once, generate a GSSAPI ticket and store *that*
in the user's session file or cookie.  Any succesful attack will only
be able to get that one ticket.  If the server is compromised you can
revoke all currently active tickets.

Assuming you're using HTTPS and store cookies with the "secure"
attribute, this could be a safe way to do things.  It's probably not
easy or available out-of-the-box though!

I think the Postgres docs could use some improvement on how to
use GSSAPI; they're pretty terse!  If I understand the Kerberos
section correctly, it's even possible to avoid passwords altogether
by using mod_auth_kerb and a browser extension that allows talking
to this module.  That would be even better, but might not be acceptable
if you want to allow users to connect using vanilla webbrowsers.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Authenticate with hash instead of plaintext password?

2012-12-16 Thread Peter Bex
On Sun, Dec 16, 2012 at 05:38:37PM +0100, Murray Cumming wrote:
> On Sun, 2012-12-16 at 17:24 +0100, Peter Bex wrote:
> > What's the use of that?
> [snip]
> 
> I would not be storing the plaintext password anywhere. That makes it
> harder for someone get the plaintext password if they break into the
> server, and therefore harder for someone to use that password to break
> into another account if the user has used the same password.

If they do break in and are able to retrieve the password hash, they
can still break in with that hash.  Hashes (if properly salted and
stretched) are only useful if they are only ever checked against the
password itself.  Storing a hash of any kind and comparing that directly
with user input is equivalent to storing the password and comparing that
with user input.

> There have been plenty of high profile cases recently of password
> databases being stolen, with those passwords being in plaintext, or
> hashed without a salt, making user accounts on other systems vulnerable.
> I'd like to avoid making the same embarrassing mistake.

Please also avoid the mistake outlined above.

Unless I'm overlooking something, then if there's a way to directly
mediate between the browser client and the postgres server, you've
effectively created a man-in-the-middle.  This shouldn't be possible
with a truly secure authentication mechanism.

The best solution I can come up with is not provide a web UI at all
but let the user connect directly to the database using a secure
method (e.g. SSL client certs, GSSAPI etc).

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Authenticate with hash instead of plaintext password?

2012-12-16 Thread Peter Bex
On Sun, Dec 16, 2012 at 04:54:30PM +0100, Murray Cumming wrote:
> libpq lets me open a connection by specifying a password:
> http://www.postgresql.org/docs/9.2/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS
> 
> Is there any way to specify a hash of the password when connecting, instead 
> of 
> providing the password itself?

What's the use of that?  It won't buy you any extra protection if the
hash would be accepted as-is instead of the password.  In fact, now
you would have *two* strings (instead of one) that are accepted as
equally valid passwords.

> My Web UI asks the user for a PostgreSQL 
> username and password, and I want to avoid asking the user for the 
> password again later, as long as they have the browser cookie that I set.

Perhaps you can encrypt it in the cookie, to prevent casual onlookers
from discovering the password.  However, if anyone can obtain the cookie
(eg via the Firesheep plugin) they can still use that to login even if
they don't know the actual password - they can just reuse the encrypted
blob.

> I've looked at the source of phpPgAdmin, which should deal with the same 
> issue, 
> but that seems to store the plaintext password in the session, which might 
> even
> mean that the plaintext password ends up on disk, though I don't know enough 
> about 
> PHP to be sure.

The session is slightly safer than storing it directly in the cookie
because one would have to break into the server rather than access
the cookie store in the client.  However, if anyone can recover the
session id, they can connect with this user's credentials as well.

> I understand that libpq already sends only an MD5 hash to the 
> PostgreSQL server, when it's configured to use MD5 authentication. 

I would think that's challenge-response based, but I'm not sure.  The
documentation seems to hint that it's just the password, but hashed.

> But I don't want to have to provide a plaintext password to libpq.

Why not?  If the script lives on the server, there shouldn't be a way
for the user to recover the password even if it's put in the string.

Unfortunately, I'm not aware of any truly secure method of doing this.
Sorry!  Maybe someone else knows of a good approach.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need help for import of text file

2012-12-15 Thread Peter Bex
On Sat, Dec 15, 2012 at 10:16:55PM +0100, Peter Bex wrote:
> A simple sed(1) expression should do the trick:
> 
> sed -E 's/ +/ /g' old-file > new-file

I just remembered where I could check, and the GNU sed equivalent is:

sed -r 's/ +/ /g' old-file > new-file

Sorry for the confusion.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Need help for import of text file

2012-12-15 Thread Peter Bex
On Sat, Dec 15, 2012 at 08:06:44PM +0100, Andreas wrote:
> Hi,
> 
> Problem here is in the morning the first digit of the hour is shown as a 
> blank so there are 2 blanks before the time so COPY misstakes this as an 
> empty column and gets confused.
> 
> Can someone point me in the direction of an COPY option I'm not aware 
> of, or alternativly to some console tool that I can put in the batch 
> before the import step and replace the 2 blanks with 1 blank.
>
> I use an OpenSuse server so some linux tool would do.

A simple sed(1) expression should do the trick:

sed -E 's/ +/ /g' old-file > new-file

GNU sed also allows in-place editing using -i, so you can avoid
writing it to a second file.  Some seds accept a different flag
to enable extended regexps.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] "Too far out of the mainstream"

2012-09-01 Thread Peter Bex
On Sat, Sep 01, 2012 at 12:43:15AM +0200, Geert Mak wrote:
> There is this case studies section as well -
> 
> http://www.postgresql.org/about/casestudies/
> 
> Which appear to me a little old and a little too little, one could try to add 
> more, perhaps.

I noticed that the "Share Your Story" link is broken.
I don't know how long it's been broken, but this might be a reason
there are no new ones.

What kind of "success story" would be accepted for this page?
We're also running Postgres for most our projects at work, some of them
being rather large databases.  Of course "large" is subjective... some
people might call it kids' stuff.  Also, how "well known" does a company
need to be in order for it to be on the list?

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BI tools and postgresql

2012-07-26 Thread Peter Bex
On Thu, Jul 26, 2012 at 08:17:19AM -0400, Chris Curvey wrote:
> 1) a tool for power users to be able to get their own data.  There are
> a few folks here who are smart enough to be trusted with direct access
> to the database, but I don't want to ask them to learn SQL.  Something
> like an open source version of the old Business Objects drag-n-drop
> interface would be great.
> 
> 2) I could also use a more traditional reporting tool to replace (or
> augment) Crystal Reports.  Heck, I'd take an open-source version of
> SQR if I could get it.

I don't know since I haven't used it yet, but Sofa looks rather promising:
http://www.sofastatistics.com/

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Feature discussion: Should syntax errors abort a transaction?

2012-06-19 Thread Peter Bex
On Tue, Jun 19, 2012 at 02:20:57AM -0400, Tom Lane wrote:
> Craig Ringer  writes:
> > I've been working in psql a lot recently, and have started to wonder why 
> > statements with syntax errors or other problems that render them 
> > unexecutable terminate the transaction.
> 
> Well, the obvious reason is that it's hard to tell what the user meant,
> so bailing is the safest response.
> 
> > I understand why statements that raise errors during their execution 
> > terminate a transaction,
> 
> So you're suggesting that "SELECT 1/0;" should terminate a transaction,
> but "SELECT 1//0;" should not?  How about "ROLBACK;"?  It gets pretty
> squishy pretty fast when you try to decide which sorts of errors are
> more important than others.

+1.  I hate tools that try to read your mind.  They invariably fail
at that.  The current behaviour is 100% correct and unambiguous.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Management in pqlib, Garbage Collection support

2012-05-03 Thread Peter Bex
On Thu, May 03, 2012 at 09:39:29AM +0200, Alexander Reichstadt wrote:
> Thanks, that's answering my question. In Objective-C as well as many other 
> languages there is the feature to turn on Garbage Collection. It's a separate 
> thread that scans memory for strong pointers, their source and origin and 
> "vacuums" memory so to not have any leaks. Anything unreferenced and no 
> longer needed is cleaned up automatically. There are some border cases where 
> GC can fail, but for most it works.

OK, so you're specifically talking about Objective C.  I don't know much
about that language, but unless there are specific ObjC-bindings for
libpq (like libpq++ for C++), you'll need to perform manual memory
management and call PQClear yourself.

> As GC is an evolutionary stage across languages [...]

Thank you for your explanation of GC but that really wasn't neccessary.
The confusion was mostly due to your mail omitting the fact that you
were using ObjC rather than C, which would be the default assumption
when talking about libpq.

Actually, my language of choice (Chicken Scheme) is also GCed and the
postgresql bindings I wrote for it integrate with its GC in such a way
as to automatically call PQClear when an object gets collected.  This
is easy to do in any language that supports "finalizers", and with a
little more effort it could even be integrated more tightly with the GC.
The user can also manually decide to clear up the memory used by a
result set when it's known in advance that this will no longer be needed
and the memory use is prohibitive.

I think if you want something similar you'll either need to write your
own ObjC class to wrap the C functions or look for something pre-existing.
Perhaps you can use libpq++.  I've also found a "PostgreSQL Cocoa Framework"
project (but it appears to be abandoned): http://pgsqlcocoa.sourceforge.net/

> In general there are libs that provide garbage collection for C as well, like 
> here:
> 

The Boehm GC is a good prototyping solution but not very suitable for
general use.  It's an especially bad idea to force use of such a garbage
collector on the user whenever using libpq.  This would also prohibit
the integration of libpq with other languages and their GCs.

AFAIK libpq currently does not offer specifying callback functions to
use for allocating and freeing objects (it uses malloc/free directly).
If this was available, it would be even easier to integrate deeply with
a custom GC.

> For example, it'd help avoid leaks like those caused by a result not being 
> PQclear'ed.

C programmers are generally comfortable with manual memory management, or
they'd use another language.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Memory Management in pqlib, Garbage Collection support

2012-05-03 Thread Peter Bex
On Thu, May 03, 2012 at 09:08:53AM +0200, Alexander Reichstadt wrote:
> 
> Hi,
> 
> since I got no answer so far I searched through the docu again. I searched 
> for GC as well as Garbage, and all garbage refers to is with regard to 
> vacuuming a database. But my question refers to wether or not memory 
> management is with garbage collection supported or not. When I try to link 
> against pqlib, it seems I need to have garbage collection enabled in order 
> for it to link. But the documentation mentions it nowhere.

What kind of garbage collection do you need to have enabled in order to link?

C is based on manual memory-management, and you can't generally have
garbage-collection in it.  Hence, libpq is *not* garbage-collected and
your statement that it needs "garbage collection enabled" doesn't make
much sense to me.

Are you talking about libpq bindings to some other language, perhaps?

> Please, can someone confirm this or is this the wrong list?

This is the right list.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Zero-length character breaking query?

2012-03-16 Thread Peter Bex
On Thu, Mar 15, 2012 at 05:09:32PM -0600, Doug Gorley wrote:
> G'day,
> 
> "select * from tdt_unsent where str_name_l = 'SMITH'" returns 0 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH'" returns 3 rows.
> "select * from tdt_unsent where str_name_l ~ '^SMITH$'" returns 0 rows.
> "select length(str_name_l) from tdt_unsent where str_name_l ~ '^SMITH'" 
> returns "5".

What does octet_length return?  Perhaps this is some nonprintable
control character.  Unicode is full of those.  If it differs from
the string length, then you can be pretty sure that's the case.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to tame a gigantic (100+ lines) query in a web app?

2011-08-14 Thread Peter Bex
On Sun, Aug 14, 2011 at 10:39:48AM -0400, W. Matthew Wilson wrote:
> I'm sure I'm not the first person to end up with a gigantic query that
> does lots of left joins and subselects.
> 
> It seems to work, but I would love to break it up into smaller chunks.
> 
> I'm thinking about rewriting the query to make several temporary
> tables that are dropped on commit, and then joining them at the end.

That's possible, but you also want to consider using CTEs (common table
expressions).  I generally prefer those when my queries are getting too
hairy to read.  You'll need PostgreSQL 8.4 or later for those.

See section 7.8 in the manual:
http://www.postgresql.org/docs/current/interactive/queries-with.html

> Is there anything dangerous about making temporary tables in this way?

AFAIK there isn't, but there might be some overhead that you don't get
with CTEs, since a temporary table will probably get materialized on disk
(AFAIK), and the optimizer probably can't do smart things to leave out
rows that cancel out through related WITH blocks.

> The temporary tables mean I'm only pulling data from the database one
> time.  ORMs often pull data from one query and then use that data to
> write the next query.  This seems slow to me.

Yeah, ORMs are stupid that way :)

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Implementing "thick"/"fat" databases

2011-07-27 Thread Peter Bex
On Wed, Jul 27, 2011 at 04:35:45PM +0200, Karsten Hilbert wrote:
> > I wonder which other languages have first class support for these areas of
> > Pg?
> 
> While already supporting most if not all standard PG datatypes the
> psycopg2 interface lets you write in/out wrappers of arbitray
> complexity mapping PG <-> Python datatypes and insert them into
> the driver at runtime.

The same is true for the Chicken Scheme PostgreSQL egg.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres errors in Drupal install.

2011-06-26 Thread Peter Bex
On Sun, Jun 26, 2011 at 11:23:30PM +0200, Dave Coventry wrote:
[snip]
> 2011-06-26 17:13:24 EDT DETAIL:  Failed system call was
> shmget(key=5432001, size=29278208, 03600).
> 2011-06-26 17:13:24 EDT HINT:  This error usually means that
> PostgreSQL's request for a shared memory segment exceeded available
> memory or swap space. To reduce the request size (currently 29278208
> bytes), reduce PostgreSQL's shared_buffers parameter (currently 3072)
> and/or its max_connections parameter (currently 103).
>   The PostgreSQL documentation contains more information about shared
> memory configuration.

This sounds like you should tweak those parameters. See the relevant section:
http://www.postgresql.org/docs/current/interactive/kernel-resources.html#SYSVIPC

> While the server is a little challenged in the way of RAM, it is by no
> means under any sort of load.

Drupal is a huge memory hog.  It's quite possible that it consumes so
much memory that it doesn't leave enough for postgres.  If at all
possible, try increasing the amount of memory available.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres errors in Drupal install.

2011-06-26 Thread Peter Bex
On Sun, Jun 26, 2011 at 09:36:59PM +0200, Dave Coventry wrote:
> I'm having huge problems with a Drupal install using Postgres 8.4.8.
> 
> I'm getting the following error:

Do you get the error in the logs or do you see it while going to the
site manually and does it show this instead of rendering a page?

If it's in the logs, and you're running cron or poorman's cron once
an hour, it could be that's where these messages are coming from.

You can run cron manually from the interface somewhere, so you
can try if you can reproduce the error more reliably and work from
there.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Executing prepared statements via bind params

2011-06-16 Thread Peter Bex
On Thu, Jun 16, 2011 at 05:19:41PM -0400, Tom Lane wrote:
> Peter Bex  writes:
> > But when I try to do the same but pas the 2 as a parameter,
> > (I do "EXECUTE bar($1)" with $1 bound to "2"), I get an error:
> 
> Why would you do that, rather than executing the prepared statement
> directly with PQexecPrepared?

I'm writing a Scheme language binding and currently I simply don't have
any bindings for this function, and a user of this library was
experimenting with some optimizations for his code and ran into this.
I was kind of hoping to avoid having too many special-purpose functions
and since there's also an SQL "EXECUTE" function, PQexecPrepared seemed
a bit redundant.

> Interposing an EXECUTE doesn't do anything but add parsing overhead.

Is this parsing overhead of an EXECUTE statement (which should be very
short and simple) *that* significant compared to the savings you get
when preparing a complex SQL statement which is executed many times?

> The reason this particular case doesn't work is that utility statements
> (including EXECUTE) don't have any support for $n parameters.

Why not, is it simply something nobody ever needed?  It seems rather
logical to be able to replace any literal by an equivalent parameter
in any SQL statement.

I should probably add support for PQexecPrepared at some point but
even then, as a user, I'd probably expect this to be possible for
reasons of symmetry and regularity.  It might also make it easier for
certain types of generated SQL.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Executing prepared statements via bind params

2011-06-16 Thread Peter Bex
Hi all,

I'm seeing a bit strange (some might say inconsistent) behaviour,
possibly a bug.

First, I prepare a statement by running the query:

   "PREPARE bar (int) AS (SELECT 1, $1, 3)"

Then I try to use the generic libpq query function PQsendQueryParams
with a query of "EXECUTE bar(2)" and I get back a result set with one
record (1, 2, 3) in it.  This is fine.

But when I try to do the same but pas the 2 as a parameter,
(I do "EXECUTE bar($1)" with $1 bound to "2"), I get an error:

  ERROR:  bind message supplies 1 parameters, but prepared statement "" 
requires 0

This doesn't make much sense to me.  It's the same error you get when
trying to run a nonparameterized query like "SELECT 1" with $1 bound to
anything.

Does the query parser somehow miss the fact that there's a placeholder
in the EXECUTE statement?

I'm attempting to keep my Scheme library's API as small and simple as
possible, so I'd like to avoid having a separate procedure for querying
and one for executing prepared statements, considering there's also an
SQL command for executing prepared statements.  Is there a particular
reason there are separate functions in libpq (aside from historical
accident)?

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] alter table add column - specify where the column will go?

2010-11-24 Thread Peter Bex
On Wed, Nov 24, 2010 at 09:37:02AM +, Grzegorz Jaƛkiewicz wrote:
> just never use SELECT *, but always call columns by names. You'll
> avoid having to depend on the order of columns, which is never
> guaranteed, even if the table on disk is one order, the return columns
> could be in some other.

People have been saying that on this list forever, and I agree you
shouldn't *depend* on column order, but why does INSERT syntax allow
you to omit the column names?

INSERT INTO sometable VALUES (1, 2, 3);

If columns inherently don't have an ordering, this shouldn't be
possible because it would make no sense.

Things like this INSERT syntax and the fact that columns are always
returned in the same order when you "SELECT *" or when you check the
table definition with \d condition people to expect being able to
influence the order of columns.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgres tips in the latest Chicken Gazette

2010-11-22 Thread Peter Bex
Hello all,

I'm not sure if this would be considered off topic, but I'd like to
point out this week's issue of the Chicken Gazette in which I describe
integration of array and "ROW" composite type handling with the
PostgreSQL library for the Chicken Scheme compiler:
http://gazette.call-cc.org/issues/13.html
Scroll down to chapter 4: "Omelette Recipes" for the Postgres bits.

I think cool features of Postgres like these are not common knowledge
among programmers in general and deserve to be pointed out, so others
will also have a chance to find out what makes postgres great!

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] More then 1600 columns?

2010-11-12 Thread Peter Bex
On Fri, Nov 12, 2010 at 10:18:31AM +0100, Thomas Kellerer wrote:
> The hstore module would also be a viable alternative - and it's indexable 
> as well.

That's what the post I replied to mentioned :)

Also, when I looked at hstore, it had a limitation on its length which
made it a little dangerous to use for me:
"In the current implementation, neither the key nor the value string
 can exceed 65535 bytes in length"
[http://www.postgresql.org/docs/8.4/interactive/hstore.html]

It looks like this limitation has been lifted in 9.0.  Good news!

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] More then 1600 columns?

2010-11-11 Thread Peter Bex
On Fri, Nov 12, 2010 at 10:43:14AM +0300, Dmitriy Igrishin wrote:
> Hey Peter,
> 
> Unfortunately, there is no indexes on arrays (only on expressions).
> With hstore we can easily create GiST index for effective access.

True. In my project I only ever needed to search on a particular key,
and I made sure that that key always had a fixed position in the array.
You can then create an index on the expression that extracts that
index from the array.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] More then 1600 columns?

2010-11-11 Thread Peter Bex
On Fri, Nov 12, 2010 at 10:17:50AM +0300, Dmitriy Igrishin wrote:
> Hey Mark,
> 
> Yeah, I can't imagine an entity in a real project even with more than 100
> columns. Its rare case.
> But if you entities (rows/tuples) of some class (table) can contains
> variable
> set of columns (properties) you can look at hstore contrib module.

What can also work extremely well is storing the data in an array.
If you need to access the array based on more meaningful keys you could
store key/index pairs in another table.

This approach only works well if you have multiple arrays with the same
layout.  You probably also need to build up your query dynamically if
you need to access variable numbers of datapoints.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Advice needed on application/database authentication/authorization/auditing model

2010-10-22 Thread Peter Bex
On Fri, Oct 22, 2010 at 08:20:11PM +0400, Dmitriy Igrishin wrote:
> Hey Peter,

Hello Dmitriy,

> > As far as I can see, this would imply either creating views on the
> >  for every user (or company?), or manually crafting queries
> > to do the same. The latter is of course what most webapps do, and it is
> > a frequent cause of errors and, hence, vulnerabilities.
> >
> Yes, liberal use of views and rules are the best solutions in this case IMO.

Do you know of an open source application that does that so I can see
it in practice?

I'd like to learn how it's done in practice because right now it seems
to me that this would be rather complicated to manage.
 
Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Advice needed on application/database authentication/authorization/auditing model

2010-10-22 Thread Peter Bex
On Fri, Oct 22, 2010 at 12:21:17AM +0400, Dmitriy Igrishin wrote:
> Hey Tony,
> 
> 2010/10/21 Tony Cebzanov 
> 
> > I have a web application with a Postgres backend.  In my initial
> > prototype, I decided not to have a Postgres database user created for
> > each application user, opting instead to use my own users table.
> >
> IMO, you are trying to reinvent the wheel. Although, you may do it just for
> fun. :-)

This is an interesting statement and I've wondered about this a couple
of times before.  It seems very common practice (at least with web
applications) to access a database from one user and do all
authentication and authorization checking in the application.

Is this a bad idea?  At first glance, it would indeed seem very
logical to make an application user match a database user, but how do
you do row-level permission checking?

For instance, very often I've had the situation where a web app stores
users, companies and s.  The whatevers can be accessed only
to either the user who created them, or when they are either company-wide
assets or the user has company-wide privileges, can be accessed only
when the whatever was made by a user from the same company.

As far as I can see, this would imply either creating views on the
 for every user (or company?), or manually crafting queries
to do the same. The latter is of course what most webapps do, and it is
a frequent cause of errors and, hence, vulnerabilities.

I'd love to hear some good advice on what has worked well for others.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Prepared statements and unknown types

2010-09-29 Thread Peter Bex
On Wed, Sep 29, 2010 at 07:33:53PM +0100, Thom Brown wrote:
> Okay, I understand what's happening.  But does the planner need to
> understand the type of literals in the select list if it's not used
> anywhere else?

Fields sent back to the client also carry their type with them.
There's no "unknown" type (and it wouldn't be very useful in any
case, because how would you go about displaying an unknown type?)

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Prepared statements and unknown types

2010-09-29 Thread Peter Bex
On Wed, Sep 29, 2010 at 07:08:22PM +0100, Thom Brown wrote:
> Could someone explain why the following doesn't work?
> 
> test=# PREPARE meow(unknown) AS
> test-# SELECT $1 as meow;
> ERROR:  could not determine data type of parameter $1
> 
> The problem is that using PDO in PHP, prepared statements aren't
> possible if values are used instead of columns in the select list.

The type is always string for data that's sent; it's converted to
an appropriate type when the destination of the parameter is determined.
If you know the type, you can do

PREPARE meow(text) AS
SELECT $1 as meow;

or

PREPARE meow(unknown) AS
SELECT $1::text as meow;

You can also send a parameter as a specific type using the C interface,
but that requires support from the language/library you're using.

> This appears to be allowed for MySQL and SQL Server.

I don't know how they handle that.  Perhaps they try to read your mind.
Perhaps PHP adds some kind of type conversion for types it knows for
those two interfaces.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] MySQL versus Postgres

2010-08-06 Thread Peter Bex
On Fri, Aug 06, 2010 at 02:25:27PM -0600, Scott Marlowe wrote:
> For an extra added bonus, show a click-path from the front page
> mysql.com to the documentation.  I gave up and just entered
> mysql.com/documentation which kicked off the search engine and let me
> click somewhere into the 5.1 docs.  Yes I know they're on
> dev.mysql.com but still, it feels like a total brochure site from the
> front end.

I know :)  They actually have TWO sites: mysql.com which is indeed
a commercial outlet where they try to sell you their commercial offering
and mysql.org which is geared toward developers.  The manual is "only"
three clicks away on that site.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] MySQL versus Postgres

2010-08-06 Thread Peter Bex
On Fri, Aug 06, 2010 at 02:09:43PM -0600, Scott Marlowe wrote:
> I'm all in favor of a nice tutorial section in the docs.  But I
> certainly don't want pgsql docs to mimick the mysql docs method of
> using a tutorial method for most of the beginner information.  It's so
> dang hard to find anything I want in the mysql docs because of it.
> Once you're somewhat familiar with the db, having to slog through
> pages of tutorial to find the bits you want is counterproductive.  The
> easy-peasy entry level mentality of the mysql docs works against
> experienced users.

+1

I spent half an hour today trying to find the various ways MySQL has
for controlling charsets.  The TOC is unusable and even using the index
it can sometimes be very frustrating to find something.  Postgres has
outstanding docs right now!  It would be a shame to change that.

Tutorials would be good, but in a separate section of the site.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How Big is Too Big for Tables?

2010-07-28 Thread Peter Bex
On Wed, Jul 28, 2010 at 02:05:47PM -0500, P Kishor wrote:
> each row is half a dozen single byte values, so, it is actually 6
> bytes per row (six columns). Even if I combine them somehow, still the
> per row overhead (which, I believe, is about 23 bytes) is more than
> the data. But, that is not the issue.

I had a design like that for an application too. I thought it was
not an issue, but the row overhead causes memory and disk usage to
skyrocket, and will cause queries to slow down to a grind.  The solution
for me was to group my values logically together and store them in the
same row somehow.  In my case, this worked by storing all the values for
one measuring point (timestamp) in an array field, with the array indices
being stored in a bookkeeping table (each measuring moment produced the
same number of values for me, so I was able to do this).

Extracting one value from a long array (some datasets include thousands
of values per measuring moment) is extremely fast. You can also easily
make indices on those array dereferences you need to search on, if those
are always the same.

> First, I can't really merge
> several days into one row. While it might make for fewer rows, it will
> complicate my data extraction and analysis life very complicated.

Perhaps you could put all days of a month in an array, indexed by day
of the month?  That wouldn't be too hard for your logic to deal with,
I think.

> The real issue is that once I put a 100 million rows in the table,
> basically the queries became way too slow.

I had the same issue.  Partitioning falls flat on its face once you're
dealing with such insane amounts of data.  In my experience if your
partitions aren't constant and will keep growing, you will face problems
sooner or later.  If you do partitioning the traditional way by
inheriting the table, you'll also run into additional trouble since for
some operations Postgres will need to obtain a handle on all partitions
and that will easily cause you to run out of shared memory.  You can
increase max_locks_per_transaction, but that's undoable if the number
of partitions keeps growing. You need to keep increasing that value all
the time...

> Of course, I could (and should) upgrade my hardware -- I am using a
> dual Xeon 3 GHz server with 12 GB RAM, but there are limits to that route.

Always try to solve it by changing your data design first, unless what
you're trying to do is fundamentally limited by hardware.  You're not
likely going to request all those record at once, nor will you need to
search through all of them; try to come up with a sane way of quickly
slicing your data to a smaller set which can be quickly retrieved.

> Keep in mind, the circa 100 million rows was for only part of the db.
> If I were to build the entire db, I would have about 4 billion rows
> for a year, if I were to partition the db by years. And, partitioning
> by days resulted in too many tables.

Yeah, sounds similar to the troubles I ran into in my project.

> I wish there were a way around all this so I could use Pg, with my
> available resources, but it looks bleak right now.

Try using the array approach.

Possibly you could create columns for each week or month in a year
and store the individual days in an array in that column.  Extracting
those shouldn't be too hard.

You could store the different types of data you have in different rows
for each unit of information you want to store for a day.

Alternatively, store your data points all in one row, and store a row
for each day.  You could easily start partitioning historical data per
year or per decade.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tuple storage overhead

2010-04-20 Thread Peter Bex
On Fri, Apr 16, 2010 at 11:28:36AM -0400, Merlin Moncure wrote:
> If you are storing big data and want to keep the overhead low, the
> first thing you need to examine is organizing your data into arrays.
> This involves tradeoffs of course and may not work but it's worth a
> shot!

That does sound interesting.  However, I'm storing several hundreds or
thousands of data points (depending on the data set).  How is Postgresql's
overhead when it comes to extracting one or two items from an array in
a query?

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tuple storage overhead

2010-04-20 Thread Peter Bex
On Fri, Apr 16, 2010 at 12:40:21PM +0200, Szymon Guz wrote:
> I thought that the default fillfactor was much smaller (and haven't checked
> that now)...  sorry for messing that up.
> But let's think of it from the other side: what do you want to do with that
> data? Maybe PostgreSQL with it's MVCC's overhead isn't the best solution for
> your needs.

I'm using this as part of a larger application. The data sets are one aspect
of it.  The idea is the following:

An engineering application generates time-dependent data. One simulation
yields a very big text file in tabular format, with hundreds or thousands
of columns with output values (often more than Postgres' column limit),
one row per timestamp.

One such file is generated for each permutation of input values which
influence the run of a simulation.

This text file is imported into a database so we can perform very quick
lookups on the numbers so they can be quickly plotted in a graph.
The user can select any number of input permutations and graph the
values of any selected output values to view the effect of the variation
in input.

One can plot any variable against any other, so one join is made for
each variable that we want to plot; it joins the timestep values of the
variable on the X axis to those on the Y axis.

Regards,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Tuple storage overhead

2010-04-16 Thread Peter Bex
On Fri, Apr 16, 2010 at 11:59:38AM +0200, Szymon Guz wrote:
> File pages are not fully filled from the start as that could result in bad
> performance of queries later.

The manual page you linked to says something else:
"The fillfactor for a table is a percentage between 10 and 100.
 100 (complete packing) is the default."

However, the index has a default fill factor of 90, so I guess
I can tweak that to 100 to shave off another few percent.
(there will be no updates nor extra inserts on these tables)

Thanks for the tip!  I hope there are more ways to tweak it, though
because this doesn't save that much.

Regards,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Tuple storage overhead

2010-04-16 Thread Peter Bex
Hi all,

I have a table with three columns: one integer and two doubles.
There are two indexes defined (one on the integer and one on one
of the doubles).  This table stores 70 records, which take up
30 Mb according to pg_relation_size(), and the total relation size
is 66 Mb.

I expected the disk space usage to be halved by changing the doubles
to floats, but it only dropped by 5 MB!  (I tried various approaches,
including dumping and restoring to make sure there was no uncollected
garbage lying around)

Someone on IRC told me the per-tuple storage overhead is pretty big,
and asked me to create a similar table containing only integers:

db=# create table testing (  x integer );
db=# INSERT INTO testing (x) VALUES (generate_series(1, 70));
dacolt_development=# SELECT pg_size_pretty(pg_total_relation_size('testing'));
 pg_size_pretty 
 
  24 MB
  (1 row)
db=# SELECT pg_size_pretty(pg_relation_size('testing'));
 pg_size_pretty 
 
  24 MB
db=# CREATE INDEX testing_1 ON testing (x);
db=# CREATE INDEX testing_2 ON testing (x);
db=# SELECT pg_size_pretty(pg_relation_size('testing'));
 pg_size_pretty 
 
  24 MB
  (1 row)
db=# SELECT pg_size_pretty(pg_total_relation_size('testing'));
 pg_size_pretty 
 
  54 MB
  (1 row)

Is there a way to reduce the per-tuple storage overhead?

The reason I'm asking is that I have tons of tables like this,
and some data sets are much bigger than this.  In a relatively
simple testcase I'm importing data from text files which are
5.7 Gb in total, and this causes the db size to grow to 34Gb.

This size is just one small sample of many such datasets that I
need to import, so disk size is really an important factor.

Regards,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Array value syntax and escaping

2010-04-01 Thread Peter Bex
On Thu, Apr 01, 2010 at 04:37:23PM -0400, Tom Lane wrote:
> There is not, but you could always look at the source for array_out and
> array_in if you want definitive answers ;-).  Personally what I'd do is
> always double-quote each non-null array element; then the rules reduce
> to "backslash any backslashes or double quotes".

That's what I'm doing right now.

> If you're working in a sane client encoding (not SJIS for instance)
> this is pretty trivial.

That's a relief :)
OTOH, for a generic library, it's not a good idea to make such assumptions..

> The problem with that is that the necessarily-arbitrary API would
> probably add as much or more complexity as would be saved.  If C had
> a simple and universally-followed convention for variable-size arrays,
> it'd be easier to provide useful helpers ...

What I was proposing is a simple escaper for string values, nothing
more.  Putting array decoration around those is trivial, encoding the
strings is the tricky part.

However, this got me thinking: It could use the same system as
environment and argv values in C, or the new connection procedures of
libpq; a null-terminated list of string pointers.

But then the question is how to encode nested arrays.  I guess it's
feasible to pass the array depth as an extra argument to the escaping
procedure, since we know arrays cannot have variable depths between
elements.  This could really work and wouldn't need to be overly complex.

It wouldn't save a lot of complexity, but it would save some wheel
reinvention in a case where there's room for error, just like PQescape*.
You could argue that PQescape* don't save a lot of complexity either,
yet those are considered a good idea.  I don't see how this is any
different.

> > I briefly considered "abusing" the PQescapeIdentifier procedure for
> > escaping since the syntax for literals inside arrays seems to be exactly
> > like that of SQL identifiers, but I'm not 100% sure about that and I
> > also think the PQescapeIdentifier procedure shouldn't be overloaded for
> > this purpose.
> 
> Well, that wouldn't work anyway, since backslashes aren't special in
> identifiers.

Good thing I didn't abuse it, then :)

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Array value syntax and escaping

2010-04-01 Thread Peter Bex
Hello all,

I am currently adding array value handling to the PostgreSQL interface
for the Chicken Scheme compiler[*] and I was wondering if there's a more
detailed documentation for the exact syntax of arrays than the short
natural language explanation in the manual.

There doesn't appear to be a helper function in libpq to escape (and
unescape) string values for use inside array values and I'm concerned
that my homebrew procedures might not be foolproof.

If I understand correctly, the connection's character encoding is also
used to determine how strings sent by "PQsendQueryParams parameters"
(I don't know if there's an unambiguous name for those) are parsed, and
it is conceivable that either through bogus or malicous input a
multibyte array string could be wrongly escaped, which means one array
value could "break out" of its proper position in the array, resulting
in an array of different length or contents than intended.

IMHO, it would be a Good Thing to have helper procedures in libpq.
That would fix the problem once and for all in one place.

I briefly considered "abusing" the PQescapeIdentifier procedure for
escaping since the syntax for literals inside arrays seems to be exactly
like that of SQL identifiers, but I'm not 100% sure about that and I
also think the PQescapeIdentifier procedure shouldn't be overloaded for
this purpose.

Cheers,
Peter

[*] http://chicken.wiki.br/eggref/4/postgresql
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
-- Donald Knuth

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general