Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-22 Thread MauMau

From: "Tatsuo Ishii" 

I don't think the bind placeholder is the case. That is processed by
exec_bind_message() in postgres.c. It has enough info about the type
of the placeholder, and I think we can easily deal with NCHAR. Same
thing can be said to COPY case.


Yes, I've learned it.  Agreed.  If we allow an encoding for NCHAR different 
from the database encoding, we can convert text from the client encoding to 
the NCHAR encoding in nchar_in() for example.  We can retrieve the NCHAR 
encoding from pg_database and store it in a global variable at session 
start.




Problem is an ordinary query (simple protocol "Q" message) as you
pointed out. Encoding conversion happens at a very early stage (note
that fast-path case has the same issue). If a query message contains,
say, SHIFT-JIS and EUC-JP, then we are going into trouble because the
encoding conversion routine (pg_client_to_server) regards that the
message from client contains only one encoding. However my question
is, does it really happen? Because there's any text editor which can
create SHIFT-JIS and EUC-JP mixed text. So my guess is, when user want
to use NCHAR as SHIFT-JIS text, the rest of query consist of either
SHIFT-JIS or plain ASCII. If so, what the user need to do is, set the
client encoding to SJIFT-JIS and everything should be fine.

Maumau, is my guess correct?


Yes, I believe you are right.  Regardless of whether we support multiple 
encodings in one database or not, a single client encoding will be 
sufficient for one session.  When receiving the "Q" message, the whole SQL 
text is converted from the client encoding to the database encoding.  This 
part needs no modification.  During execution of the "Q" message, NCHAR 
values are converted from the database encoding to the NCHAR encoding.


Thank you very much, Tatsuo san.  Everybody, is there any other challenge we 
should consider to support NCHAR/NVARCHAR types as distinct types?


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LDAP: bugfix and deprecated OpenLDAP API

2013-09-22 Thread Abhijit Menon-Sen
At 2013-08-19 11:47:36 +, laurenz.a...@wien.gv.at wrote:
>
> To repeat: this fixes a bug in LDAP connection parameter lookup

Hi.

I read through the patch, and it looks sensible.

I would have preferred the ldap_simple_bind_s() call in the HAVE_LIBLDAP
branch to not be inside an else {} (the if block above returns if there
is an error anyway), but that's a minor point.

I tested the code as follows:

1. Built the patched source --with-ldap
2. Set up ~/.pg_service.conf:

[foo]

ldap://localhost:3343/cn=dbserver,cn=hosts?pgconnectinfo?base?(objectclass=*)

ldap://localhost:3443/cn=dbserver,cn=hosts?pgconnectinfo?base?(objectclass=*)

3. iptables -A INPUT -p tcp -d 127.0.0.1 --dport 3343 -j DROP
4. netcat -l 127.0.0.1 3343 ; netcat -l 127.0.0.1 3443
5. PGSERVICE=foo bin/psql

psql did connect to localhost:3443 after a few seconds of trying to
connect to :3343 and failing. (I tried without the iptables rule, so
I know that it does try to connect to both.)

This doesn't seem to handle timeouts in the sense of a server that
doesn't respond after you connect (or perhaps the timeout was long
enough that it outlasted my patience), but that's not the fault of
this patch, anyway.

I can't say anything about the patch on Windows, but since Magnus seemed
to think it was OK, I'm marking this ready for committer.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube extension kNN support

2013-09-22 Thread Oleg Bartunov
Do you have any benchmarks ?


On Mon, Sep 23, 2013 at 3:38 AM, Stas Kelvich wrote:

> Hello, hackers.
>
> Here is the patch that introduces kNN search for cubes with euclidean,
> taxicab and chebyshev distances.
>
> Following distance operators introduced:
>
> <#> taxicab distance
> <->  euclidean distance
> <=> chebyshev distance
>
> For example:
> SELECT * FROM objects ORDER BY objects.coord <-> '(137,42,314)'::cube
> LIMIT 10;
>
> Also there is operator "->" for selecting ordered rows directly from index.
> This request selects rows ordered ascending by 3rd coordinate:
>
> SELECT * FROM objects ORDER BY objects.coord->3 LIMIT 10;
>
> For descendent ordering suggested syntax with minus before coordinate.
> This request selects rows ordered descending by 4th coordinate:
>
> SELECT * FROM objects ORDER BY objects.coord->-4 LIMIT 10;
>
> Stas Kelvich.
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] pgbench progress report improvements - split 3

2013-09-22 Thread Fabien COELHO


Hello Josh,


As long as you're hacking pgbench output, what about offering a JSON
option instead of the text output?  I'm working on automating pgbench
performance testing, and having the output in a proper delimited format
would be really helpful.


I'm more a "grep | cut | ..." person, but I do understand your point about 
automation. My favorite hack is to load benchmark data into sqlite and 
perform query to build data extract for gnuplot figures.


So I can put that in the stack of things to do when I have some time.

Due to the possibly repetitive table structure of the data, maybe CSV 
would make sense as well. It is less extensible, but it is directly usable 
by sqlite or pg.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Extend namespace of valid guc names

2013-09-22 Thread Amit Kapila
On Fri, Sep 20, 2013 at 9:48 AM, Amit Kapila  wrote:
> On Thu, Sep 19, 2013 at 2:48 PM, Andres Freund  wrote:
>> On 2013-09-18 11:02:50 +0200, Andres Freund wrote:
>>> On 2013-09-18 11:55:24 +0530, Amit Kapila wrote:
>>>
>>> > > I think that ship has long since sailed. postgresql.conf has allowed
>>> > > foo.bar style GUCs via custom_variable_classes for a long time, and
>>> > > these days we don't even require that but allow them generally. Also,
>>> > > SET, postgres -c, and SELECT set_config() already don't have the
>>> > > restriction to one dot in the variable name.
>>> >
>>> > It's even explained in document that a two-part name is allowed for
>>> > Customized Options at link:
>>> > http://www.postgresql.org/docs/devel/static/runtime-config-custom.html
>>>
>>> Oh I somehow missed that. I'll need to patch that as well.
>>
>> Updated patch attached.
>
> old part of line
> - PostgreSQL will accept a setting for any two-part parameter name.
>
> new part of line
> + PostgreSQL will accept a setting for any parameter name containing
> at least one dot.
>
> If I read new part of line just in context of custom options, it makes
> sense, but when I compare with old line I think below lines or variant
> of them might make more sense as this line is not restricted to just
> custom options:
>
> a. PostgreSQL will accept a setting for any parameter name containing
> two or more part parameter names.
> b. PostgreSQL will accept a setting for any parameter name containing
> one or more dots in parts of parameter names.
>
> It's just how user interpret any line, so may be your line is more
> meaningful to some users. If you don't think there is any need to
> change then keep as it is and let committer decide about it. I don't
> have any big problem with the current wording.
>
> I think Robert is still not fully convinced about this patch, but from
> my side review of patch with the current scope is complete, so I will
> mark it as Ready For Committer if nobody has objections for the same.

I had marked this patch as Ready For Committer, as I think in it's
current scope definition it's ready for next level of review.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory

2013-09-22 Thread Amit Kapila
On Mon, Sep 23, 2013 at 12:34 AM, Noah Misch  wrote:
> On Sun, Sep 22, 2013 at 01:17:52PM +0530, Amit Kapila wrote:
>> On Fri, Sep 20, 2013 at 5:14 PM, Andres Freund  
>> wrote:
>> > On 2013-09-19 11:44:34 -0400, Robert Haas wrote:
>> >> On Wed, Sep 18, 2013 at 1:42 PM, Andres Freund  
>> >> wrote:
>> >> >> + /* Create or truncate the file. */
>> >> >> + statefd = open(PG_DYNSHMEM_NEW_STATE_FILE, 
>> >> >> O_RDWR|O_CREAT|O_TRUNC, 0600);
>> >> >
>> >> > Doesn't this need a | PG_BINARY?
>> >>
>> >> It's a text file.  Do we need PG_BINARY anyway?
>> >
>> > I'd say yes. Non binary mode stuff on windows does stuff like
>> > transforming LF <=> CRLF on reading/writing, which makes sizes not match
>> > up and similar ugliness.
>> > Imo there's little reason to use non-binary mode for anything written
>> > for postgres' own consumption.
>>
>> On checking about this in code, I found the below comment which
>> suggests LF<=> CRLF is not an issue (in windows it uses pgwin32_open
>> to open a file):
>>
>> /*
>>  * NOTE:  this is also used for opening text files.
>>  * WIN32 treats Control-Z as EOF in files opened in text mode.
>>  * Therefore, we open files in binary mode on Win32 so we can read
>>  * literal control-Z. The other affect is that we see CRLF, but
>>  * that is OK because we can already handle those cleanly.
>>  */
>
> That comment appears at the definition of PG_BINARY.  You only get what it
> describes when you use PG_BINARY.
>
>> Second instance, I noticed in code as below which again suggests CRLF
>> should not be an issue until file mode is specifically set to TEXT
>> mode which is not the case with current usage of open in dynamic
>> shared memory code.
>>
>> #ifdef WIN32
>> /* use CRLF line endings on Windows */
>> _setmode(_fileno(fh), _O_TEXT);
>> #endif
>
> I suspect that call (in logfile_open()) has no effect.  The file is already in
> text mode.

Won't this be required when we have to open a new file due to log
rotation based on time?

The basic point, I wanted to make is that until you use _O_TEXT mode
explicitly, the problem LF<=>CRLF will not happen. CreateFile() API
which is used  for windows implementation of open doesn't take any
parameter which specifies it as text or binary, only by using
_setmode, we need to set the file mode as Text or Binary.
I checked fcntl.h where there is below comment above definition of
_O_TEXT and _O_BINARY which again is pointing to what I said above.
/* O_TEXT files have  sequences translated to  on read()'s,
** and  sequences translated to  on write()'s
*/

One more point, this issue has only chance of occurring when somebody
takes the file from unix system to windows and then may be back, do
you think dsm state file should be allowed in cross platform backup, I
think pg_basebackup should  disallow the backup of this file.
However user can use some other custom utility to take filesystem
level backup where this can happen, but still as per my understanding
it should not create problem.

I think to be on safe side we can use PG_BINARY, but it would be
better if we are sure that this problem can occur then only we should
use it.
If you think cross platform backup's can create such issues, then I
can once test this as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertions in PL/PgSQL

2013-09-22 Thread Pavel Stehule
2013/9/22 Jaime Casanova 

>
> El 21/09/2013 17:16, "Jaime Casanova"  escribió:
>
> >
> > On Fri, Sep 20, 2013 at 5:17 AM, Marko Tiikkaja  wrote:
> > > On 9/20/13 12:09 PM, Amit Khandekar wrote:
> > >>
> > >> On 16 September 2013 03:43, Marko Tiikkaja  wrote:
> > >>>
> > >>> I think it would be extremely surprising if a command like that got
> > >>> optimized away based on a GUC, so I don't think that would be a good
> > >>> idea.
> > >>
> > >>
> > >>
> > >> In pl_gram.y, in the rule stmt_raise, determine that this RAISE is for
> > >> ASSERT, and then return NULL if
> plpgsql_curr_compile->enable_assertions is
> > >> false. Isn't this possible ?
> > >
> > >
> > > Of course it's possible.  But I, as a PostgreSQL user writing PL/PgSQL
> code,
> > > would be extremely surprised if this new cool option to RAISE didn't
> work
> > > for some reason.  If we use ASSERT the situation is different; most
> people
> > > will realize it's a new command and works differently from RAISE.
> > >
> > >
> >
> > What about just adding a clause WHEN to the RAISE statement and use
> > the level machinery (client_min_messages) to make it appear or not
> > of course, this has the disadvantage that an EXCEPTION level will
> > always happen... or you can make it a new loglevel that mean EXCEPTION
> > if asserts_enabled
> >
>
>  meaning RAISE ASSERT of course
>

After days I am thinking so it can be a good solution

syntax - enhanced current RAISE

RAISE ASSERT WHEN boolean expression

RAISE ASSERT 'some message' WHEN expression

and we can have a GUC that controls asserts per database - possibly
overwritten by plpgsql option - similar to current plpgsql options

assert_level = [*ignore*, notice, warning, error]

comments?

Regards

Pavel

p.s. clause WHEN can be used for other exception level - so it can be a
interesting shortcut for other use cases.

--
> Jaime Casanova
> 2ndQuadrant: Your PostgreSQL partner
>


[HACKERS] Cube extension kNN support

2013-09-22 Thread Stas Kelvich
Hello, hackers.

Here is the patch that introduces kNN search for cubes with euclidean, taxicab 
and chebyshev distances.

Following distance operators introduced:

<#> taxicab distance
<->  euclidean distance
<=> chebyshev distance

For example:
SELECT * FROM objects ORDER BY objects.coord <-> '(137,42,314)'::cube LIMIT 10;

Also there is operator "->" for selecting ordered rows directly from index.
This request selects rows ordered ascending by 3rd coordinate:

SELECT * FROM objects ORDER BY objects.coord->3 LIMIT 10; 

For descendent ordering suggested syntax with minus before coordinate. 
This request selects rows ordered descending by 4th coordinate:

SELECT * FROM objects ORDER BY objects.coord->-4 LIMIT 10; 

Stas Kelvich.




distances.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-22 Thread Peter Geoghegan
On Sun, Sep 22, 2013 at 1:39 PM, Andres Freund  wrote:
> I still fail to see how that's relevant. For every index there's two
> things that can happen:
> a) there's a conflicting tuple. In that case we can fail at that
> point/convert to an update. No Bloat.

Well, yes - if the conflict is in the first unique index you look at.

> b) there's no conflicting tuple. In that case we will insert a promise
> tuple.

Yeah, if there is no conflict relating to any of the tuples, the cost
is limited to updating the promise tuples in-place. Not exactly a
trivial additional cost even then though, because you have to
exclusive lock and WAL-log twice per index tuple.

> If there's no conflict in further indexes (i.e. we INSERT), the
> promise will converted to a plain tuple.

Sure.

> If there *is* a further
> conflict, you *still* need the new index tuple because by definition
> (the index changed) it cannot be an HOT update.

By definition? What do you mean? This isn't MySQL's REPLACE. This
feature is almost certainly going to tacitly require the user to write
the upsert SQL with a particular unique index in mind (to figure that
out for ourselves, we'd need to somehow ask/infer, which is ugly/very
hard to impossible). The UPDATE, as typically written, probably
*won't* actually update any of the other, incidentally
unique-constrained/value locked columns that we have to check in case
that's what the user really meant, and very probably not the
"interesting" column appearing in the UPDATE qual itself, so it
probably *will* be a HOT update.

> So you convert it as
> well. No Bloat.

Even if this is a practical possibility, which I doubt, the book
keeping sounds very messy and invasive indeed.

> Yes, I plan to reply to those, I just didn't have time to do so this
> weekend.

Great, thanks. I cannot strongly emphasize enough how I think that's
the way to frame all of this. So much so that I almost managed to
resist answering the above points. :-)

>  There's other stuff than PG every now and then ;)

Hope you enjoyed the hike.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI freezing bug

2013-09-22 Thread Hannu Krosing
On 09/20/2013 12:55 PM, Heikki Linnakangas wrote:
> Hi,
>
> Prompted by Andres Freund's comments on my Freezing without Write I/O
> patch, I realized that there's there's an existing bug in the way
> predicate locking handles freezing (or rather, it doesn't handle it).
>
> When a tuple is predicate-locked, the key of the lock is ctid+xmin.
> However, when a tuple is frozen, its xmin is changed to FrozenXid.
> That effectively invalidates any predicate lock on the tuple, as
> checking for a lock on the same tuple later won't find it as the xmin
> is different.
>
> Attached is an isolationtester spec to demonstrate this.
The case is even fishier than that.

That is, you can get bad behaviour on at least v9.2.4 even without
VACUUM FREEZE.

You just need to run 

permutation "r1" "r2" "w1" "w2" "c1" "c2"

twice in a row.

the first time it does get serialization error at "c2"
but the 2nd time both "c1" and "c2" complete successfully

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-22 Thread Valentine Gogichashvili
>
>
>  PostgreSQL has a very powerful possibilities for storing any kind of
>> encoding. So maybe it makes sense to add the ENCODING as another column
>> property, the same way a COLLATION was added?
>>
>
> Some other people in this community suggested that.  ANd the SQL standard
> suggests the same -- specifying a character encoding for each column:
> CHAR(n) CHARASET SET ch.
>
>
>  Text operations should work automatically, as in memory all strings will
>> be
>> converted to the database encoding.
>>
>> This approach will also open a possibility to implement custom ENCODINGs
>> for the column data storage, like snappy compression or even BSON, gobs or
>> protbufs for much more compact type storage.
>>
>
> Thanks for your idea that sounds interesting, although I don't understand
> that well.
>
>
The idea is very simple:

CREATE DATABASE utf8_database ENCODING 'utf8';

\c utf8_database

CREATE TABLE a(
  id serial,
  ascii_data text ENCODING 'ascii', -- will use ascii_to_utf8 to read and
utf8_to_ascii to write
  koi8_data text ENCODING 'koi8_r', -- will use koi8_r_to_utf8 to read and
utf8_to_koi8_r to write
  json_data json ENCODING 'bson' -- will use bson_to_json to read and
json_to_bson to write
);

The problem with bson_to_json here is that probably it will not be possible
to write JSON in koi8_r for example. But now it is also even not considered
in these discussions.

If the ENCODING machinery would get not only the encoding name, but also
the type OID, it should be possible to write encoders for TYPEs and array
of TYPEs (I had to do it using the casts to bytea and protobuff to minimize
the size of storage for an array of types when writing a lot of data, that
could be unpacked afterwords directly in the DB as normal database types).

I hope I made my point a little bit clearer.

Regards,

Valentine Gogichashvili


Re: [HACKERS] Looking for information on our elephant

2013-09-22 Thread Oleg Bartunov
Tatsuo,

you could ask Marc about archives. Probably he has original mbox files.

Oleg


On Sun, Sep 22, 2013 at 11:18 AM, Tatsuo Ishii  wrote:

> Oleg,
>
> Unfortunately the archives seem to miss attached files. I love to see
> the attached files because they are logo images.  Any idea?
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp
>
> > Tatsuo,
> >
> > I have  emails even from 1995 !  You can see that historical message
> here:
> > http://www.pgsql.ru/db/mw/msg.html?mid=1238939
> >
> > Re: [HACKERS] PostgreSQL logo.
> > *Author:* yang( at )sjuphil( dot )sju( dot )edu
> > *Date:* 1997-04-03 20:36:33
> > *Subject:*<
> http://www.pgsql.ru/db/mw/index.html?word=Re%3A%20%5BHACKERS%5D%20PostgreSQL%20logo
> .>Re:
> > [HACKERS] PostgreSQL logo.
> >
> > Some other ideas:
> >
> > derivative: a sword (derivative of the Dragon book cover -- Postgres as
> a tool)
> > illustrative: a bowl of Alphabet Soup, with letters spelling out
> POSTGRESQL
> > obscure: a revolver/hit man (Grosse Pt is an anagram of Postgres, and an
> > abbreviation of the title of the new John Cusack movie)
> >
> > but if you want an animal-based logo, how about some sort of elephant?
> > After all, as the Agatha Christie title read, elephants can remember ...
> >
> > David yangy...@sju.edu
> >
> >
> >
> > Oleg
> >
> >
> > On Fri, Sep 20, 2013 at 3:16 AM, Tatsuo Ishii 
> wrote:
> >
> >> Hi,
> >>
> >> I'm Looking for information on our elephant: especially how/why we
> >> chose elephant as our mascot.
> >>
> >> According to:
> >> http://wiki.postgresql.org/wiki/Logo
> >>
> >> The discussion was back to 1997 on the hackers list. Unfortunately the
> >> official archive for 1997 was lost. Mine seems to be gone too. Any
> >> information will be appreciated.
> >> --
> >> Tatsuo Ishii
> >> SRA OSS, Inc. Japan
> >> English: http://www.sraoss.co.jp/index_en.php
> >> Japanese: http://www.sraoss.co.jp
> >>
> >>
> >> --
> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-hackers
> >>
>


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-22 Thread Andres Freund
On 2013-09-22 12:54:57 -0700, Peter Geoghegan wrote:
> On Sun, Sep 22, 2013 at 2:10 AM, Andres Freund  wrote:
> > I can't follow here. Why does e.g. the promise tuple approach bloat more
> > than the subxact example?
> > The protocol is roughly:
> > 1) Insert index pointer containing an xid to be waiting upon instead of
> >the target tid into all indexes
> > 2) Insert heap tuple, we can be sure there's no conflict now
> > 3) Go through the indexes and repoint the item to point to the tid of the
> >heaptuple instead of the xid.
> >
> > There's zero heap or index bloat in the uncontended case. In the
> > contended case it's just the promise tuples from 1) that are inserted
> > before the conflict is detected. Those can be marked as dead when the
> > conflict happened.
> 
> It depends on your definition of the contended case. You're assuming
> that insertion is the most probable outcome, when in fact much of the
> time updating is just as likely or even more likely. Many promise
> tuples may be inserted before actually seeing a conflict and deciding
> to update/lock for update. 

I still fail to see how that's relevant. For every index there's two
things that can happen:
a) there's a conflicting tuple. In that case we can fail at that
point/convert to an update. No Bloat.
b) there's no conflicting tuple. In that case we will insert a promise
tuple. If there's no conflict in further indexes (i.e. we INSERT), the
promise will converted to a plain tuple. If there *is* a further
conflict, you *still* need the new index tuple because by definition
(the index changed) it cannot be an HOT update. So you convert it as
well. No Bloat.

> I think that bloat that is generally cleaned up synchronously is still
> bloat

I don't think it's particularly relevant because the above will just
cause bloat in case of rollbacks and such which is nothin new, but:
I fail to fee the point of such a position.

> I think reviewer time would for now be much better spent discussing
> the patch at a higher level (along the lines of my recent mail to
> Stephen and Robert).

Yes, I plan to reply to those, I just didn't have time to do so this
weekend. There's other stuff than PG every now and then ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-22 Thread Peter Geoghegan
On Sun, Sep 22, 2013 at 2:10 AM, Andres Freund  wrote:
> I can't follow here. Why does e.g. the promise tuple approach bloat more
> than the subxact example?
> The protocol is roughly:
> 1) Insert index pointer containing an xid to be waiting upon instead of
>the target tid into all indexes
> 2) Insert heap tuple, we can be sure there's no conflict now
> 3) Go through the indexes and repoint the item to point to the tid of the
>heaptuple instead of the xid.
>
> There's zero heap or index bloat in the uncontended case. In the
> contended case it's just the promise tuples from 1) that are inserted
> before the conflict is detected. Those can be marked as dead when the
> conflict happened.

It depends on your definition of the contended case. You're assuming
that insertion is the most probable outcome, when in fact much of the
time updating is just as likely or even more likely. Many promise
tuples may be inserted before actually seeing a conflict and deciding
to update/lock for update. In order for the example in the docs to
bloat at all, both the UPDATE and the INSERT need to fail within a
tiny temporal window - that's what I mean by uncontended (it is
usually tiny because if the UPDATE blocks, that often means it will
succeed anyway, but if not the INSERT will very probably succeed).

This is because the UPDATE won't bloat when no existing row is seen,
because its subplan will return no rows. The INSERT will only bloat if
it fails, which is generally very unlikely because of the fact that
the UPDATE just did nothing. Contrast that with bloating almost every
time an UPDATE is necessary (I think that bloat that is generally
cleaned up synchronously is still bloat). That's before we even talk
about the additional overhead. Making the locks expensive to
release/clean-up could really hurt, since it appears they'll *have* to
be unlocked before row locking, and during that time concurrent
activity affecting the row to be locked can necessitate a full restart
- that's a window we want to keep as small as possible.

I think reviewer time would for now be much better spent discussing
the patch at a higher level (along the lines of my recent mail to
Stephen and Robert). I've been at least as guilty as anyone else in
getting mired in these details. We'll be much better equipped to have
this discussion afterwards, because it isn't clear to us if we really
need or would find it at all useful to have long-lasting value locks,
how frequently we'll need to retry and for what reasons, and so on.

My immediate concern as the patch author is to come up with a better
answer to the problem that Robert described [1], because "hey, I
locked the row -- you take it from here user that might not have any
version of it visible to you" is not good enough. I hope that there
isn't a tension between solving that problem and offering the
flexibility and composability of the proposed syntax.

[1] 
http://www.postgresql.org/message-id/AANLkTineR-rDFWENeddLg=grkt+epmhk2j9x0yqpi...@mail.gmail.com

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving avg performance for numeric

2013-09-22 Thread Tomas Vondra

Hi,

I've reviewed the v6 of the "numeric optimize" patch
(http://www.postgresql.org/message-id/CAFj8pRDQhG7Pqmf8XqXY0PnHfakkPQLPHnoRLJ_=ekfsboa...@mail.gmail.com),
as Pavel did some hacking on the patch and asked me to do the review.

The patch seems fine to me, the following comments are mostly nitpicking:

1) Applies cleanly to the HEAD (although only by "patch" and not "git
apply").

2) I think we should use "estimate" instead of "approximation" in the
docs, it seems more correct / natural to me (but maybe I'm wrong on this
one).

3) state_data_size does not make much sense to me - it should be
state_size. This probably comes from the state_data_type, but that's
('state' + 'data type') and by replacing the second part with 'size'
you'll get state_size.

4) currently the state size may be specified for all data types, no
matter if their size is fixed or variable. Wouldn't it be safer to allow
this option only for variable-length data types? Right now you can
specify stype=int and sspace=200 which does not make much sense to me.
We can always relax the restrictions if needed, the opposite is much
more difficult.

5) in numeric.c, there are no comments for
  - fields of the NumericAggState (some are obvious, some are not)
  - multiple functions are missing the initial comments (e.g.
numeric_accum, do_numeric_accum)

6) I think the "first" variable in do_numeric_accum is rather useless,
it's trivial to remove it - just use the state->first instead and move
the assignment to the proper branch (ok, this is nitpicking).

7) I think the error message in makeNumericAggState should be something 
like "This must not be called in non-aggregate context!" or something 
like that.


8) The records in pg_aggregate.c are using either 0 (for fixed-length) 
or 128. This seems slightly excessive to me. What is the reasoning 
behind this? Is that because of the two NumericVar fields?


regards
Tomas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] trivial one-off memory leak in guc-file.l ParseConfigFile

2013-09-22 Thread didier
Hi

fix a small memory leak in guc-file.l ParseConfigFile

AbsoluteConfigLocation() return a strdup string but it's never free or
referenced outside ParseConfigFile

Courtesy Valgrind and Noah Misch MEMPOOL work.

Regards
Didier


memory_leak_in_parse_config_file.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench progress report improvements - split 3

2013-09-22 Thread Josh Berkus
Fabien,

As long as you're hacking pgbench output, what about offering a JSON
option instead of the text output?  I'm working on automating pgbench
performance testing, and having the output in a proper delimited format
would be really helpful.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory

2013-09-22 Thread Noah Misch
On Sun, Sep 22, 2013 at 01:17:52PM +0530, Amit Kapila wrote:
> On Fri, Sep 20, 2013 at 5:14 PM, Andres Freund  wrote:
> > On 2013-09-19 11:44:34 -0400, Robert Haas wrote:
> >> On Wed, Sep 18, 2013 at 1:42 PM, Andres Freund  
> >> wrote:
> >> >> + /* Create or truncate the file. */
> >> >> + statefd = open(PG_DYNSHMEM_NEW_STATE_FILE, 
> >> >> O_RDWR|O_CREAT|O_TRUNC, 0600);
> >> >
> >> > Doesn't this need a | PG_BINARY?
> >>
> >> It's a text file.  Do we need PG_BINARY anyway?
> >
> > I'd say yes. Non binary mode stuff on windows does stuff like
> > transforming LF <=> CRLF on reading/writing, which makes sizes not match
> > up and similar ugliness.
> > Imo there's little reason to use non-binary mode for anything written
> > for postgres' own consumption.
> 
> On checking about this in code, I found the below comment which
> suggests LF<=> CRLF is not an issue (in windows it uses pgwin32_open
> to open a file):
> 
> /*
>  * NOTE:  this is also used for opening text files.
>  * WIN32 treats Control-Z as EOF in files opened in text mode.
>  * Therefore, we open files in binary mode on Win32 so we can read
>  * literal control-Z. The other affect is that we see CRLF, but
>  * that is OK because we can already handle those cleanly.
>  */

That comment appears at the definition of PG_BINARY.  You only get what it
describes when you use PG_BINARY.

> Second instance, I noticed in code as below which again suggests CRLF
> should not be an issue until file mode is specifically set to TEXT
> mode which is not the case with current usage of open in dynamic
> shared memory code.
> 
> #ifdef WIN32
> /* use CRLF line endings on Windows */
> _setmode(_fileno(fh), _O_TEXT);
> #endif

I suspect that call (in logfile_open()) has no effect.  The file is already in
text mode.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench progress report improvements - split 3

2013-09-22 Thread Fabien


Split 3 of the initial submission, which actually deal with data measured 
and reported on stderr under various options.


This version currently takes into account many comments by Noah Mish. In 
particular, the default "no report" behavior under benchmarking is not 
changed, although I really think that it should. Also, the standard 
deviation is only shown when available, which is not in under all 
settings, because of concerns expressed about the cost of additional calls 
to gettimeofday. ISTM that these concerned are misplaced in this 
particular case.



Improve pgbench measurements & progress report

These changes are coupled because measures are changed, and their 
reporting as well. Submitting separate patches for these interlinked 
features would result in conflicting or dependent patches, so I wish to 
avoid that. Also it would require more reviewer time.


 - Use same progress and quiet options both under init & bench.

   The reporting is every 5 seconds for initialization,
   and currently no report for benchmarking.

   I strongly suggest to change this default to 5 seconds for both,
   if people do not veto any change of the default behavior...
   The rational is that benchmarking is error prone as it must run
   a long time to be significant because of warmup effects (esp on HDD,
   less true on SSD). Seeing how the current performance evolve would
   help pgbench users realise that. See down below a sample output.

   The previous progress report under initialization, which printed a line
   every 100,000 rows, is removed, as it is much to verbose for most
   hardware, and pretty long for any significant scale.

 - Measure transaction latency instead of computing it,
   for --rate and --progress.

   The previous computed figure does not make sense under throttling:
   as sleep throttling time was included in the figures, the displayed
   results were plain false.

   The latency and its standard deviation are printed out under progress
   and in the final report when available.

   It could be made "always" available, but that would require to accept
   additional gettimeofday calls. I do not think that there is a
   performance issue here, but there is no concensus yet.

 - Take thread start time at the beginning of the thread (!)

   Otherwise it includes pretty slow thread/fork system start times in
   the measurements. May help with bug #8326. This also helps with throttling,
   as the start time is used to adjust the rate: if it is not the actual
   start time, there is a bit of a rush at the beginning in order to
   catch up. Taking the start time when the thread actually starts is
   the sane thing to do to avoid surprises at possibly strange measures
   on short runs.

Sample output under initialization with --progress=1

  creating tables...
  1126000 of 300 tuples (37%) done (elapsed 1.00 s, remaining 1.67 s).
  2106000 of 300 tuples (70%) done (elapsed 2.00 s, remaining 0.85 s).
  2824000 of 300 tuples (94%) done (elapsed 3.00 s, remaining 0.19 s).
  300 of 300 tuples (100%) done (elapsed 3.19 s, remaining 0.00 s).
  vacuum...
  set primary keys...
  done.

Sample output under benchmarking with --progress=1

  progress: 1.0 s, 2626.1 tps, 0.374 stddev 0.597 ms lat
  progress: 2.0 s, 2766.6 tps, 0.358 stddev 0.588 ms lat
  progress: 3.0 s, 2567.4 tps, 0.385 stddev 0.665 ms lat
  progress: 4.0 s, 3014.2 tps, 0.328 stddev 0.593 ms lat
  progress: 5.0 s, 2959.3 tps, 0.334 stddev 0.553 ms lat
  ...
  progress: 16.0 s, 5009.2 tps, 0.197 stddev 0.381 ms lat
  ...
  progress: 24.0 s, 7051.2 tps, 0.139 stddev 0.284 ms lat
  ...
  progress: 50.0 s, 6860.5 tps, 0.143 stddev 0.052 ms lat
  ...

The "warmup" is quite fast because the DB is on a SSD. In the beginning
the standard deviation is well over the average transaction time, but
when the steady state is reached (later) it is much below.

--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index ad8e272..0bdf8dd 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -167,11 +167,12 @@ char	   *index_tablespace = NULL;
 #define SCALE_32BIT_THRESHOLD 2
 
 bool		use_log;			/* log transaction latencies to a file */
-bool		use_quiet;			/* quiet logging onto stderr */
 int			agg_interval;		/* log aggregates instead of individual
  * transactions */
-int			progress = 0;   /* thread progress report every this seconds */
+int			progress = -1;  /* report every this seconds.
+   0 is quiet, -1 is not set yet */
 int progress_nclients = 0; /* number of clients for progress report */
+int progress_nthreads = 0; /* number of threads for progress report */
 bool		is_connect;			/* establish connection for each transaction */
 bool		is_latencies;		/* report per-command latencies */
 int			main_pid;			/* main process id used in log filename */
@@ -214,6 +215,8 @@ typedef struct
 	int			nvariables;
 	instr_time	txn_begin;		/* used for measuring 

Re: [HACKERS] pgbench progress report improvements - split 2

2013-09-22 Thread Fabien


Split 2 of the initial submission

pgbench: reduce and compensate throttling underestimation bias.

This is a consequence of relying on an integer random generator,
which allow to ensure that delays inserted stay reasonably in
range of the target average delay.

The bias was about 0.5% with 1000 distinct values. Multiplier added
to compensate the 0.05% bias with 1 distinct integer values,
so there is no bias now.

--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index ad8e272..572573a 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -929,13 +929,17 @@ top:
 		 * that the series of delays will approximate a Poisson distribution
 		 * centered on the throttle_delay time.
  *
- * 1000 implies a 6.9 (-log(1/1000)) to 0.0 (log 1.0) delay multiplier.
+		 * 1 implies a 9.2 (-log(1/1)) to 0.0 (log 1) delay multiplier,
+		 * and results in a 0.055 % target underestimation bias:
+		 *
+		 * SELECT 1.0/AVG(-LN(i/1.0)) FROM generate_series(1,1) AS i;
+		 * = 1.00055271703266335474
 		 *
 		 * If transactions are too slow or a given wait is shorter than
 		 * a transaction, the next transaction will start right away.
 		 */
-		int64 wait = (int64)
-			throttle_delay * -log(getrand(thread, 1, 1000)/1000.0);
+		int64 wait = (int64) (throttle_delay *
+			1.00055271703 * -log(getrand(thread, 1, 1)/1.0));
 
 		thread->throttle_trigger += wait;
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench progress report improvements - split 1

2013-09-22 Thread Fabien COELHO


Split 1 of the initial submission.

pgbench: minor update of documentation & help message.

Use NUM in help message for homogeneity with other options. The target 
*start* time of the transaction is set by the stochastic process which is 
doing the throttling (--rate), not the end time.


--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index ad8e272..06dd709 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -369,7 +369,7 @@ usage(void)
 		   "  -N, --skip-some-updates  skip updates of pgbench_tellers and pgbench_branches\n"
 		   "  -P, --progress=NUM   show thread progress report every NUM seconds\n"
 		   "  -r, --report-latencies   report average latency per command\n"
-		   "  -R, --rate=SPEC  target rate in transactions per second\n"
+		   "  -R, --rate=NUM   target rate in transactions per second\n"
 		   "  -s, --scale=NUM  report this scale factor in output\n"
 		   "  -S, --select-onlyperform SELECT-only transactions\n"
 		   "  -t, --transactions   number of transactions each client runs "
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 49a79b1..5871b45 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -429,7 +429,7 @@ pgbench  options  dbname


 The rate is targeted by starting transactions along a
-Poisson-distributed schedule time line.  The expected finish time
+Poisson-distributed schedule time line.  The expected start time
 schedule moves forward based on when the client first started, not
 when the previous transaction ended.  That approach means that when
 transactions go past their original scheduled end time, it is

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-22 Thread Sawada Masahiko
On Fri, Sep 20, 2013 at 10:33 PM, Samrat Revagade
 wrote:
>
>
>
> On Fri, Sep 20, 2013 at 3:40 PM, Sameer Thakur 
> wrote:
>>
>>
>>
>>

>>>
>>> >Attached patch combines documentation patch and source-code patch.
>>
>>
>> I have had a stab at reviewing the documentation. Have a look.
>>
>
> Thanks.
> Attached patch implements suggestions in documentation.
> But comments from Fujii-san still needs to be implemented .
> We will implement them soon.
>

I have attached the patch which modify based on Fujii-san suggested.

If synchronous_transfer is set 'data_flush', behaviour of
synchronous_transfer with synchronous_commit is

(1) synchronous_commit = on
A data flush should wait for the corresponding WAL to be
flushed in the standby

(2) synchronous_commit = remote_write
A data flush should wait for the corresponding WAL to be
written to OS in the standby.

(3) synchronous_commit = local
(4) synchronous_commit = off
A data flush should wait for the corresponding WAL to be
written locally in the master.

Even if user changes synchronous_commit value in transaction,
other process (e.g. checkpointer process) can't confirm it.
Currently patch, each processes uses locally synchronous_commit.

Regards,

---
Sawada Masahiko


synchronous_transfer_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_sleep(interval)

2013-09-22 Thread Fabien COELHO


Hello,


There is no pg_sleep(text) function and the cast is unknown->double
precision.


My mistake.

As I understand it, pg_sleep('12') currently works and would not anymore 
once your patch is applied. That is the concern raised by Robert Haas.



   ISTM that providing "pg_sleep(TEXT)" cleanly resolves the
   upward-compatibility issue raised.


I don't like this idea at all.  If we don't want to break compatibility
for callers that quote the value, I would rather the patch be rejected
outright.


That was just a suggestion, and I was trying to help. ISTM that if 
Robert's concern is not addressed one way or another, you will just get 
"rejected" on this basis.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_sleep(interval)

2013-09-22 Thread Vik Fearing
On 09/20/2013 01:59 PM, Fabien COELHO wrote:
>
> Here is a review of the pg_sleep(INTERVAL) patch version 1:

Thank you for looking at it.

>
>  - the new function is *not* tested anywhere!
>
>I would suggest simply to replace some pg_sleep(int) instances
>by corresponding pg_sleep(interval) instances in the non
>regression tests.

Attached is a rebased patch that adds a test as you suggest.

>   - some concerns have been raised that it breaks pg_sleep(TEXT)
>which currently works thanks to the implicit TEXT -> INT cast.

There is no pg_sleep(text) function and the cast is unknown->double
precision.

>
>I would suggest to add pg_sleep(TEXT) explicitely, like:
>
>  CREATE FUNCTION pg_sleep(TEXT) RETURNS VOID VOLATILE STRICT AS
>  $$ select pg_sleep($1::INTEGER) $$ LANGUAGE SQL;
>
>That would be another one liner, to update the documentation and
>to add some tests as well!
>
>ISTM that providing "pg_sleep(TEXT)" cleanly resolves the
>upward-compatibility issue raised.
>

I don't like this idea at all.  If we don't want to break compatibility
for callers that quote the value, I would rather the patch be rejected
outright.

-- 
Vik

*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 7586,7599  SELECT TIMESTAMP 'now';  -- incorrect for use with DEFAULT
 
  
 
! The following function is available to delay execution of the server
  process:
  
  pg_sleep(seconds)
  
  
  pg_sleep makes the current session's process
! sleep until seconds seconds have
  elapsed.  seconds is a value of type
  double precision, so fractional-second delays can be specified.
  For example:
--- 7586,7601 
 
  
 
! The following functions are available to delay execution of the server
  process:
  
  pg_sleep(seconds)
+ pg_sleep(interval)
  
  
  pg_sleep makes the current session's process
! sleep until seconds seconds (or the specified
! interval) have
  elapsed.  seconds is a value of type
  double precision, so fractional-second delays can be specified.
  For example:
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***
*** 3017,3022  DATA(insert OID = 2625 ( pg_ls_dir			PGNSP PGUID 12 1 1000 0 0 f f f f t t v 1 0
--- 3017,3024 
  DESCR("list all files in a directory");
  DATA(insert OID = 2626 ( pg_sleep			PGNSP PGUID 12 1 0 0 0 f f f f t f v 1 0 2278 "701" _null_ _null_ _null_ _null_ pg_sleep _null_ _null_ _null_ ));
  DESCR("sleep for the specified time in seconds");
+ DATA(insert OID = 3179 ( pg_sleep			PGNSP PGUID 14 1 0 0 0 f f f f t f v 1 0 2278 "1186" _null_ _null_ _null_ _null_ "select pg_sleep(extract(epoch from now() + $1) - extract(epoch from now()))" _null_ _null_ _null_ ));
+ DESCR("sleep for the specified interval");
  
  DATA(insert OID = 2971 (  textPGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "16" _null_ _null_ _null_ _null_ booltext _null_ _null_ _null_ ));
  DESCR("convert boolean to text");
*** a/src/test/regress/expected/stats.out
--- b/src/test/regress/expected/stats.out
***
*** 18,24  SET enable_indexscan TO on;
  SET enable_indexonlyscan TO off;
  -- wait to let any prior tests finish dumping out stats;
  -- else our messages might get lost due to contention
! SELECT pg_sleep(2.0);
   pg_sleep 
  --
   
--- 18,24 
  SET enable_indexonlyscan TO off;
  -- wait to let any prior tests finish dumping out stats;
  -- else our messages might get lost due to contention
! SELECT pg_sleep(interval '2 seconds');
   pg_sleep 
  --
   
*** a/src/test/regress/sql/stats.sql
--- b/src/test/regress/sql/stats.sql
***
*** 16,22  SET enable_indexonlyscan TO off;
  
  -- wait to let any prior tests finish dumping out stats;
  -- else our messages might get lost due to contention
! SELECT pg_sleep(2.0);
  
  -- save counters
  CREATE TEMP TABLE prevstats AS
--- 16,22 
  
  -- wait to let any prior tests finish dumping out stats;
  -- else our messages might get lost due to contention
! SELECT pg_sleep(interval '2 seconds');
  
  -- save counters
  CREATE TEMP TABLE prevstats AS

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench progress report improvements

2013-09-22 Thread Fabien COELHO



It is also printed without --rate. There is a "if" above because there is
one report with "lag" (under --rate), and one without.


The code I quoted is for the final report in printResults(), and that only
shows latency mean/stddev when using --rate.  The progress reporting in
threadRun() does have two variations as you describe.


Indeed, I took it for the progress report. I'll check. It must be consistent 
whether under --rate or not.


I checked.

The current status is that stddev is currently only available under 
--progress or --rate, and displayed in the final report if --rate.


I can add stddev in the final report *if* progress was on, but not 
otherwise.


This means that under standard benchmarking (no --rate and without 
progress if it is not enabled by default) stddev cannot be shown.


This is consistent with your requirement that gettimeofday calls must be 
avoided, but results in this necessary inconsistency in reporting results.


The alternative is to always measure, but that would imply to always call 
gettimeofday, and I understood that it is a nogo for you. I think that 
your performance concerns are needless:


I confirm a 50,000,000 gettimeofday calls/s on my laptop. On a desktop 
workstation I have 43 millions calls/s. The slowest I have found on an old 
server is 3.3 millions calls/s, that is 0.3 µs per call.


Even so, this is to be compared to the fastest possible transaction 
latency I got which is about 140 µs (one local prepared in-memory table 
select on my laptop), so the overhead 1/450 of the minimum possible 
transaction time, pretty negligeable in my opinion. For transaction times 
which involve disk accesses, the latency is in ms and the overhead well 
under 1/1000.


There are some measures about gettimeofday overhead reported in:

  http://www.atl.external.lmco.com/projects/QoS/POSIX_html/index.html

The worst figure is for a Pentium 166 MHz (!), the overhead is 86.6 µs. 
However typical values are around or under 1 µs, even for old Pentium II 
350 MHz PCs, and this is less that 1% of our minimum possible transaction 
time.


Note also that under option --report-latencies, the number of gettimeofday 
calls is pretty large, and nobody complained.


So I'm of the opinion that we should not worry at all about the number of 
gettimeofday calls in pgbench.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-09-22 Thread Andres Freund
Hi,

I don't have time to answer the other emails today (elections,
climbing), but maybe you could clarify the below?

On 2013-09-21 17:07:11 -0700, Peter Geoghegan wrote:
> On Sun, Sep 15, 2013 at 8:23 AM, Andres Freund  wrote:
> >> I'll find it very difficult to accept any implementation that is going
> >> to bloat things even worse than our upsert looping example.
> >
> > How would any even halfway sensible example cause *more* bloat than the
> > upsert looping thing?
> 
> I was away in Chicago over the week, and didn't get to answer this.
> Sorry about that.
> 
> In the average/uncontended case, the subxact example bloats less than
> all alternatives to my design proposed to date (including the "unborn
> heap tuple" idea Robert mentioned in passing to me in person the other
> day, which I think is somewhat similar to a suggestion of Heikki's
> [1]). The average case is very important, because in general
> contention usually doesn't happen.

I can't follow here. Why does e.g. the promise tuple approach bloat more
than the subxact example?
The protocol is roughly:
1) Insert index pointer containing an xid to be waiting upon instead of
   the target tid into all indexes
2) Insert heap tuple, we can be sure there's no conflict now
3) Go through the indexes and repoint the item to point to the tid of the
   heaptuple instead of the xid.

There's zero heap or index bloat in the uncontended case. In the
contended case it's just the promise tuples from 1) that are inserted
before the conflict is detected. Those can be marked as dead when the
conflict happened.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench progress report improvements

2013-09-22 Thread Fabien COELHO


Dear Noah,

Thanks for your answers and remarks,

[...]

I'll split some part of the patch where there is no coupling, but I do 
not want to submit conflicting patches.



Those benefits aren't compelling enough to counterbalance the risk of
gettimeofday() overhead affecting results.  (Other opinions welcome.)


Yep. If I really leave gettimeofday out, I cannot measure the stddev, so 
I'll end up with:


 --rate=> gettimeofday, mean (& stddev) measured, because they
cannot be derived otherwise.

 no --rate => no (or less) gettimeofday, mean computed from total time and
no stddev report because it cannot be computed.

That is annoying, because I do want the standard deviation, and this mean 
one "if"s complexity here and there.


ISTM that when running under a target time, the (hopefully very small) 
overhead is only one gettimeofday call, because the other one is taken 
anyway to check whether the thread should stop.


Or I can add a yet another option, say --stddev, to ask for standard 
deviation, which will imply the additional gettimeofday call...


For a tool like pgbench that requires considerable skill to use 
effectively, changing a default only helps slightly.  It doesn't take 
much of a risk to make us better off leaving the default unchanged.


I can put a 0 default... but even very experienced people will be bitten 
over and over. Why should I care? ISTM that the default should be the

safe option, and experienced user can request "-quiet" if they want it.


[...]
I tried to preserve the row-counting behavior because I thought that
someone would object otherwise, but I would be really in favor of
dropping the row-counting report behavior altogether and only keep the
time triggered report.


I would be fine with dropping the row-counting behavior.  But why subject this
distant side issue to its third round of bikeshedding in 18 months?


I was not involved:-)

The 10 behavior is the initial & old version, and only applies to 
initialization. Someone found it too verbose when scaling, and I agree, so 
made a quick patch which preserves the old behavior (someone must have 
said: whatever, do not change the default!) but allowed to switch to a 
less noisy version, hence the -quiet which is not really quiet. This would 
be the defective result of a compromise:-)


If I follow your request not to change any default, I cannot merge cleanly 
the -i & bench behaviors, as currenty -i does have a default progress 
report and its own -quiet, and benchmarking does not.


The current behavior is inconsistent. I would prefer something consistent, 
preferably always show a progress report, and -quiet hides it (fully), or 
if you really insist no progress report, and --progress shows it, and the 
-quiet option is removed.



 - Take thread start time at the beginning of the thread.


That theory is sound, but I would like at least one report reproducing that
behavior and finding that this change improved it.


[...] so I'm inclined to leave it alone.


I really spent *hours* debugging stupid measures on the previous round of 
pgbench changes, when adding the throttling stuff. Having the start time 
taken when the thread really starts is just sanity, and I needed that just 
to rule out that it was the source of the "strange" measures.


-j 800 vs -j 100 : ITM that if I you create more threads, the time delay 
incurred is cumulative, so the strangeness of the result should worsen.
800 threads ~ possibly 10 seconds of creation time, to be compared to a 
few seconds of run time.



 Shouldn't it just be:

int64 wait = (int64) (throttle_delay *
-log(1 - pg_erand48(thread->random_state)));

[...]

Ah; that makes sense.  Then I understand why you want to remove the bias, but
why do you also increase the upper bound?


Because the bias was significantly larger for 1000 (about 0.5%), so this 
also contributed to reduce said bias, and 9.2 times the average target 
seems as reasonnable a bound as 6.9.




It is also printed without --rate. There is a "if" above because there is
one report with "lag" (under --rate), and one without.


The code I quoted is for the final report in printResults(), and that only
shows latency mean/stddev when using --rate.  The progress reporting in
threadRun() does have two variations as you describe.


Indeed, I took it for the progress report. I'll check. It must be 
consistent whether under --rate or not.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory

2013-09-22 Thread Amit Kapila
On Fri, Sep 20, 2013 at 5:14 PM, Andres Freund  wrote:
> Hi,
>
>
> On 2013-09-19 11:44:34 -0400, Robert Haas wrote:
>> On Wed, Sep 18, 2013 at 1:42 PM, Andres Freund  
>> wrote:
>
>> >> --- /dev/null
>> >> +++ b/src/backend/storage/ipc/dsm.c
>> >> +#define PG_DYNSHMEM_STATE_FILE   PG_DYNSHMEM_DIR 
>> >> "/state"
>> >> +#define PG_DYNSHMEM_NEW_STATE_FILE   PG_DYNSHMEM_DIR "/state.new"
>> >
>> > Hm, I guess you dont't want to add it  to global/ or so because of the
>> > mmap implementation where you presumably scan the directory?
>>
>> Yes, and also because I thought this way would make it easier to teach
>> things like pg_basebackup (or anybody's home-brew scripts) to just
>> skip that directory completely.  Actually, I was wondering if we ought
>> to have a directory under pgdata whose explicit charter it was to
>> contain files that shouldn't be copied as part of a base backup.
>> pg_do_not_back_this_up.
>
> Wondered exactly about that as soon as you've mentioned
> pg_basebackup. pg_local/?
>
>> >> + /* Determine size for new control segment. */
>> >> + maxitems = PG_DYNSHMEM_FIXED_SLOTS
>> >> + + PG_DYNSHMEM_SLOTS_PER_BACKEND * MaxBackends;
>> >
>> > It seems likely that MaxConnections would be sufficient?
>>
>> I think we could argue about the best way to set this until the cows
>> come home, but I don't think it probably matters much at this point.
>> We can always change the formula later as we gain experience.
>> However, I don't have a principled reason for assuming that only
>> user-connected backends will create dynamic shared memory segments.
>
> Hm, yes. I had MaxBackends down as MaxConnections + autovacuum stuff;
> but max_worker_processes are in there now, so you're right that doesn't
> make sense.
>
>> >> +/*
>> >> + * Read and parse the state file.
>> >> + *
>> >
>> > Perhaps CRC32 the content?
>>
>> I don't see the point.  If the file contents are garbage that happens
>> to look like a number, we'll go "oh, there isn't any such segment" or
>> "oh, there is such a segment but it doesn't look like a control
>> segment, so forget it".   There are a lot of things we really ought to
>> be CRCing to avoid corruption risk, but I can't see how this is
>> remotely one of them.
>
> I was worried about a partially written file or containing contents from
> two different postmaster cycles, but it's actually far too small for
> that...
> I initially had thought you'd write the contents of the entire shared
> control segment there, not just it's id.
>
>> >> + /* Create or truncate the file. */
>> >> + statefd = open(PG_DYNSHMEM_NEW_STATE_FILE, O_RDWR|O_CREAT|O_TRUNC, 
>> >> 0600);
>> >
>> > Doesn't this need a | PG_BINARY?
>>
>> It's a text file.  Do we need PG_BINARY anyway?
>
> I'd say yes. Non binary mode stuff on windows does stuff like
> transforming LF <=> CRLF on reading/writing, which makes sizes not match
> up and similar ugliness.
> Imo there's little reason to use non-binary mode for anything written
> for postgres' own consumption.

On checking about this in code, I found the below comment which
suggests LF<=> CRLF is not an issue (in windows it uses pgwin32_open
to open a file):

/*

 * NOTE:  this is also used for opening text files.

 * WIN32 treats Control-Z as EOF in files opened in text mode.

 * Therefore, we open files in binary mode on Win32 so we can read

 * literal control-Z. The other affect is that we see CRLF, but

 * that is OK because we can already handle those cleanly.

 */

Second instance, I noticed in code as below which again suggests CRLF
should not be an issue until file mode is specifically set to TEXT
mode which is not the case with current usage of open in dynamic
shared memory code.

#ifdef WIN32

/* use CRLF line endings on Windows */

_setmode(_fileno(fh), _O_TEXT);

#endif

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Looking for information on our elephant

2013-09-22 Thread Tatsuo Ishii
Oleg,

Unfortunately the archives seem to miss attached files. I love to see
the attached files because they are logo images.  Any idea?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

> Tatsuo,
> 
> I have  emails even from 1995 !  You can see that historical message here:
> http://www.pgsql.ru/db/mw/msg.html?mid=1238939
> 
> Re: [HACKERS] PostgreSQL logo.
> *Author:* yang( at )sjuphil( dot )sju( dot )edu
> *Date:* 1997-04-03 20:36:33
> *Subject:*Re:
> [HACKERS] PostgreSQL logo.
> 
> Some other ideas:
> 
> derivative: a sword (derivative of the Dragon book cover -- Postgres as a 
> tool)
> illustrative: a bowl of Alphabet Soup, with letters spelling out POSTGRESQL
> obscure: a revolver/hit man (Grosse Pt is an anagram of Postgres, and an
> abbreviation of the title of the new John Cusack movie)
> 
> but if you want an animal-based logo, how about some sort of elephant?
> After all, as the Agatha Christie title read, elephants can remember ...
> 
> David yangy...@sju.edu
> 
> 
> 
> Oleg
> 
> 
> On Fri, Sep 20, 2013 at 3:16 AM, Tatsuo Ishii  wrote:
> 
>> Hi,
>>
>> I'm Looking for information on our elephant: especially how/why we
>> chose elephant as our mascot.
>>
>> According to:
>> http://wiki.postgresql.org/wiki/Logo
>>
>> The discussion was back to 1997 on the hackers list. Unfortunately the
>> official archive for 1997 was lost. Mine seems to be gone too. Any
>> information will be appreciated.
>> --
>> Tatsuo Ishii
>> SRA OSS, Inc. Japan
>> English: http://www.sraoss.co.jp/index_en.php
>> Japanese: http://www.sraoss.co.jp
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers