Re: [HACKERS] Autonomous transaction

2010-04-06 Thread pg
 It would be useful to have a relation such that all dirtied buffers got 
written out even for failed transactions (barring a crash) and such that 
read-any-undeleted were easy to do, despite the non-ACIDity. The overhead of a 
side transaction seems overkill for such things as logs or advisory relations, 
and non-DB files would be harder to tie in efficiently to DB activity. A side 
transaction would still have to be committed in order to be useful; either 
you're committing frequently (ouch!), or you risk failing to commit just as you 
would the main transaction.

David Hudson

-Original Message-
From: Loïc Vaumerel [mailto:she...@gmail.com]
Sent: Sunday, April 4, 2010 10:26 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] Autonomous transaction

Hi,



I have an application project based on a database.
I am really interested in using PostgreSQL.


I have only one issue, I want to use autonomous transactions to put in place a 
debug / logging functionality.
To do so, I insert messages in a debug table.
The problem is, if the main transaction / process rollback, my debug message 
insert will be rolled back too.
This is not the behavior I wish.


I need a functionality with the same behavior than the Oracle PRAGMA 
AUTONOMOUS_TRANSACTION one.
I have searched for it in the documentation and on the net, unfortunately 
nothing. (maybe I missed something)


I just found some posts regarding this :
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00893.php
https://labs.omniti.com/trac/pgtreats/browser/trunk/autonomous_logging_tool
... and some others ...


All solutions I found are working the same way : they use dblink.
I consider these solution more as handiwork than a clean solution.
I am a little bit concerned about side effects as dblink were not intially 
designed for this.


So my questions :
Is there a way to use real and clean autonomous transactions in PostgreSQL yet ?
If no, is it planned to do so ? When ?


Thanks in advance


Best regards


Shefla



Re: [HACKERS] Bloom filters bloom filters bloom filters

2010-01-20 Thread pg
  Then your union operation is to just bitwise or the two bloomfilters.

Keep in mind that when performing this sort of union between two 
comparably-sized sets, your false-positive rate will increase by about an order 
of magnitude. You need to size your bloom filters accordingly, or perform the 
union differently. Intersections, however, behave well.

There is a similar problem, among others, with expanding smaller filters to 
match larger ones.

David Hudson





Re: [HACKERS] Syntax for partitioning

2009-10-30 Thread pg
  PARTITION BY RANGE ( a_expr )
 ...
 PARTITION BY HASH ( a_expr )
 PARTITIONS num_partitions;

 Unless someone comes up with a maintenance plan for stable hashfunctions, we 
 should probably not dare look into this yet.

What would cover the common use case of per-day quals and drops over an 
extended history period, say six or nine months? You don't get quite the same 
locality of reference, generally, with an unpartitioned table, due to slop in 
the arrival of rows. Ideally, you don't want to depend on an administrator, or 
even an administrative script, to continually intervene in the structure of a 
table, as would be the case with partitioning by range, and you don't want to 
coalesce multiple dates, as an arbitrary hash might do. What the administrator 
would want would be to decide what rows were too old to keep, then process 
(e.g. archive, summarize, filter) and delete them.

Suppose that the number of partitions were taken as a hint rather than as a 
naming modulus, and that any quasi-hash function had to be specified explicitly 
(although storage assignment could be based on a hash of the quasi-hash 
output). If a_expr were allowed to include a to-date conversion of a timestamp, 
day-by-day partitioning would fall out naturally. If, in addition, 
single-parameter (?) functions were characterized as range-preserving and 
order-preserving, plan generation could be improved for time ranges on 
quasi-hash-partitioned tables, without a formal indexing requirement.

There are cases where additional partition dimensions would be useful, for 
eventual parallelized operation on large databases, and randomizing quasi-hash 
functions would help. IMHO stability is not needed, except to the extent that 
hash functions have properties that lend themselves to plan generation and/or 
table maintenance.

It is not clear to me what purpose there would be in dropping a partition. This 
would be tantamount to deleting all of the rows in a partition, if it were 
analogous to dropping a table, and would require some sort of compensatory 
aggregation of existing partitions (in effect, a second partitioning 
dimension), if it were merely structural.

Perhaps I'm missing something here.

David Hudson





Re: [HACKERS] Unicode Normalization

2009-09-24 Thread pg
In a context using normalization, wouldn't you typically want to store a 
normalized-text type that could perhaps (depending on locale) take advantage of 
simpler, more-efficient comparison functions? Whether you're doing 
INSERT/UPDATE, or importing a flat text file, if you canonicalize characters 
and substrings of identical meaning when trivial distinctions of encoding are 
irrelevant, you're better off later. User-invocable normalization functions by 
themselves don't make much sense. (If Postgres now supports binary- or 
mixed-binary-and-text flat files, perhaps for restore purposes, the same thing 
applies.)

David Hudson




Re: [HACKERS] [PATCH v4] Avoid manual shift-and-test logic in AllocSetFreeIndex

2009-07-21 Thread pg
 Normally I'd try a small lookup table (1-byte index to 1-byte value) in this 
case. But if the bitscan instruction were even close in performance, it'd be 
preferable, due to its more-reliable caching behavior; it should be possible to 
capture this at code-configuration time (aligned so as to produce an optimal 
result for each test case; see below).

The specific code for large-versus-small testing would be useful; did I 
overlook it?

Note that instruction alignment with respect to words is not the only potential 
instruction-alignment issue. In the past, when optimizing code to an extreme, 
I've run into cache-line issues where a small change that should've produced a 
small improvement resulted in a largish performance loss, without further work. 
Lookup tables can have an analogous issue; this could, in a simplistic test, 
explain an anomalous large-better-than-small result, if part of the large 
lookup table remains cached. (Do any modern CPUs attempt to address this??) 
This is difficult to tune in a multiplatform code base, so the numbers in a 
particular benchmark do not tell the whole tale; you'd need to make a judgment 
call, and perhaps to allow a code-configuration override.

David Hudson





Re: [HACKERS] Improving the ngettext() patch

2009-06-05 Thread pg
(Grrr, declension, not declination.)

 Plural-Forms: nplurals=3; plural=n%10==1  n%100!=11 ? 0 :n%10=2  
 n%10=4  (n%10010 ||n%100=20) ? 1 : 2;\n

Thanks. The above (ignoring backslash-EOL) is the form recommended for Russian 
(inter alia(s)) in the Texinfo manual for gettext (info gettext). FWIW this 
might be an alternative:

Plural-Forms: nplurals=3; plural=((n - 1) % 10) = (5-1) || (((n - 1) % 100) 
= (14-1)  ((n - 1) % 100) = (11 - 1)) ? 2 : ((n - 1) % 10) == (1 - 1) ? 0 : 
1;\n

David Hudson




Re: [HACKERS] Improving the ngettext() patch

2009-06-04 Thread pg
 Russian plural forms for 100, 101, 102 etc. is different, as for 0, 1, 2.

True. The rule IIRC is that except for 11-14 and for collective numerals, 
declination follows the last digit.

It would be possible to generalize declination via a language-specific 
message-selector function, especially if the number of numerical complements 
were limited to 1.

How awkward would it be to re-word the style of messages to avoid declination? 
For example, the Russian equivalent of X rows could be something like #rows 
-- X.

David Hudson




Re: realloc overhead (was [HACKERS] Multiple sorts in a query)

2009-05-20 Thread pg
 So at least transiently we use 3x the size of the actual array.
 I was conjecturing, prior to investigation. Are you saying you know 
 this/have seen this already?
 Well I'm just saying if you realloc a x kilobyte block into a 2x block and 
 the allocator can't expand it and has to copy then it seems inevitable.

FYI the malloc()/realloc()/free() on FC4 causes memory fragmentation, and thus 
a long-term growth in process memory, under some circumstances. ?This, together 
with the power-of-two allocations in aset.c not accounting for malloc() 
overhead (not that they could), implies that memory contexts can cause 
fragmentation, more slowly, too.

Reallocations of smallish blocks from memory contexts tend to use memory 
already withheld from the OS; a transient increase in memory usage is possible, 
but unlikely to matter. ?Perhaps something should be done about larger blocks.

David Hudson