Re: [HACKERS] Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

2011-07-14 Thread Radosław Smogura

On Thu, 14 Jul 2011 15:15:33 +0300, Peter Eisentraut wrote:

On sön, 2011-07-10 at 11:40 -0700, Josh Berkus wrote:

Hackers,

 B. 6. Current behaviour _is intended_ (there is if  to check 
node type) and _natural_. In this particular case user ask for text 
content of some node, and this content is actually .


 I don't buy that. The check for the node type is there because
 two different libxml functions are used to convert nodes to
 strings. The if has absolutely *zero* to do with escaping, expect
 for that missing escape_xml() call in the else case.

 Secondly, there is little point in having an type XML if we
 don't actually ensure that values of that type can only contain
 well-formed XML.

Can anyone else weigh in on this? Peter?


Looks like a good change to me.
I'll bump it in few hours, as I can't recall password from keyring. Now 
I have hands clean and it's not my business to care about this.


Best regards.
Radek.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

2011-07-13 Thread Radosław Smogura

On Tue, 12 Jul 2011 11:11:40 -0700, Josh Berkus wrote:

Radoslaw,

For me this discussion is over. I putted my objections and 
suggestions. Full
review is available in archives, and why to not escape is putted in 
review

of your 2nd patch, about scalar values.


Did you install and test the functionality of the patch?  I can't 
tell

from your review whether you got that far.


Yas, patch was tested, and applied. I think in both reviews I marked 
that patch does this what Florain said. Examples of double escaping were 
taken from one of patch (I think from this about scalar values).


If I should I will bump this patches up.

Regards,
Radosław Smogura

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

2011-07-12 Thread Radosław Smogura

On Sun, 10 Jul 2011 17:06:22 -0500, Robert Haas wrote:

On Jul 10, 2011, at 1:40 PM, Josh Berkus j...@agliodbs.com wrote:


Hackers,

B. 6. Current behaviour _is intended_ (there is if  to check 
node type) and _natural_. In this particular case user ask for 
text content of some node, and this content is actually .


I don't buy that. The check for the node type is there because
two different libxml functions are used to convert nodes to
strings. The if has absolutely *zero* to do with escaping, expect
for that missing escape_xml() call in the else case.

Secondly, there is little point in having an type XML if we
don't actually ensure that values of that type can only contain
well-formed XML.


Can anyone else weigh in on this? Peter?


Unless I am missing something, Florian  is clearly correct here.

...Robert
For me not, because this should be fixed internally by making xml type 
sefe, currently xml type may be used to keep proper XMLs and any kind of 
data, as well.


If I ask, by any means select xpath(/text(...)). I want to get 
text.
1) How I should descape node in client application (if it's part of xml 
I don't have header), bear in mind XML must give support for streaming 
processing too.
2) Why I should differntly treat text() then select from varchar in 
both I ask for xml, driver can't make this, because it doesn't know if 
it gets scalar, text, comment, element, or maybe document.
3) What about current applications, folks probably uses this and are 
happy they get text, and will not see, that next release of PostgreSQL 
will break their applications.


There is of course disadvantage of current behaviour as it may lead to 
inserting badly xmls (in one case), but I created example when auto 
escaping will create double escaped xmls, and may lead to insert 
inproper data (this is about 2nd patch where Florian add escaping, too).


SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM 
(SELECT
(XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES 
(XMLELEMENT(name
root, XMLATTRIBUTES('n' AS xmlns, 'v' AS value),'t'))) v(x)) as 
foo;


   xmlelement
-
 root sth=amp;lt;n/

It can't be resolved without storing type in xml or adding xmltext or 
adding pseudo xmlany element, which will be returned by xpath.


Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

2011-07-12 Thread Radosław Smogura

On Tue, 12 Jul 2011 11:45:59 +0200, Florian Pflug wrote:

On Jul12, 2011, at 11:00 , Radosław Smogura wrote:

On Sun, 10 Jul 2011 17:06:22 -0500, Robert Haas wrote:

Unless I am missing something, Florian  is clearly correct here.
For me not, because this should be fixed internally by making xml 
type sefe


Huh??. Making the xml type safe is *exactly* what I'm trying to do 
here...


currently xml type may be used to keep proper XMLs and any kind of 
data, as well.


As I pointed out before, that simply isn't true. Try storing
non-well-formed data into an XML column (there *are* ways to do
that, i.e. there are bugs, one if which I'm trying to fix here!)
and then dump and (try to) reload your database. Ka-wom!

If I ask, by any means select xpath(/text(...)). I want to get 
text.


And I want '3' || '4' to return the integer 34. Though luck! The fact 
that
XPATH() is declared to return XML, *not* TEXT means you don't get 
what you
want. Period. Feel free to provide a patch that adds a function 
XPATH_TEXT

if you feel this is an issue.

XML *is* *not* simply an alias for TEXT! It's a distinct type, which 
its
down distinct rules about what constitutes a valid value and what 
doesn't.


1) How I should descape node in client application (if it's part of 
xml I don't have header), bear in mind XML must give support for 
streaming processing too.


Huh?

2) Why I should differntly treat text() then select from varchar in 
both I ask for xml, driver can't make this, because it doesn't know if 
it gets scalar, text, comment, element, or maybe document.


3) What about current applications, folks probably uses this and are 
happy they get text, and will not see, that next release of PostgreSQL 
will break their applications.


That, and *only* that, I recognize as a valid concern. However, and 
*again*
as I have pointer out before a *multiple* of times, backwards 
compatibility
is no excuse not to fix bugs. Plus, there might just as well be 
applications
which feed the contents of XML columns directly into a XML parser (as 
they
have every right to!) and don't expect that parser to throw an error. 
Which,
as it stands, we cannot guarantee. Having to deal with an error there 
is akin

to having to deal with integer columns containing 'foobar'!


Bugs must be resolved in smart way, especially if they changes 
behaviour, with consideration of impact change will produce, removing 
support for xml resolves this bug as well. I've said problem should be 
resolved in different way.


There is of course disadvantage of current behaviour as it may lead 
to inserting badly xmls (in one case), but I created example when auto 
escaping will create double escaped xmls, and may lead to insert 
inproper data (this is about 2nd patch where Florian add escaping, 
too).


SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) 
FROM (SELECT
(XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES 
(XMLELEMENT(name
root, XMLATTRIBUTES('n' AS xmlns, 'v' AS value),'t'))) v(x)) as 
foo;


  xmlelement
-
root sth=amp;lt;n/


Radosław, you've raised that point before, and I refuted it. The 
crucial
difference is that double-escaped values are well-formed, where as 
un-escaped

ones are not.

Again, as I said before, the double-escaping done by XMLATTRIBUTES 
there is
not pretty. But its *not* XPATH()'s fault!. To see that, simply 
replace your

XPATH() expression with 'lt;n'::xml to see that.

And in fact

It can't be resolved without storing type in xml or adding xmltext 
or adding pseudo xmlany element, which will be returned by xpath.


Huh?

Frankly, Radosław, I get the feeling that you're not trying to 
understand my
answers to your objections, but instead keep repeating the same 
assertions

over and over again. Even though at least some of them, like XML
being able to
store arbitrary values, are simply wrong! And I'm getting pretty
tired of this...
So far, you also don't seem to have taken a single look at the actual
implementation of the patch, even though code review is an supposed 
to be an

integral part of the patch review process. I therefore don't believe
that we're
getting anywhere here.
So far, you don't know if I taken a single look, your suspicious are 
wrong, and You try to blame me. All of your sentences about do not 
understanding I may sent to you, and blame you with your words.



So please either start reviewing the actual implementation, and leave
the considerations about whether we want this or not to the eventual
committer.
Or, if you don't want to do that for one reason or another, pleaser 
consider
letting somebody else take over this review, i.e. consider removing 
your name

from the Reviewer field.


If I do review I may put my comments, but I get the feeling that 
you're not trying to understand my
answers to your objections, but instead keep repeating the same 
assertions over and over again. - and in patch there is review of code.
So please either

Re: [HACKERS] Crash dumps

2011-07-07 Thread Radosław Smogura
Craig Ringer cr...@postnewspapers.com.au Thursday 07 of July 2011 01:05:48
 On 6/07/2011 11:00 PM, Radosław Smogura wrote:
  I think IPC for fast shout down all backends and wait for report
  processing is quite enaugh.
 
 How do you propose to make that reliable, though?
 
 --
 Craig Ringer
 
 POST Newspapers
 276 Onslow Rd, Shenton Park
 Ph: 08 9381 3088 Fax: 08 9388 2258
 ABN: 50 008 917 717
 http://www.postnewspapers.com.au/

I want to add IPC layer to postgresql, few approches may be considerable, 
1. System IPC
2. Urgent data on socket
3. Sockets (at least descriptors) + threads
4. Not portable, fork in segfault (I think forked process should start in 
segfault too).

I actualy think for 3. sockets (on Linux pipes) + threads will be the best and 
more portable, for each backend PostgreSQL will open two channels urgent and 
normal, for each channel a thread will be spanned and this thread will just 
wait for data, backend will not start if it didn't connected to postmaster. 
Some security must be accounted when opening plain socket.

In context of crash, segfault sends information on urgent channel, and 
postmaster kills all backends except sender, giving it time to work in 
segfault.

Normal channels may, be used for scheduling some async operations, like read 
next n-blocks when sequence scan started.

By the way getting reports on segfault isn't something unusal, Your favorite 
software Java(tm) Virtual Machine does it.

Regards,
Radosław Smogura




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash dumps

2011-07-06 Thread Radosław Smogura

On Wed, 06 Jul 2011 07:59:12 +0800, Craig Ringer wrote:

On 5/07/2011 9:05 PM, Magnus Hagander wrote:
On Tue, Jul 5, 2011 at 15:02, Robert Haasrobertmh...@gmail.com  
wrote:

On Mon, Jul 4, 2011 at 12:47 PM, Radosław Smogura
rsmog...@softperience.eu  wrote:
I asked about crash reports becaus of at this time there was 
thread about

crashing in live system.


Yeah, I thought this was the result of that effort:


[snip]


That crash dump is basically the windows equivalent of a coredump,
though. Just a different name...


Yup, it's a cut-down core dump. In this case generated in-process by
the crashing backend.

It'd be nice to be able to generate the crash dump from
out-of-process. Unfortunately, the automatic crash dump generation
system on Windows doesn't appear to be available to system services
running non-interactively. Not that I could see, anyway. As a result
we had to trap the crashes within the crashing process and generate
the dump from there. As previously stated, doing anything within a
segfaulting process is unreliable, so it's not the best approach in
the world.

All I was saying in this thread is that it'd be nice to have a way
for a crashing backend to request that another process capture
diagnostic information from it before it exits with a fault, so it
doesn't have to try to dump its self.

As Tom said, though, anything like that is more likely to decrease
the reliability of the overall system. You don't want a dead backend
hanging around forever waiting for the postmaster to act on it, and
you *really* don't want other backends still alive and potentially
writing from shm that's in in who-knows-what state while the
postmaster is busy fiddling with a crashed backend.

So, overall, I think dump a simple core and die as quickly as
possible is the best option. That's how it already works on UNIX, 
and

all the win32 crash dump patches do is make it work on Windows too.

--
Craig Ringer

POST Newspapers
276 Onslow Rd, Shenton Park
Ph: 08 9381 3088 Fax: 08 9388 2258
ABN: 50 008 917 717
http://www.postnewspapers.com.au/


Personally I will not send core dump to anyone, core dump may not only 
contain sensible information from postmaster, but from other application 
too.
Btw, I just take core dump form postmaster, I found there some dns 
addresses I connected before from bash. Postamster should not see it.


I think IPC for fast shout down all backends and wait for report 
processing is quite enaugh.


Regards,
Radosław Smogura

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash dumps

2011-07-04 Thread Radosław Smogura

On Mon, 04 Jul 2011 12:58:46 +0800, Craig Ringer wrote:

On 15/06/2011 2:37 AM, Radosław Smogura wrote:

Hello,

Because, I work a little bit on streaming protocol and from time to 
time
I have crashes. I want ask if you wont crash reporting (this is one 
of

minors products from mmap playing) those what I have there is mmaped
areas, and call stacks, and some other stuff.


Core files already contain all that, don't they? They omit shared
memory segments by default on most platforms, but should otherwise be
quite complete.

The usual approach on UNIXes and linux is to use the built-in OS
features to generate a core dump of a crashing process then analyze 
it
after the fact. That way the crash is over as fast as possible and 
you
can get services back up and running before spending the time, CPU 
and

I/O required to analyze the core dump.


Actually this, what I was thinking about was, to add dumping of GUC, 
etc. List of mappings came from when I tired to mmap PostgreSQL, and due 
to many of errors, which sometimes occurred in unexpected places, I was 
in need to add something that will be useful for me and easy to analyse 
(I could simple find pointer, and then check which region failed). The 
idea to try to evolve this come later.


I think report should looks like:
This is crash report of PostgreSQL database, generated on
Here is list of GUC variables:
Here is list of files:
Here is backtrace:
Here is detailed backtrace:
Here is list of m-mappings (you may get what library are linked in)
Here is your free memory
Here is your disk usage
Here is your custom addition


This based reports works
for Linux with gdb, but there is some pluggable architecture, which
connects with segfault


Which process does the debugging? Does the crashing process fork() a
copy of gdb to debug its self?

One thing I've been interested in is giving the postmaster (or more
likely a helper for the postmaster) the ability to handle backend is
crashing messages, attach a debugger to the crashing backend and
generate a dump and/or backtrace. This might be workable in cases
where in-process debugging can't be done due to a smashed stack, full
heap causing malloc() failure, etc.


Currently I do everything in segfault handler (no fork), but I like the 
idea of fork (in segfault), this may resolve some problems.


Regards,
Radosław Smogura


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash dumps

2011-07-04 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Monday 04 of July 2011 16:32:32
 Craig Ringer cr...@postnewspapers.com.au writes:
  Why not produce a tool that watches the datadir for core files and
  processes them? ...
 
 By and large, our attitude has been that Postgres shouldn't be crashing
 often enough to make this sort of infrastructure worthwhile.  Developer
 time spent on it would be far better spent on fixing the bugs instead.
 
  For that reason, it'd be handy if a backend could trap SIGSEGV and
  reliably tell the postmaster I'm crashing! so the postmaster could
  fork a helper to capture any additional info the backend needs to be
  alive for. ...
  Unfortunately, reliably and segfault don't go together.
 
 Yeah.  I think there's no chance at all that we'd accept patches pointed
 in this direction.  They'd be more likely to decrease the system's
 reliability than anything else.  Aside from the difficulty of doing
 anything at all reliably in an already-failing process, once we realize
 that something is badly wrong it's important to kill off all other
 backends ASAP.  That reduces the window for any possible corruption of
 shared memory to make it into on-disk state.  So interposing a helper
 to fool around with the failed process doesn't sound good at all.
 
 In practice I think you can generally get everything of interest
 out of the core file, so it's not clear to me that there's any win
 available from this line of thought anyhow.
 
   regards, tom lane
I asked about crash reports becaus of at this time there was thread about 
crashing in live system.

There is one win, I think, users will faster send crash report, then core dump 
(from many reasons, size, number of confidential information, etc).

Such report may be quite usefull, for reasonable bug finding.

It may be attached as contrib too, but I noticed and we agree that report 
generation should not affect speed of shoutdown.

PostgreSQL will look better - server application that generates crash reports 
looks better then no generating. Just a bit of marketing ;)

Regards,
Radek.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Review of patch Bugfix for XPATH() if expression returns a scalar value

2011-06-30 Thread Radosław Smogura

On Wed, 29 Jun 2011 22:26:39 +0200, Florian Pflug wrote:

On Jun29, 2011, at 19:57 , Radosław Smogura wrote:

This is review of patch
https://commitfest.postgresql.org/action/patch_view?id=565
Bugfix for XPATH() if expression returns a scalar value

SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) 
FROM (SELECT
(XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES 
(XMLELEMENT(name
root, XMLATTRIBUTES('n' AS xmlns, 'v' AS value),'t'))) v(x)) as 
foo;


  xmlelement
-
root sth=amp;lt;n/

	It's clearly visible that value from attribute is lt;n, not . 
Every
parser will read this as lt;n which is not-natural and will 
require form
consumer/developer to de-escape this on his side - roughly speaking 
this will

be reported as serious bug.


There's a problem there, no doubt, but your proposed solution just 
moves the

problem around.

Here's your query, reformatted to be more legible

SELECT
  XMLELEMENT(name root,
 XMLATTRIBUTES(foo.namespace AS sth)
  )
FROM (
  SELECT
(XPATH('namespace-uri(/*)', x))[1] AS namespace
  FROM (VALUES (XMLELEMENT(name root,
XMLATTRIBUTES('n' AS xmlns,
  'v' AS value),'t')
  )
) v(x)) as foo;

What happens is that the XPATH expression selects the xmlns
attribute with the value 'n'. To be well-formed xml, that value
must be escaped, so what is actually returned by the XPATH
call is 'lt;n'. The XMLATTRIBUTES() function, however, doesn't
distinguish between input of type XML and input of type TEXT,
so it figures it has to represent the *textual* value 'lt;n'
in xml, and produces 'amp;lt;n'.

Not escaping textual values returned by XPATH isn't a solution,
though. For example, assume someone inserts the value returned
by the XPATH() call in your query into a column of type XML.
If XPATH() returned 'n' literally instead of escaping it,
the column would contain non-well-formed XML in a column of type
XML. Not pretty, and pretty fatal during your next dump/reload
cycle.

Or, if you want another example, simply remove the XMLATTRIBUTES
call and use the value returned by XPATH as a child node directly.

SELECT
  XMLELEMENT(name root,
 foo.namespace
  )
FROM (
  SELECT
(XPATH('namespace-uri(/*)', x))[1] AS namespace
  FROM (VALUES (XMLELEMENT(name root,
XMLATTRIBUTES('n' AS xmlns,
  'v' AS value),'t')
  )
) v(x)) as foo;

 xmlelement

 rootlt;n/root

Note that this correct! The root node contains a text node
representing the textual value 'n'. If XPATH() hadn't return
the value escaped, the query above would have produces

  rootn/root

which is obviously wrong. It works in this case because XMLELEMENT
is smart enough to distinguish between child nodes gives as TEXT
values (those are escaped) and child nodes provided as XML values
(those are inserted unchanged).

XMLATTRIBUTES, however, doesn't make that distinction, and simply
escaped all attributes values independent of their type. Now, the
reason it does that is probably that *not* all XML values are
well-formed attribute values without additional escaping. For 
example,

you cannot have literal '' and '' in attribute values. So if
XMLATTRIBUTES, like XMLELEMENT never escaped values which are
already of type XML, the following piece of code

  XMLELEMENT(name a, XMLATTRIBUTES('node/'::xml as v))

would produce

  a v=node//

which, alas, is not well-formed :-(

The correct solution, I believe, is to teach XMLATTRIBUTES to
only escape those characters which are invalid in attribute
values but valid in XML (Probably only '' and '' but I'll
check). If there are no objects, I'll prepare a separate patch
for that. I don't want to include it in this patch because it's
really a distinct issue (even modifies a different function).

best regards,
Florian Pflug


You may manually enter invalid xml too, You don't need xpath for this. 
Much more

In PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit
I executed
SELECT XMLELEMENT(name a, XMLATTRIBUTES('node/'::xml as v))
and it's looks like I got correct output a v=lt;node/gt;/
when I looked with text editor into table files I saw same value.

I will check on last master if I can reproduce this.

But indeed, now PostgreSQL is not type safe against XML type. See 
SELECT XMLELEMENT(name root, '', (xpath('//text()', 
'rootlt;/root'))[1])).


You found some problem with escaping, but solution is bad, I think 
problem lies with XML type, which may be used to hold strings and proper 
xml. For example above query can't distinguish if (xpath('//text()', 
'rootlt;/root'))[1] is xml text, or xml element.


For me adding support for scalar is really good and needed, but 
escaping is not.


Regards,
Radosław Smogura


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

2011-06-29 Thread Radosław Smogura
Review of patch 
https://commitfest.postgresql.org/action/patch_view?id=580


=== Patch description ===
SUMMARY: When text() based XPATH expression is invoked output is not XML 
escaped


DESCRIPTION: Submitter invokes following statement:
SELECT (XPATH('/*/text()', 'rootlt;/root'))[1].
He expect (escaped) result lt;, but gets 

AFFECTS: Possible this may affects situations when user wants to insert output 
from above expression to XML column.


PATCH CONTENT:
A. 1. Patch fixes above problem (I don't treat this like problem, but like 
enhancement).

A. 2. In addition patch contains test cases for above.
A. 3. Patch changes behaviour of method xml_xmlnodetoxmltype invoked by 
xpath_internal, by adding escape_xml() call.

A. 4. I don't see any stability issues with this.
A. 5. Performance may be reduced and memory consumption may 
increase due to internals of method escape_xml


=== Review ===
B. 1. Patch applies cleanly.

B. 2. Source compiles, and patch works as Submitter wants.

B. 3. Personally I missed some notes in documentation that such 
expression will be escaped (those should be clearly exposed, as the live 
functionality is changed).


B. 4. Submitter, possible, revealed some missed, minor functionality of 
PostgreSQL XML. As he expects XML escaped output.


B. 5. Currently XPATH produces escaped output for Element nodes, and 
not escaped output for all other types of nodes (including text, 
comment, etc.)


B. 6. Current behaviour _is intended_ (there is if  to check node type) 
and _natural_. In this particular case user ask for text content of some 
node, and this content is actually .


B. 7. Similar behaviour may be observer e. g. in SQL Server(tm)
SELECT x.value('(//text())[1]', 'varchar(256)') FROM #xml_t;
Produced 

B. 8. Even if current behaviour may be treated as wrong it was exposed 
and other may depends on not escaped content.


B. 9. I think, correct approach should go to create new function 
(basing on one existing) that will be able to escape above. In this 
situation

call should look like (for example only!):
SELECT xml_escape((XPATH('/*/text()', 'rootlt;/root')))[1]
or
SELECT xml_escape((XPATH('/*/text()', 'rootlt;/root'))[1])
One method to operate on array one to operate on single XML datum.
Or to think about something like xmltext(). (Compare current status of xml.c)

B. 10. If such function (B.9.) is needed and if it will be included is out of 
scope of this review.


Basing mainly on A.1, B.6., B.8., bearing in mind B.10., in my opinion this is 
subject to reject as need more work, or as invalid.


The detailed explanation why such behaviour should not be implemented I will 
send in review of https://commitfest.postgresql.org/action/patch_view?id=565.


Regards,

Radosław Smogura

P. S.
I would like to say sorry, for such late answaer, but I sent this from other 
mail address, which was not attached to mailing list. Blame KDE KMail not me 
:)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Review of patch Bugfix for XPATH() if expression returns a scalar value

2011-06-29 Thread Radosław Smogura
This is review of patch
https://commitfest.postgresql.org/action/patch_view?id=565
Bugfix for XPATH() if expression returns a scalar value

Patch applies cleanly, and compiles cleanly too, I didn't checked tests.

Form discussion about patch, and referenced thread in this patch 
http://archives.postgresql.org/pgsql-general/2010-07/msg00355.php, if I 
understand good such functionality is desired.

This patch, I think, gives good approach for dealing with scalar 
values, 
and, as well converting value to it's string representation is good too 
(according to current support for xml), with one exception detailed below.

In this patch submitter, similarly to 
https://commitfest.postgresql.org/action/patch_view?id=580, added 
functionality for XML-escaping of some kind of values (I think only string 
scalars are escaped), which is not-natural and may lead to double escaping in 
some situation, example query may be:

SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM (SELECT 
(XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name 
root, XMLATTRIBUTES('n' AS xmlns, 'v' AS value),'t'))) v(x)) as foo;

   xmlelement
-
 root sth=amp;lt;n/
 
It's clearly visible that value from attribute is lt;n, not . 
Every 
parser will read this as lt;n which is not-natural and will require form 
consumer/developer to de-escape this on his side - roughly speaking this will 
be reported as serious bug.
I didn't found good reasons why XML-escaping should be included, 
submitter 
wrote about inserting this to xml column and possibility of data damage, but 
didn't give an example. Such example is desired.

Conclusion
I vote +1 for this patch if escaping will be removed.

Regards,
Radoslaw Smogura

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hugetables question

2011-06-23 Thread Radosław Smogura
Martijn van Oosterhout klep...@svana.org Thursday 23 of June 2011 09:10:20
 On Wed, Jun 22, 2011 at 02:31:01PM +0200, Rados??aw Smogura wrote:
  I strictly disagree with opinion if there is 1% it's worthless. 1%
  here, 1% there, and finally You get 10%, but of course hugepages
  will work quite well if will be used in code that require many
  random jumps. I think this can be reproduced and some not-common
  case may be found to get performance of about 10% (maybe upload
  whole table in shared buffer and randomly peek records one by
  one).
 
 I think the point is not that 1% is worthless, but that it hasn't been
 shown that it is a 1% improvement, becuase the noise is too large.
 
 For benefits this small, what you need to is run each test 100 times
 and check the mean and standard deviation and see whether the
 improvment is real or not.
 
 When the benefit is 10% you only need a handful of runs to prove it,
 which is why they're accepted easier.
 
 Have a nice day,
 
  Patriotism is when love of your own people comes first; nationalism,
  when hate for people other than your own comes first.
  
- Charles de Gaulle
I think conclusion from this test was Much more important things are to do, 
then 1% benefit - not 1% is worthless.

I will try today hugepages, with random peeks.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hugetables question

2011-06-22 Thread Radosław Smogura

On Wed, 22 Jun 2011 14:24:17 +0300, Marti Raudsepp wrote:

On Sun, Jun 19, 2011 at 12:56, Radosław Smogura
rsmog...@softperience.eu wrote:

I want to implement hugepages for shared memory


Hi,

Have you read this post by Tom Lane about the performance estimation
and a proof-of-concept patch with hugepages?
http://archives.postgresql.org/pgsql-hackers/2010-11/msg01842.php

It's possible that there was a flaw in his analysis, but his
conclusion is that it's not worth it:


And the bottom line is: if there's any performance benefit at all,
it's on the order of 1%.  The best result I got was about 3200 TPS
with hugepages, and about 3160 without.  The noise in these numbers
is more than 1% though.


Regards,
Marti


Actually when I tried to implement hugepages for palloc (I ware unable 
to write fast and effective mallocator), my result was that when I was 
using normal pages I got small performance degree, but when I was using 
huge pages this was faster then normal build (even with infective 
mallocator).


I know there are some problems with accessing higher memory (when 
server is more then 8GB), and hugepages may resolve this.


I strictly disagree with opinion if there is 1% it's worthless. 1% 
here, 1% there, and finally You get 10%, but of course hugepages will 
work quite well if will be used in code that require many random 
jumps. I think this can be reproduced and some not-common case may be 
found to get performance of about 10% (maybe upload whole table in 
shared buffer and randomly peek records one by one).


Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch - Debug builds without optimization

2011-06-20 Thread Radosław Smogura
Greg Stark st...@mit.edu Monday 20 of June 2011 03:39:12
 On Thu, Jun 16, 2011 at 9:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  BTW, if you're hacking Postgres code and don't already have a
  reinstall script, you need one.  Mine is basically
  
 pg_ctl stop
 cd $PGBLDROOT/src/backend
 make install-bin
 pg_ctl start
 
 I've always wondered what other people do to iterate quickly. It's a
 bit of a pain that you can't just run the binary out of the build
 tree. This looks a lot safer than some of the things I was considering
 doing with symlinks.
I actually go to installation directory
and call in one line (simple because up arrow helps).

pg_ctl -D db stop; gmake -C ../postgresql -j5 install; pg_ctl -D db start

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] POSIX question

2011-06-20 Thread Radosław Smogura

Hello,

I had some idea with hugepagse, and I read why PostgreSQL doesn't 
support POSIX (need of nattach). During read about POSIX/SysV I found 
this (thread about dynamic chunking shared memory).


http://archives.postgresql.org/pgsql-hackers/2010-08/msg00586.php

When playing with mmap I done some approach how to deal with growing 
files, so...


Maybe this approach could resolve both of above problems (POSIX and 
dynamic shared memory). Here is idea:


1. mmap some large amount of anonymous virtual memory (this will be 
maximum size of shared memory).
2. init small SysV chunk for shmem header (to keep fallout 
requirements)
3. SysV remap is Linux specific so unmap few 1st vm pages of step 1. 
and attach there (2.)
3. a. Lock header when adding chunks (1st chunk is header) (we don't 
want concurrent chunk allocation)
4. allocate some other chunks of shared memory (POSIX is the best way) 
and put it in shmem header, put there information like chunk id/name, is 
this POSIX or SysV, some useful flags (hugepage?) needed by reattaching, 
attach those in 1.

4b. unlock 3a

Point 1. will no eat memory, as memory allocation is delayed and in 
64bit platforms you may reserve quite huge chunk of this, and in future 
it may be possible using mmap / munmap to concat chunks / defrag it etc.


Mmap guarants that mmaping with mmap_fixed over already mmaped area 
will unmap old.


A working preview changeset applied for sysv_memory.c maybe quite 
small.


If someone will want to extend memory, he may add new chunk (ofcourse 
to keep header memory continuous number of chunks is limited).


What do you think about this?

Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POSIX question

2011-06-20 Thread Radosław Smogura
Florian Pflug f...@phlo.org Monday 20 of June 2011 16:16:58
 On Jun20, 2011, at 15:27 , Radosław Smogura wrote:
  1. mmap some large amount of anonymous virtual memory (this will be
  maximum size of shared memory). ...
  Point 1. will no eat memory, as memory allocation is delayed and in 64bit
  platforms you may reserve quite huge chunk of this, and in future it may
  be possible using mmap / munmap to concat chunks / defrag it etc.
 
 I think this breaks with strict overcommit settings (i.e.
 vm.overcommit_memory = 2 on linux). To fix that, you'd need a way to tell
 the kernel (or glibc) to simply reserve a chunk of virtual address space
 for further user. Not sure if there's a API for that...
 
 best regards,
 Florian Pflug

This may be achived by many other things, like mmap /dev/null.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POSIX question

2011-06-20 Thread Radosław Smogura
Florian Pflug f...@phlo.org Monday 20 of June 2011 17:01:40
 On Jun20, 2011, at 16:39 , Radosław Smogura wrote:
  Florian Pflug f...@phlo.org Monday 20 of June 2011 16:16:58
  
  On Jun20, 2011, at 15:27 , Radosław Smogura wrote:
  1. mmap some large amount of anonymous virtual memory (this will be
  maximum size of shared memory). ...
  Point 1. will no eat memory, as memory allocation is delayed and in
  64bit platforms you may reserve quite huge chunk of this, and in
  future it may be possible using mmap / munmap to concat chunks /
  defrag it etc.
  
  I think this breaks with strict overcommit settings (i.e.
  vm.overcommit_memory = 2 on linux). To fix that, you'd need a way to
  tell the kernel (or glibc) to simply reserve a chunk of virtual address
  space for further user. Not sure if there's a API for that...
  
  best regards,
  Florian Pflug
  
  This may be achived by many other things, like mmap /dev/null.
 
 Are you sure? Isn't mmap()ing /dev/null a way to *allocate* memory?
 
 Or at least this is what I always thought glibc does when you malloc()
 are large block at once. (This allows it to actually return the memory
 to the kernel once you free() it, which isn't possible if the memory
 was allocated simply by extending the heap).
 
 You can work around this by mmap()ing an actual file, because then
 the kernel knows it can use the file as backing store and thus doesn't
 need to reserve actual physical memory. (In a way, this just adds
 additional swap space). Doesn't seem very clean though...
 
 Even if there's a way to work around a strict overcommit setting, unless
 the workaround is a syscall *explicitly* designed for that purpose, I'd
 be very careful with using it. You might just as well be exploiting a
 bug in the overcommit accounting logic and future kernel versions may
 simply choose to fix the bug...
 
 best regards,
 Florian Pflug

I'm sure at 99%. When I ware playing with mmap I preallocated, probably, 
about 100GB of memory.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POSIX question

2011-06-20 Thread Radosław Smogura
Florian Pflug f...@phlo.org Monday 20 of June 2011 17:07:55
 On Jun20, 2011, at 17:05 , Radosław Smogura wrote:
  I'm sure at 99%. When I ware playing with mmap I preallocated,
  probably, about 100GB of memory.
 
 You need to set vm.overcommit_memory to 2 to see the difference. Did
 you do that?
 
 You can do that either with echo 2  /proc/sys/vm/overcommit_memory
 or by editing /etc/sysctl.conf and issuing sysctl -p.
 
 best regards,
 Florian Pflug
I've just created 127TB mapping in Linux - maximum allowed by VM. Trying 
overcommit with 0,1,2.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hugetables question

2011-06-19 Thread Radosław Smogura
I want to implement hugepages for shared memory, to make it transparent I want 
to do in this fashion:
1. Reserve memory M of size s
2. Try to allocate hugepage memory of as big size as possible (hs), attach at 
M.
3. Allocate normal shared memory of size hs - s, and attach it at M+hs.
This soulution should work for Linux and Windows, and make no difference for 
usage of such shared memory in application.

(...and this actually works)

But in sysv_shmem i saw some checking for memory belonging to other (probably 
failed) processes, because I can't put new header in step 3, i would like to 
ask if will be suefficient to:
1. Check creator pid by shmctl.
2. Remove checking of shmem magic
3. Or maybe instead of this better will be to split shared memory, header will 
be stored under one key and it will contain keys to other individually 
allocated blocks?

Ofocourse moving to POSIX may be much more better, but according to commit 
feast thread it may be impossible.

Maybe some other ideas.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hugetables question

2011-06-19 Thread Radosław Smogura
Martijn van Oosterhout klep...@svana.org Sunday 19 of June 2011 12:35:18
 On Sun, Jun 19, 2011 at 11:56:15AM +0200, Rados??aw Smogura wrote:
  I want to implement hugepages for shared memory, to make it transparent I
  want to do in this fashion:
  1. Reserve memory M of size s
  2. Try to allocate hugepage memory of as big size as possible (hs),
  attach at M.
  3. Allocate normal shared memory of size hs - s, and attach it at M+hs.
  This soulution should work for Linux and Windows, and make no difference
  for usage of such shared memory in application.
 
 At least in Linux they're trying to make hugepages transparent, so I'm
 wondering if this is going to make a difference for Linux in the long
 term.
 
 As for your other problem, Perhaps you can put the shmem block first,
 before the hugemem block? Would require some pointer fiddling, but
 seems doable.
 
 Habe a nice day,
 
  Patriotism is when love of your own people comes first; nationalism,
  when hate for people other than your own comes first.
  
- Charles de Gaulle
Yes shmem will be transparent in Linux, but in any case, currently is only for 
anonymous memory, and has some disadvantages over explicit hugepages.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XPATH evaluation

2011-06-18 Thread Radosław Smogura
Nicolas Barbier nicolas.barb...@gmail.com Friday 17 of June 2011 17:29:57
 2011/6/17, Andrew Dunstan and...@dunslane.net:
  On 06/17/2011 10:55 AM, Radosław Smogura wrote:
  XML canonization preservs whitespaces, if I remember
  well, I think there is example.
  
  In any case if I will store image in XML (I've seen this), preservation
  of white spaces and new lines is important.
  
  If you store images you should encode them anyway, in base64 or hex.
 
 Whitespace that is not at certain obviously irrelevant places (such as
 right after , between attributes, outside of the whole document,
 etc), and that is not defined to be irrelevant by some schema (if the
 parser is schema-aware), is relevant. You cannot just muck around with
 it and consider that correct.
 
  More generally, data that needs that sort of preservation should
  possibly be in CDATA nodes.
 
 CDATA sections are just syntactic sugar (a form of escaping):
 
 URL:http://www.w3.org/TR/xml-infoset/#omitted
 
 Appendix D: What is not in the Information Set
 [..]
 19. The boundaries of CDATA marked sections.
 
 Therefore, there is not such thing as a CDATA node that would be
 different from just text (Infoset-wise).
 
 Note that that does not mean that binary data is never supposed to be
 altered or that all binary data is to be accepted: e.g., whether
 newlines are represented using \n, \r, or \r\n is irrelevant;
 also, binary data that is not valid according to the used encoding
 must of course not be accepted.
 
 Nicolas

I would like to send patch to remove formatting. How to deal with collapsing 
blank nodes I don't know.

Regards,
Radek
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 702b9e3..07464fb 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3262,7 +3262,7 @@ xml_xmlnodetoxmltype(xmlNodePtr cur)
 		buf = xmlBufferCreate();
 		PG_TRY();
 		{
-			xmlNodeDump(buf, NULL, cur, 0, 1);
+			xmlNodeDump(buf, NULL, cur, 0, 0);
 			result = xmlBuffer_to_xmltype(buf);
 		}
 		PG_CATCH();

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] XPATH evaluation

2011-06-17 Thread Radosław Smogura

Hello,

During review of 
https://commitfest.postgresql.org/action/patch_view?id=580 I found 
following problems with XPath.


1.
SELECT (XPATH('/root/*', 'root xmlns:o=http://olacle.com/db; 
xmlns:p=http://postgresql.org/db;o:dbab/b/a/o:dbp:db/p:db/root'));

Produces:
{o:db
  a
b/
  /a
/o:db,p:db/}
In above b/b was reduced to b/ this is different infoset then 
input, and those notations are differently interpreted e.g. by XML 
Binding  WebServices. The 1st one will may be mapped to empty string, 
and 2nd one to to null.


As well result was formatted which produces again different infoset.

Both of above may cause problems with XML digesting.

2.
SELECT (XPATH('/root/*', 'root xmlns:o=http://olacle.com/db; 
xmlns:p=http://postgresql.org/db;o:db/o:dbp:db/p:db/root'));

{o:db/,p:db/}
In above I missing namespaces.

I may take on assignment 1st (fix is simple), but for 2nd I have mixed 
fillings. I think 2nd should transfer namespaces in some way to client.


What do You think?

Regards,
Radosław Smogura

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XPATH evaluation

2011-06-17 Thread Radosław Smogura
Andrew Dunstan and...@dunslane.net Friday 17 of June 2011 15:47:04
 On 06/17/2011 05:41 AM, Florian Pflug wrote:
  On Jun17, 2011, at 11:09 , Radosław Smogura wrote:
  1.
  SELECT (XPATH('/root/*', 'root xmlns:o=http://olacle.com/db;
  xmlns:p=http://postgresql.org/db;o:dbab/b/a/o:dbp:db/p
  :db/root')); Produces:
  {o:db
  
a

  b/

/a
  
  /o:db,p:db/}
  In aboveb/b  was reduced tob/  this is different infoset then
  input, and those notations are differently interpreted e.g. by XML
  Binding  WebServices. The 1st one will may be mapped to empty string,
  and 2nd one to to null.
  
  Oh, joy :-(
 
 I thought these were basically supposed to be the same.
 
 The XML Information Set for example specifically excludes:
 
 The difference between the two forms of an empty element: |foo/ |
 and |foo/foo|.
 
 
 See http://www.w3.org/TR/2004/REC-xml-infoset-20040204/ Appendix D.
 Note that this implies that foo/foo does not have content of an
 empty string, but that it has no content.
 
 
 cheers
 
 andrew

Indeed, Infoset Spec, and XML Canonization Spec treats foo/foo same, as 
foo/ - my wrong, but XML canonization preservs whitespaces, if I remember 
well, I think there is example.

In any case if I will store image in XML (I've seen this), preservation of 
white spaces and new lines is important.

Regards,
Radek.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XPATH evaluation

2011-06-17 Thread Radosław Smogura
Florian Pflug f...@phlo.org Friday 17 of June 2011 11:41:08
 On Jun17, 2011, at 11:09 , Radosław Smogura wrote:
  1.
  SELECT (XPATH('/root/*', 'root xmlns:o=http://olacle.com/db;
  xmlns:p=http://postgresql.org/db;o:dbab/b/a/o:dbp:db/p:
  db/root')); Produces:
  {o:db
  
   a
   
 b/
   
   /a
  
  /o:db,p:db/}
  In above b/b was reduced to b/ this is different infoset then
  input, and those notations are differently interpreted e.g. by XML
  Binding  WebServices. The 1st one will may be mapped to empty string,
  and 2nd one to to null.
 
 Oh, joy :-(
 
 Does this happen only with my patch applied or also with unpatched HEAD?
 
  2.
  SELECT (XPATH('/root/*', 'root xmlns:o=http://olacle.com/db;
  xmlns:p=http://postgresql.org/db;o:db/o:dbp:db/p:db/root'));
  {o:db/,p:db/}
  In above I missing namespaces.
 
 Hm, that's a hard problem a think. Your problem (1) basically tells us that
 ideally we'd return the matching parts of an XML document unmodified. Now,
 (2) tells us that isn't to most sensible thing to do either.
 
  I may take on assignment 1st (fix is simple)
 
 Whats your proposed fix for (1)?
 
  , but for 2nd I have mixed fillings. I think 2nd should transfer
  namespaces in some way to client.
 
 I don't see how XPATH() can do that without breaking it's API. The only
 thing we could do AFAICS is the define a second XPATH evaluation function
 which returns a list of namespace declarations (prefix and uri) for every
 node.
 
 best regards,
 Florian Pflug

No this is not about Your patch, but was inspired by it.
Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XPATH evaluation

2011-06-17 Thread Radosław Smogura
Andrew Dunstan and...@dunslane.net Friday 17 of June 2011 17:09:25
 On 06/17/2011 10:55 AM, Radosław Smogura wrote:
  Andrew Dunstanand...@dunslane.net  Friday 17 of June 2011 15:47:04
  
  On 06/17/2011 05:41 AM, Florian Pflug wrote:
  On Jun17, 2011, at 11:09 , Radosław Smogura wrote:
  1.
  SELECT (XPATH('/root/*', 'root xmlns:o=http://olacle.com/db;
  xmlns:p=http://postgresql.org/db;o:dbab/b/a/o:dbp:db/
  p
  
  :db/root')); Produces:
  {o:db
  
 a
 
   b/
 
 /a
  
  /o:db,p:db/}
  In aboveb/b   was reduced tob/   this is different infoset then
  input, and those notations are differently interpreted e.g. by XML
  Binding   WebServices. The 1st one will may be mapped to empty
  string, and 2nd one to to null.
  
  Oh, joy :-(
  
  I thought these were basically supposed to be the same.
  
  The XML Information Set for example specifically excludes:
   The difference between the two forms of an empty element: |foo/ 
   | and |foo/foo|.
  
  Seehttp://www.w3.org/TR/2004/REC-xml-infoset-20040204/  Appendix D.
  Note that this implies thatfoo/foo  does not have content of an
  empty string, but that it has no content.
  
  
  cheers
  
  andrew
  
  Indeed, Infoset Spec, and XML Canonization Spec treatsfoo/foo  same,
  as foo/  - my wrong, but XML canonization preservs whitespaces, if I
  remember well, I think there is example.
  
  In any case if I will store image in XML (I've seen this), preservation
  of white spaces and new lines is important.
 
 If you store images you should encode them anyway, in base64 or hex.
 
 More generally, data that needs that sort of preservation should
 possibly be in CDATA nodes.
 
 cheers
 
 andrew
I know this answer, because this solution is better. But, during one work I 
created XSL-FO with whitespace preserve attribute, if I would like to get part 
of such XSL-FO I could destroy output document.

But those use-cases doesn't change fact that XPATH output doesn't preserves 
whitepsaces, newlines, and produces different node, then was in original. It 
same as regexp form varchar will trim result without control.

I emphasize this because it may cause problems with XML Digest algorithms 
which are quite popular and may cause some legal! problems when you try to use 
Advance Signature in Europe Union, as well with other application.

With XML Binding it's quite popular to interpret foo/ as null, foo/foo 
as empty string. In particulary mantoined Infoset Spec doesn't matters here.

I think no-formatting is reasonable requirement for XPATH function.

Regards,
Radek.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [Streaming] Datum compression

2011-06-16 Thread Radosław Smogura
Hello,

I think about following functionality to add to streaming. In some way earlier 
or later soemone will add possibility for gzip communication. As I try to make 
streaming, I wonder if it is possible to change TOAST compression to 
GZIP/zLib, currently is LZMA. 

I want to make this: if user want to use compression during communication, 
server will be able to send compressed datum, instead of doing recompression. 
GZIP is actually quite popular, and skipping plain Windows C, it's supported 
on all driver platforms, including Java and .NET.

What do You think about this approach.

Regards,
Radek.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch - Debug builds without optimization

2011-06-16 Thread Radosław Smogura

Hello,

I'm sending following patch which disables optimization when 
--enable-debug is passed. It was nasty (for me, at least) that debug 
build required passing of CFLAGS with -O0 to get nice traceable code.


Regards,
Radekdiff --git a/configure.in b/configure.in
old mode 100644
new mode 100755
index e6232af..3759645
--- a/configure.in
+++ b/configure.in
@@ -194,10 +194,10 @@ PGAC_ARG_BOOL(enable, spinlocks, yes,
   [do not use spinlocks])
 
 #
-# --enable-debug adds -g to compiler flags
+# --enable-debug adds -g and (-O0) to compiler flags
 #
 PGAC_ARG_BOOL(enable, debug, no,
-  [build with debugging symbols (-g)])
+  [build with debugging symbols (-g) and no optimization (-O0)])
 AC_SUBST(enable_debug)
 
 #
@@ -402,7 +402,7 @@ unset CFLAGS
 # If the user specifies something in the environment, that is used.
 # else:  If the template file set something, that is used.
 # else:  If coverage was enabled, don't set anything.
-# else:  If the compiler is GCC, then we use -O2.
+# else:  If the compiler is GCC, then we use -O2, unless debug is passed.
 # else:  If the compiler is something else, then we use -O, unless debugging.
 
 if test $ac_env_CFLAGS_set = set; then
@@ -412,11 +412,18 @@ elif test ${CFLAGS+set} = set; then
 elif test $enable_coverage = yes; then
   : # no optimization by default
 elif test $GCC = yes; then
-  CFLAGS=-O2
+  # if the user selected debug mode, use -O0, instead of nothing equivalent to
+  # (-O1)
+  if test $enable_debug != yes; then
+CFLAGS=-O0
+  else
+CFLAGS=-O2
+  fi
 else
-  # if the user selected debug mode, don't use -O
+  # if the user selected debug mode, use -O0, instead of nothing equivalent to
+  # (-O1)
   if test $enable_debug != yes; then
-CFLAGS=-O
+CFLAGS=-O0
   fi
 fi
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch - Debug builds without optimization

2011-06-16 Thread Radosław Smogura

On Thu, 16 Jun 2011 14:30:27 +0200, Radosław Smogura wrote:

Hello,

I'm sending following patch which disables optimization when
--enable-debug is passed. It was nasty (for me, at least) that debug
build required passing of CFLAGS with -O0 to get nice traceable code.

Regards,
Radek


Sorry for mess, this should be submited.

diff --git a/configure.in b/configure.in
old mode 100644
new mode 100755
index e6232af..47edf82
--- a/configure.in
+++ b/configure.in
@@ -194,10 +194,10 @@ PGAC_ARG_BOOL(enable, spinlocks, yes,
   [do not use spinlocks])
 
 #
-# --enable-debug adds -g to compiler flags
+# --enable-debug adds -g and (-O0) to compiler flags
 #
 PGAC_ARG_BOOL(enable, debug, no,
-  [build with debugging symbols (-g)])
+  [build with debugging symbols (-g) and no optimization (-O0)])
 AC_SUBST(enable_debug)
 
 #
@@ -402,7 +402,7 @@ unset CFLAGS
 # If the user specifies something in the environment, that is used.
 # else:  If the template file set something, that is used.
 # else:  If coverage was enabled, don't set anything.
-# else:  If the compiler is GCC, then we use -O2.
+# else:  If the compiler is GCC, then we use -O2, unless debug is passed.
 # else:  If the compiler is something else, then we use -O, unless debugging.
 
 if test $ac_env_CFLAGS_set = set; then
@@ -412,7 +412,13 @@ elif test ${CFLAGS+set} = set; then
 elif test $enable_coverage = yes; then
   : # no optimization by default
 elif test $GCC = yes; then
-  CFLAGS=-O2
+  # if the user selected debug mode, use -O0, instead of nothing equivalent to
+  # (-O1)
+  if test $enable_debug == yes; then
+CFLAGS=-O0
+  else
+CFLAGS=-O2
+  fi
 else
   # if the user selected debug mode, don't use -O
   if test $enable_debug != yes; then

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [WIP] [Stream] Preview of pg_type changes

2011-06-16 Thread Radosław Smogura

Hello,

Here I would like to expose changes to pg_type and type infrastructure 
about streaming. Changes are as follows:

- added new column typstreamin typestremout
- general contract for those is for streamin same as receive (receive 
use internal), for streamout it is (internal, type)

- changes to psql help
- and all functionality for type manipulation.

There is no streamin/streamout methods implemented.

If someone wants and have time, as this is WIP patch, then suggestions 
are welcome.


Regards,
Radek

stream_pg_type.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch - Debug builds without optimization

2011-06-16 Thread Radosław Smogura

On Thu, 16 Jun 2011 16:00:21 -0400, Tom Lane wrote:

Alvaro Herrera alvhe...@commandprompt.com writes:
I disagree with this change.  Debug builds are very useful to have 
in
production, and you don't want to be running -O0 there.  I have 
found
that you can use a src/Makefile.custom like this for those times 
when you

want to debug stuff in a particular set of files:



CFLAGS := $(patsubst -O2,-O0,$(CFLAGS))



Then you remove the .o files that you want to debug, and rerun make.


FWIW, I only use Makefile.custom for more-or-less-permanent changes 
to

the build behavior of a particular machine.  For one-shot things like
recompiling some particular file(s) at -O0, it's easier to do this:

rm foo.o
make PROFILE=-O0
reinstall postgres executable

The makefiles automatically add PROFILE at the end of CFLAGS, so you 
can
inject any compile flag this way --- I think the original intent was 
to

use it to add -pg for gprof-enabled builds.  But it's handy for this.

BTW, if you're hacking Postgres code and don't already have a
reinstall script, you need one.  Mine is basically

pg_ctl stop
cd $PGBLDROOT/src/backend
make install-bin
pg_ctl start

regards, tom lane

Thanks,

Actually I do something like above, but good to know install-bin 
target, I fired before gmake -j5 install.


Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Crash dumps

2011-06-14 Thread Radosław Smogura

Hello,

Because, I work a little bit on streaming protocol and from time to 
time I have crashes. I want ask if you wont crash reporting (this is one 
of minors products from mmap playing) those what I have there is mmaped 
areas, and call stacks, and some other stuff. This based reports works 
for Linux with gdb, but there is some pluggable architecture, which 
connects with segfault - one thing that should be considered is to kill 
other processes immediately when reporting started (as taking report 
takes some time) so some IPC will be required.


I may polish this a little bit, and send patch for this (currently 
without IPC killing of others).


Regards,
Radosław Smogura

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] .gitignore for some of cygwin files

2011-06-09 Thread Radosław Smogura

On Thu, 9 Jun 2011 14:12:59 +0200, Magnus Hagander wrote:

On Wed, Jun 8, 2011 at 12:34, Radosław Smogura
rsmog...@softperience.eu wrote:
After compilation I got big list of differences about binaries and 
other

files. You may see it in patch.


Hmm. Does this really affect just cygwin, and not also mingw? I know
msvc builds out-of-tree, but mingw doesn't, iirc? WHere do the EXEs 
go

there?


Actually I called compilation from make in cygwin env, as I can't run 
build on Windows7/VisualStudio 2010. I don't know if this is preferred 
method.


From configure script

checking for ld used by GCC... /usr/i686-pc-cygwin/bin/ld.exe
checking if the linker (/usr/i686-pc-cygwin/bin/ld.exe) is GNU ld... 
yes


CC runs as
GNU C (GCC) version 4.3.4 20090804 (release) 1 (i686-pc-cygwin)
compiled by GNU C version 4.3.4 20090804 (release) 1, GMP 
version 4.3.1, MPFR version 2.4.1-p5.
GGC heuristics: --param ggc-min-expand=100 --param 
ggc-min-heapsize=131072


After compilation I got many of *dll.def and others.



What's nbproject?
Just configuration from some editor. It looks like any move in project 
creates this entry in .gitignore



As for the patch itself, we don't normally keep comments in the
.gitignore files, I'm not sure if we want to start doing that...

This is not my part - just wanted to comment what and why.

Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] .gitignore for some of cygwin files

2011-06-09 Thread Radosław Smogura

On Thu, 9 Jun 2011 14:12:59 +0200, Magnus Hagander wrote:

On Wed, Jun 8, 2011 at 12:34, Radosław Smogura
rsmog...@softperience.eu wrote:
After compilation I got big list of differences about binaries and 
other

files. You may see it in patch.


Hmm. Does this really affect just cygwin, and not also mingw? I know
msvc builds out-of-tree, but mingw doesn't, iirc? WHere do the EXEs 
go

there?

What's nbproject?

As for the patch itself, we don't normally keep comments in the
.gitignore files, I'm not sure if we want to start doing that...


Sorry for trash this is `git status` after compilation on branch

#   .gitignore-for-cygwin.diff
#   blob_20110601.patch
#   nbproject/
#   src/backend/postgres.def
#   src/backend/postgres.exe
#   
src/backend/replication/libpqwalreceiver/liblibpqwalreceiverdll.def

#   src/backend/snowball/libdict_snowballdll.def
#   src/backend/utils/adt/blob.c
#   
src/backend/utils/mb/conversion_procs/ascii_and_mic/libascii_and_micdll.def
#   
src/backend/utils/mb/conversion_procs/cyrillic_and_mic/libcyrillic_and_micdll.def
#   
src/backend/utils/mb/conversion_procs/euc2004_sjis2004/libeuc2004_sjis2004dll.def
#   
src/backend/utils/mb/conversion_procs/euc_cn_and_mic/libeuc_cn_and_micdll.def
#   
src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/libeuc_jp_and_sjisdll.def
#   
src/backend/utils/mb/conversion_procs/euc_kr_and_mic/libeuc_kr_and_micdll.def
#   
src/backend/utils/mb/conversion_procs/euc_tw_and_big5/libeuc_tw_and_big5dll.def
#   
src/backend/utils/mb/conversion_procs/latin2_and_win1250/liblatin2_and_win1250dll.def
#   
src/backend/utils/mb/conversion_procs/latin_and_mic/liblatin_and_micdll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_ascii/libutf8_and_asciidll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_big5/libutf8_and_big5dll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/libutf8_and_cyrillicdll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_euc2004/libutf8_and_euc2004dll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/libutf8_and_euc_cndll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/libutf8_and_euc_jpdll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/libutf8_and_euc_krdll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/libutf8_and_euc_twdll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_gb18030/libutf8_and_gb18030dll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_gbk/libutf8_and_gbkdll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_iso8859/libutf8_and_iso8859dll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/libutf8_and_iso8859_1dll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_johab/libutf8_and_johabdll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_sjis/libutf8_and_sjisdll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/libutf8_and_sjis2004dll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_uhc/libutf8_and_uhcdll.def
#   
src/backend/utils/mb/conversion_procs/utf8_and_win/libutf8_and_windll.def

#   src/bin/initdb/initdb.exe
#   src/bin/pg_basebackup/pg_basebackup.exe
#   src/bin/pg_config/pg_config.exe
#   src/bin/pg_controldata/pg_controldata.exe
#   src/bin/pg_ctl/pg_ctl.exe
#   src/bin/pg_dump/pg_dump.exe
#   src/bin/pg_dump/pg_dumpall.exe
#   src/bin/pg_dump/pg_restore.exe
#   src/bin/pg_resetxlog/pg_resetxlog.exe
#   src/bin/psql/psql.exe
#   src/bin/scripts/clusterdb.exe
#   src/bin/scripts/createdb.exe
#   src/bin/scripts/createlang.exe
#   src/bin/scripts/createuser.exe
#   src/bin/scripts/dropdb.exe
#   src/bin/scripts/droplang.exe
#   src/bin/scripts/dropuser.exe
#   src/bin/scripts/reindexdb.exe
#   src/bin/scripts/vacuumdb.exe
#   src/include/utils/blob.h
#   src/interfaces/ecpg/preproc/ecpg.exe
#   src/pl/plpgsql/src/libplpgsqldll.def
#   src/test/regress/libregressdll.def
#   src/test/regress/pg_regress.exe
#   src/timezone/zic.exe


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] .gitignore for some of cygwin files

2011-06-09 Thread Radosław Smogura
Magnus Hagander mag...@hagander.net Thursday 09 of June 2011 16:25:07
 On Thu, Jun 9, 2011 at 15:40, Radosław Smogura rsmog...@softperience.eu 
wrote:
  On Thu, 9 Jun 2011 14:12:59 +0200, Magnus Hagander wrote:
  On Wed, Jun 8, 2011 at 12:34, Radosław Smogura
  
  rsmog...@softperience.eu wrote:
  After compilation I got big list of differences about binaries and
  other files. You may see it in patch.
  
  Hmm. Does this really affect just cygwin, and not also mingw? I know
  msvc builds out-of-tree, but mingw doesn't, iirc? WHere do the EXEs go
  there?
  
  What's nbproject?
  
  As for the patch itself, we don't normally keep comments in the
  .gitignore files, I'm not sure if we want to start doing that...
  
  Sorry for trash this is `git status` after compilation on branch
  
  #   .gitignore-for-cygwin.diff
  #   blob_20110601.patch
  #   nbproject/
  #   src/backend/postgres.def
  #   src/backend/postgres.exe
  #  
  src/backend/replication/libpqwalreceiver/liblibpqwalreceiverdll.def #  
  src/backend/snowball/libdict_snowballdll.def
  #   src/backend/utils/adt/blob.c
  #
  src/backend/utils/mb/conversion_procs/ascii_and_mic/libascii_and_micdll.d
  ef #
  src/backend/utils/mb/conversion_procs/cyrillic_and_mic/libcyrillic_and_mi
  cdll.def #
  src/backend/utils/mb/conversion_procs/euc2004_sjis2004/libeuc2004_sjis200
  4dll.def #
  src/backend/utils/mb/conversion_procs/euc_cn_and_mic/libeuc_cn_and_micdll
  .def #
  src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/libeuc_jp_and_sjisd
  ll.def #
  src/backend/utils/mb/conversion_procs/euc_kr_and_mic/libeuc_kr_and_micdll
  .def #
  src/backend/utils/mb/conversion_procs/euc_tw_and_big5/libeuc_tw_and_big5d
  ll.def #
  src/backend/utils/mb/conversion_procs/latin2_and_win1250/liblatin2_and_wi
  n1250dll.def #
  src/backend/utils/mb/conversion_procs/latin_and_mic/liblatin_and_micdll.d
  ef #
  src/backend/utils/mb/conversion_procs/utf8_and_ascii/libutf8_and_asciidll
  .def #
  src/backend/utils/mb/conversion_procs/utf8_and_big5/libutf8_and_big5dll.d
  ef #
  src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/libutf8_and_cyril
  licdll.def #
  src/backend/utils/mb/conversion_procs/utf8_and_euc2004/libutf8_and_euc200
  4dll.def #
  src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/libutf8_and_euc_cnd
  ll.def #
  src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/libutf8_and_euc_jpd
  ll.def #
  src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/libutf8_and_euc_krd
  ll.def #
  src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/libutf8_and_euc_twd
  ll.def #
  src/backend/utils/mb/conversion_procs/utf8_and_gb18030/libutf8_and_gb1803
  0dll.def #
  src/backend/utils/mb/conversion_procs/utf8_and_gbk/libutf8_and_gbkdll.def
  #
  src/backend/utils/mb/conversion_procs/utf8_and_iso8859/libutf8_and_iso885
  9dll.def #
  src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/libutf8_and_iso8
  859_1dll.def #
  src/backend/utils/mb/conversion_procs/utf8_and_johab/libutf8_and_johabdll
  .def #
  src/backend/utils/mb/conversion_procs/utf8_and_sjis/libutf8_and_sjisdll.d
  ef #
  src/backend/utils/mb/conversion_procs/utf8_and_sjis2004/libutf8_and_sjis2
  004dll.def #
  src/backend/utils/mb/conversion_procs/utf8_and_uhc/libutf8_and_uhcdll.def
  #
  src/backend/utils/mb/conversion_procs/utf8_and_win/libutf8_and_windll.def
  #   src/bin/initdb/initdb.exe
  #   src/bin/pg_basebackup/pg_basebackup.exe
  #   src/bin/pg_config/pg_config.exe
  #   src/bin/pg_controldata/pg_controldata.exe
  #   src/bin/pg_ctl/pg_ctl.exe
  #   src/bin/pg_dump/pg_dump.exe
  #   src/bin/pg_dump/pg_dumpall.exe
  #   src/bin/pg_dump/pg_restore.exe
  #   src/bin/pg_resetxlog/pg_resetxlog.exe
  #   src/bin/psql/psql.exe
  #   src/bin/scripts/clusterdb.exe
  #   src/bin/scripts/createdb.exe
  #   src/bin/scripts/createlang.exe
  #   src/bin/scripts/createuser.exe
  #   src/bin/scripts/dropdb.exe
  #   src/bin/scripts/droplang.exe
  #   src/bin/scripts/dropuser.exe
  #   src/bin/scripts/reindexdb.exe
  #   src/bin/scripts/vacuumdb.exe
  #   src/include/utils/blob.h
  #   src/interfaces/ecpg/preproc/ecpg.exe
  #   src/pl/plpgsql/src/libplpgsqldll.def
  #   src/test/regress/libregressdll.def
  #   src/test/regress/pg_regress.exe
  #   src/timezone/zic.exe
 
 Based on this list, a global exclude for *.exe and lib*dll.def
 seems reasonable. Andrew, does that seem right to you as well? I
 definitely want to keep the nbproject thing out of there, that belongs
 in a personal exclude.
 
 utils/adt/blob.c and include/utils/blob.h - what is that? That's
 something that's actually from your development work and should not be
 ignored, right?

Thats mine, sorry, looks I didn't get diff directly from master.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org

[HACKERS] .gitignore for some of cygwin files

2011-06-08 Thread Radosław Smogura
After compilation I got big list of differences about binaries and 
other files. You may see it in patch.


Regards,
Radekdiff --git a/.gitignore b/.gitignore
old mode 100644
new mode 100755
index 81c4d5e..75e1d9a
--- a/.gitignore
+++ b/.gitignore
@@ -20,6 +20,14 @@ lcov.info
 *.vcproj
 win32ver.rc
 
+# Defs from gcc in Windows
+*dll.def
+/src/backend/postgres.def
+
+# Windows executables, unless you don't keep file with this extension in 
+# repository global ignore is quite enaugh
+*.exe
+
 # Local excludes in root directory
 /GNUmakefile
 /config.log
@@ -28,3 +36,4 @@ win32ver.rc
 /pgsql.sln.cache
 /Debug/
 /Release/
+/nbproject/private/
\ No newline at end of file

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BLOB support

2011-06-06 Thread Radosław Smogura

On Sun, 05 Jun 2011 22:16:41 +0200, Dimitri Fontaine wrote:

Tom Lane t...@sss.pgh.pa.us writes:
Yes.  I think the appropriate problem statement is provide 
streaming
access to large field values, as an alternative to just 
fetching/storing
the entire value at once.  I see no good reason to import the 
entire
messy notion of LOBS/CLOBS.  (The fact that other databases have 
done it

is not a good reason.)


Spent some time in the archive to confirm a certain “déjà vu”
impression.  Couldn't find it.  Had to manually search in closed 
commit

fests… but here we are, I think:

  https://commitfest.postgresql.org/action/patch_view?id=70
  
http://archives.postgresql.org/message-id/17891.1246301...@sss.pgh.pa.us
  
http://archives.postgresql.org/message-id/4a4bf87e.7010...@ak.jp.nec.com


Regards,


I think more about this with contrast to sent references, but I still 
have in my mind construct
Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit 
outdated we have BlueRay

conn.prepareStatemnt(INSERT INTO someonetubevideos values (?))
where 1st parameter is myWeddingDvd,
or if someone doesn't like Java he/she/it may wish to put C++ istream, 
or C FILE.


I think (with respect to below consideration), this implicite requires 
that LOBs should be stored in one, centralized place doesn't matter if 
this will be file system or special table, or something else, but when 
statement is processed there is no idea with which table LOB will be 
associated, if we want to TOAST, where TOAST it, what will be if 
insertion will by SQL function, which choose table depending on BLOB 
content?


Quite interesting idea from cited patch was about string identifying 
LOB, but with above it close road to for JDBC create LOB. I think, as 
well constructs that insert 1st, small LOB into table to get some driver 
depending API are little bit old fashioned.


Possible solutions, if we don't want centralized storage, may be:
1. Keep BLOB in memory, but this may, depending on implementation, 
reduce size of initial BLOB.
2. Temporally backup blob in file, then when values are stored copy 
file to TOAST table, but still some changes are required to support LOBs 
for complex types and arrays.


So please give some ideas how to resolve this, or may be it has low 
priority?


Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BLOB support

2011-06-06 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Monday 06 of June 2011 16:13:26
 =?UTF-8?Q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
   I think more about this with contrast to sent references, but I still
   have in my mind construct
   Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit
   outdated we have BlueRay
   conn.prepareStatemnt(INSERT INTO someonetubevideos values (?))
   where 1st parameter is myWeddingDvd,
 
 Yes, if you insist upon designing the API like that, then you come to
 the conclusion that you need global LOB identifiers.
 
 However, there are many ways to design this that don't work that way.
 One idea to think about is
 
   insert into someonetubevideos values('')
   returning open_for_write(videocolumn)
 
 which gives you back some kind of writable stream ID (this is a
 transient, within-session ID, not global) for the target field in the
 row you just inserted.
I know, but this is a little bit old-fashioned bahaviour.

 BTW, as was noted upthread by Dimitri, this whole subject has been
 discussed before on pgsql-hackers.  You really ought to go re-read the
 previous threads.
 
   regards, tom lane

I read this, but it may be rethinked again. Actaully changes to TOAST (I mean 
streaming will be just for LOBs, I had written all found disadvantages for 
TOAST for LOB and it's looks like only performance of above is some kind of 
disadvantage, as well this prevent some less usefull concepts of Copy on Write 
for LOBs.

Introducing streaming for TOAST is little useless, sorry just for cite from 
my, mentoined document:

(This is generally about on demand stream of TOASTed value, in 
context of LOBs is acceptable, as long not transactional aware LOBs are 
acceptable). If we will add streaming of TOASTed values, so caller will 
get some reference to this value, we need to ensure that pointed data 
will not be changed, nor deleted - I think this will require caller to 
add FOR UPDATE (or silently to add this by server) for each statement 
returning pointers to TOASTed, as client may do transactional query, and 
other client just after (1st) may remove record, commit, and call 
VACUUM. In this situation when 1st will try to read data form given row, 
it will get error. This may be accpetable for LOBs (commonly LOBs may be not 
transaction aware, but I will be angry if this will happen with VARCHAR)

If this is acceptable I will do following changes.

Add 
- server_max_in_memory_lob_size - GUC server start-only config to describe 
maximum value of client session parameter max_in_memory_lob.

- max_in_memory_lob - session GUC describing how huge LOBs may be keept in 
memory before backing up to file

- rescursivly toasting, detoasting during insert/update/remove for searching 
for LOBs (we need this for arrays and complex types) - this is for last stage 
(error disallowing LOBs in composites/arrays may be quite enaugh, for 
begining) - I want LOBs to be starting point for LOBing other types (e.g. some 
big arrays may be LOBbed).

- during toasting, lob will be toasted and in place of LOB, the reference to 
it will be putted, and encoded in LOB datum.

- api for LOB manipulation (few changes to current implementation) in way that 
BLOB is field type and uninon of in_memory, in_file, in_toast. This few bytes 
for LOB will not affect size of datum looking at size of LOB.

- api for maintaing temoraly lob files, we need this as per session list of id 
- file desc, to prevent prevent stealing of lobs by different connections 
(security)

- streaming api for TOASTED values (based on COPY protocol, or changed COPY 
protocol) or at least function calls - I havent looked at this in context of 
TOASTed LOBs.

Is it good?

Regards,
Radek


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Auto adjust send buffer size to congention window

2011-06-05 Thread Radosław Smogura
Hi,

I've got idea to auto adjust send buffer size to size of TCP congention 
window. Will this increase performance and in which way. I suppose throughput 
may be increased, but latency decreased. What do You think?

Regards,
Radek


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto adjust send buffer size to congention window

2011-06-05 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Monday 06 of June 2011 04:07:41
 =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
  I've got idea to auto adjust send buffer size to size of TCP congention
  window. Will this increase performance and in which way. I suppose
  throughput may be increased, but latency decreased. What do You think?
 
 I think if that makes any difference at all, it would be proof that the
 TCP stack on your machine was implemented by incompetents.  The kernel
 should already be handling such considerations.
 
   regards, tom lane
I was thought about send buffer size in libpq. 

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BLOB support

2011-06-04 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Friday 03 of June 2011 16:44:13
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Excerpts from Radosław Smogura's message of jue jun 02 15:26:29 -0400 
2011:
  So do I understand good should We think about create bettered TOAST to
  support larger values then 30-bit length? I like this much more,
  
  Good :-)
  
  (BTW while it'd be good to have longer-than-30 bit length words for
  varlena, I'm not sure we have room for that.)
 
 You wouldn't want to push such values around as whole values anyway.
 Possibly what would work here is a variant form of TOAST pointer for
 which we'd simply throw error if you tried to fetch the entire value
 at once.
 
   regards, tom lane

I mentoined about JDBC call. 
1. 
b = conn.createBlob();
ps = conn.preparesStatement(INSRT INTO t blob = ?);
ps.setBlob(1, b);
ps.executeQuery();

2.
Statements could be as well too
SELECT blob_index_of(?, ?);
where 1st ? is blob, 2nd one is some text/sequence

This call must be supported to make BLOB as far as possible simillar to other 
types, this actually disallow to put LOB in TOAST as there is no relation (2.) 
or relation is not known (1.) during reading LOB - in any case you can't skip 
bytes from protocol stream, so possible solutions are:

1. Create temporaly LOB in file or memory depending of it size.
2. Use existing Large Object interface.
3. Client will not send LOB, just it's faked ID and during call Server will 
ask client to serialize this LOB, by faked id.
4. Any other propositions?

I vote for 2.

For pg_type/class changes I think about removing haslobs, and put this as 
attstorage or somewhere else for example ('l' may stand for lobs or -3 
length), but currently TOAST composites doesn't support recursion, and those 
types are toasted as whole. I may add recursion for those types, and support 
special maintain for LOBs. In any case handling this will be better in 
toasting code, not in nodeModifyTable.

Any ideas about this?

Reagrds,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BLOB support

2011-06-03 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Friday 03 of June 2011 16:44:13
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Excerpts from Radosław Smogura's message of jue jun 02 15:26:29 -0400 
2011:
  So do I understand good should We think about create bettered TOAST to
  support larger values then 30-bit length? I like this much more,
  
  Good :-)
  
  (BTW while it'd be good to have longer-than-30 bit length words for
  varlena, I'm not sure we have room for that.)
 
 You wouldn't want to push such values around as whole values anyway.
 Possibly what would work here is a variant form of TOAST pointer for
 which we'd simply throw error if you tried to fetch the entire value
 at once.
 
   regards, tom lane

Ok, now it's more clear about this, what You have talked about, but I still 
need to pass constant ID to client.

Actually, this variant must be passed to client.

Form other side, as BLOB may be created before statement invoke or if it's 
called. This will require to create tempolary BLOBs, and introducing v3.1 
protocol, which will allow to stream values greater then 4GB, by passing -2 
size in length fields, and introducing stream_in/out in pg_type (this is from 
my concept of streaming protocol).

So I think better will be to introduce 1st streaming protocol, as it is on top 
LOBs. I will send thread for this in a moment.

 Why?  The tuples are not going away due to MVCC anyway.
Vaccum / autovacumm + no lock may be enaugh, I think. Constant ID is required.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Streaming solution and v3.1 protocol

2011-06-03 Thread Radosław Smogura
Hello,

Sorry for short introduction about this, and plese as far as possible, 
disconnet it from LOBs, as it on top of LOB.

Idea of streaming is to reduce memory copy mainly during receiving and sending 
tuples. Currently receive works as follows
1. Read bytes of tuple (allocate x memory).
2. Eventually convert it to database encoding.
3. Use this data and create datum (which is little changed copy of 1 or 2)
4. Streaming will be allowed only in binary mode, and actually stream in/out 
will return binary data.

Look for example at execution chain from textrecv.

Idea is to add stream_in, stream_out columns pg_type.

When value is to be serialized the sin/sout is called. Struct must pass len of 
data, and struct of stream (simillar to C FILE*).

Caller should validate if all bytes has been consumed (expose simple methods 
for this)

To implement(code API requirements):
First stream is buffered socekt reader.

Fast substreams - create fast stream limited to x bytes basing on other stream

Skipping bytes + skipAll()

Stream filtering - do fast (faster will be if conversion will occure in 
buffered chunks) encoding conversion.

Support for direct PG printf() version. Linux has ability to create cookie 
streams and use it with fprintf(), so its greate advantage to format huge 
strings. Other system should buffer output. Problem is if Linux cookie will 
fail will it write something to output? Windows proxy will push value to temp 
buffer.

Good idea may be to introduce new version of protocol reserving len field 
value 
(-2) for fixed size streams above 4GB
(-3) for chunked streaming - actualy is innovative functionality and it's not 
required by any driver.

In streaming it may be imagined that socket's fd will be passed to sin 
functions.

Problems: during output - something failed while writing. Resolution, add some 
control flags for each n-bytes send to client. This will prevent sending of 
e.g. 4GB of data if first byte filed, You send only n-bytes and then abort is 
received - or send data in frames.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BLOB support

2011-06-03 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Friday 03 of June 2011 18:08:56
 Robert Haas robertmh...@gmail.com writes:
  On Thu, Jun 2, 2011 at 12:53 PM, Radosław Smogura
  
  rsmog...@softperience.eu wrote:
  1. No tracking of unused LO (you store just id of such object). You may
  leak LO after row remove/update. User may write triggers for this, but
  it is not argument - BLOB type is popular, and it's simplicity of use
  is quite important. When I create app this is worst thing.
  
  2. No support for casting in UPDATE/INSERT. So there is no way to simple
  migrate data (e.g. from too long varchars). Or to copy BLOBs.
  
  3. Limitation of field size to 1GB.
  
  As a general point, it would probably be a good idea to address each
  of these issues separately, and to have a separate discussion about
  each one.
  
  As to #1 specifically, if you use a text or bytea field rather than a
  large object per se, then this issue goes away.  But then you lose the
  streaming functionality.  So at least some people here are saying that
  we should try to fix that by adding the streaming functionality to
  text/bytea rather than by doing anything to the large object facility.
 
 #2 is also a problem that only becomes a problem if you insist that LOBs
 have to be a distinct kind of value.
 
   regards, tom lane

And one more topic to discuss. Should blob be referencable, e.g. I create in 
JDBC new Blob, I set stream for it what should happen if I will call
UPDATE t set b = ? where 1=1
?

This is not about copy on write.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BLOB support

2011-06-02 Thread Radosław Smogura

On Thu, 02 Jun 2011 15:39:39 +0300, Peter Eisentraut wrote:

On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:
I partialy implemented following missing LOBs types. Requirement for 
this was
to give ability to create (B/C)LOB columns and add casting 
functionality e.g.

SET my_clob = 'My long text'.

Idea is as follow:
0. Blob is two state object: 1st in memory contains just bytea, 
serialized

contains Oid of large object.
1. Each type has additional boolean haslobs, which is set 
recursivly.
2. Relation has same bool haslobs (used to speed up tables without 
LOBs)
3. When data are inserted/updated then special function is called 
and tuple
is modified in this way all LOBs are serialized to (old) LOB table 
and just

Oid is stored.
4. When removed LOB is removed from (old) LOB table.


Superficially, this looks like a reimplementation of TOAST.
May look similar, but Datums doesn't support more then 32bit length and 
size of any field is limited to 1GB, am I right? Serializations is only 
for casting simple values  1GB, and simple operations, to do not 
overhead creation of hundreds LOBs.


What functionality exactly do you envision that the BLOB and CLOB 
types would
need to have that would warrant treating them different from, say, 
bytea

and text?


Actually I thought about less sophisticated support of LOBs, supporting 
casting and copying data, as well known form other databases idea that 
LOBs are not downloaded during normal query execution (just ids are 
taken). Currently, e.g. LOBs are not connected with tables, so deleting 
rows doesn't delete LOB, table actually holds Oid of large objects, no 
support for casting to/from LOB, no support for CLOBS. Some drivers try 
to emulate BLOBs/CLOBs, but it is not perfect, mainly from above 
reasons.


Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BLOB support

2011-06-02 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Thursday 02 of June 2011 16:42:42
 Robert Haas robertmh...@gmail.com writes:
  But these problems can be fixed without inventing a completely new
  system, I think.  Or at least we should try.  I can see the point of a
  data type that is really a pointer to a LOB, and the LOB gets deleted
  when the pointer is removed, but I don't think that should require
  far-reaching changes all over the system (like relhaslobs) to make it
  work efficiently.  I think you need to start with a problem statement,
  get agreement that it is a problem and on what the solution should be,
  and then go write the code to implement that solution.
 
 Yes.  I think the appropriate problem statement is provide streaming
 access to large field values, as an alternative to just fetching/storing
 the entire value at once.  I see no good reason to import the entire
 messy notion of LOBS/CLOBS.  (The fact that other databases have done it
 is not a good reason.)
 
 For primitive types like text or bytea it seems pretty obvious what
 streaming access should entail, but it might be interesting to
 consider what it should mean for structured types.  For instance, if I
 have an array field with umpteen zillion elements, it might be nice to
 fetch them one at a time using the streaming access mechanism.  I don't
 say that that has to be in the first version, but it'd be a good idea to
 keep that in the back of your head so you don't design a dead-end
 solution that can't be extended in that direction.
 
   regards, tom lane

In context of LOBs streaming is resolved... I use current LO functionallity 
(so driver may be able to read LOBs as psql \lo_export does it or using COPY 
subprotocol) and client should get just LO's id. BLOBs in this implementation, 
like Robert wanted are just wrapper for core LO, with some extensions for 
special situations Adding of relhaslob in this impl is quite importnat to 
do not examine tupledesc for each table operation, but this value may be 
deduced during relation open (with performance penatly). I saw simillar is 
made few lines above when triggers are fired, and few lines below when indices 
are updated. 

Currently BLOBs may be emulated using core LO (JDBC driver does it), but among 
everything else, other problems are, if you look from point of view of 
application developing:

1. No tracking of unused LO (you store just id of such object). You may leak 
LO after row remove/update. User may write triggers for this, but it is not 
argument - BLOB type is popular, and it's simplicity of use is quite 
important. When I create app this is worst thing.

2. No support for casting in UPDATE/INSERT. So there is no way to simple 
migrate data (e.g. from too long varchars). Or to copy BLOBs.

3. Limitation of field size to 1GB.

Other solution, I was think about, is to introduce system triggers (such 
triggers can't be disabled or removed). So there will be new flag in triggers 
table.

Now I think, we should try to mix both aproches, as system triggers may give 
interesting API for other developers.

Other databases (may) store LOBs, Arrays, and Composites in external tables, 
so user get's just id of such object.

I think about two weaks about streaming, I have some concepts about this, but 
from point of view of memory consumption and performance. I will send concept 
later, I want to think a little bit about it once more, and search what can be 
actually done.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BLOB support

2011-06-02 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Thursday 02 of June 2011 19:43:16
 =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
  Tom Lane t...@sss.pgh.pa.us Thursday 02 of June 2011 16:42:42
  
  Yes.  I think the appropriate problem statement is provide streaming
  access to large field values, as an alternative to just fetching/storing
  the entire value at once.  I see no good reason to import the entire
  messy notion of LOBS/CLOBS.  (The fact that other databases have done it
  is not a good reason.)
  
  In context of LOBs streaming is resolved... I use current LO
  functionallity (so driver may be able to read LOBs as psql \lo_export
  does it or using COPY subprotocol) and client should get just LO's id.
 
 Just to be clear: I do not want to expose a concept of object IDs for
 field values in the first place.  All of the problems you enumerate stem
 from the idea that LOBs ought to be a distinct kind of field, and I
 don't buy that.
 
   regards, tom lane

So do I understand good should We think about create bettered TOAST to support 
larger values then 30-bit length? I like this much more, but without Objects 
ID quering relation with lobs will require to lock relation for some time, as 
client will need to reference LOB in some way, I think using TID or some 
derivative of TID, am I right?

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] BLOB support

2011-06-01 Thread Radosław Smogura
Hello,

I partialy implemented following missing LOBs types. Requirement for this was 
to give ability to create (B/C)LOB columns and add casting functionality e.g. 
SET my_clob = 'My long text'.

Idea is as follow:
0. Blob is two state object: 1st in memory contains just bytea, serialized 
contains Oid of large object.
1. Each type has additional boolean haslobs, which is set recursivly.
2. Relation has same bool haslobs (used to speed up tables without LOBs)
3. When data are inserted/updated then special function is called and tuple 
is modified in this way all LOBs are serialized to (old) LOB table and just 
Oid is stored.
4. When removed LOB is removed from (old) LOB table.

I have implemented:
- 0.
- 1. and 2. without altering relations and types.
- 3. Inserts only, Updates need polishing. No recursion (no support for BLOBs 
in composites or arrays).

Current patch is here (many changes to pg_types.h - new column haslobs added), 
it's in _early_ stage for those who wants to look inside it.

Any ideas or suggestions?

Regards,
Radek

P. S. 
I'm during removal, and I have limited access to Internet.


blob_20110601.patch.bz2
Description: application/bzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving the memory allocator

2011-04-26 Thread Radosław Smogura
I didn't followed this topic carefully, so sorry If I wrote something 
that was written, but I thought about following approach at least for 
message sending, etc.:


1. When initializing MemoryContext (pool) give one parameter that will 
be stack size. Stack is addition to normal operations.
2. Operations on stack are only allocations, so alloc will be just 
pointer bump.
3. When allocating memory for messages, internally set size to be 1/3 
greater (or 2/3 I don't remember well) for realloc, may help when 
encoding change occurs.
3. free is no op (as it was pointed), stack will be released or pointer 
resetted, when context is released


From one point of view this may waste some memory (no free), but for 
messages should not happen, and even giving 100kb of memory for stack 
may be quite enough, depending ofcourse how many data you send.


Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving the memory allocator

2011-04-26 Thread Radosław Smogura

On Tue, 26 Apr 2011 14:25:10 +0200, Andres Freund wrote:

On Tuesday, April 26, 2011 01:59:45 PM Radosław Smogura wrote:
 I didn't followed this topic carefully, so sorry If I wrote 
something
 that was written, but I thought about following approach at least 
for

 message sending, etc.:

 1. When initializing MemoryContext (pool) give one parameter that 
will

 be stack size. Stack is addition to normal operations.

You can't allocate memory on the stack and return that.

Andres


I didn't wrote allocate on stack. I thought about preallocate n bytes 
chunk for stack-like memory but without popping.


Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-18 Thread Radosław Smogura

On Sun, 17 Apr 2011 21:06:17 -0400, Robert Haas wrote:

On Sun, Apr 17, 2011 at 5:32 PM, Radosław Smogura
rsmog...@softperience.eu wrote:

Each process has simple mirror of shared descriptors.

I believe that modifications to buffer content may be only done 
when holding
exclusive lock (with some simple exceptions) (+ MVCC), actually I 
saw only two
things that can change already loaded data and cause damage, you 
have
described (setting hint bits during scan, and vacuum - 1st may only 
cause, I
think, that two processes will ask for same transaction statuses 
except
vacuum, 2nd one is impossible as vacumm requires exclusive pin). 
When buffer
tag is changed the version of buffer is bumped up, and checked 
against local

version - this about reading buffer.


Yes, an exclusive lock is required for substantive content changes.
But if vacuum cleaning up the buffer is an issue for your patch, then
it's probably also a problem if someone grabs an exclusive content
lock and deletes the tuple (by setting XMAX) and some other backend
later sees the old buffer contents after having in the meanwhile 
taken
a new snapshot; or if likewise someone grabs an exclusive-lock, adds 
a

tuple, and then your backend takes a new snapshot and then sees the
old buffer contents.  Basically, any time someone grabs an
exclusive-lock and releases it, it's necessary for all observers to
see the updated contents by the time the exclusive lock is released.

In other cases after obtaining lock check is done if buffer has 
associated
updatable buffer and if local mirror has it too, then swap should 
take

place.


I think this check would have to be done every time someone
share-locks the buffer, which seems rather expensive.

Logic about updatable buffers is similar to shared buffers, each 
updatable
buffer has pin count, and updatable buffer can't be free if someone 
uses it,
but in contrast to normal buffers, updatable buffers doesn't have 
any
support for locking etc. Updatable buffers exists only on free list, 
or when

associated with buffer.


I don't see how you're going to get away with removing buffer locks.
They exist for a reason, and adding mmap() to the mix is going to
require MORE locking, not less.

In future, I will change version to shared segment id, something 
like
relation's oid + block, but ids will have continuous numbering 
1,2,3..., so I
will be able to bypass smgr/md during read, and tag version check - 
this looks

like faster solution.


I don't understand this part at all.


To my previous post I want to clarify that updatable buffers are 
implemented in shared memory, so there is no way that process has own 
copy of data.


Regards,
Radek.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Sunday 17 April 2011 01:35:45
 =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
  No, no, no :) I wanted to do this, but from above reason I skipped it. I
  swap VM pages, I do remap, in place where the shared buffer was I put
  mmaped page, and in place where mmaped page was I put shared page (in
  certain cases, which should be optimized by e. g. read for update, for
  initial read of page in process I directly points to shared buffer), it
  can be imagined as I affects TLB. This what I call VM swap is
  remapping, so I don't change pointers, I change only where this pointers
  points in physical memory, preserving same pointer in Virtual Memory.
 
 ... Huh?  Are you saying that you ask the kernel to map each individual
 shared buffer separately?  I can't believe that's going to scale to
 realistic applications.
 
   regards, tom lane

No, I do 
mrempa(mmap_buff_A, MAP_FIXED, temp); 
mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A), 
mrempa(tmp, MAP_FIXED, mmap_buff_A). 

This is this additional overhead - and may have some disadvantages. All 
regions SysV / Posix MMAP are mapped before. 

I couldn't believe too, but as I done some work about read, I was in dead 
corner: 
1. Create Read Before Buffer (connect with XLOG) that will store each page 
before modification (page should be flushed and synced to log)
2. Rewrite whole db to repoint pointers or similar stuff (I done few steps for 
this).
3. Or find something different. 

I couldn't believe too, it's way I still work on it. I saw it gains speed for 
few simple updates. I'm not quite sure why it gets it. I only may think it  
was from pre update reads. But full checks will go after some good point of 
updates.

Regards,
Radek







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Sunday 17 April 2011 17:48:56
 =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
  Tom Lane t...@sss.pgh.pa.us Sunday 17 April 2011 01:35:45
  
  ... Huh?  Are you saying that you ask the kernel to map each individual
  shared buffer separately?  I can't believe that's going to scale to
  realistic applications.
  
  No, I do
  mrempa(mmap_buff_A, MAP_FIXED, temp);
  mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A),
  mrempa(tmp, MAP_FIXED, mmap_buff_A).
 
 There's no mremap() in the Single Unix Spec, nor on my ancient HPUX box,
 nor on my quite-up-to-date OS X box.  The Linux man page for it says
 This call is Linux-specific, and should not be used in programs
 intended to be portable.  So if the patch is dependent on that call,
 it's dead on arrival from a portability standpoint.
Good point. This is from initial concept, and actually I done this to do not 
leave gaps in VM in which library or something could be mmaped. Last time I 
think about using mmap to replace just one VM page.

 But in any case, you didn't explain how use of mremap() avoids the
 problem of the kernel having to maintain a separate page-mapping-table
 entry for each individual buffer.  (Per process, yet.)  If that's what's
 happening, it's going to be a significant performance penalty as well as
 (I suspect) a serious constraint on how many buffers can be managed.
 
   regards, tom lane
Kernel merges vm_structs. So mappings are compacted. I'm not kernel 
specialist, but skipping memory consumption, for not compacted mappings, 
kernel uses btrees for dealing with  TLB, so it should not matter if there is 
100 vm_structs or 10 vm_structs.

Swap isn't made everywhere. When buffer is initialy read (privaterefcount 
==1), then any access to this buffer will directly point to latest valid area. 
If it has assigned shmem area then this will be used. I plan to add 
readbuffer for update to prevent swaps, when it's almost sure that buffer 
will be used for update.

I measured performance of page modifications (with unpining, full process on 
stand alone unit test) it's 2x-3x more time of normal page reads, but this 
result may not be sure, as I saw memcpy to memory above 2GB is slower then 
memcpy to first 2GB (this may be idea to try to put some shared structs  
2GB).

I know that this patch is big question. Sometimes I'm optimistic, and 
sometimes I'm pessimistic about final result.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Radosław Smogura
Andres Freund and...@anarazel.de Sunday 17 April 2011 20:02:11
 On Sunday 17 April 2011 19:26:31 Radosław Smogura wrote:
  Kernel merges vm_structs. So mappings are compacted. I'm not kernel
  specialist, but skipping memory consumption, for not compacted mappings,
  kernel uses btrees for dealing with  TLB, so it should not matter if
  there is  100 vm_structs or 10 vm_structs.
 
 But the CPUs TLB cache has maybe 16/256 (1lvl, 2nd) to 64/512 entries. That
 will mean that there will be cachemisses all over.
 Additionally your scheme requires flushing it regularly...
 
 Andres

I only know Phenom has 4096 entries I think and this covers 16MB of memory. 
But I was taking about memory usage of struct vm_struct in kernel. I tries as 
well with huge pages, but I can't write really fast allocator for this, it's 
slower then malloc, maybe from different reasons.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Radosław Smogura
Robert Haas robertmh...@gmail.com Sunday 17 April 2011 22:01:55
 On Sun, Apr 17, 2011 at 11:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
  Tom Lane t...@sss.pgh.pa.us Sunday 17 April 2011 01:35:45
  
  ... Huh?  Are you saying that you ask the kernel to map each individual
  shared buffer separately?  I can't believe that's going to scale to
  realistic applications.
  
  No, I do
  mrempa(mmap_buff_A, MAP_FIXED, temp);
  mremap(shared_buff_Y, MAP_FIXED, mmap_buff_A),
  mrempa(tmp, MAP_FIXED, mmap_buff_A).
  
  There's no mremap() in the Single Unix Spec, nor on my ancient HPUX box,
  nor on my quite-up-to-date OS X box.  The Linux man page for it says
  This call is Linux-specific, and should not be used in programs
  intended to be portable.  So if the patch is dependent on that call,
  it's dead on arrival from a portability standpoint.
  
  But in any case, you didn't explain how use of mremap() avoids the
  problem of the kernel having to maintain a separate page-mapping-table
  entry for each individual buffer.  (Per process, yet.)  If that's what's
  happening, it's going to be a significant performance penalty as well as
  (I suspect) a serious constraint on how many buffers can be managed.
 
 I share your suspicions, although no harm in measuring it.
 
 But I don't understand is how this approach avoids the problem of
 different processes seeing different buffer contents.  If backend A
 has the buffer mmap'd and backend B wants to modify it (and changes
 the mapping), backend A is still looking at the old buffer contents,
 isn't it?  And then things go boom.

Each process has simple mirror of shared descriptors.

I believe that modifications to buffer content may be only done when holding 
exclusive lock (with some simple exceptions) (+ MVCC), actually I saw only two 
things that can change already loaded data and cause damage, you have 
described (setting hint bits during scan, and vacuum - 1st may only cause, I 
think, that two processes will ask for same transaction statuses except 
vacuum, 2nd one is impossible as vacumm requires exclusive pin). When buffer 
tag is changed the version of buffer is bumped up, and checked against local 
version - this about reading buffer. 

In other cases after obtaining lock check is done if buffer has associated 
updatable buffer and if local mirror has it too, then swap should take 
place.

Logic about updatable buffers is similar to shared buffers, each updatable 
buffer has pin count, and updatable buffer can't be free if someone uses it, 
but in contrast to normal buffers, updatable buffers doesn't have any 
support for locking etc. Updatable buffers exists only on free list, or when 
associated with buffer.

In future, I will change version to shared segment id, something like 
relation's oid + block, but ids will have continuous numbering 1,2,3..., so I 
will be able to bypass smgr/md during read, and tag version check - this looks 
like faster solution.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-17 Thread Radosław Smogura
Robert Haas robertmh...@gmail.com Monday 18 April 2011 03:06:17
 On Sun, Apr 17, 2011 at 5:32 PM, Radosław Smogura
 
 rsmog...@softperience.eu wrote:
  Each process has simple mirror of shared descriptors.
  
  I believe that modifications to buffer content may be only done when
  holding exclusive lock (with some simple exceptions) (+ MVCC), actually
  I saw only two things that can change already loaded data and cause
  damage, you have described (setting hint bits during scan, and vacuum -
  1st may only cause, I think, that two processes will ask for same
  transaction statuses except vacuum, 2nd one is impossible as vacumm
  requires exclusive pin). When buffer tag is changed the version of
  buffer is bumped up, and checked against local version - this about
  reading buffer.
 
 Yes, an exclusive lock is required for substantive content changes.
 But if vacuum cleaning up the buffer is an issue for your patch, then
 it's probably also a problem if someone grabs an exclusive content
 lock and deletes the tuple (by setting XMAX) and some other backend
 later sees the old buffer contents after having in the meanwhile taken
 a new snapshot; or if likewise someone grabs an exclusive-lock, adds a
 tuple, and then your backend takes a new snapshot and then sees the
 old buffer contents.  Basically, any time someone grabs an
 exclusive-lock and releases it, it's necessary for all observers to
 see the updated contents by the time the exclusive lock is released.
  In other cases after obtaining lock check is done if buffer has
  associated updatable buffer and if local mirror has it too, then swap
  should take place.
 
 I think this check would have to be done every time someone
 share-locks the buffer, which seems rather expensive.
I don't treat as issues, but it's disadvantage.

  Logic about updatable buffers is similar to shared buffers, each
  updatable buffer has pin count, and updatable buffer can't be free if
  someone uses it, but in contrast to normal buffers, updatable buffers
  doesn't have any support for locking etc. Updatable buffers exists only
  on free list, or when associated with buffer.
 
 I don't see how you're going to get away with removing buffer locks.
 They exist for a reason, and adding mmap() to the mix is going to
 require MORE locking, not less.
 
  In future, I will change version to shared segment id, something like
  relation's oid + block, but ids will have continuous numbering 1,2,3...,
  so I will be able to bypass smgr/md during read, and tag version check -
  this looks like faster solution.
 
 I don't understand this part at all.
Versioning is witch approach where I thought about really often changes of 
mmaped areas, I allocated part of segments, but now the segment is mmaped with 
reservation, to it's full possible size, addresses of segments can't change 
(problem is only with segment deletion).

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Radosław Smogura
Greg Stark gsst...@mit.edu Saturday 16 April 2011 13:00:19
 On Sat, Apr 16, 2011 at 7:24 AM, Robert Haas robertmh...@gmail.com wrote:
  The OP says that this patch maintains the WAL-before-data rule without
  any explanation of how it accomplishes that seemingly quite amazing
  feat.  I assume I'm going to have to read this patch at some point to
  refute this assertion, and I think that sucks. I am pretty nearly 100%
  confident that this approach is utterly doomed, and I don't want to
  spend a lot of time on it unless someone can provide me with a
  compelling explanation of why my confidence is misplaced.
 
 Fwiw he did explain how he did that. Or at least I think he did --
 it's possible I read what I expected because what he came up with is
 something I've recently been thinking about.
 
 What he did, I gather, is treat the mmapped buffers as a read-only
 copy of the data. To actually make any modifications he copies it into
 shared buffers and treats them like normal. When the buffers get
 flushed from memory they get written and then the pointers get
 repointed back at the mmapped copy. Effectively this means the shared
 buffers get extended to include all of the filesystem cache instead of
 having to evict buffers from shared buffers just because you want to
 read another one that's already in filesystem cache.
 
 It doesn't save the copying between filesystem cache and shared
 buffers for buffers that are actually being written to. But it does
 save some amount of other copies on read-only traffic and it can even
 save some i/o. It does require a function call before each buffer
 modification where the pattern is currently lock buffer, mutate
 buffer, mark buffer dirty. From what he describes he needs to add a
 prepare buffer for mutation between the lock and mutate.
 
 I think it's an interesting experiment and it's good to know how to
 solve some of the subproblems. Notably, how do you extend files or
 drop them atomically across processes? And how do you deal with
 getting the mappings to be the same across all the processes or deal
 with them being different? But I don't think it's a great long-term
 direction. It just seems clunky to have to copy things from mmapped
 buffers to local buffers and back. Perhaps the performance testing
 will show that clunkiness is well worth it but we'll need to see that
 for a wide variety of workloads to judge that.

In short words, I swap, exchange (clash of terms) VM pages to prevent pointers 
(only if needed). I tried to directly point to new memory area, but I saw that 
some parts of code really depends on memory pointed by original pointers, e.g. 
Vaccumm uses hint bits setted by previous scan (it depends on this if bit is 
set or not! so for it it's not only hint). Just from this case I can't assume 
there is no more such places, so VM pages swap does it for me.

Stand alone tests shows for me that this process (with copy from mmap) is 
2x-3x time longer then previous. But until someone will not update whole 
table, then benefit will be taken from pre-update scan, index scans, larger 
availability of memory (you don't eat cache memory to keep copy of cache in 
ShMem). Everything may be slower when database fits in ShMem, and similarly 
(2nd level bufferes may increase performance slightly).

I reserve memory for whole segment even if file is smaller. Extending is by 
wirte one byte at the end of block (here may come deal with Unfiorm Buffer 
Caches, if I remember name well). For current processors, and current 
implementation database size is limited to about 260TB (no dynamic segment 
reservation is performed).

Truncation not implemented.

Each buffer descriptor has tagVersion to simple check if buffer tag has 
changed. Descriptors (partially) are mirrored in local memory, and versions 
are checked. Currently each re-read (is pointed to smgr/md), but introduce 
shared segment id, and assuming each segment has constant maximum number of 
blocks, will make it faster (this will be something like current buffer tag), 
even version field will be unneeded.

I saw problems with vacuum, as it reopens relation and I got mappings of same 
file twice (minor problem). Important will be about deletion, when pointers 
must invalidated in good way.

Regards,
Radek.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-16 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Saturday 16 April 2011 17:02:32
 Greg Stark gsst...@mit.edu writes:
  What he did, I gather, is treat the mmapped buffers as a read-only
  copy of the data. To actually make any modifications he copies it into
  shared buffers and treats them like normal. When the buffers get
  flushed from memory they get written and then the pointers get
  repointed back at the mmapped copy.
 
 That seems much too late --- won't other processes still be looking at
 the stale mmap'ed version of the page until a write-out happens?
No, no, no :) I wanted to do this, but from above reason I skipped it. I swap 
VM pages, I do remap, in place where the shared buffer was I put mmaped page, 
and in place where mmaped page was I put shared page (in certain cases, which 
should be optimized by e. g. read for update, for initial read of page in 
process I directly points to shared buffer), it can be imagined as I affects 
TLB. This what I call VM swap is remapping, so I don't change pointers, I 
change only where this pointers points in physical memory, preserving same 
pointer in Virtual Memory.

if 0x1 is start of buffer 1 (at relation 1, block 1)
I have 
0x1 - 0x1 + BLCKSZ - mmaped area
0xf1000 - 0xf1000 + BLCKSZ - Shmem

SWAP
0x1 - 0x1 + BLCKSZ - Shmem 
0xf1000 - 0xf1000 + BLCKSZ - mmaped area

It's reason I putted in crash reports /proc/{pid}/maps. For e. g. maps after 
swap looks like (from crash report):

[...]
#Data mappings
7fe69b7e3000-7fe69b7ef000 r--s  08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
7fe69b7ef000-7fe69b7f1000 rw-s 00148000 00:04 8880132
/SYSV0052ea91 (deleted)
7fe69b7f1000-7fe6db7e3000 r--s e000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
[...]
#SysV shmem mappings

7fec60788000-7fec6078c000 rw-s 00144000 00:04 8880132
/SYSV0052ea91 (deleted)
7fec6078c000-7fec6078e000 r--s c000 08:03 3196408
/home/radek/src/postgresql-2nd-level-cache/db/base/12822/12516
7fec6078e000-7fec6079c000 rw-s 0014a000 00:04 8880132
/SYSV0052ea91 (deleted)
[...]

Without swap 12516 should be mapped to one VM region of size equal to 
BLCKSZ*BLOCKS_PER_SEGMENT (which is about 1GB).

When process reads buffer (or after taking lock), the shared buffer descriptor 
is checked if page was modified (currently is it dirty) if yes do swap, if 
page is currently in use, or use directly SysV shared areas if pages is just 
pinned to process.

Regards,
Radek

 I'm pretty concerned about the memory efficiency of this too, since it
 seems like it's making it *guaranteed*, not just somewhat probable,
 that there are two copies in RAM of every database page that's been
 modified since the last checkpoint (or so).
 
   regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] MMAP Buffers

2011-04-15 Thread Radosław Smogura

Hello,

If I may, I want to share some concept to use mmap in PG. It's far, far 
away from perfect, but it's keeps WAL before data. As well I crated 
table, with index, inserted few values, and I done vacuum full on this 
table. Db inits welcome from orginal sources.


Performance of read (if backend is loaded) is really good, query time 
goes down from 450ms to about 410ms. Update may be slower - but work is 
in progress (I will start with write, as I went to point when simple 
updates may be performed). Even that I didn't covered all aspects off 
updating, it's simple to do it, just to call PreopareBufferToUpdate 
before modifing buffer, ofc some ideas of increasing this are still in 
my head.


Any comments, suggestions welcome.

I didn't included this, as diff, because of ~150kb size (mainly 
configure scripts, which are included in SVC). Due to this, You may 
download it from 
http://softperience.eu/downloads/pg_mmap_20110415.diff.bz2 (Legal: Work 
under PostgreSQL BSD Lincense). Patch is just GIT diff, later I will try 
to grab some git.


Regards and have a nice day,
Radek.

P.S. This problem about assert with signals, I wrote... I merged with 
last master, and rebuilded code. I think, I forgot to rebuild it after 
previous merge.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-15 Thread Radosław Smogura

On Fri, 15 Apr 2011 14:33:37 +0300, Heikki Linnakangas wrote:

On 15.04.2011 13:32, Radosław Smogura wrote:
If I may, I want to share some concept to use mmap in PG. It's far, 
far

away from perfect, but it's keeps WAL before data. As well I crated
table, with index, inserted few values, and I done vacuum full on 
this

table. Db inits welcome from orginal sources.

Performance of read (if backend is loaded) is really good, query 
time
goes down from 450ms to about 410ms. Update may be slower - but work 
is

in progress (I will start with write, as I went to point when simple
updates may be performed). Even that I didn't covered all aspects 
off

updating, it's simple to do it, just to call PreopareBufferToUpdate
before modifing buffer, ofc some ideas of increasing this are still 
in

my head.

Any comments, suggestions welcome.


The patch is quite hard to read because of random whitespace changes
and other stylistic issues, but I have a couple of high-level
questions on the design:
Yes, but, hmm... in Netbeans I had really long gaps (probably 8 spaces, 
from tabs), so deeper ifs, comments at the and of variables, went of 
out my screen. I really wanted to not format this, but sometimes I 
needed.



* Does each process have its own mmappings, or are the mmapping done
to shared buffers?
Those are MAP_SHARED mappings, but each process has it's own pointer to 
this.



* How do you handle locking?
I do not do locking... I do different thing (worst)... When buffer 
should be updated, it gets shared buffer, content is copied (so 
situation almost same like fread), and depending on situation content is 
used directly or pages between mmaping and shared (updatable) regions 
are swapped - it keeps tuple pointers, etc. I really would be happy if 
such method (lock flushing to file) could exists.



Do you still need to allocate a shared buffer for each mmapped page?
Currently each mmaped page has additional shared buffer, but it's 
almost ready to use independent pool of shared buffers. This will be 
good, as mmaped buffers could cover whole system cache, keeping maybe 
10%-20% of this size for write in SHMEM.


I think about MAP_PRIVATE, but those has some pluses and minuses, e.g. 
MAP_SHARED may be, for less critical systems, simplier equipped with GUC

mmap_direct_write=true.

Regards,
Radek


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MMAP Buffers

2011-04-15 Thread Radosław Smogura
Joshua Berkus j...@agliodbs.com Friday 15 April 2011 18:55:04
 Radoslaw,
 
 10% improvement isn't very impressive from a switch to mmap.  What workload
 did you test with?  What I'd really like to see is testing with databases
 which are 50%, 90% and 200% the size of RAM ... that's where I'd expect
 the greatest gain from limiting copying.
I think 10% is quite good, as my stand-alone test of mmap vs. read shown that 
speed up of copying 100MB data to mem may be from ~20ms to ~100ms (depends on 
destination address). Of course deeper, system test simulating real usage will 
say more. In any case after good deals with writes, I will speed up reads. I 
think to bypass smgr/md much more and to expose shared id's (1,2,3...) for 
each file segment.

Going to topic...

In attachment I sent test-scripts which I used to fill data, nothing complex 
(left from 2nd level caches).

Query I've used to measure was SELECT count(substr(content, 1, 1)) FROM 
testcase1 WHERE multi_id  5;

Timings ware taken from psql.

I didn't made load (I have about 2GB of free sapce at /home, and 4GB RAM) and 
stress (I'm not quite ready to try concurrent updates of same page - may fail, 
notice is and place to fix is in code) tests yet.

  Netbeans is possibly not very well suited to working on postgres code.
  AFAIK emacs and/or vi(m) are used by almost all the major developers.
 
 Guys, can we *please* focus on the patch for now, rather than the
 formatting, which is fixable with sed?
Netbeans is quite good, of course it depends who likes what. Just try 7.0 RC 
2.

Regards,
Radek


test-scritps_20110319_0026.tar.bz2
Description: application/bzip-compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-24 Thread Radosław Smogura
Jim Nasby j...@nasby.net Thursday 24 March 2011 20:36:48
 On Mar 23, 2011, at 5:12 PM, Tom Lane wrote:
  Robert Haas robertmh...@gmail.com writes:
  It looks like the only way anything can ever get put on the free list
  right now is if a relation or database is dropped.  That doesn't seem
  too good.
  
  Why not?  AIUI the free list is only for buffers that are totally dead,
  ie contain no info that's possibly of interest to anybody.  It is *not*
  meant to substitute for running the clock sweep when you have to discard
  a live buffer.
 
 Turns out we've had this discussion before:
 http://archives.postgresql.org/pgsql-hackers/2010-12/msg01088.php and
 http://archives.postgresql.org/pgsql-hackers/2010-12/msg00689.php
 
 Tom made the point in the first one that it might be good to proactively
 move buffers to the freelist so that backends would normally just have to
 hit the freelist and not run the sweep.
 
 Unfortunately I haven't yet been able to do any performance testing of any
 of this... perhaps someone else can try and measure the amount of time
 spent by backends running the clock sweep with different shared buffer
 sizes. --
 Jim C. Nasby, Database Architect   j...@nasby.net
 512.569.9461 (cell) http://jim.nasby.net

Will not be enough to take spin lock (or make ASM (lock) and increment call 
for Intels/AMD) around increment StrategyControl-nextVictimBuffer, everything 
here may be controlled by macro GetNextVictimBuffer(). Within for (;;) the 
valid buffer may be obtained with modulo NBuffers, to decrease lock time. We 
may try to calculate how many buffers we had skipped, and decrease e.g. 
trycount by this value, and put some additional restriction like no more 
passes then NBuffers*4 calls, and notify error.

This will made clock sweep concurrent.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-24 Thread Radosław Smogura
Robert Haas robertmh...@gmail.com Thursday 24 March 2011 22:41:19
 On Thu, Mar 24, 2011 at 5:34 PM, Greg Stark gsst...@mit.edu wrote:
  On Thu, Mar 24, 2011 at 8:59 PM, Robert Haas robertmh...@gmail.com 
wrote:
  It seems at least plausible that buffer allocation could be
  significantly faster if it need only pop the head of a list, rather
  than scanning until it finds a suitable candidate.  Moving as much
  buffer allocation work as possible into the background seems like it
  ought to be useful.
  
  Linked lists are notoriously non-concurrent, that's the whole reason
  for the clock sweep algorithm to exist at all instead of just using an
  LRU directly. That said, an LRU needs to be able to remove elements
  from the middle and not just enqueue elements on the tail, so the
  situation isn't exactly equivalent.
  
  Just popping off the head is simple enough but the bgwriter would need
  to be able to add elements to the tail of the list and the people
  popping elements off the head would need to compete with it for the
  lock on the list. And I think you need a single lock for the whole
  list because of the cases where the list becomes a single element or
  empty.
  
  The main impact this list would have is that it would presumably need
  some real number of buffers to satisfy the pressure for victim buffers
  for a real amount of time. That would represent a decrease in cache
  size, effectively evicting buffers from cache as if the cache were
  smaller by that amount.
  
  Theoretical results are that a small change in cache size affects
  cache hit rates substantially. I'm not sure that's born out by
  practical experience with Postgres though. People tend to either be
  doing mostly i/o or very little i/o. Cache hit rate only really
  matters and is likely to be affected by small changes in cache size in
  the space in between
 
 You wouldn't really have to reduce the effective cache size - there's
 logic in there to just skip to the next buffer if the first one you
 pull off the freelist has been reused.  But the cache hit rates on
 those buffers would (you'd hope) be fairly low, since they are the
 ones we're about to reuse.  Maybe it doesn't work out to a win,
 though.
If I may,
Under unnormal circumstances (like current process is held by kernel) 
obtaining buffer from list may be cheaper
this code 
while (StrategyControl-firstFreeBuffer = 0)
{
buf = BufferDescriptors[StrategyControl-firstFreeBuffer];
Assert(buf-freeNext != FREENEXT_NOT_IN_LIST);

/* Unconditionally remove buffer from freelist */
StrategyControl-firstFreeBuffer = buf-freeNext;
buf-freeNext = FREENEXT_NOT_IN_LIST;
could look 
do
{
SpinLock();
if (StrategyControl-firstFreeBuffer = 0) {
Unspin()
break;
}

buf = BufferDescriptors[StrategyControl-firstFreeBuffer];
Unspin();

Assert(buf-freeNext != FREENEXT_NOT_IN_LIST);

/* Unconditionally remove buffer from freelist */
StrategyControl-firstFreeBuffer = buf-freeNext;
buf-freeNext = FREENEXT_NOT_IN_LIST;like this
}while(true);
and aquirng spin lock for linked list is enaugh, and cheaper then taking 
lwlock is more complex than spin on this.

after this simmilary with spin lock on 
trycounter = NBuffers*4;
for (;;)
{
spinlock()
buf = BufferDescriptors[StrategyControl-nextVictimBuffer];

if (++StrategyControl-nextVictimBuffer = NBuffers)
{
StrategyControl-nextVictimBuffer = 0;
StrategyControl-completePasses++;
}
unspin();

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Radosław Smogura
Merlin Moncure mmonc...@gmail.com Monday 21 March 2011 20:58:16
 On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark gsst...@mit.edu wrote:
  On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com 
wrote:
  Can't you make just one large mapping and lock it in 8k regions? I
  thought the problem with mmap was not being able to detect other
  processes
  (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.htm
  l) compatibility issues (possibly obsolete), etc.
  
  I was assuming that locking part of a mapping would force the kernel
  to split the mapping. It has to record the locked state somewhere so
  it needs a data structure that represents the size of the locked
  section and that would, I assume, be the mapping.
  
  It's possible the kernel would not in fact fall over too badly doing
  this. At some point I'll go ahead and do experiments on it. It's a bit
  fraught though as it the performance may depend on the memory
  management features of the chipset.
  
  That said, that's only part of the battle. On 32bit you can't map the
  whole database as your database could easily be larger than your
  address space. I have some ideas on how to tackle that but the
  simplest test would be to just mmap 8kB chunks everywhere.
 
 Even on 64 bit systems you only have 48 bit address space which is not
 a theoretical  limitation.  However, at least on linux you can map in
 and map out pretty quick (10 microseconds paired on my linux vm) so
 that's not so big of a deal.  Dealing with rapidly growing files is a
 problem.  That said, probably you are not going to want to reserve
 multiple gigabytes in 8k non contiguous chunks.
 
  But it's worse than that. Since you're not responsible for flushing
  blocks to disk any longer you need some way to *unlock* a block when
  it's possible to be flushed. That means when you flush the xlog you
  have to somehow find all the blocks that might no longer need to be
  locked and atomically unlock them. That would require new
  infrastructure we don't have though it might not be too hard.
  
  What would be nice is a mlock_until() where you eventually issue a
  call to tell the kernel what point in time you've reached and it
  unlocks everything older than that time.
 
 I wonder if there is any reason to mlock at all...if you are going to
 'do' mmap, can't you just hide under current lock architecture for
 actual locking and do direct memory access without mlock?
 
 merlin
I can't reproduce this. Simple test shows 2x faster read with mmap that 
read();

I'm sending this what I done with mmap (really ugly, but I'm in forest). It is 
read only solution, so init database first with some amount of data (I have 
about 300MB) (2nd level scripts may do this for You).

This what I found:
1. If I not require to put new mmap (mmap with FIXED) in previous region (just 
I do munmap / mmap) with each query, execution time grows, about 10%.

2. Sometimes is enough just to comment or uncomment something that do not have 
side effects on code flow (bufmgr.c; (un)comment some unused if; put NULL, it 
will be replaced), and e.g. query execution time may grow 2x.

3. My initial solution, was 2% faster, about 9ms when reading, now it's 10% 
slower, after making them more usable.

Regards,
Radek


pg_mmap_20110323.patch.bz2
Description: application/bzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Radosław Smogura
Merlin Moncure mmonc...@gmail.com Tuesday 22 March 2011 23:06:02
 On Tue, Mar 22, 2011 at 4:28 PM, Radosław Smogura
 
 rsmog...@softperience.eu wrote:
  Merlin Moncure mmonc...@gmail.com Monday 21 March 2011 20:58:16
  
  On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark gsst...@mit.edu wrote:
   On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com
  
  wrote:
   Can't you make just one large mapping and lock it in 8k regions? I
   thought the problem with mmap was not being able to detect other
   processes
   (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.h
   tm l) compatibility issues (possibly obsolete), etc.
   
   I was assuming that locking part of a mapping would force the kernel
   to split the mapping. It has to record the locked state somewhere so
   it needs a data structure that represents the size of the locked
   section and that would, I assume, be the mapping.
   
   It's possible the kernel would not in fact fall over too badly doing
   this. At some point I'll go ahead and do experiments on it. It's a bit
   fraught though as it the performance may depend on the memory
   management features of the chipset.
   
   That said, that's only part of the battle. On 32bit you can't map the
   whole database as your database could easily be larger than your
   address space. I have some ideas on how to tackle that but the
   simplest test would be to just mmap 8kB chunks everywhere.
  
  Even on 64 bit systems you only have 48 bit address space which is not
  a theoretical  limitation.  However, at least on linux you can map in
  and map out pretty quick (10 microseconds paired on my linux vm) so
  that's not so big of a deal.  Dealing with rapidly growing files is a
  problem.  That said, probably you are not going to want to reserve
  multiple gigabytes in 8k non contiguous chunks.
  
   But it's worse than that. Since you're not responsible for flushing
   blocks to disk any longer you need some way to *unlock* a block when
   it's possible to be flushed. That means when you flush the xlog you
   have to somehow find all the blocks that might no longer need to be
   locked and atomically unlock them. That would require new
   infrastructure we don't have though it might not be too hard.
   
   What would be nice is a mlock_until() where you eventually issue a
   call to tell the kernel what point in time you've reached and it
   unlocks everything older than that time.
  
  I wonder if there is any reason to mlock at all...if you are going to
  'do' mmap, can't you just hide under current lock architecture for
  actual locking and do direct memory access without mlock?
  
  merlin
  
  Actually after dealing with mmap and adding munmap I found crucial thing
  why to not use mmap:
  You need to munmap, and for me this takes much time, even if I read with
  SHARED | PROT_READ, it's looks like Linux do flush or something else,
  same as with MAP_FIXED, MAP_PRIVATE, etc.
 
 can you produce small program demonstrating the problem?  This is not
 how things should work AIUI.
 
 I was thinking about playing with mmap implementation of clog system
 -- it's perhaps better fit.  clog is rigidly defined size, and has
 very high performance requirements.  Also it's much less changes than
 reimplementing heap buffering, and maybe not so much affected by
 munmap.
 
 merlin

Ah... just one thing, maybe usefull why performance is lost with huge memory. 
I saw mmaped buffers are allocated in something like 0x007, so definitly above 
4gb.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-23 Thread Radosław Smogura
Greg Stark gsst...@mit.edu Wednesday 23 March 2011 21:30:04
 On Wed, Mar 23, 2011 at 8:00 PM, Robert Haas robertmh...@gmail.com wrote:
  It looks like the only way anything can ever get put on the free list
  right now is if a relation or database is dropped.  That doesn't seem
  too good.  I wonder if the background writer shouldn't be trying to
  maintain the free list.  That is, perhaps BgBufferSync() should notice
  when the number of free buffers drops below some threshold, and run
  the clock sweep enough to get it back up to that threshold.
 
 I think this is just a terminology discrepancy. In postgres the free
 list is only used for buffers that contain no useful data at all. The
 only time there are buffers on the free list is at startup or if a
 relation or database is dropped.
 
 Most of the time blocks are read into buffers that already contain
 other data. Candidate buffers to evict are buffers that have been used
 least recently. That's what the clock sweep implements.
 
 What the bgwriter's responsible for is looking at the buffers *ahead*
 of the clock sweep and flushing them to disk. They stay in ram and
 don't go on the free list, all that changes is that they're clean and
 therefore can be reused without having to do any i/o.
 
 I'm a bit skeptical that this works because as soon as bgwriter
 saturates the i/o the os will throttle the rate at which it can write.
 When that happens even a few dozens of milliseconds will be plenty to
 allow the purely user-space processes consuming the buffers to catch
 up instantly.
 
 But Greg Smith has done a lot of work tuning the bgwriter so that it
 is at least useful in some circumstances. I could well see it being
 useful for systems where latency matters and the i/o is not saturated.

Freelist is almost useless under normal operations, but it's only one check if 
it's empty or not, which could be optimized by checking (... -1), or !(...  
0)

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-22 Thread Radosław Smogura
Merlin Moncure mmonc...@gmail.com Monday 21 March 2011 20:58:16
 On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark gsst...@mit.edu wrote:
  On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com 
wrote:
  Can't you make just one large mapping and lock it in 8k regions? I
  thought the problem with mmap was not being able to detect other
  processes
  (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.htm
  l) compatibility issues (possibly obsolete), etc.
  
  I was assuming that locking part of a mapping would force the kernel
  to split the mapping. It has to record the locked state somewhere so
  it needs a data structure that represents the size of the locked
  section and that would, I assume, be the mapping.
  
  It's possible the kernel would not in fact fall over too badly doing
  this. At some point I'll go ahead and do experiments on it. It's a bit
  fraught though as it the performance may depend on the memory
  management features of the chipset.
  
  That said, that's only part of the battle. On 32bit you can't map the
  whole database as your database could easily be larger than your
  address space. I have some ideas on how to tackle that but the
  simplest test would be to just mmap 8kB chunks everywhere.
 
 Even on 64 bit systems you only have 48 bit address space which is not
 a theoretical  limitation.  However, at least on linux you can map in
 and map out pretty quick (10 microseconds paired on my linux vm) so
 that's not so big of a deal.  Dealing with rapidly growing files is a
 problem.  That said, probably you are not going to want to reserve
 multiple gigabytes in 8k non contiguous chunks.
 
  But it's worse than that. Since you're not responsible for flushing
  blocks to disk any longer you need some way to *unlock* a block when
  it's possible to be flushed. That means when you flush the xlog you
  have to somehow find all the blocks that might no longer need to be
  locked and atomically unlock them. That would require new
  infrastructure we don't have though it might not be too hard.
  
  What would be nice is a mlock_until() where you eventually issue a
  call to tell the kernel what point in time you've reached and it
  unlocks everything older than that time.
 
 I wonder if there is any reason to mlock at all...if you are going to
 'do' mmap, can't you just hide under current lock architecture for
 actual locking and do direct memory access without mlock?
 
 merlin

Actually after dealing with mmap and adding munmap I found crucial thing why 
to not use mmap:
You need to munmap, and for me this takes much time, even if I read with 
SHARED | PROT_READ, it's looks like Linux do flush or something else, same as 
with MAP_FIXED, MAP_PRIVATE, etc.

Regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-21 Thread Radosław Smogura
Merlin Moncure mmonc...@gmail.com Monday 21 March 2011 20:58:16
 On Mon, Mar 21, 2011 at 2:08 PM, Greg Stark gsst...@mit.edu wrote:
  On Mon, Mar 21, 2011 at 3:54 PM, Merlin Moncure mmonc...@gmail.com 
wrote:
  Can't you make just one large mapping and lock it in 8k regions? I
  thought the problem with mmap was not being able to detect other
  processes
  (http://www.mail-archive.com/pgsql-general@postgresql.org/msg122301.htm
  l) compatibility issues (possibly obsolete), etc.
  
  I was assuming that locking part of a mapping would force the kernel
  to split the mapping. It has to record the locked state somewhere so
  it needs a data structure that represents the size of the locked
  section and that would, I assume, be the mapping.
  
  It's possible the kernel would not in fact fall over too badly doing
  this. At some point I'll go ahead and do experiments on it. It's a bit
  fraught though as it the performance may depend on the memory
  management features of the chipset.
  
  That said, that's only part of the battle. On 32bit you can't map the
  whole database as your database could easily be larger than your
  address space. I have some ideas on how to tackle that but the
  simplest test would be to just mmap 8kB chunks everywhere.
 
 Even on 64 bit systems you only have 48 bit address space which is not
 a theoretical  limitation.  However, at least on linux you can map in
 and map out pretty quick (10 microseconds paired on my linux vm) so
 that's not so big of a deal.  Dealing with rapidly growing files is a
 problem.  That said, probably you are not going to want to reserve
 multiple gigabytes in 8k non contiguous chunks.
 
  But it's worse than that. Since you're not responsible for flushing
  blocks to disk any longer you need some way to *unlock* a block when
  it's possible to be flushed. That means when you flush the xlog you
  have to somehow find all the blocks that might no longer need to be
  locked and atomically unlock them. That would require new
  infrastructure we don't have though it might not be too hard.
  
  What would be nice is a mlock_until() where you eventually issue a
  call to tell the kernel what point in time you've reached and it
  unlocks everything older than that time.
Sorry for curious, but I think mlock is for swap prevent not for flush 
prevent.

 I wonder if there is any reason to mlock at all...if you are going to
 'do' mmap, can't you just hide under current lock architecture for
 actual locking and do direct memory access without mlock?
 
 merlin

mmap man do not say anything about when flush occurs when mmap is file and is 
shared, so flushes may be intended or not. Much more, this what I read, SysV 
shared memory is emulated by mmap (and I think this mmap is on /dev/shm)

Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 2nd Level Buffer Cache

2011-03-18 Thread Radosław Smogura
Kevin Grittner kevin.gritt...@wicourts.gov Thursday 17 March 2011 22:02:18
 Rados*aw Smogurarsmog...@softperience.eu wrote:
  I have implemented initial concept of 2nd level cache. Idea is to
  keep some segments of shared memory for special buffers (e.g.
  indices) to prevent overwrite those by other operations. I added
  those functionality to nbtree index scan.
  
  I tested this with doing index scan, seq read, drop system
  buffers, do index scan and in few places I saw performance
  improvements, but actually, I'm not sure if this was just random
  or intended improvement.
 
 I've often wondered about this.  In a database I developed back in
 the '80s it was clearly a win to have a special cache for index
 entries and other special pages closer to the database than the
 general cache.  A couple things have changed since the '80s (I mean,
 besides my waistline and hair color), and PostgreSQL has many
 differences from that other database, so I haven't been sure it
 would help as much, but I have wondered.
 
 I can't really look at this for a couple weeks, but I'm definitely
 interested.  I suggest that you add this to the next CommitFest as a
 WIP patch, under the Performance category.
 
 https://commitfest.postgresql.org/action/commitfest_view/open
 
  There is few places to optimize code as well, and patch need many
  work, but may you see it and give opinions?
 
 For something like this it makes perfect sense to show proof of
 concept before trying to cover everything.
 
 -Kevin

Here I attach latest version of patch with few performance improvements (code 
is still dirty) and some reports from test, as well my simple tests.

Actually there is small improvement without dropping system caches, and bigger 
with dropping. I have small performance decrease (if we can talk about 
measuring basing on this tests) to original PG version when dealing with same 
configuration, but increase is with 2nd level buffers... or maybe I badly 
compared reports.

In tests I tried to choose typical, simple queries. 

Regards,
Radek


2nd_lvl_cache_20110318.diff.bz2
Description: application/bzip


test-scritps_20110319_0026.tar.bz2
Description: application/bzip-compressed-tar


reports_20110318.tar.bz2
Description: application/bzip-compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 2nd Level Buffer Cache

2011-03-17 Thread Radosław Smogura
Hi,

I have implemented initial concept of 2nd level cache. Idea is to keep some 
segments of shared memory for special buffers (e.g. indices) to prevent 
overwrite those by other operations. I added those functionality to nbtree 
index scan.

I tested this with doing index scan, seq read, drop system buffers, do index 
scan and in few places I saw performance improvements, but actually, I'm not 
sure if this was just random or intended improvement.

There is few places to optimize code as well, and patch need many work, but 
may you see it and give opinions?

Regards,
Radek
diff --git a/.gitignore b/.gitignore
index 3f11f2e..6542e35 100644
--- a/.gitignore
+++ b/.gitignore
@@ -22,3 +22,4 @@ lcov.info
 /GNUmakefile
 /config.log
 /config.status
+/nbproject/private/
\ No newline at end of file
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 2796445..0229f5a 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -508,7 +508,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 	if (blkno != P_NEW)
 	{
 		/* Read an existing block of the relation */
-		buf = ReadBuffer(rel, blkno);
+		buf = ReadBufferLevel(rel, blkno, BUFFER_LEVEL_2ND);
 		LockBuffer(buf, access);
 		_bt_checkpage(rel, buf);
 	}
@@ -548,7 +548,7 @@ _bt_getbuf(Relation rel, BlockNumber blkno, int access)
 			blkno = GetFreeIndexPage(rel);
 			if (blkno == InvalidBlockNumber)
 break;
-			buf = ReadBuffer(rel, blkno);
+			buf = ReadBufferLevel(rel, blkno, BUFFER_LEVEL_2ND);
 			if (ConditionalLockBuffer(buf))
 			{
 page = BufferGetPage(buf);
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index dadb49d..2922711 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -22,6 +22,7 @@ BufferDesc *BufferDescriptors;
 char	   *BufferBlocks;
 int32	   *PrivateRefCount;
 
+BufferLevelDesc *bufferLevels;
 
 /*
  * Data Structures:
@@ -72,8 +73,7 @@ int32	   *PrivateRefCount;
 void
 InitBufferPool(void)
 {
-	bool		foundBufs,
-foundDescs;
+	bool		foundBufs, foundDescs, foundBufferLevels = false;
 
 	BufferDescriptors = (BufferDesc *)
 		ShmemInitStruct(Buffer Descriptors,
@@ -83,19 +83,38 @@ InitBufferPool(void)
 		ShmemInitStruct(Buffer Blocks,
 		NBuffers * (Size) BLCKSZ, foundBufs);
 
-	if (foundDescs || foundBufs)
+bufferLevels = (BufferLevelDesc*)
+ShmemInitStruct(Buffer Levels Descriptors Table,
+		sizeof(BufferLevelDesc) * BUFFER_LEVEL_SIZE, 
+foundBufferLevels);
+	if (foundDescs || foundBufs || foundBufferLevels)
 	{
 		/* both should be present or neither */
-		Assert(foundDescs  foundBufs);
+		Assert(foundDescs  foundBufs  foundBufferLevels);
 		/* note: this path is only taken in EXEC_BACKEND case */
 	}
 	else
 	{
 		BufferDesc *buf;
+BufferLevelDesc *bufferLevelDesc;
+
 		int			i;
-
+
 		buf = BufferDescriptors;
 
+/* Initialize buffer levels. */
+//1st Level - Default
+bufferLevelDesc = bufferLevels;
+bufferLevelDesc-index = 0;
+bufferLevelDesc-super = BUFFER_LEVEL_END_OF_LIST;
+bufferLevelDesc-lower = BUFFER_LEVEL_END_OF_LIST;
+
+//2nd Level - For indices
+bufferLevelDesc++;
+bufferLevelDesc-index = 1;
+bufferLevelDesc-super = BUFFER_LEVEL_END_OF_LIST;
+bufferLevelDesc-lower = 0;
+
 		/*
 		 * Initialize all the buffer headers.
 		 */
@@ -117,6 +136,10 @@ InitBufferPool(void)
 			 */
 			buf-freeNext = i + 1;
 
+/* Assign buffer level. */
+//TODO Currently hardcoded - 
+buf-buf_level = ( 0.3 * NBuffers  i ) ? BUFFER_LEVEL_DEFAULT : BUFFER_LEVEL_2ND;
+
 			buf-io_in_progress_lock = LWLockAssign();
 			buf-content_lock = LWLockAssign();
 		}
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 1f89e52..867bae0 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -47,7 +47,8 @@
 #include storage/standby.h
 #include utils/rel.h
 #include utils/resowner.h
-
+#include catalog/pg_type.h
+#include funcapi.h
 
 /* Note: these two macros only work on shared buffers, not local ones! */
 #define BufHdrGetBlock(bufHdr)	((Block) (BufferBlocks + ((Size) (bufHdr)-buf_id) * BLCKSZ))
@@ -85,7 +86,7 @@ static volatile BufferDesc *PinCountWaitBuf = NULL;
 static Buffer ReadBuffer_common(SMgrRelation reln, char relpersistence,
   ForkNumber forkNum, BlockNumber blockNum,
   ReadBufferMode mode, BufferAccessStrategy strategy,
-  bool *hit);
+  bool *hit, BufferLevel bufferLevel);
 static bool PinBuffer(volatile BufferDesc *buf, 

Re: [HACKERS] Binary in/out for aclitem

2011-02-24 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Wednesday 23 February 2011 22:30:04
 =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
  Here is extended version, has version field (N_ACL_RIGHTS*2) and reserved
  mask, as well definition is more general then def of PGSQL. In any way it
  require that rights mades bit array.
 
 You're going in quite the wrong direction here.  The consensus as I
 understood it was that we should just use the text representation in
 binary mode too, rather than inventing a separate representation that's
 going to put a whole new set of constraints on what can happen to the
 internal representation.  The proposal you have here has no redeeming
 social value whatever, because nobody cares about the I/O efficiency
 for aclitem (and even if anyone did, you've made no case that this would
 actually be more efficient to use on the client side).
 
   regards, tom lane

Look at it. Pass call to in/out.

Regards,
Radek
diff --git a/.gitignore b/.gitignore
index 1be11e8..0d594f9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -17,3 +17,5 @@ objfiles.txt
 /GNUmakefile
 /config.log
 /config.status
+/nbproject/private/
+/nbproject
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 691ba3b..fa151cd 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -33,6 +33,7 @@
 #include utils/lsyscache.h
 #include utils/memutils.h
 #include utils/syscache.h
+#include libpq/pqformat.h
 
 
 typedef struct
@@ -77,7 +78,8 @@ static const char *getid(const char *s, char *n);
 static void putid(char *p, const char *s);
 static Acl *allocacl(int n);
 static void check_acl(const Acl *acl);
-static const char *aclparse(const char *s, AclItem *aip);
+static const char *aclparse(const char *s, AclItem *aip, bool binary);
+static Datum aclitem_common_in_recv(const char* s, bool binary);
 static bool aclitem_match(const AclItem *a1, const AclItem *a2);
 static int	aclitemComparator(const void *arg1, const void *arg2);
 static void check_circularity(const Acl *old_acl, const AclItem *mod_aip,
@@ -222,6 +224,8 @@ putid(char *p, const char *s)
  *
  *		This routine is called by the parser as well as aclitemin(), hence
  *		the added generality.
+ * 
+ * @param binary if we parse for binary mode or text mode
  *
  * RETURNS:
  *		the string position in 's' immediately following the ACL
@@ -230,7 +234,7 @@ putid(char *p, const char *s)
  *		  UID/GID, id type identifier and mode type values.
  */
 static const char *
-aclparse(const char *s, AclItem *aip)
+aclparse(const char *s, AclItem *aip, bool binary)
 {
 	AclMode		privs,
 goption,
@@ -249,20 +253,20 @@ aclparse(const char *s, AclItem *aip)
 		/* we just read a keyword, not a name */
 		if (strcmp(name, group) != 0  strcmp(name, user) != 0)
 			ereport(ERROR,
-	(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+	(errcode((binary ? ERRCODE_INVALID_BINARY_REPRESENTATION : ERRCODE_INVALID_TEXT_REPRESENTATION)),
 	 errmsg(unrecognized key word: \%s\, name),
 	 errhint(ACL key word must be \group\ or \user\.)));
 		s = getid(s, name);		/* move s to the name beyond the keyword */
 		if (name[0] == '\0')
 			ereport(ERROR,
-	(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+	(errcode((binary ? ERRCODE_INVALID_BINARY_REPRESENTATION : ERRCODE_INVALID_TEXT_REPRESENTATION)),
 	 errmsg(missing name),
 	 errhint(A name must follow the \group\ or \user\ key word.)));
 	}
 
 	if (*s != '=')
 		ereport(ERROR,
-(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+(errcode((binary ? ERRCODE_INVALID_BINARY_REPRESENTATION : ERRCODE_INVALID_TEXT_REPRESENTATION)),
  errmsg(missing \=\ sign)));
 
 	privs = goption = ACL_NO_RIGHTS;
@@ -315,7 +319,7 @@ aclparse(const char *s, AclItem *aip)
 break;
 			default:
 ereport(ERROR,
-		(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+		(errcode((binary ? ERRCODE_INVALID_BINARY_REPRESENTATION : ERRCODE_INVALID_TEXT_REPRESENTATION)),
 	  errmsg(invalid mode character: must be one of \%s\,
 			 ACL_ALL_RIGHTS_STR)));
 		}
@@ -337,7 +341,7 @@ aclparse(const char *s, AclItem *aip)
 		s = getid(s + 1, name2);
 		if (name2[0] == '\0')
 			ereport(ERROR,
-	(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+	(errcode((binary ? ERRCODE_INVALID_BINARY_REPRESENTATION : ERRCODE_INVALID_TEXT_REPRESENTATION)),
 	 errmsg(a name must follow the \/\ sign)));
 		aip-ai_grantor = get_role_oid(name2, false);
 	}
@@ -548,6 +552,22 @@ check_acl(const Acl *acl)
  errmsg(ACL arrays must not contain null values)));
 }
 
+static
+Datum aclitem_common_in_recv(const char* s, bool binary)
+{
+AclItem*aip;
+
+aip = (AclItem *) palloc(sizeof(AclItem));
+s = aclparse(s, aip, binary);
+while (isspace((unsigned char) *s))
+++s;
+if (*s)
+ereport(ERROR,
+(errcode((binary ? ERRCODE_INVALID_BINARY_REPRESENTATION : ERRCODE_INVALID_TEXT_REPRESENTATION)),
+   

Re: [HACKERS] Binary in/out for aclitem

2011-02-23 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Wednesday 23 February 2011 16:19:27
 rsmogura rsmog...@softperience.eu writes:
   On Tue, 22 Feb 2011 20:20:39 -0500, Tom Lane wrote:
  ...  But my question isn't about that; it's about
  why aclitem should be considered a first-class citizen.  It makes me
  uncomfortable that client apps are looking at it at all, because any
  that do are bound to get broken in the future, even assuming that
  they get the right answers today.  I wonder how many such clients are up
  to speed for per-column privileges and non-constant default privileges
  for instance.  And sepgsql is going to cut them off at the knees.
  
   Technically, at eye glance, I didn't seen in sepgsql modifications to
   acl.h. So, I think, aclitem will be unaffected. In any way sepgsql needs
   some way to present access rights to administrator it may use own model,
   or aclitem, too.
 
 You're missing the point, which is that the current internal
 representation of aclitem could change drastically to support future
 feature improvements in the area of privileges.  It has already changed
 significantly in the past (we didn't use to have WITH GRANT OPTION).
 If we had to add a field, for instance, a binary representation would
 simply be broken, as clients would have difficulty telling how to
 interpret it as soon as there was more than one possible format.
 Text representations are typically a bit more extensible.
 
   regards, tom lane
I removed from patch this (think like currently not needed, but it is enaught 
to put in doc)

Each privilige has idividual number P from 1 to n. and it is represented by 
setted P-th bit. First n-th bits (in network bit order) represents normal 
priv, next n-th bits represents grant option of privs. This chain is encoded 
as n*2 bit number rounded up to full 8 bits, with minimal length 32 bit.

I was thinking about adding number of all privs to each ACL item, removed as 
this could be deducted from PG version, where 1st 7-bit represents version, 
last 8-th bit will represent if grant part has been added.

---
In any way binary output should be available, if we have binary mode. I know 
that text is more extensible, we may in contrast to above packed version, 
describes acl privs as byte array elements from represented setted priv (same 
as text).

Fallback solution is to just recall aclin/aclout with StringInfo.

Regards,
Radek.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Binary in/out for aclitem

2011-02-23 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Wednesday 23 February 2011 16:19:27
 rsmogura rsmog...@softperience.eu writes:
   On Tue, 22 Feb 2011 20:20:39 -0500, Tom Lane wrote:
  ...  But my question isn't about that; it's about
  why aclitem should be considered a first-class citizen.  It makes me
  uncomfortable that client apps are looking at it at all, because any
  that do are bound to get broken in the future, even assuming that
  they get the right answers today.  I wonder how many such clients are up
  to speed for per-column privileges and non-constant default privileges
  for instance.  And sepgsql is going to cut them off at the knees.
  
   Technically, at eye glance, I didn't seen in sepgsql modifications to
   acl.h. So, I think, aclitem will be unaffected. In any way sepgsql needs
   some way to present access rights to administrator it may use own model,
   or aclitem, too.
 
 You're missing the point, which is that the current internal
 representation of aclitem could change drastically to support future
 feature improvements in the area of privileges.  It has already changed
 significantly in the past (we didn't use to have WITH GRANT OPTION).
 If we had to add a field, for instance, a binary representation would
 simply be broken, as clients would have difficulty telling how to
 interpret it as soon as there was more than one possible format.
 Text representations are typically a bit more extensible.
 
   regards, tom lane

Actully, You litlle messed in my head. So in prev post we don't need to send 
information if grant option has been set, currently in text mode no grant 
options means ACL_NO_RIGHTS, and in binary same may be achived be settig there 
0. 

But version field may be usefull to validate this and future calls, and 
provide backward compatibility (if newer client will send less bits then rest 
of bits will be set to 0).

I think about splitting privs chain to two numbers, it may be easier to 
implement this and parse if number of privs will extend 32...

In addition I may add support for possible, future representation, where given 
privilige may be yes, no, undefined (like in Windows).

Regrads,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Binary in/out for aclitem

2011-02-23 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Wednesday 23 February 2011 16:19:27
 rsmogura rsmog...@softperience.eu writes:
   On Tue, 22 Feb 2011 20:20:39 -0500, Tom Lane wrote:
  ...  But my question isn't about that; it's about
  why aclitem should be considered a first-class citizen.  It makes me
  uncomfortable that client apps are looking at it at all, because any
  that do are bound to get broken in the future, even assuming that
  they get the right answers today.  I wonder how many such clients are up
  to speed for per-column privileges and non-constant default privileges
  for instance.  And sepgsql is going to cut them off at the knees.
  
   Technically, at eye glance, I didn't seen in sepgsql modifications to
   acl.h. So, I think, aclitem will be unaffected. In any way sepgsql needs
   some way to present access rights to administrator it may use own model,
   or aclitem, too.
 
 You're missing the point, which is that the current internal
 representation of aclitem could change drastically to support future
 feature improvements in the area of privileges.  It has already changed
 significantly in the past (we didn't use to have WITH GRANT OPTION).
 If we had to add a field, for instance, a binary representation would
 simply be broken, as clients would have difficulty telling how to
 interpret it as soon as there was more than one possible format.
 Text representations are typically a bit more extensible.
 
   regards, tom lane

Here is extended version, has version field (N_ACL_RIGHTS*2) and reserved 
mask, as well definition is more general then def of PGSQL. In any way it 
require that rights mades bit array.

Still I tested only aclitemsend.

Btw, Is it possible and needed to add group byte, indicating that grantee is 
group or user?

Regards,
Radek
diff --git a/.gitignore b/.gitignore
index 1be11e8..0d594f9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -17,3 +17,5 @@ objfiles.txt
 /GNUmakefile
 /config.log
 /config.status
+/nbproject/private/
+/nbproject
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 691ba3b..c25c0fd 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -33,6 +33,7 @@
 #include utils/lsyscache.h
 #include utils/memutils.h
 #include utils/syscache.h
+#include libpq/pqformat.h
 
 
 typedef struct
@@ -78,6 +79,10 @@ static void putid(char *p, const char *s);
 static Acl *allocacl(int n);
 static void check_acl(const Acl *acl);
 static const char *aclparse(const char *s, AclItem *aip);
+
+/** Assigns default grantor and send warning. */
+static void aclitem_assign_default_grantor(AclItem *aip);
+
 static bool aclitem_match(const AclItem *a1, const AclItem *a2);
 static int	aclitemComparator(const void *arg1, const void *arg2);
 static void check_circularity(const Acl *old_acl, const AclItem *mod_aip,
@@ -209,6 +214,14 @@ putid(char *p, const char *s)
 	*p = '\0';
 }
 
+/** Assigns default grantor and send warning. */
+void aclitem_assign_default_grantor(AclItem *aip) {
+aip-ai_grantor = BOOTSTRAP_SUPERUSERID;
+ereport(WARNING,
+(errcode(ERRCODE_INVALID_GRANTOR),
+ errmsg(defaulting grantor to user ID %u,
+BOOTSTRAP_SUPERUSERID)));
+}
 /*
  * aclparse
  *		Consumes and parses an ACL specification of the form:
@@ -343,11 +356,7 @@ aclparse(const char *s, AclItem *aip)
 	}
 	else
 	{
-		aip-ai_grantor = BOOTSTRAP_SUPERUSERID;
-		ereport(WARNING,
-(errcode(ERRCODE_INVALID_GRANTOR),
- errmsg(defaulting grantor to user ID %u,
-		BOOTSTRAP_SUPERUSERID)));
+aclitem_assign_default_grantor(aip);
 	}
 
 	ACLITEM_SET_PRIVS_GOPTIONS(*aip, privs, goption);
@@ -643,6 +652,163 @@ aclitemout(PG_FUNCTION_ARGS)
 
 	PG_RETURN_CSTRING(out);
 }
+/** Do binary read of aclitem. Input format is same as {@link aclitem_recv}, but
+ * special algorithm is used to determine grantee's and grantor's OID. The reason
+ * is to keep backward information compatiblity with text mode - typical
+ * client (which gets instructions from user)
+ * may be much more interested in sending grantee and grantors name then
+ * OID. Detailed rule is as follow:br/
+ * If message has no name and names' length then
+ * use passed OIDs (message may be truncated, we accept this, 
+ * but both, two last fields must be not present).br/
+ * If grantee's name len or grantor's name len is {@code -1} then use respecitve
+ * OIDs.br/
+ * If name length is not {@code -1} then find OID for given part, and
+ * ensure that respective OID is {@code 0} or is equal to found OID./br
+ * If grantor's OID is {@code 0} and grantor's name lenght is {@code -1} or 
+ * truncated then assign superuser as grantor.
+ */
+Datum
+aclitemrecv(PG_FUNCTION_ARGS) {
+StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
+AclItem*aip;
+intgRawLen;
+char   *gVal = NULL;
+int4gValLen;
+OidgOid;
+int2   numberOfAcls;
+int4   mask;

[HACKERS] Binary in/out for aclitem

2011-02-22 Thread Radosław Smogura
Hi,

Actaully one more POD left it's aclitem :). In Java for e.g. it is used to 
obtain column priviliges, I assume some folks may want to use it too.

I tested only recv :-(

Acually I don't know if idea of such format is OK, but my intention was to 
send roles names, so driver don't need to ask for name (like for regproc) and 
to keep together human-based approch (text) and binary approch.

Kind regards,
Radek
diff --git a/.gitignore b/.gitignore
index 1be11e8..0d594f9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -17,3 +17,5 @@ objfiles.txt
 /GNUmakefile
 /config.log
 /config.status
+/nbproject/private/
+/nbproject
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index 691ba3b..33877cb 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -33,6 +33,7 @@
 #include utils/lsyscache.h
 #include utils/memutils.h
 #include utils/syscache.h
+#include libpq/pqformat.h
 
 
 typedef struct
@@ -78,6 +79,10 @@ static void putid(char *p, const char *s);
 static Acl *allocacl(int n);
 static void check_acl(const Acl *acl);
 static const char *aclparse(const char *s, AclItem *aip);
+
+/** Assigns default grantor and send warning. */
+static void aclitem_assign_default_grantor(AclItem *aip);
+
 static bool aclitem_match(const AclItem *a1, const AclItem *a2);
 static int	aclitemComparator(const void *arg1, const void *arg2);
 static void check_circularity(const Acl *old_acl, const AclItem *mod_aip,
@@ -209,6 +214,14 @@ putid(char *p, const char *s)
 	*p = '\0';
 }
 
+/** Assigns default grantor and send warning. */
+void aclitem_assign_default_grantor(AclItem *aip) {
+aip-ai_grantor = BOOTSTRAP_SUPERUSERID;
+ereport(WARNING,
+(errcode(ERRCODE_INVALID_GRANTOR),
+ errmsg(defaulting grantor to user ID %u,
+BOOTSTRAP_SUPERUSERID)));
+}
 /*
  * aclparse
  *		Consumes and parses an ACL specification of the form:
@@ -343,11 +356,7 @@ aclparse(const char *s, AclItem *aip)
 	}
 	else
 	{
-		aip-ai_grantor = BOOTSTRAP_SUPERUSERID;
-		ereport(WARNING,
-(errcode(ERRCODE_INVALID_GRANTOR),
- errmsg(defaulting grantor to user ID %u,
-		BOOTSTRAP_SUPERUSERID)));
+aclitem_assign_default_grantor(aip);
 	}
 
 	ACLITEM_SET_PRIVS_GOPTIONS(*aip, privs, goption);
@@ -643,6 +652,143 @@ aclitemout(PG_FUNCTION_ARGS)
 
 	PG_RETURN_CSTRING(out);
 }
+/** Do binary read of aclitem. Input format is same as {@link aclitem_recv}, but
+ * special algorithm is used to determine grantee's and grantor's OID. The reason
+ * is to keep backward information compatiblity with text mode - typical
+ * client (which gets instructions from user)
+ * may be much more interested in sending grantee and grantors name then
+ * OID. Detailed rule is as follow:br/
+ * If message contains bonly/b oids and privs (no name and names' length) then
+ * use passed OIDs (message may be truncated, we accept this, 
+ * but both, two last fields must be not present).br/
+ * If grantee's name len or grantor's name len is {@code -1} then use respecitve
+ * OIDs.br/
+ * If name length is not {@code -1} then find OID for given part, and
+ * ensure that respective OID is {@code 0} or is equal to found OID./br
+ * If grantor's OID is {@code 0} and grantor's name lenght is {@code -1} or 
+ * truncated then assign superuser as grantor.
+ */
+Datum
+aclitemrecv(PG_FUNCTION_ARGS) {
+StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
+AclItem*aip;
+intgRawLen;
+char   *gVal = NULL;
+intgValLen;
+OidgOid;
+
+aip = (AclItem *) palloc(sizeof(AclItem));
+aip-ai_grantee = pq_getmsgint(buf, 4);
+aip-ai_grantor = pq_getmsgint(buf, 4);
+aip-ai_privs = pq_getmsgint(buf, 4);
+
+/* Client has passed names. */
+if (buf-cursor  buf-len) {
+/* Read grantee. */
+gRawLen = pq_getmsgint(buf, 4);
+if (gRawLen != -1) {
+gVal = pq_getmsgtext(buf, gRawLen, gValLen);
+gOid = get_role_oid(gVal, false);
+if (aip-ai_grantee != 0  aip-ai_grantee != gOid) {
+ereport(ERROR,
+(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+errmsg(grantee's OID is not 0 and passed grantee's name and grantee's OID doesn't match)));
+
+}
+}
+
+/* Read grantor. */
+gRawLen = pq_getmsgint(buf, 4);
+if (gRawLen != -1) {
+gVal = pq_getmsgtext(buf, gRawLen, gValLen);
+gOid = get_role_oid(gVal, false);
+if (aip-ai_grantor != 0  aip-ai_grantor != gOid) {
+ereport(ERROR,
+(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+errmsg(grantor's OID is not 0 and passed grantor's name and grantor's OID doesn't match)));
+}
+}else {
+gVal = NULL;
+}
+}
+
+if 

[HACKERS] Void binary patch

2011-02-20 Thread Radosław Smogura
Just patch for missing procedures for void send/recv

Regards,
Radek
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index d9329f8..614eb98 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -212,7 +212,20 @@ void_out(PG_FUNCTION_ARGS)
 	PG_RETURN_CSTRING(pstrdup());
 }
 
+Datum
+void_recv(PG_FUNCTION_ARGS)
+{
+PG_RETURN_VOID();
+}
 
+Datum
+void_send(PG_FUNCTION_ARGS)
+{
+	StringInfoData buf;
+
+pq_begintypsend(buf); //Nice machinery to send nothing
+	PG_RETURN_BYTEA_P(pq_endtypsend(buf));
+}
 /*
  * trigger_in		- input routine for pseudo-type TRIGGER.
  */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 0894985..0711474 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -4226,6 +4226,11 @@ DATA(insert OID = 2502 (  anyarray_recv		   PGNSP PGUID 12 1 0 0 f f f t f s 1 0
 DESCR(I/O);
 DATA(insert OID = 2503 (  anyarray_send		   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 17 2277 _null_ _null_ _null_ _null_	anyarray_send _null_ _null_ _null_ ));
 DESCR(I/O);
+DATA(insert OID = 3120 (  void_recv		   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 2278 2281 _null_ _null_ _null_ _null_ void_recv _null_ _null_ _null_ ));
+DESCR(I/O);
+DATA(insert OID = 3121 (  void_send		   PGNSP PGUID 12 1 0 0 f f f t f s 1 0 17 2278 _null_ _null_ _null_ _null_	void_send _null_ _null_ _null_ ));
+DESCR(I/O);
+
 
 /* System-view support functions with pretty-print option */
 DATA(insert OID = 2504 (  pg_get_ruledef	   PGNSP PGUID 12 1 0 0 f f f t f s 2 0 25 26 16 _null_ _null_ _null_ _null_	pg_get_ruledef_ext _null_ _null_ _null_ ));
diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h
index 0f7312e..9baed6c 100644
--- a/src/include/catalog/pg_type.h
+++ b/src/include/catalog/pg_type.h
@@ -615,7 +615,7 @@ DATA(insert OID = 2276 ( any			PGNSP PGUID  4 t p P f t \054 0 0 0 any_in any_ou
 #define ANYOID			2276
 DATA(insert OID = 2277 ( anyarray		PGNSP PGUID -1 f p P f t \054 0 0 0 anyarray_in anyarray_out anyarray_recv anyarray_send - - - d x f 0 -1 0 0 _null_ _null_ ));
 #define ANYARRAYOID		2277
-DATA(insert OID = 2278 ( void			PGNSP PGUID  4 t p P f t \054 0 0 0 void_in void_out - - - - - i p f 0 -1 0 0 _null_ _null_ ));
+DATA(insert OID = 2278 ( void			PGNSP PGUID  4 t p P f t \054 0 0 0 void_in void_out void_recv void_send - - - i p f 0 -1 0 0 _null_ _null_ ));
 #define VOIDOID			2278
 DATA(insert OID = 2279 ( trigger		PGNSP PGUID  4 t p P f t \054 0 0 0 trigger_in trigger_out - - - - - i p f 0 -1 0 0 _null_ _null_ ));
 #define TRIGGEROID		2279
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 8392be6..8652ba0 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -504,6 +504,8 @@ extern Datum anyenum_in(PG_FUNCTION_ARGS);
 extern Datum anyenum_out(PG_FUNCTION_ARGS);
 extern Datum void_in(PG_FUNCTION_ARGS);
 extern Datum void_out(PG_FUNCTION_ARGS);
+extern Datum void_recv(PG_FUNCTION_ARGS);
+extern Datum void_send(PG_FUNCTION_ARGS);
 extern Datum trigger_in(PG_FUNCTION_ARGS);
 extern Datum trigger_out(PG_FUNCTION_ARGS);
 extern Datum language_handler_in(PG_FUNCTION_ARGS);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Varchar and binary protocol

2011-02-09 Thread Radosław Smogura
Actually difference is
http://archives.postgresql.org/pgsql-hackers/2011-02/msg00415.php

Merlin Moncure mmonc...@gmail.com Thursday 10 February 2011 08:48:26
 On Sat, Feb 5, 2011 at 4:59 PM, Radosław Smogura
 
 rsmog...@softperience.eu wrote:
  Hi,
  
  I do performance tests against orignal JDBC driver and my version in
  binary and in text mode. I saw strange results when I was reading
  varchar values. Here is some output from simple benchmark
  
  Plain strings speed   Execution: 8316582, local: 2116608,
  all: 10433190
  Binary strings speed  Execution: 9354613, local: 2755949,
  all: 12110562
  Text NG strings speed Execution: 8346902, local: 2704242,
  all: 11051144
  
  Plain is standard JDBC driver, Binary is my version with binary transfer,
  Text is my version with normal transfer. 1st column, Execution is time
  spend on query execution this includes send, recivie proto message,
  store it, etc, no conversion to output format. Values are in
  nanoseconds.
  
  In new version I added some functionality, but routines to read parts in
  Execution block are almost same for binary and text.
  
  But as you see the binary version is 10-20% slower then orginal, and my
  text version, if I increase number of read records this proportion will
  not change. I done many checks, against even skip proto message
  content driver, end results was same 10-20% slower.
 
 Since there is basically zero difference in how *varchar* is handled
 in the database for the text or binary protocols (AFAIK, they use the
 same code), this is almost certainly an issue with the JDBC driver, or
 your benchmark application.
 
 merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Varlena and binary

2011-02-07 Thread Radosław Smogura
Just from curious may I ask in which direction this will go, and how this will 
affect performance of text and binary format?

Actually I started to make smaller improvements, and I think about one big to 
encode text (when client and server encoding are different) directly to 
StringInfo, without intermediate buffer.

Thanks in advice
Radek

Tom Lane t...@sss.pgh.pa.us Monday 07 February 2011 17:12:07
 =?utf-8?q?Rados=C5=82aw_Smogura?= m...@smogura.eu writes:
  I'm sending small patch for textsend. It reduces unnecessary copies, and
  memory usage for duplication of varlena data. May you look?
 
 This code will break the day that text and bytea don't have the same
 internal representation, which seems likely to be soon.  Barring some
 compelling evidence of a major performance improvement obtainable this
 way, I don't think we want this patch.
 
   regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Varlena and binary

2011-02-06 Thread Radosław Smogura
Hi,

I'm sending small patch for textsend. It reduces unnecessary copies, and 
memory usage for duplication of varlena data. May you look?

Kind regards,
Radosław Smogura
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index e111d26..f24bbcd 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -442,12 +442,20 @@ textrecv(PG_FUNCTION_ARGS)
 Datum
 textsend(PG_FUNCTION_ARGS)
 {
-	text	   *t = PG_GETARG_TEXT_PP(0);
-	StringInfoData buf;
-
-	pq_begintypsend(buf);
-	pq_sendtext(buf, VARDATA_ANY(t), VARSIZE_ANY_EXHDR(t));
-	PG_RETURN_BYTEA_P(pq_endtypsend(buf));
+text   *t = PG_GETARG_TEXT_PP(0);
+const char* textData = VARDATA_ANY(t);
+const int   textSize = VARSIZE_ANY_EXHDR(t);
+char* textConverted = pg_server_to_client(textData, textSize);
+//Logic based on pq_sendtext
+if (textConverted == textData) {
+		PG_RETURN_BYTEA_P(t);
+}else {
+		StringInfoData buf;
+		pq_begintypsend(buf);
+		appendBinaryStringInfo(buf, textConverted, strlen(textConverted));
+		pfree(textConverted);
+		PG_RETURN_BYTEA_P(pq_endtypsend(buf));
+	}
 }
 
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Varchar and binary protocol

2011-02-05 Thread Radosław Smogura
Hi,

I do performance tests against orignal JDBC driver and my version in binary 
and in text mode. I saw strange results when I was reading varchar values.
Here is some output from simple benchmark

Plain strings speed   Execution: 8316582, local: 2116608, all: 
10433190
Binary strings speed  Execution: 9354613, local: 2755949, all: 
12110562
Text NG strings speed Execution: 8346902, local: 2704242, all: 
11051144

Plain is standard JDBC driver, Binary is my version with binary transfer, Text 
is my version with normal transfer. 1st column, Execution is time spend on 
query execution this includes send, recivie proto message, store it, etc, no 
conversion to output format. Values are in nanoseconds.

In new version I added some functionality, but routines to read parts in 
Execution block are almost same for binary and text.

But as you see the binary version is 10-20% slower then orginal, and my text 
version, if I increase number of read records this proportion will not change. 
I done many checks, against even skip proto message content driver, end 
results was same 10-20% slower.

Regards,
Radek.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Binary timestamp with without timezone

2010-12-20 Thread Radosław Smogura


On Thu, 16 Dec 2010 14:24:27 -0500, Tom Lane t...@sss.pgh.pa.us wrote:

=?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:

Tom Lane t...@sss.pgh.pa.us Thursday 16 December 2010 18:59:56
=?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu 
writes:

... This timestamp must be properly encoded
depending if target is WITH TZ or not, but JDBC (and other 
clients,

probably too) doesn't have any knowledge about target type when
statement is executed



Seems like you need to fix *that*.


I don't say it's bad way to send timestamps in text mode. It's good 
solution,
because timestamp without tz will silently ignore tz offset, 
timestamp with tz
will use offset in calculations if it is there, if no it will use 
server TZ.


No, what I'm saying is that it's complete folly to be sending binary
data for a value you don't know the exact type of.
I know something about value I want to send, but only this it should be 
a timestamp. I don't know if it should be with or without tz.



There are too many
ways for that to fail, and too few ways for the backend to validate 
what

you sent.  Adding more possible ways to interpret a binary blob makes
that problem worse, not better.


Official JDBC driver release use this technique to send timezone 
timestamps, but for text mode; any timestamp is send as UNSPECIFIED. So 
text mode driver can fail in this way too.


What you need to fix is the inadequate type bookkeeping in JDBC.  If 
you
don't know the exact type of the value you're going to send, send it 
in

text mode, where you have some reasonable hope of a mismatch being
detected.


I know that this procedure isn't good as well as in text mode and in 
binary mode, but gives any chance to do it better. In both cases we can 
find examples when this behaviour will fail, but
In proposed solution I added (I hope in safe way) support for timezone 
information, that is missing in comparison to binary protocol, which can 
be useful.


Maybe better idea is to create new timestamptz type, that will fully 
support TIME offsets, as well and most important, will give much more 
client friendly casting to timestamp and timestamptz-s. I mean it should 
be casted to timestamptz, as well to timestamp, but in last situation, 
per field base ('2010-01-01 +1:00)::timestamp - '2010-01-01'. It could 
be better, because missing tz offset in current implementation can cause 
problems with historical DST offset (many posts found).


Binary protocol will not have this disadvantage when reading, because 
Java supports historical DST, and timestamptz is UTC based.


Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Binary timestamp with without timezone

2010-12-16 Thread Radosław Smogura
Hi,
 
I work on binary support for JDBC. I saw disadventage of TIMESTAMPS WITH / 
WITHOUT TZ. Currently (in text mode) driver always sends date time string with 
appended time offset, as UNSPECIFIED so backend can choose to use offset or 
not. In binary mode I can only send 8 bytes timestamp without appended 
timezone. This timestamp must be properly encoded depending if target is WITH 
TZ or not, but JDBC (and other clients, probably too) doesn't have any 
knowledge about target type when statement is executed (so currently I send 
timestamps as text).
I think about following patch (giving backward compatibility) on timestamp 
(tz). Idea is as follows if we have additional two bytes it's TZ offset and 
use this to convert received time to UTC. I wrote it in e-mail editor (sorry 
:) no C IDE last time),

Datum
timestamptz_recv(PG_FUNCTION_ARGS)
{
StringInfo  buf = (StringInfo) PG_GETARG_POINTER(0);

#ifdef NOT_USED
Oid typelem = PG_GETARG_OID(1);
#endif
int32   typmod = PG_GETARG_INT32(2);
TimestampTz timestamp;
int tz;
struct pg_tm tt,
   *tm = tt;
fsec_t  fsec;
char   *tzn;
int16tzOffset; //Zone offset with precision to minutes 
12*60=720

#ifdef HAVE_INT64_TIMESTAMP
timestamp = (TimestampTz) pq_getmsgint64(buf);
#else
timestamp = (TimestampTz) pq_getmsgfloat8(buf);
#endif

+  if (buf-len == 10) { //We assume two last bytes is timezone offset
+   tzOffset = pg_copymsgbytes(buf, tzOffset,2 
/*sizeof(int16)*/);
+#ifdef HAVE_INT64_TIMESTAMP
+timestamp -= ((int16) tzOffset) * 60 /* sek */ * USECS_PER_SEC;
+#else
+timestamp -= (float8) (tzOffset * 60 /* sek */); //Good casting...?
+#endif
+   }

/* rangecheck: see if timestamptz_out would like it */
if (TIMESTAMP_NOT_FINITE(timestamp))
 /* ok */ ;
else if (timestamp2tm(timestamp, tz, tm, fsec, tzn, NULL) != 0)
ereport(ERROR,
(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
 errmsg(timestamp out of range)));

AdjustTimestampForTypmod(timestamp, typmod);

PG_RETURN_TIMESTAMPTZ(timestamp);
}

Will be this enaugh to allow to send TIMESTAMPS WITH(OUT) TZ as UNSPECIFIED? 
Simillar should go for (ugly) time with timezone.

//Ofc, excelent behaviour will be if I could send values always with TZ and
//leave _proper_ casting to backend.

Kind regards,
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Binary timestamp with without timezone

2010-12-16 Thread Radosław Smogura
Tom Lane t...@sss.pgh.pa.us Thursday 16 December 2010 18:59:56
 =?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
  I work on binary support for JDBC. I saw disadventage of TIMESTAMPS WITH
  / WITHOUT TZ. Currently (in text mode) driver always sends date time
  string with appended time offset, as UNSPECIFIED so backend can choose
  to use offset or not. In binary mode I can only send 8 bytes timestamp
  without appended timezone. This timestamp must be properly encoded
  depending if target is WITH TZ or not, but JDBC (and other clients,
  probably too) doesn't have any knowledge about target type when
  statement is executed
 
 Seems like you need to fix *that*.
I don't say it's bad way to send timestamps in text mode. It's good solution, 
because timestamp without tz will silently ignore tz offset, timestamp with tz 
will use offset in calculations if it is there, if no it will use server TZ.

  I think about following patch (giving backward compatibility) on
  timestamp (tz). Idea is as follows if we have additional two bytes it's
  TZ offset and use this to convert received time to UTC. I wrote it in
  e-mail editor (sorry
  
  :) no C IDE last time),
 
 This is not a terribly good idea, and even if it was, how will you use
 it from a client that doesn't know which data type is really in use?
Binary protocol disallow to send timezone offset, as text mode allow (lack of 
information). I would like to send this in same behavior as text mode does:
send local time with two bytes of client tz as OID unspecified.

One more instead of pq_copymsgbtes better will be tzOffset = pq_getint(buf, 
sizeof(int16));

Kind regards,
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] [HACKERS] Improved JDBC driver part 2

2010-12-01 Thread Radosław Smogura
I've just started small clean up - now it's there.

On Wed, 1 Dec 2010 12:06:19 +0100, Valentine Gogichashvili
val...@gmail.com wrote:
 Hi,
 
 I cannot get the file:
 
 wget http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz
 --2010-12-01 12:05:28--
 http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz
 Resolving www.rsmogura.net... 64.120.14.83
 Connecting to www.rsmogura.net|64.120.14.83|:80... connected.
 HTTP request sent, awaiting response... 404 Not Found
 2010-12-01 12:05:29 ERROR 404: Not Found.
 
 
 On Tue, Nov 30, 2010 at 7:49 PM, Radosław Smogura m...@smogura.eu
wrote:
 
 Hello,

 Maybe you are interested about this what I done with JDBC

 === Original driver (Text mode) ===
 * Memory *
 1. Memory usage improvments when using result set input streams (no
 uneeded
 memory copy) - needs few touches for bigger performance.
 2. Memory usage improvments for large data, should be no problem to
load
 1GB
 bytea[] when have only 300MB of memory (threshold size still
 hardcoded).

 * JDBC 4 *
 1. XML are now correctly transformed before send to server - previous
 version
 used normal text-file transformations that is not enaugh.
 2. In all modes (text/binary) XMLs are sended in binary mode, so driver
 don't
 need to do special transformation (does it require libxml?), until
 character
 streams are used.
 3. JDBC4 exception throwing.
 4. XML objects are readable only once, you can't reuse it, update form
 result
 set (silently set to null on RS.updateRow() - shouldn't be silent)
 returns
 null till refreshRow(), but you can write to them after load.
 5.Target XML behavior is streaming behavior to don't repeat problems
with
 bytea.

 * JDBC 4.1 *
 1. Just started.

 * Others *
 1. Few additional test cases. Few utils for XML checking (string equals
 is
 too
 less) no good, but better.
 2. Fixed bug, causing inproper time(stamps) encoding for WITH TIME ZONE
 fields,
 after changing default time zone.

 === Binary mode ===
 1. Read for almost all data types with arrays.
 2. Write for few.
 3. Much more restrictive checking when casting form one type to other.
 4. Exceptions when casting from one type to other inproper type.
 5. Still ResultSet.getString() for XML will return XML - this spec.
 prohibited
 (X - base type conversion, x - possible conversion, no x - no base and
 possible = no conversion).
 6. No getPriviliges for metadata - no binary output for ACL!!!
 7. Many, many tests passed.
 8. Data reading is faster for all reads (checked with profiler, against
 original driver).

 Driver is here
 http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gzis
 currently JDK 6 compatible (will be not), compressed patch takes about
 136kb
 gziped.

 Kind regards  have a nice day
 --
 Radosław Smogura
 http://www.softperience.eu

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [JDBC] [HACKERS] Improved JDBC driver part 2

2010-12-01 Thread Radosław Smogura
On Wed, 1 Dec 2010 12:47:13 +0100, Magnus Hagander mag...@hagander.net
wrote:
 On Tue, Nov 30, 2010 at 19:49, Radosław Smogura m...@smogura.eu wrote:
 Hello,

 Maybe you are interested about this what I done with JDBC
 
 snip
 
 
 Driver is here
 http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz is
 currently JDK 6 compatible (will be not), compressed patch takes about
 136kb
 gziped.
 
 Is there any particular reason why this work can't be maintained as a
 branch to the main driver? My understanding is your work is based off
 that one? Being able to work like that would make things a lot easier
 to review.
Yes, it's based on this, with CVS subfolders in sources. I don't see any
problems to maintain this as branch. Ah only one need to read something
about CVS  branching.

 That said, such a process would also be a lot easier if the JDBC
 driver wasn't in cvs ;)
Yes, SVN is much more nicer.
 
 -- 
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/

-- 
--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Improved JDBC driver part 2

2010-11-30 Thread Radosław Smogura
Hello,

Maybe you are interested about this what I done with JDBC

=== Original driver (Text mode) ===
* Memory *
1. Memory usage improvments when using result set input streams (no uneeded 
memory copy) - needs few touches for bigger performance.
2. Memory usage improvments for large data, should be no problem to load 1GB 
bytea[] when have only 300MB of memory (threshold size still hardcoded).

* JDBC 4 *
1. XML are now correctly transformed before send to server - previous version 
used normal text-file transformations that is not enaugh.
2. In all modes (text/binary) XMLs are sended in binary mode, so driver don't 
need to do special transformation (does it require libxml?), until character 
streams are used.
3. JDBC4 exception throwing.
4. XML objects are readable only once, you can't reuse it, update form result 
set (silently set to null on RS.updateRow() - shouldn't be silent) returns 
null till refreshRow(), but you can write to them after load. 
5.Target XML behavior is streaming behavior to don't repeat problems with 
bytea.

* JDBC 4.1 *
1. Just started.

* Others *
1. Few additional test cases. Few utils for XML checking (string equals is too 
less) no good, but better.
2. Fixed bug, causing inproper time(stamps) encoding for WITH TIME ZONE fields, 
after changing default time zone.

=== Binary mode ===
1. Read for almost all data types with arrays.
2. Write for few.
3. Much more restrictive checking when casting form one type to other.
4. Exceptions when casting from one type to other inproper type.
5. Still ResultSet.getString() for XML will return XML - this spec. prohibited 
(X - base type conversion, x - possible conversion, no x - no base and 
possible = no conversion).
6. No getPriviliges for metadata - no binary output for ACL!!!
7. Many, many tests passed.
8. Data reading is faster for all reads (checked with profiler, against 
original driver).

Driver is here http://www.rsmogura.net/pgsql/pgjdbc_exp_20101130_C.tar.gz is 
currently JDK 6 compatible (will be not), compressed patch takes about 136kb 
gziped.

Kind regards  have a nice day
--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] JDBC and Binary protocol error, for some statements

2010-11-26 Thread Radosław Smogura
This what I done is 
1. Send bind
2. Put on stack the requested format types
3. On bind complete get requestedFormats from stack.
4. When execute is complete (or portal suspend) then, use requestedFormats
to change the field formats received from describe or previously cached.

I assume server can't change formats after bind, even the describe portal
was fired. Is it all good? I don't know much about server internals.

Kind regards,
Radek.


On Fri, 26 Nov 2010 01:02:25 -0500, Tom Lane t...@sss.pgh.pa.us wrote:
 Maciek Sakrejda msakre...@truviso.com writes:
 21:43:02.264 (26)  FE= Describe(statement=S_1)
 You're still doing the statement-flavor Describe. As Tom pointed out,
 this won't tell you the result types because it doesn't know them.
 Actually, technically if you issue a statement-flavor Describe *after*
 a Bind, the server does have this information, but I'm not surprised
 that it doesn't send it correctly, since it seems pointless to send
 the statement variation after already doing a Bind.
 
 In principle you could open more than one Portal off a Statement
 at the same time, so it wouldn't necessarily be well-defined anyway.
 
   regards, tom lane

-- 
--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] JDBC and Binary protocol error, for some statements

2010-11-25 Thread Radosław Smogura
I checked against other parameter bindings and it looks like problem is
connected with oid=0.
In those cases: 
1. Executing statement with parameter sent as varchar, int, long, with
text and binary format is ok.
2. Executing statement with oid=0 fail always; I've sent parameter in text
mode (encoded '2'), and in binary mode encoded int4 2 - both failed.

On Thu, 25 Nov 2010 01:56:02 -0800, Maciek Sakrejda
msakre...@truviso.com
wrote:
 Haven't really gotten much further, but an interesting note: the named
 / unnamed prepared statement and portal stuff seems to be a red
 herring. I can add a name to the portal, or move to an unnamed
 prepared statement, and I still see the same thing. Which is
 interesting, since that's not what Radosław saw (his original test
 only failed once named prepared statements kicked in)...
 ---
 Maciek Sakrejda | System Architect | Truviso
 
 1065 E. Hillsdale Blvd., Suite 215
 Foster City, CA 94404
 (650) 242-3500 Main
 www.truviso.com

-- 
--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] JDBC and Binary protocol error, for some statements

2010-11-25 Thread Radosław Smogura
Hi,

Thank you for your response.

I would only ask to be sure...
So, to summarise, I shouldn't believe server DescribeRow (in context of
format), in this situation, but only I should look at this what I asked
for, isn't it? If I asked for columns in binary format, I need to do binary
reading regarding what server has responded? If I asked for odd columns in
text, even in binary do I need to choose proper format basing only on my
request?

But to the last part of cited protocol specification, when I've sent
message with statement parameter's type int4, int8, varchar the format
field wasn't set to 0, but 1.

Kind regards,
Radosław Smogura

On Thu, 25 Nov 2010 12:23:03 -0500, Tom Lane t...@sss.pgh.pa.us wrote:
 AFAICS this isn't a bug.  What you're issuing Describe against is the
 prepared statement, not the portal.  The result column formats are not
 specified by a prepared statement, so Describe just returns zeroes for
 them.  Result column formats are specified by the Bind command, which
 creates a Portal.  If you'd issued the Describe against the Portal, you
 should get back the correct format codes.  Per the protocol
 specification:
 
 The Describe message (statement variant) specifies the name of an
 existing prepared statement (or an empty string for the unnamed
 prepared statement). The response is a ParameterDescription message
 describing the parameters needed by the statement, followed by a
 RowDescription message describing the rows that will be returned
 when the statement is eventually executed (or a NoData message if
 the statement will not return rows). ErrorResponse is issued if
 there is no such prepared statement. Note that since Bind has not
 yet been issued, the formats to be used for returned columns are not
 yet known to the backend; the format code fields in the
 RowDescription message will be zeroes in this case.
 
 Now, if there's something in the JDBC driver that expects
 DescribeStatement to return useful result format codes, that'd be
 a bug in the driver.
 
   regards, tom lane

-- 
--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] JDBC and Binary protocol error, for some statements

2010-11-25 Thread Radosław Smogura
On Thu, 25 Nov 2010 11:28:02 -0800, Maciek Sakrejda
msakre...@truviso.com
wrote:
 So, to summarise, I shouldn't believe server DescribeRow (in context of
 format), in this situation, but only I should look at this what I asked
 for, isn't it? If I asked for columns in binary format, I need to do
 binary reading regarding what server has responded?
 
 Yes, because in this case 0 doesn't mean the result will be in
 text, it means, you issued the statement-variant of Describe, so I'm
 not sure what the result format will be yet.
 
 If I asked for odd columns in text, even in binary do I need to choose
 proper format basing only on my request?
 
 I don't quite understand this question, but I think so. I don't think
 there's ever a situation where the server will ignore your result
 format requests.
 
 But to the last part of cited protocol specification, when I've sent
 message with statement parameter's type int4, int8, varchar the format
 field wasn't set to 0, but 1.
 
 I wasn't able to reproduce that with my standalone test case. When I
 changed the parameter oid to 23, I still got the same behavior. Can
 you alter my test case to reproduce the error?
 
I will do it tomorrow.

In this situation I need to test portals as well.

 ---
 Maciek Sakrejda | System Architect | Truviso
 
 1065 E. Hillsdale Blvd., Suite 215
 Foster City, CA 94404
 (650) 242-3500 Main
 www.truviso.com

-- 
--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] JDBC and Binary protocol error, for some statements

2010-11-25 Thread Radosław Smogura
Hm... I moved Bind before Describe, I now have
// Construct a new portal if needed.
Portal portal = null;
if (usePortal)
{
String portalName = C_ + (nextUniqueID++);
portal = new Portal(query, portalName);
}

sendBind(query, params, portal, noBinaryTransfer);

if (describeStatement) {
sendDescribeStatement(query, params, describeOnly);
if (describeOnly)
return;
}
still nothing
21:43:02.263 (26) simple execute,
handler=org.postgresql.jdbc2.abstractjdbc2statement$statementresulthand...@337ec9f7,
maxRows=0, fetchSize=0, flags=16
21:43:02.264 (26)  FE= Parse(stmt=S_1,query=SELECT $1::int,oids={0})
21:43:02.264 (26)  FE= Bind(stmt=S_1,portal=null,$1='2')
21:43:02.264 (26)  FE= Describe(statement=S_1)
21:43:02.264 (26)  FE= Execute(portal=null,limit=0)
21:43:02.265 (26)  FE= Sync
21:43:02.265 (26)  =BE ParseComplete [S_1]
21:43:02.265 (26)  =BE BindComplete [null]
21:43:02.266 (26)  =BE ParameterDescription
21:43:02.266 (26)  =BE RowDescription(1)
21:43:02.266 (26) Field(,INT4,4,T)
21:43:02.266 (26)  =BE DataRow(len=4)
21:43:02.267 (26)  =BE CommandStatus(SELECT 1)
21:43:02.267 (26)  =BE ReadyForQuery(I)
If I've understood well I should get proper result after 1st bind...?

On Thu, 25 Nov 2010 12:21:39 -0800, Maciek Sakrejda
msakre...@truviso.com
wrote:
 OTOH, it seems possible that the JDBC driver might behave differently
 depending on whether parameter types were prespecified or not --- it
 might issue Describe earlier in order to get the parameter types,
 perhaps.
 
 Ah. Bingo:
 
 boolean describeStatement = describeOnly || (!oneShot 
 paramsHasUnknown  queryHasUnknown  !query.isStatementDescribed());
 
 
 ---
 Maciek Sakrejda | System Architect | Truviso
 
 1065 E. Hillsdale Blvd., Suite 215
 Foster City, CA 94404
 (650) 242-3500 Main
 www.truviso.com

-- 
--
Radosław Smogura
http://www.softperience.eu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] JDBC and Binary protocol error, for some statements

2010-11-25 Thread Radosław Smogura
Thank you, but I think about this last night. Opening unnecessary portals
isn't good idea, similarly sending 2nd describe when statement was
prepared. Currently JDBC drivers doesn't make this. I think better will be
to store what format we had requested on stack, and then coerce those
formats when results are handled.

Kind regards,
Radosław Smogura

On Fri, 26 Nov 2010 01:02:25 -0500, Tom Lane t...@sss.pgh.pa.us wrote:
 Maciek Sakrejda msakre...@truviso.com writes:
 21:43:02.264 (26)  FE= Describe(statement=S_1)
 You're still doing the statement-flavor Describe. As Tom pointed out,
 this won't tell you the result types because it doesn't know them.
 Actually, technically if you issue a statement-flavor Describe *after*
 a Bind, the server does have this information, but I'm not surprised
 that it doesn't send it correctly, since it seems pointless to send
 the statement variation after already doing a Bind.
 
 In principle you could open more than one Portal off a Statement
 at the same time, so it wouldn't necessarily be well-defined anyway.
 
   regards, tom lane

-- 
--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] JDBC and Binary protocol error, for some statements

2010-11-24 Thread Radosław Smogura
I didn't described log correctly, 1st attached response is normal execution; 
flags QUERY_SUPPRESS_BEGIN | QUERY_ONESHOT, 2nd is compiled statement 
QUERY_SUPPRESS_BEGIN only.

Text format is marked as 0, binary format is 1. 

The 1st shown execution (flags=17) is good, it tells that result is sended in 
binary format, as int4, but 2nd one (flags=16) (statement compiled) result is 
bad because

Server described row as text 
07:52:06.061 (54)  =BE RowDescription(1)
07:52:06.061 (54) Field(,INT4,4,T)
but recieved tuple was clearly in binary format, 0x00, 0x00, 0x00, 0x02. 
(Look at length should be 1 or 2 if it's text format and value 2)
Speaking it simple, server said you will recive text data, but sent it as 
binary encoded int.

I've checked this againt 8.4 and 9.0.1.

Maciek Sakrejda msakre...@truviso.com Wednesday 24 November 2010 18:02:27
 Result is oid=23, format=(0) T, value = 0x00,0x00,0x00,0x02
 
 What do you mean regarding the format? Are you just inferring that
 from the data? If memory serves, the format of a particular column is
 not specified anywhere other than the RowDescription, and according to
 your JDBC log output above, the server is telling you the format is
 text (1) (which is your point--it doesn't match the resulting
 data--but I want to make sure we're clear on what's actually going
 on).

It's jdbc2.PreparedStatementTest, form JDBC driver unit tests. I've exposed 
sources here http://www.rsmogura.net/pgsql/pgjdbc_exp_20101124.tar.gz
compiled driver and unit tests are in parent directory.
In above all not related to this bug tests has been commented, just run ant 
test.
 Also, can you narrow this down to a simple, self-contained test case
 (with code)? Even if it's against a custom driver build, that would be
 easier to investigate.
 ---
 Maciek Sakrejda | System Architect | Truviso
 
 1065 E. Hillsdale Blvd., Suite 215
 Foster City, CA 94404
 (650) 242-3500 Main
 www.truviso.com


Kind regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] JDBC and Binary protocol error, for some statements

2010-11-23 Thread Radosław Smogura
Hi,

I work on implementing binary protocol, during test select ?::int has
been 
executed few times to compile it to server prepared statement, but after
this 
backend returned field format = text, but sent field data in binary mode.

The parameter is long and is sent to in text mode, with oid 0, as value
2
Result is oid=23, format=(0) T, value = 0x00,0x00,0x00,0x02

When I send parameter explicite in binary mode and int4 oid, problem
doesn't 
occurs.

Here is debug of protocol

---
07:52:02.995 (54) simple execute, 
handler=org.postgresql.jdbc2.abstractjdbc2statement$statementresulthand...@508aeb74,

maxRows=0, fetchSize=0, flags=17
07:52:02.996 (54)  FE= Parse(stmt=null,query=SELECT $1::int,oids={0})
07:52:02.996 (54)  FE= Bind(stmt=null,portal=null,$1='2')
07:52:02.996 (54)  FE= Describe(portal=null)
07:52:02.996 (54)  FE= Execute(portal=null,limit=0)
07:52:02.996 (54)  FE= Sync
07:52:02.997 (54)  =BE ParseComplete [null]
07:52:02.997 (54)  =BE BindComplete [null]
07:52:02.998 (54)  =BE RowDescription(1)
07:52:02.998 (54) Field(,INT4,4,B)
07:52:02.998 (54)  =BE DataRow(len=4)
07:52:02.998 (54)  =BE CommandStatus(SELECT)
07:52:02.999 (54)  =BE ReadyForQuery(I)


---
Compiled stat exec
07:52:06.059 (54) simple execute, 
handler=org.postgresql.jdbc2.abstractjdbc2statement$statementresulthand...@128ae45a,

maxRows=0, fetchSize=0, flags=16
07:52:06.059 (54)  FE= Parse(stmt=S_1,query=SELECT $1::int,oids={0})
07:52:06.059 (54)  FE= Describe(statement=S_1)
07:52:06.060 (54)  FE= Bind(stmt=S_1,portal=null,$1='2')
07:52:06.060 (54)  FE= Execute(portal=null,limit=0)
07:52:06.060 (54)  FE= Sync
07:52:06.060 (54)  =BE ParseComplete [S_1]
07:52:06.061 (54)  =BE ParameterDescription
07:52:06.061 (54)  =BE RowDescription(1)
07:52:06.061 (54) Field(,INT4,4,T)
07:52:06.061 (54)  =BE BindComplete [null]
07:52:06.061 (54)  =BE DataRow(len=4)
07:52:06.062 (54)  =BE CommandStatus(SELECT)
07:52:06.062 (54)  =BE ReadyForQuery(I)
org.postgresql.jdbc4.exceptions.PSQLDataException: Zła wartość dla typu
int: 

Kind regards,
Radek


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Experimental JDBC driver

2010-11-21 Thread Radosław Smogura
Hello,

I last time added some features to JDBC driver meking in some way experimental 
driver, it could be quite usefull, but if it's possible I would like to ask 
for some reviews and tests. Maybe some ideas from this will be handly. 
Beacause the chageset is huge, I don't give patch set, but only URL where You 
can find sources and compiled binary 
http://www.rsmogura.net/pgsql/ 

(drivers are called pgjdbc_exp).

Below is some info about changes, in TODO and TO_THINK is set of free ideas 
about this.

=== JDBC 4 Exceptions ===
1. Driver can throw JDBC 4 exceptions.
2. The PSQLException is now interface, to keep as far as possible backward 
compatibility, read TO_THINK
3. TODO: In many static methods still exists 
PgSqlExceptionFactory.getDefault() 
this should be removed.
4. TODO: Add functionality to allow user to choose between throwin 
PSQL*Exceptions, or plain SQL*Exception.
5. TODO: Check if decision block about exception type is fully proper and 
there is no missign states, eventualy try to decide to throw Trnasient / Non 
Transient exceptions.
6. Currently we throw only Non Transient exception, as any error in server 
only allows to commit or rollback transaction, non future data read or update 
is possible.
7. TO_THINK: Change PSQLException to interface should make no differences when 
client application will be fully rebuilded, but if we have some 3rd party 
binary library its execution can be break (different VM opcode for calling 
methods). Because we have exception factory we can
revoke PSQLException to class, create PSQLExceptionInterface and add 
additional connection parameter 
useOldException default true - if true then PSQLException will be thrown on 
error, false SQLException implementing PSQLExceptionInterface. 

=== Binary Read and Write ===
Thanks to Mikko for some concepts from his patch (especialy about integer/float 
dates).
1.Few basic data types has been written (with server types)
 - read: bool, double, int, short, decimal, varchar, text; date, timestamp 
(with / without time zone)
 - write: int, decimal, date, timestamp
 - casting from server type to Java type is currently supported for few  
classes and types, need to check this step by step with JDBC spec
 - dates / times are limited to integer dates server, currently
2.The reader / writer is configured by connection basis with factories so if in 
future PostgreSQL will change the binary representation only the appropriate 
reader / writer implementation will change.
3. TODO: Use simillar to binary factory machanism to read / write character 
protocol.
4. TODO: Add binary modes for PGType classes.
5. TODO: Coopy all statics converters form PGType to factory, and mark methods 
as deprecated (statics disallow to use custom exception factories). 
5a. Subtask create casting facility extracting PGTypes cast methods to this 
facility, connect it with current if..else trees in ResultSet.get()
6. TODO: Plugability model to internal reading and writing other server types 
in binary / text mode.
7. Add support for UUID Java class and server type.
8. TODO: Binary reader always return byte[], think about directly writting to 
stream (less GC overhead).
9.Hack JDK MutuableInteger, less GC overhead and more preformance when parsing 
DECIMAL.
10. TODO: Make additional test about timestamps (dates, etc.) with / without 
time zone when server is in different timezone then driver, and what will hapen 
on TZ change in client. Subject to ask on pg-hackers.
11. TO_THINK: It's good point to think about changing OID to JDBC RowId, just 
sugar for sugar.
12. No describe or lazy decision to read given value as binary or character.
13. If you would like to test binary for all JUnit test call 
ant -DbinaryTransfer=true 

=== Other changes ===
1. setStatementTimeout - uses Java Timer. It's not fully secure, as timer 
will abort statement when statement is executing, long Java processing, or 
some sub statements can break this soultion.

=== JDBC 4.1 ===
1. build.xml changes, and basic implementation (with 
UnsupportedOperaionException) of some JDBC interfaces to use JDK 7 JDBC 4.1

Kind regards,
Radek

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-15 Thread Radosław Smogura
On Tue, 12 Oct 2010 20:03:29 +0200, Magnus Hagander mag...@hagander.net
wrote:
 On Tue, Oct 12, 2010 at 17:55, David Fetter da...@fetter.org wrote:
 On Tue, Oct 12, 2010 at 10:37:00AM -0500, Kevin Grittner wrote:
 David Fetter da...@fetter.org wrote:
  Is there something incomplete about the ones I sent, and if so,
  what?

 Well, I'm still curious why it was necessary to modify the server
 side to implement an interface feature for which everything needed
 seems to be present on the client side.

 Not everything is.

 Let's imagine you have a connection pooler with two clients, A and B.
 A calls setQueryTimeout, then starts a query, which terminates in
 time, but dies before handling it.  B connects to the pool, gets A's
 connection, and finds a statement_timeout that's not the default, even
 though only A's single query was supposed to have that
 statement_timeout.  This is not a situation that can be resolved
 without being able to set a timer *on the server side*.
 
 Sure it can. The connection pooler just needs to issue a RESET ALL
 statement when it hands over a connection from one client to another.
 Isn't that what for example pgbouncer does - at least when configured
 per instructions?
 
 Also, doesn't this affect *all* settings, not just timeout, if it
 doesn't? Imagine client A executing a SET datestyle for example.
 
 AFAICS, any connection pooler that *doesn't* issue a reset between
 handing this around is broken, isn't it?
 

If I'm right you would like to say, that when taking connection from pool
REST ALL should be invoked.

But... connection pooler will not send RESET ALL in some situations,
because JDBC driver can have no notify about switching client. In EE
platforms (e. g. Glassfish), server sometimes creates it's own pool and in
certain configuration, when you close Connection, server will never pass
close to PooledConection, nor physical connection. This due to fact, that
GF and others adds functionality for statements pooling (if server will
call close on pooled conn, then reusing cached statement will cause error -
in fact this problem occurs with Hibernate, C3po, XA, and GFv3).

To summarize, you should never believe that RESET ALL will be called, nor
any other behavior when switching clients. 

Am I right?
--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-15 Thread Radosław Smogura
On Fri, 15 Oct 2010 10:37:05 +0200, Magnus Hagander mag...@hagander.net
wrote:
 But... connection pooler will not send RESET ALL in some situations,
 because JDBC driver can have no notify about switching client. In EE
 platforms (e. g. Glassfish), server sometimes creates it's own pool and
 in
 certain configuration, when you close Connection, server will never
pass
 close to PooledConection, nor physical connection. This due to fact,
that
 GF and others adds functionality for statements pooling (if server will
 call close on pooled conn, then reusing cached statement will cause
 error -
 in fact this problem occurs with Hibernate, C3po, XA, and GFv3).
 
 To me, that sounds like a bug in the connection pooler. It is only
 safe under quite limited circumstances.

It's hard to say this is bug. The GF connection pooler is general pooler
for all drivers, so it can't know anything about reseting, and if I have
right JDBC4 doesn't support such notifications. It can't close logical
connections, as if would close, cached statements will be invalideted too.

But benefits of pooling statements are much more greater then RESET ALL,
because you can take advance of precompiling prepared statements,
increasing performance; it is comparable to using connection pool instead
of starting physical connections.

In ~2008, Sun published result of (spectest?) Glassfih v2 and PostgreSQL
with special patches allowing statement caching (on JDBC driver side) - it
was the fastest combination, biting all professional and highly paid
software.

Probably same wrapping can do JBoss, WAS, and others.

 in fact this problem occurs with Hibernate, C3po, XA, and GFv3).
Hibernate, with C3P0 hacks and uses some private code, unwraps objects,
etc... :(, so when private implementation changes we have BUM.

 -- 
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/

-- 
--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-14 Thread Radosław Smogura
On Wed, 13 Oct 2010 21:01:06 -0400, Robert Haas robertmh...@gmail.com
wrote:
 Is this a JDBC patch or a PG patch?  Are we tracking JDBC patches
 using the CF app?

It is JDBC patch. I will clean it and submit on this site. I didn't know
about such application and such process.

--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-14 Thread Radosław Smogura
 Regarding JDBC in the CF process -- other interfaces are handled
 there.  I haven't seen one patch this size for JDBC since I've been
 involved, let alone two competing patches to implement the same
 feature.  Small patches which can be quickly handled don't make sense
 to put into the process, but it seemed reasonable for these.

In any way I'm sending this patch, and I will put this under Miscellaneous in 
CF. This cleared patch takes only 47k (in uncleared was some binary read 
classes) and about 50% it's big test case.

Have a nice day,
Radek


statemnt_to_20101014.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-13 Thread Radosław Smogura
This, what I see in your patch, is sending additional statement to server. 
This adds some unnecessery (especially TCP/IP) latency. gura 

  I sent such patch fully in Java
  (http://archives.postgresql.org/pgsql-jdbc/2009-11/msg00010.php),
  implementing cancellation with Timer and cancel query facility of
  PSQL server,
 
 Would you like to update it?

I updated patch to latets CVS version, I didn't have time to remove some 
trashes from it.

If something will be wrong with patch, give a feedback.

Kind regards,
Radosław Smogura
http://softperience.pl


statemnt_to_20101013.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] Support for JDBC setQueryTimeout, et al.

2010-10-12 Thread Radosław Smogura
On Mon, 11 Oct 2010 08:29:16 -0500, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 David Fetter da...@fetter.org wrote:
  
 Please find enclosed a WIP patch from one of my co-workers
 intended to support JDBC's setQueryTimeout, along with the patch
 for JDBC that uses it.
  
 I agree that it would be very nice to support this JDBC feature, but
 I'm not clear on why this can't be done with just JDBC changes using
 the java.util.Timer class and the existing Statement.cancel()
 method.  Can you explain why the backend needed to be touched?
  
 -Kevin

I sent such patch fully in Java
(http://archives.postgresql.org/pgsql-jdbc/2009-11/msg00010.php),
implementing cancellation with Timer and cancel query facility of PSQL
server, unfortunately none has revised it, even that setQuertyTimeout todo
is for long time on dashboard, and it's important for enterprise class
software.

--
Radosław Smogura
http://www.softperience.eu

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >