Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-09-22 Thread Peter Eisentraut
On 9/19/15 10:46 AM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 7/23/15 6:39 PM, Tom Lane wrote:
>>> + 2202HEERRCODE_INVALID_TABLESAMPLE_ARGUMENT
>>>invalid_tablesample_argument
>>> + 2202GEERRCODE_INVALID_TABLESAMPLE_REPEAT  
>>>invalid_tablesample_repeat
> 
>> Where did you get these error codes from?  The constants in the SQL
>> standard would map to
> 
>> ERRCODE_INVALID_SAMPLE_SIZE
>> ERRCODE_INVALID_REPEAT_ARGUMENT_IN_A_SAMPLE_CLAUSE
> 
>> Were you looking at a different standard, or did you intentionally
>> choose to rephrase?
> 
> I was looking at SQL:2011.  My concern in naming them that way was that
> I wanted to have errcodes that would be general enough for any tablesample
> extension to use, but still be tablesample-specific, ie I don't want them
> to have to fall back on say ERRCODE_INVALID_PARAMETER_VALUE.

Makes sense.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-09-19 Thread Tom Lane
Peter Eisentraut  writes:
> On 7/23/15 6:39 PM, Tom Lane wrote:
>> + 2202HEERRCODE_INVALID_TABLESAMPLE_ARGUMENT 
>>   invalid_tablesample_argument
>> + 2202GEERRCODE_INVALID_TABLESAMPLE_REPEAT   
>>   invalid_tablesample_repeat

> Where did you get these error codes from?  The constants in the SQL
> standard would map to

> ERRCODE_INVALID_SAMPLE_SIZE
> ERRCODE_INVALID_REPEAT_ARGUMENT_IN_A_SAMPLE_CLAUSE

> Were you looking at a different standard, or did you intentionally
> choose to rephrase?

I was looking at SQL:2011.  My concern in naming them that way was that
I wanted to have errcodes that would be general enough for any tablesample
extension to use, but still be tablesample-specific, ie I don't want them
to have to fall back on say ERRCODE_INVALID_PARAMETER_VALUE.

Is your concern that we shouldn't be extending the meaning of these
standard SQLSTATE numbers in that way, or that I didn't slavishly follow
the standard's wording while naming the macros, or what exactly?

It's certainly not too late to change this, but we need to agree on
what would be better.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-09-18 Thread Peter Eisentraut
On 7/23/15 6:39 PM, Tom Lane wrote:
> + 2202HEERRCODE_INVALID_TABLESAMPLE_ARGUMENT  
>  invalid_tablesample_argument
> + 2202GEERRCODE_INVALID_TABLESAMPLE_REPEAT
>  invalid_tablesample_repeat

Where did you get these error codes from?  The constants in the SQL
standard would map to

ERRCODE_INVALID_SAMPLE_SIZE
ERRCODE_INVALID_REPEAT_ARGUMENT_IN_A_SAMPLE_CLAUSE

Were you looking at a different standard, or did you intentionally
choose to rephrase?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-24 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 The only major difference that I see so far and I'd like you to 
 incorporate that into your patch is that I renamed the SampleScanCost to 
 SampleScanGetRelSize because that reflects much better the use of it, it 
 isn't really used for costing, but for getting the pages and tuples of 
 the baserel.

Good suggestion.  I was feeling vaguely uncomfortable with that name as
well, given what the functionality ended up being.  (In previous drafts
of my patch I'd given the costing function more responsibility, but
that ends up making TSMs responsible to know what nodeSamplescan.c does,
so it's wrong.)  Will do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-24 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 Ok, attached are couple of cosmetic changes - what I wrote above, plus 
 comment about why we do float8 + hashing for REPEATABLE (it's not 
 obvious IMHO) and one additional test query.

OK, thanks.

 Do you want to do the contrib changes yourself as well or should I take it?

I'm working on them already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-24 Thread Petr Jelinek

On 2015-07-25 00:36, Tom Lane wrote:

I wrote:

Petr Jelinek p...@2ndquadrant.com writes:

The only major difference that I see so far and I'd like you to
incorporate that into your patch is that I renamed the SampleScanCost to
SampleScanGetRelSize because that reflects much better the use of it, it
isn't really used for costing, but for getting the pages and tuples of
the baserel.



Good suggestion.  I was feeling vaguely uncomfortable with that name as
well, given what the functionality ended up being.


After further thought it seemed like the right name to use is
SampleScanGetSampleSize, so as to avoid confusion between the size of the
relation and the size of the sample.  (The planner is internally not
making such a distinction right now, but that doesn't mean we should
propagate that fuzzy thinking into the API spec.)



Right, sounds good to me.


Attached is a more-or-less-final version of the proposed patch.
Major changes since yesterday:

* I worked over the contrib modules and docs.

* I thought of a reasonably easy way to do something with nonrepeatable
sampling methods inside join queries: we can simply wrap the SampleScan
plan node in a Materialize node, which will guard it against being
executed more than once.  There might be a few corner cases where this
doesn't work fully desirably, but in testing it seemed to do the right
thing (and I added some regression tests about that).  This seems
certainly a better answer than either throwing an error or ignoring
the problem.



Seems reasonable.

I was wondering if we should perhaps cache the output of GetTsmRoutine 
as we call it up to 4 times in the planner now but it's relatively cheap 
call (basically just makeNode) so it's probably not worth it. In any 
case it's not something that should stop you from committing the patch. 
Thanks for all your work on this.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-24 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 I was wondering if we should perhaps cache the output of GetTsmRoutine 
 as we call it up to 4 times in the planner now but it's relatively cheap 
 call (basically just makeNode) so it's probably not worth it.

Yeah, I was wondering about that too.  The way to do it would probably
be to add a TsmRoutine pointer to RelOptInfo.  I'm not concerned at all
about the makeNode/fill-the-struct cost, but the syscache lookup involved
in getting from the function OID to the function might be worth worrying
about.  As things stand, it didn't quite seem worth the trouble, but if
we add any more planner lookups of the TsmRoutine then I'd want to do it.

Another place for future improvement is to store the sample-size outputs
separately in RelOptInfo instead of overwriting pages/tuples.  I'm not
sure it's worth the complication right now, but if we ever support doing
sampling with more than one scan plan type (eg bernoulli filtering in
an indexscan), we'd pretty much have to do that in order to be able to
compute costs sanely.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-23 Thread Petr Jelinek

On 2015-07-24 00:39, Tom Lane wrote:

I wrote:

OK, so InitSampleScan for a function called at ExecInitSampleScan time
(which we might as well make optional), and then we'll use BeginSampleScan
for the function that gets the parameters.  The restart/ReScan function
goes away since BeginSampleScan will take its place.


Here's a WIP patch implementing things this way.  I've also taken the time
to do a complete code review, and fixed quite a large number of things,
some cosmetic and some not so much.  I have not yet touched the tsm
contrib modules (so they won't even compile...), but I'm reasonably happy
with the state of the core code now.



Eh, I was just writing email with my rewrite :)

I'll do review of this instead then.

The only major difference that I see so far and I'd like you to 
incorporate that into your patch is that I renamed the SampleScanCost to 
SampleScanGetRelSize because that reflects much better the use of it, it 
isn't really used for costing, but for getting the pages and tuples of 
the baserel.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-23 Thread Petr Jelinek

On 2015-07-24 01:26, Petr Jelinek wrote:

On 2015-07-24 00:39, Tom Lane wrote:

I wrote:

OK, so InitSampleScan for a function called at ExecInitSampleScan time
(which we might as well make optional), and then we'll use
BeginSampleScan
for the function that gets the parameters.  The restart/ReScan function
goes away since BeginSampleScan will take its place.


Here's a WIP patch implementing things this way.  I've also taken the
time
to do a complete code review, and fixed quite a large number of things,
some cosmetic and some not so much.  I have not yet touched the tsm
contrib modules (so they won't even compile...), but I'm reasonably happy
with the state of the core code now.



Eh, I was just writing email with my rewrite :)

I'll do review of this instead then.

The only major difference that I see so far and I'd like you to
incorporate that into your patch is that I renamed the SampleScanCost to
SampleScanGetRelSize because that reflects much better the use of it, it
isn't really used for costing, but for getting the pages and tuples of
the baserel.



Ok, attached are couple of cosmetic changes - what I wrote above, plus 
comment about why we do float8 + hashing for REPEATABLE (it's not 
obvious IMHO) and one additional test query.


Do you want to do the contrib changes yourself as well or should I take it?

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/tablesample/bernoulli.c b/src/backend/access/tablesample/bernoulli.c
index 97d6407..e0a1568 100644
--- a/src/backend/access/tablesample/bernoulli.c
+++ b/src/backend/access/tablesample/bernoulli.c
@@ -46,7 +46,7 @@ typedef struct
 } BernoulliSamplerData;
 
 
-static void bernoulli_samplescancost(PlannerInfo *root,
+static void bernoulli_samplescangetrelsize(PlannerInfo *root,
 		 RelOptInfo *baserel,
 		 List *paramexprs,
 		 BlockNumber *pages,
@@ -73,7 +73,7 @@ tsm_bernoulli_handler(PG_FUNCTION_ARGS)
 	tsm-parameterTypes = list_make1_oid(FLOAT4OID);
 	tsm-repeatable_across_queries = true;
 	tsm-repeatable_across_scans = true;
-	tsm-SampleScanCost = bernoulli_samplescancost;
+	tsm-SampleScanGetRelSize = bernoulli_samplescangetrelsize;
 	tsm-InitSampleScan = bernoulli_initsamplescan;
 	tsm-BeginSampleScan = bernoulli_beginsamplescan;
 	tsm-NextSampleBlock = NULL;
@@ -87,11 +87,11 @@ tsm_bernoulli_handler(PG_FUNCTION_ARGS)
  * Cost estimation.
  */
 static void
-bernoulli_samplescancost(PlannerInfo *root,
-		 RelOptInfo *baserel,
-		 List *paramexprs,
-		 BlockNumber *pages,
-		 double *tuples)
+bernoulli_samplescangetrelsize(PlannerInfo *root,
+			   RelOptInfo *baserel,
+			   List *paramexprs,
+			   BlockNumber *pages,
+			   double *tuples)
 {
 	Node	   *pctnode;
 	float4		samplesize;
diff --git a/src/backend/access/tablesample/system.c b/src/backend/access/tablesample/system.c
index 7cece29..1cdb859 100644
--- a/src/backend/access/tablesample/system.c
+++ b/src/backend/access/tablesample/system.c
@@ -48,7 +48,7 @@ typedef struct
 } SystemSamplerData;
 
 
-static void system_samplescancost(PlannerInfo *root,
+static void system_samplescangetrelsize(PlannerInfo *root,
 	  RelOptInfo *baserel,
 	  List *paramexprs,
 	  BlockNumber *pages,
@@ -76,7 +76,7 @@ tsm_system_handler(PG_FUNCTION_ARGS)
 	tsm-parameterTypes = list_make1_oid(FLOAT4OID);
 	tsm-repeatable_across_queries = true;
 	tsm-repeatable_across_scans = true;
-	tsm-SampleScanCost = system_samplescancost;
+	tsm-SampleScanGetRelSize = system_samplescangetrelsize;
 	tsm-InitSampleScan = system_initsamplescan;
 	tsm-BeginSampleScan = system_beginsamplescan;
 	tsm-NextSampleBlock = system_nextsampleblock;
@@ -90,11 +90,11 @@ tsm_system_handler(PG_FUNCTION_ARGS)
  * Cost estimation.
  */
 static void
-system_samplescancost(PlannerInfo *root,
-	  RelOptInfo *baserel,
-	  List *paramexprs,
-	  BlockNumber *pages,
-	  double *tuples)
+system_samplescangetrelsize(PlannerInfo *root,
+			RelOptInfo *baserel,
+			List *paramexprs,
+			BlockNumber *pages,
+			double *tuples)
 {
 	Node	   *pctnode;
 	float4		samplesize;
diff --git a/src/backend/executor/nodeSamplescan.c b/src/backend/executor/nodeSamplescan.c
index 32db05e..144d2db 100644
--- a/src/backend/executor/nodeSamplescan.c
+++ b/src/backend/executor/nodeSamplescan.c
@@ -325,6 +325,12 @@ tablesample_init(SampleScanState *scanstate)
 		 * suitable seed.  For regression-testing purposes, that has the
 		 * convenient property that REPEATABLE(0) gives a machine-independent
 		 * result.
+		 *
+		 * The reson for using float8 for REPEATABLE at the SQL level is that
+		 * it will work as expected both for users used to databases which
+		 * accept only integers in REPEATABLE clause and for those who might
+		 * expect that REPEATABLE works same way as setseed() (float in the
+		 * range from -1 to 1).
 		 */
 		seed = 

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-23 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 On 2015-07-23 02:01, Tom Lane wrote:
 This needs to work more like LIMIT, which doesn't try to compute the
 limit parameters until the first fetch.  So what we need is an Init
 function that does very darn little indeed (maybe we don't even need
 it at all), and then a ParamInspect function that is called at first fetch
 or during a ReScan, and that one is the one that gets to look at the
 evaluated parameter values.

 If we replace the Begin and ReScan interfaces by single interface the 
 Init would definitely be optional (all it would do so far is palloc the 
 tsmdata which can be done easily in the new interface if tsmdata is 
 NULL). I don't like the ParamInspect name as that function needs to 
 setup the state for the sampling method (and that's true no matter if we 
 have Init or not), I think something like BeginScan works better, but it 
 must be clearly documented that it's called for ReScan as well.

OK, so InitSampleScan for a function called at ExecInitSampleScan time
(which we might as well make optional), and then we'll use BeginSampleScan
for the function that gets the parameters.  The restart/ReScan function
goes away since BeginSampleScan will take its place.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-22 Thread Tom Lane
I wrote:
 We could alternatively provide two scan-initialization callbacks,
 one that analyzes the parameters before we do heap_beginscan,
 and another that can do additional setup afterwards.  Actually,
 that second call would really not be meaningfully different from
 the ReScan call, so another solution would be to just automatically
 invoke ReScan after we've created the HeapScanDesc.  TSMs could work
 around not having this by complicating their NextBlock function a bit
 (so that it would do some initialization on first call) but perhaps
 it would be easier to have the additional init call.

Actually, there's another reason why this has to be redesigned: evaluating
the parameter expressions of TABLESAMPLE during ExecInitSampleScan is
not OK.  Consider this:

regression=# select * from (values (0::float4),(100)) v(pct), lateral (select 
count(*) from tenk1 tablesample bernoulli (pct)) ss;
 pct | count 
-+---
   0 | 0
 100 | 0
(2 rows)

Surely the second execution of the subquery should've given me 1.
It doesn't because the TABLESAMPLE parameters aren't recomputed during a
rescan.  Even if you're minded to claim that that's all right, this is
definitely not acceptable:

regression=# select * from (values (0::float4),(100)) v(pct), lateral (select * 
from tenk1 tablesample bernoulli (pct)) ss;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

That one's crashing because pct is a Var that refers to an outer scan
tuple that doesn't exist yet.  I'm not real sure how come the case with
an aggregate manages not to crash, actually.

This needs to work more like LIMIT, which doesn't try to compute the
limit parameters until the first fetch.  So what we need is an Init
function that does very darn little indeed (maybe we don't even need
it at all), and then a ParamInspect function that is called at first fetch
or during a ReScan, and that one is the one that gets to look at the
evaluated parameter values.

If we wanted to let the method inspect the HeapScanDesc during setup
it would need still a third callback.  I'm not exactly sure if that's
worth the trouble.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-21 Thread Petr Jelinek

On 2015-07-20 17:23, Tom Lane wrote:


Doesn't look like it to me: heap_beginscan_sampling always passes
allow_sync = false to heap_beginscan_internal.



Oh, right.


More to the point, the existing coding of all these methods is such
that they would fail to be reproducible if syncscan were enabled,
because the scan start point wouldn't be the same.  That's fixable,
but it'll take more work than just dealing with the above oversight.
In any case, given that Simon has stated he wants support for sample
methods that don't try to be reproducible, we need the method
to be able to say whether syncscan is allowed.



I am not completely clear on how we do this tbh, do you mean we use the 
info about repeatability for this?


Another thing that's not clear to me after playing with this is how do 
we want to detect if to do pagemode scan or not. I understand that it's 
neat optimization to say pagemode = true in bernoulli when percentage is 
high and false when it's low but then this would have to come from the 
BeginSampleScan() in the proposed API, but then BeginSampleScan would 
not have access to HeapScanDesc as it would not be inited yet when it's 
called. The info that BeginSampleScan() needs can be obtained directly 
from ss_currentRelation I guess, but it's somewhat strange to pass 
semi-initialized SampleScanState to the BeginSampleScan().


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-21 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 Another thing that's not clear to me after playing with this is how do 
 we want to detect if to do pagemode scan or not. I understand that it's 
 neat optimization to say pagemode = true in bernoulli when percentage is 
 high and false when it's low but then this would have to come from the 
 BeginSampleScan() in the proposed API, but then BeginSampleScan would 
 not have access to HeapScanDesc as it would not be inited yet when it's 
 called.

Right.

 The info that BeginSampleScan() needs can be obtained directly 
 from ss_currentRelation I guess, but it's somewhat strange to pass 
 semi-initialized SampleScanState to the BeginSampleScan().

Doesn't seem like a big problem from here.  The HeapScanDesc pointer
will be null at that point, so it's not like attempts to access it
would escape notice.

We could alternatively provide two scan-initialization callbacks,
one that analyzes the parameters before we do heap_beginscan,
and another that can do additional setup afterwards.  Actually,
that second call would really not be meaningfully different from
the ReScan call, so another solution would be to just automatically
invoke ReScan after we've created the HeapScanDesc.  TSMs could work
around not having this by complicating their NextBlock function a bit
(so that it would do some initialization on first call) but perhaps
it would be easier to have the additional init call.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-20 Thread Petr Jelinek

On 2015-07-19 22:56, Tom Lane wrote:

Since I'm not observing any movement on the key question of redesigning
the tablesample method API, and I think that's something that's absolutely
necessary to fix for 9.5, attached is an attempt to respecify the API.



Sorry, I got something similar to what you posted written as well, but I 
didn't want to submit before I have more working code done.




* I got rid of the TableSampleDesc struct altogether in favor of giving
the execution functions access to the whole SampleScanState executor
state node.  If we add support for filtering at the join level, filtering
in indexscan nodes, etc, those would become separate sets of API functions
in my view; there is no need to pretend that this set of API functions
works for anything except the SampleScan case.  This way is more parallel
to the FDW precedent, too.  In particular it lets tablesample code get at
the executor's EState, which the old API did not, but which might be
necessary for some scenarios.


Ok.



* You might have expected me to move the tsmseqscan and tsmpagemode flags
into the TsmRoutine struct, but instead this API puts equivalent flags
into the SampleScanState struct.  The reason for that is that it lets
the settings be determined at runtime after inspecting the TABLESAMPLE
parameters, which I think would be useful.  For example, whether to use
the bulkread strategy should probably depend on what the sampling
percentage is.



I think this ignores one aspect of the old API. That is, we allowed the 
sampling method to specify which kind of input parameters it accepts. 
This is why for example the tsm_system_rows accepts integer while system 
and bernoulli both accept float8. This part of the API is certainly not 
needed for bernoulli and system sampling but if we ever want to do 
something more sophisticated like stratified sampling which Simon 
mentioned several times, specifying just percentages is not going to be 
enough.




* As written, this still allows TABLESAMPLE parameters to have null
values, but I'm pretty strongly tempted to get rid of that: remove
the paramisnull[] argument and make the core code reject null values.
I can't see any benefit in allowing null values that would justify the
extra code and risks-of-omission involved in making every tablesample
method check this for itself.



I am for not allowing NULLs.


* I specified that omitting NextSampleBlock is allowed and causes the
core code to do a standard seqscan, including syncscan support, which
is a behavior that's impossible with the current API.  If we fix
the bernoulli code to have history-independent sampling behavior,
I see no reason that syncscan shouldn't be enabled for it.



Umm, we were actually doing syncscan as well before.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-20 Thread Petr Jelinek

On 2015-07-20 12:18, Petr Jelinek wrote:

On 2015-07-19 22:56, Tom Lane wrote:


* You might have expected me to move the tsmseqscan and tsmpagemode flags
into the TsmRoutine struct, but instead this API puts equivalent flags
into the SampleScanState struct.  The reason for that is that it lets
the settings be determined at runtime after inspecting the TABLESAMPLE
parameters, which I think would be useful.  For example, whether to use
the bulkread strategy should probably depend on what the sampling
percentage is.



I think this ignores one aspect of the old API. That is, we allowed the
sampling method to specify which kind of input parameters it accepts.
This is why for example the tsm_system_rows accepts integer while system
and bernoulli both accept float8. This part of the API is certainly not
needed for bernoulli and system sampling but if we ever want to do
something more sophisticated like stratified sampling which Simon
mentioned several times, specifying just percentages is not going to be
enough.



Actually I see the tsmapi has support for this, so please ignore the noise.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-20 Thread Emre Hasegeli
 to handle DROP dependency behaviors properly.  (On reflection, maybe
 better if it's bernoulli(internal) returns tablesample_handler,
 so as to guarantee that such a function doesn't create a conflict with
 any user-defined function of the same name.)

 The probability of conflict seems high with the system() so yeah we'd need
 some kind of differentiator.

Maybe it would be even better to have something like bernoulli(tablesample)
where tablesample defined as pseudo-type like trigger.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-20 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 On 2015-07-19 22:56, Tom Lane wrote:
 * I specified that omitting NextSampleBlock is allowed and causes the
 core code to do a standard seqscan, including syncscan support, which
 is a behavior that's impossible with the current API.  If we fix
 the bernoulli code to have history-independent sampling behavior,
 I see no reason that syncscan shouldn't be enabled for it.

 Umm, we were actually doing syncscan as well before.

Doesn't look like it to me: heap_beginscan_sampling always passes
allow_sync = false to heap_beginscan_internal.

More to the point, the existing coding of all these methods is such
that they would fail to be reproducible if syncscan were enabled,
because the scan start point wouldn't be the same.  That's fixable,
but it'll take more work than just dealing with the above oversight.
In any case, given that Simon has stated he wants support for sample
methods that don't try to be reproducible, we need the method
to be able to say whether syncscan is allowed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-19 Thread Simon Riggs
On 19 July 2015 at 21:56, Tom Lane t...@sss.pgh.pa.us wrote:


 Since I'm not observing any movement


Apologies if nothing visible was occurring. Petr and I had arranged to meet
and discuss Mon/Tues.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-19 Thread Tom Lane
Since I'm not observing any movement on the key question of redesigning
the tablesample method API, and I think that's something that's absolutely
necessary to fix for 9.5, attached is an attempt to respecify the API.

I haven't actually written any code, but I've written a tsmapi.h file
modeled on fdwapi.h, and rewritten tablesample-method.sgml to match;
those two files are attached.  Some notes:

* This is predicated on the assumption that we'll get rid of the
pg_tablesample_method catalog and instead have a single FDW-style handler
function for each sample method.  That fixes the problem with the contrib
modules being broken in DROP/pg_upgrade cases.

* I got rid of the TableSampleDesc struct altogether in favor of giving
the execution functions access to the whole SampleScanState executor
state node.  If we add support for filtering at the join level, filtering
in indexscan nodes, etc, those would become separate sets of API functions
in my view; there is no need to pretend that this set of API functions
works for anything except the SampleScan case.  This way is more parallel
to the FDW precedent, too.  In particular it lets tablesample code get at
the executor's EState, which the old API did not, but which might be
necessary for some scenarios.

* You might have expected me to move the tsmseqscan and tsmpagemode flags
into the TsmRoutine struct, but instead this API puts equivalent flags
into the SampleScanState struct.  The reason for that is that it lets
the settings be determined at runtime after inspecting the TABLESAMPLE
parameters, which I think would be useful.  For example, whether to use
the bulkread strategy should probably depend on what the sampling
percentage is.

* I got rid of the ExamineTuple function altogether.  As I said before,
I think what that is mostly doing is encouraging people to do unsound
things.  But in any case, there is no need for it because NextSampleTuple
*can* look at the HeapScanDesc state if it really wants to: that lets it
identify visible tuples, or even inspect the tuple contents.  In the
attached, I documented the visible-tuples aspect but did not suggest
examining tuple contents.

* As written, this still allows TABLESAMPLE parameters to have null
values, but I'm pretty strongly tempted to get rid of that: remove
the paramisnull[] argument and make the core code reject null values.
I can't see any benefit in allowing null values that would justify the
extra code and risks-of-omission involved in making every tablesample
method check this for itself.

* I specified that omitting NextSampleBlock is allowed and causes the
core code to do a standard seqscan, including syncscan support, which
is a behavior that's impossible with the current API.  If we fix
the bernoulli code to have history-independent sampling behavior,
I see no reason that syncscan shouldn't be enabled for it.

Barring objections, I'll press forward with turning this into code
over the next few days.

regards, tom lane

/*-
 *
 * tsmapi.h
 *	  API for tablesample methods
 *
 * Copyright (c) 2015, PostgreSQL Global Development Group
 *
 * src/include/access/tsmapi.h
 *
 *-
 */
#ifndef TSMAPI_H
#define TSMAPI_H

#include nodes/execnodes.h
#include nodes/relation.h


/*
 * Callback function signatures --- see tablesample-method.sgml for more info.
 */

typedef void (*SampleScanCost_function) (PlannerInfo *root,
		 RelOptInfo *baserel,
		 List *paramexprs,
		 BlockNumber *pages,
		 double *tuples);

typedef void (*BeginSampleScan_function) (SampleScanState *node,
		  int eflags,
		  uint32 seed,
		  Datum *params,
		  bool *paramisnull,
		  int nparams);

typedef BlockNumber (*NextSampleBlock_function) (SampleScanState *node);

typedef OffsetNumber (*NextSampleTuple_function) (SampleScanState *node,
  BlockNumber blockno,
  OffsetNumber maxoffset);

typedef void (*ReScanSampleScan_function) (SampleScanState *node);

typedef void (*EndSampleScan_function) (SampleScanState *node);

/*
 * TsmRoutine is the struct returned by a tablesample method's handler
 * function.  It provides pointers to the callback functions needed by the
 * planner and executor, as well as additional information about the method.
 *
 * More function pointers are likely to be added in the future.
 * Therefore it's recommended that the handler initialize the struct with
 * makeNode(TsmRoutine) so that all fields are set to NULL.  This will
 * ensure that no fields are accidentally left undefined.
 */
typedef struct TsmRoutine
{
	NodeTag		type;

	/* List of datatype OIDs for the arguments of the TABLESAMPLE clause */
	List	   *parameterTypes;

	/* Can method produce repeatable samples across, or even within, queries? */
	bool		repeatable_across_queries;
	bool		

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-17 Thread Petr Jelinek

On 2015-07-16 17:08, Tom Lane wrote:

Petr Jelinek p...@2ndquadrant.com writes:

On 2015-07-16 15:59, Tom Lane wrote:

I'm not clear on whether sequence AMs would need explicit catalog
representation, or could be folded down to just a single SQL function
with special signature as I suggested for tablesample handlers.
Is there any need for a sequence AM to have additional catalog
infrastructure like index AMs need?



That depends on the route we will choose to take with the storage there.
If we allow custom columns for sequence AMs (which is what both Heikki
and me seem to be inclined to do) then I think it will still need
catalog, plus it's also easier to just reuse the relam behavior than
coming up up with something completely new IMHO.


Hm.  I've not been following the sequence AM stuff carefully, but if you
want to use relam to point at a sequence's AM then really sequence AMs
have to be represented in pg_am.  (It would be quite broken from a
relational theory standpoint if relam could point to either of two
catalogs; not to mention that we have no way to guarantee OID uniqueness
across multiple catalogs.)



Well not necessarily, it would just mean that relam has different 
meaning depending on relkind so the OID uniqueness is not needed at all.



As things stand today, putting both kinds of AM into one catalog would be
pretty horrible, but it seems not hard to make it work if we did this sort
of refactoring first.  We'd end up with three columns in pg_am: amname,
amkind (index or sequence), and amhandler.  The type and contents of the
struct returned by amhandler could be different depending on amkind, which
means that the APIs could be as different as we need, despite sharing just
one catalog.  The only real restriction is that index and sequence AMs
could not have the same names, which doesn't seem like much of a problem
from here.



Yes, this would be better.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Petr Jelinek

On 2015-07-13 00:36, Tom Lane wrote:


We have a far better model to follow already, namely the foreign data
wrapper API.  In that, there's a single SQL-visible function that returns
a dummy datatype indicating that it's an FDW handler, and when called,
it hands back a struct containing pointers to all the other functions
that the particular wrapper needs to supply (and, if necessary, the struct
could have non-function-pointer fields containing other info the core
system might need to know about the handler).  We could similarly invent a
pseudotype tablesample_handler and represent each tablesample method by
a single SQL function returning tablesample_handler.  Any future changes
in the API for tablesample handlers would then appear as changes in the C
definition of the struct returned by the handler, which requires no
SQL-visible thrashing, hence creates no headaches for pg_upgrade, and it
is pretty easy to design it so that failure to update an external module
that implements the API results in C compiler errors and/or sane fallback
behavior.



(back from vacation, going over this thread)

Yes this sounds very sane (and we use something similar also for logical 
decoding plugins, not just FDWs). I wish this has occurred to me before, 
I would not have to spend time on the DDL support which didn't even get in.



Once we've done that, I think we don't even need a special catalog
representing tablesample methods.  Given FROM foo TABLESAMPLE
bernoulli(...), the parser could just look for a function bernoulli()
returning tablesample_handler, and it's done.  The querytree would have an
ordinary function dependency on that function, which would be sufficient


It seems possible that we might not need catalog indeed.

This would also simplify the parser part which currently contains 
specialized function search code as we could most likely just reuse the 
generic code.



to handle DROP dependency behaviors properly.  (On reflection, maybe
better if it's bernoulli(internal) returns tablesample_handler,
so as to guarantee that such a function doesn't create a conflict with
any user-defined function of the same name.)



The probability of conflict seems high with the system() so yeah we'd 
need some kind of differentiator.



PS: now that I've written this rant, I wonder why we don't redesign the
index AM API along the same lines.  It probably doesn't matter much at
the moment, but if we ever get serious about supporting index AM
extensions, I think we ought to consider doing that.


+1

I think this is very relevant to the proposed sequence am patch as well.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Alvaro Herrera
Petr Jelinek wrote:
 On 2015-07-13 00:36, Tom Lane wrote:

 PS: now that I've written this rant, I wonder why we don't redesign the
 index AM API along the same lines.  It probably doesn't matter much at
 the moment, but if we ever get serious about supporting index AM
 extensions, I think we ought to consider doing that.
 
 +1
 
 I think this is very relevant to the proposed sequence am patch as well.

Hmm, how would this work?  Would we have index AM implementation run
some function that register their support methods somehow at startup?
Hopefully we're not going to have the index AMs become shared libraries.

In any case, if indexes AMs and sequence AMs go this route, that
probably means the column store AM we're working on will probably have
to go the same route too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Petr Jelinek

On 2015-07-13 15:39, Tom Lane wrote:


Datum
tsm_system_rows_init(PG_FUNCTION_ARGS)
{
 TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
 uint32  seed = PG_GETARG_UINT32(1);
 int32   ntuples = PG_ARGISNULL(2) ? -1 : PG_GETARG_INT32(2);

This is rather curious coding.  Why should this function check only
one of its arguments for nullness?  If it needs to defend against
any of them being null, it really needs to check all.  But in point of
fact, this function is declared STRICT, which means it's a violation of
the function call protocol if the core code ever passes a null to it,
and so this test ought to be dead code.

A similar pattern of ARGISNULL checks in declared-strict functions exists
in all the tablesample modules, not just this one, showing that this is an
overall design error not just a thinko here.  My inclination would be to
make the core code enforce non-nullness of all tablesample arguments so as
to make it unnecessary to check strictness of the tsm*init functions, but
in any case it is Not Okay to just pass nulls willy-nilly to strict C
functions.



The reason for this ugliness came from having to have balance between 
modularity and following the SQL Standard error codes for BERNOULLI and 
SYSTEM, which came as issue during reviews (the original code did the 
checks before calling the sampling method's functions but produced just 
generic error code about wrong parameter). I considered it as okayish 
mainly because those functions are not SQL callable and the code which 
calls them knows how to handle it correctly, but I understand why you don't.


I guess if we did what you proposed in another email in this thread - 
don't have the API exposed on SQL level, we could send the additional 
parameters as List * and leave the validation completely to the 
function. (And maybe don't allow NULLs at all)



Also, I find this coding pretty sloppy even without the strictness angle,
because the net effect of this way of dealing with nulls is that an
argument-must-not-be-null complaint is reported as out of range,
which is both confusing and the wrong ERRCODE.

 if (ntuples  1)
 ereport(ERROR,
 (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
  errmsg(invalid sample size),
  errhint(Sample size must be positive integer value.)));

I don't find this to be good error message style.  The secondary comment
is not a hint, it's an ironclad statement of what you did wrong, so if
we wanted to phrase it like this it should be an errdetail not errhint.
But the whole thing is overly cute anyway because there is no reason at
all not to just say what we mean in the primary error message, eg
 ereport(ERROR,
 (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
  errmsg(sample size must be greater than zero)));



Same as above, although now that I re-read the standard I am sure I 
misunderstand it the first time - it says:
If S is the null value or if S  0 (zero) or if S  100, then an 
exception condition is raised: data exception — invalid sample size.


I took it as literal error message originally but they just mean status 
code by it.



While we're on the subject, what's the reason for disallowing a sample
size of zero?  Seems like a reasonable edge case.


Yes that's a bug.



 /* All blocks have been read, we're done */
 if (sampler-doneblocks  sampler-nblocks ||
 sampler-donetuples = sampler-ntuples)
 PG_RETURN_UINT32(InvalidBlockNumber);

Okay, I lied, I *am* going to complain about this comment.  Comments that
do not accurately describe the code they're attached to are worse than
useless.



That's copy-pasto from the tsm_system_time.


In short, I'd do something more like

 uint32  r;

 /* Safety check to avoid infinite loop or zero result for small n. */
 if (n = 1)
 return 1;

 /*
  * This should only take 2 or 3 iterations as the probability of 2 numbers
  * being relatively prime is ~61%; but just in case, we'll include a
  * CHECK_FOR_INTERRUPTS in the loop.
  */
 do {
 CHECK_FOR_INTERRUPTS();
 r = (uint32) (sampler_random_fract(randstate) * n);
 } while (r == 0 || gcd(r, n)  1);

Note however that this coding would result in an unpredictable number
of iterations of the RNG, which might not be such a good thing if we're
trying to achieve repeatability.  It doesn't matter in the context of
this module since the RNG is not used after initialization, but it would
matter if we then went on to do Bernoulli-style sampling.  (Possibly that
could be dodged by reinitializing the RNG after the initialization steps.)



Bernoulli-style sampling does not need this kind of code so it's not 
really an issue. That is unless you'd like to combine the linear probing 
and bernoulli of course, but I don't see any benefit in doing that.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Petr Jelinek

On 2015-07-13 18:00, Tom Lane wrote:


And here's that.  I do *not* claim that this is a complete list of design
issues with the patch, it's just things I happened to notice in the amount
of time I've spent so far (which is already way more than I wanted to
spend on TABLESAMPLE right now).


I'm not sure that we need an extensible sampling-method capability at all,
let alone that an API for that needs to be the primary focus of a patch.
Certainly the offered examples of extension modules aren't inspiring:
tsm_system_time is broken by design (more about that below) and nobody
would miss tsm_system_rows if it weren't there.  What is far more
important is to get sampling capability into indexscans, and designing
an API ahead of time seems like mostly a distraction from that.

I'd think seriously about tossing the entire executor-stage part of the
patch, creating a stateless (history-independent) sampling filter that
just accepts or rejects TIDs, and sticking calls to that into all the
table scan node types.  Once you had something like that working well
it might be worth considering whether to try to expose an API to
generalize it.  But even then it's not clear that we really need any
more options than true-Bernoulli and block-level sampling.


I think this is not really API issue as much as opposite approach on 
what to implement first. I prioritized in first iteration for the true 
block sampling support as that's what I've been mostly asked for.


But my plan was to have at same later stage (9.6+) also ability to do 
subquery scans etc. Basically new SamplingFilter executor node which 
would pass the tuples to examinetuple() which would then decide what to 
do with it. The selection between using nextblock/nexttuple and 
examinetuple was supposed to be extension of the API where the sampling 
method would say if it supports examinetuple or nextblock/nexttuple or 
both. And eventually I wanted to rewrite bernoulli to just use the 
examinetuple() on top of whatever scan once this additional support was 
in. I think I explained this during the review stage.




The IBM paper I linked to in the other thread mentions that their
optimizer will sometimes choose to do Bernoulli sampling even if SYSTEM
was requested.  Probably that happens when it decides to do a simple
indexscan, because then there's no advantage to trying to cluster the


Yeah it happens when there is an index which is used in WHERE clause and 
has bigger selectivity than the percentage specified in the TABLESAMPLE 
clause. This of course breaks one of the common use-case though:

SELECT count(*) * 100 FROM table TABLESAMPLE SYSTEM(1) WHERE foo = bar;


sampled rows.  But in the case of a bitmap scan, you could very easily do
either true Bernoulli or block-level sampling simply by adjusting the
rules about which bits you keep or clear in the bitmap (given that you
apply the filter between collection of the index bitmap and accessing the
heap, which seems natural).  The only case where a special scan type
really seems to be needed is if you want block-level sampling, the query
would otherwise use a seqscan, *and* the sampling percentage is pretty low
--- if you'd be reading every second or third block anyway, you're likely
better off with a plain seqscan so that the kernel sees sequential access
and does prefetching.  The current API doesn't seem to make it possible to
switch between seqscan and read-only-selected-blocks based on the sampling
percentage, but I think that could be an interesting optimization.


Well you can do that if you write your own sampling method. We don't do 
that in optimizer and that's design choice, because you can't really do 
that on high level like that if you want to keep extensibility. And 
given the amount of people that asked if they can do their own sampling 
when I talked to them about this during the design stage, I consider the 
extensibility as more important. Especially if extensibility gives you 
the option to do the switching anyway, albeit on lower-level and not out 
of the box.



(Another bet that's been missed is having the samplescan logicrequest
prefetching when it is doing selective block reads.  The current API can't
support that AFAICS, since there's no expectation that nextblock calls
could be done asynchronously from nexttuple calls.)


Not sure I follow, would not it be possible to achieve this using the 
tsmseqscan set to true (it's a misnomer then, I know)?




Another issue with the API as designed is the examinetuple() callback.
Allowing sample methods to see invisible tuples is quite possibly the
worst idea in the whole patch.  They can *not* do anything with such
tuples, or they'll totally break reproducibility: if the tuple is
invisible to your scan, it might well be or soon become invisible to
everybody, whereupon it would be subject to garbage collection at the
drop of a hat.  So if an invisible tuple affects the sample method's
behavior at all, repeated scans in the same query would 

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Alvaro Herrera
Petr Jelinek wrote:
 On 2015-07-13 15:39, Tom Lane wrote:

 I don't find this to be good error message style.  The secondary comment
 is not a hint, it's an ironclad statement of what you did wrong, so if
 we wanted to phrase it like this it should be an errdetail not errhint.
 But the whole thing is overly cute anyway because there is no reason at
 all not to just say what we mean in the primary error message, eg
  ereport(ERROR,
  (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
   errmsg(sample size must be greater than zero)));
 
 Same as above, although now that I re-read the standard I am sure I
 misunderstand it the first time - it says:
 If S is the null value or if S  0 (zero) or if S  100, then an exception
 condition is raised: data exception — invalid sample size.
 I took it as literal error message originally but they just mean status code
 by it.

Yes, must use a new errcode ERRCODE_INVALID_SAMPLE_SIZE defined to 2202H
here.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Petr Jelinek

On 2015-07-16 15:59, Tom Lane wrote:

Alvaro Herrera alvhe...@2ndquadrant.com writes:

Petr Jelinek wrote:

On 2015-07-13 00:36, Tom Lane wrote:

PS: now that I've written this rant, I wonder why we don't redesign the
index AM API along the same lines.  It probably doesn't matter much at
the moment, but if we ever get serious about supporting index AM
extensions, I think we ought to consider doing that.



I think this is very relevant to the proposed sequence am patch as well.



Hmm, how would this work?  Would we have index AM implementation run
some function that register their support methods somehow at startup?


Well, registration of new index AMs is an unsolved question ATM anyhow.
But what I'm imagining is that pg_am would reduce to about two columns,
amname and a handler function OID, and everything else that constitutes
the API for AMs would get moved down to the C level.  We have to keep that
catalog because we still need index AMs to have OIDs that will represent
them in pg_opclass etc; but we don't need to nail the exact set of AM
interface functions into the catalog.  (I'm not sure whether we'd want
to remove all the bool columns from pg_am.  At the C level it would be
about as convenient to have them in a struct returned by the handler
function.  But it's occasionally useful to have those properties
visible to SQL queries.)


This is along the lines of how I was thinking also (when I read your 
previous email). I think the properties of the index will have to be 
decided on individual basis once somebody actually starts working on 
this. But functions can clearly go into C struct if they are called only 
from C anyway.



I'm not clear on whether sequence AMs would need explicit catalog
representation, or could be folded down to just a single SQL function
with special signature as I suggested for tablesample handlers.
Is there any need for a sequence AM to have additional catalog
infrastructure like index AMs need?



That depends on the route we will choose to take with the storage there. 
If we allow custom columns for sequence AMs (which is what both Heikki 
and me seem to be inclined to do) then I think it will still need 
catalog, plus it's also easier to just reuse the relam behavior than 
coming up up with something completely new IMHO.



In any case, if indexes AMs and sequence AMs go this route, that
probably means the column store AM we're working on will probably have
to go the same route too.


It's worth considering anyway.  The FDW API has clearly been far more
successful than the index AM API in terms of being practically usable
by extensions.



Yep, I now consider it to be a clear mistake that I modeled both 
sequence am and tablesample after indexes given that I targeted both at 
extensibility.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Amit Langote
On Thu, Jul 16, 2015 at 10:33 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Petr Jelinek wrote:
 On 2015-07-13 00:36, Tom Lane wrote:

 PS: now that I've written this rant, I wonder why we don't redesign the
 index AM API along the same lines.  It probably doesn't matter much at
 the moment, but if we ever get serious about supporting index AM
 extensions, I think we ought to consider doing that.

 +1

 I think this is very relevant to the proposed sequence am patch as well.

 Hmm, how would this work?  Would we have index AM implementation run
 some function that register their support methods somehow at startup?
 Hopefully we're not going to have the index AMs become shared libraries.


I recall a proposal by Alexander Korotkov about extensible access
methods although his proposal also included a CREATE AM command that
would add a pg_am row so that perhaps differs from what Tom seems to
allude to here.

http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Tom Lane
Amit Langote amitlangot...@gmail.com writes:
 On Thu, Jul 16, 2015 at 10:33 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Hmm, how would this work?  Would we have index AM implementation run
 some function that register their support methods somehow at startup?

 I recall a proposal by Alexander Korotkov about extensible access
 methods although his proposal also included a CREATE AM command that
 would add a pg_am row so that perhaps differs from what Tom seems to
 allude to here.

I think we'd still need to invent CREATE AM if we wanted to allow index
AMs to be created as extensions; we'd still have to have the pg_am
catalog, and extensions still couldn't write rows directly into that,
for the same reasons I pointed out with respect to tablesample methods.

However, if the contents of pg_am could be boiled down to just a name and
a handler function, then that would represent a simple and completely
stable definition for CREATE AM's arguments, which would be a large
improvement over trying to reflect the current contents of pg_am directly
in a SQL statement.  We add new columns to pg_am all the time, and that
would create huge backward-compatibility headaches if we had to modify
the behavior of a CREATE AM statement every time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-16 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 On 2015-07-16 15:59, Tom Lane wrote:
 I'm not clear on whether sequence AMs would need explicit catalog
 representation, or could be folded down to just a single SQL function
 with special signature as I suggested for tablesample handlers.
 Is there any need for a sequence AM to have additional catalog
 infrastructure like index AMs need?

 That depends on the route we will choose to take with the storage there. 
 If we allow custom columns for sequence AMs (which is what both Heikki 
 and me seem to be inclined to do) then I think it will still need 
 catalog, plus it's also easier to just reuse the relam behavior than 
 coming up up with something completely new IMHO.

Hm.  I've not been following the sequence AM stuff carefully, but if you
want to use relam to point at a sequence's AM then really sequence AMs
have to be represented in pg_am.  (It would be quite broken from a
relational theory standpoint if relam could point to either of two
catalogs; not to mention that we have no way to guarantee OID uniqueness
across multiple catalogs.)

As things stand today, putting both kinds of AM into one catalog would be
pretty horrible, but it seems not hard to make it work if we did this sort
of refactoring first.  We'd end up with three columns in pg_am: amname,
amkind (index or sequence), and amhandler.  The type and contents of the
struct returned by amhandler could be different depending on amkind, which
means that the APIs could be as different as we need, despite sharing just
one catalog.  The only real restriction is that index and sequence AMs
could not have the same names, which doesn't seem like much of a problem
from here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-14 Thread Simon Riggs
On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote:

 Michael Paquier michael.paqu...@gmail.com writes:
  Regarding the fact that those two contrib modules can be part of a
  -contrib package and could be installed, nuking those two extensions
  from the tree and preventing the creating of custom tablesample
  methods looks like a correct course of things to do for 9.5.

 TBH, I think the right thing to do at this point is to revert the entire
 patch and send it back for ground-up rework.  I think the high-level
 design is wrong in many ways and I have about zero confidence in most
 of the code details as well.


Based on the various comments here, I don't see that as the right course of
action at this point.

There are no issues relating to security or data loss, just various fixable
issues in a low-impact feature, which in my view is an important feature
also.

If it's
 to stay, it *must* get a line-by-line review from some committer-level
 person; and I think there are other more important things for us to be
 doing for 9.5.


Honestly, I am very surprised by this. My feeling was the code was neat,
clear and complete, much more so than many patches I review. If I had
thought the patch or its implementation was in any way contentious I would
not have committed it.

I take responsibility for the state of the code and will put time into
addressing the concerns mentioned and others.

If we cannot resolve them in reasonable time, a revert is possible: there
is nothing riding on this from me.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-14 Thread Simon Riggs
On 14 July 2015 at 15:32, Tom Lane t...@sss.pgh.pa.us wrote:

 Simon Riggs si...@2ndquadrant.com writes:
  On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote:
  TBH, I think the right thing to do at this point is to revert the entire
  patch and send it back for ground-up rework.  I think the high-level
  design is wrong in many ways and I have about zero confidence in most
  of the code details as well.

  There are no issues relating to security or data loss, just various
 fixable
  issues in a low-impact feature, which in my view is an important feature
  also.

 There is a *very large* amount of work needed here, and I do not hear you
 promising to do it.


I thought I had done so clearly enough, happy to do so again.

I promise that either work will be done, or the patch will be reverted.
Since I have more time now, I view that as a realistic prospect.


 What I'm hearing is stonewalling, and I am not happy.


I'm not sure what you mean by that but it sounds negative and is almost
certainly not justified, in this or other cases.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-14 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote:
 TBH, I think the right thing to do at this point is to revert the entire
 patch and send it back for ground-up rework.  I think the high-level
 design is wrong in many ways and I have about zero confidence in most
 of the code details as well.

 There are no issues relating to security or data loss, just various fixable
 issues in a low-impact feature, which in my view is an important feature
 also.

There is a *very large* amount of work needed here, and I do not hear you
promising to do it.  What I'm hearing is stonewalling, and I am not happy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-14 Thread Noah Misch
On Tue, Jul 14, 2015 at 11:14:55AM +0100, Simon Riggs wrote:
 On 13 July 2015 at 14:39, Tom Lane t...@sss.pgh.pa.us wrote:
  TBH, I think the right thing to do at this point is to revert the entire
  patch and send it back for ground-up rework.  I think the high-level
  design is wrong in many ways and I have about zero confidence in most
  of the code details as well.
 
 
 Based on the various comments here, I don't see that as the right course of
 action at this point.
 
 There are no issues relating to security or data loss, just various fixable
 issues in a low-impact feature, which in my view is an important feature
 also.
 
  If it's
  to stay, it *must* get a line-by-line review from some committer-level
  person; and I think there are other more important things for us to be
  doing for 9.5.
 
 
 Honestly, I am very surprised by this.

Tom's partial review found quite a crop of unvarnished bugs:

1. sample node can give different tuples across rescans within an executor run
2. missing dependency machinery to restrict dropping a sampling extension
3. missing pg_dump --binary-upgrade treatment
4. potential core dumps due to dereferencing values that could be null
5. factually incorrect comments
6. null argument checks in strict functions
7. failure to check for constisnull
8. failure to sanity-check ntuples
9. arithmetic errors in random_relative_prime()

(That's after sifting out design counterproposals, feature requests, and other
topics of regular disagreement.  I erred on the side of leaving things off
that list.)  Finding one or two like that during a complete post-commit review
would be business as usual.  Finding nine in a partial review diagnoses a
critical shortfall in pre-commit review vigilance.  Fixing the bugs found to
date will not cure that shortfall.  A qualified re-review could cure it.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-14 Thread Simon Riggs
On 15 July 2015 at 05:58, Noah Misch n...@leadboat.com wrote:


   If it's
   to stay, it *must* get a line-by-line review from some committer-level
   person; and I think there are other more important things for us to be
   doing for 9.5.
  
 
  Honestly, I am very surprised by this.

 Tom's partial review found quite a crop of unvarnished bugs:

 1. sample node can give different tuples across rescans within an executor
 run
 2. missing dependency machinery to restrict dropping a sampling extension
 3. missing pg_dump --binary-upgrade treatment
 4. potential core dumps due to dereferencing values that could be null
 5. factually incorrect comments
 6. null argument checks in strict functions
 7. failure to check for constisnull
 8. failure to sanity-check ntuples
 9. arithmetic errors in random_relative_prime()

 (That's after sifting out design counterproposals, feature requests, and
 other
 topics of regular disagreement.  I erred on the side of leaving things off
 that list.)  Finding one or two like that during a complete post-commit
 review
 would be business as usual.  Finding nine in a partial review diagnoses a
 critical shortfall in pre-commit review vigilance.  Fixing the bugs found
 to
 date will not cure that shortfall.  A qualified re-review could cure it.


Thank you for the summary of points. I agree with that list.

I will work on the re-review as you suggest.

1 and 4 relate to the sample API exposed, which needs some rework. We'll
see how big that is; at this time I presume not that hard, but I will wait
for Petr's opinion also when he returns on Friday.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Michael Paquier
On Mon, Jul 13, 2015 at 7:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 The two contrib modules this patch added are nowhere near fit for public
 consumption.  They cannot clean up after themselves when dropped:
 ...
 Raw inserts into system catalogs just
 aren't a sane thing to do in extensions.

 I had some thoughts about how we might fix that, without going to the
 rather tedious lengths of creating a complete set of DDL infrastructure
 for CREATE/DROP TABLESAMPLE METHOD.

 First off, the extension API designed for the tablesample patch is
 evidently modeled on the index AM API, which was not a particularly good
 precedent --- it's not a coincidence that index AMs can't be added or
 dropped on-the-fly.  Modeling a server internal API as a set of
 SQL-visible functions is problematic when the call signatures of those
 functions can't be accurately described by SQL datatypes, and it's rather
 pointless and inefficient when none of the functions in question is meant
 to be SQL-callable.  It's even less attractive if you don't think you've
 got a completely stable API spec, because adding, dropping, or changing
 signature of any one of the API functions then involves a pile of
 easy-to-mess-up catalog changes on top of the actually useful work.
 Not to mention then having to think about backwards compatibility of
 your CREATE command's arguments.

 We have a far better model to follow already, namely the foreign data
 wrapper API.  In that, there's a single SQL-visible function that returns
 a dummy datatype indicating that it's an FDW handler, and when called,
 it hands back a struct containing pointers to all the other functions
 that the particular wrapper needs to supply (and, if necessary, the struct
 could have non-function-pointer fields containing other info the core
 system might need to know about the handler).  We could similarly invent a
 pseudotype tablesample_handler and represent each tablesample method by
 a single SQL function returning tablesample_handler.  Any future changes
 in the API for tablesample handlers would then appear as changes in the C
 definition of the struct returned by the handler, which requires no
 SQL-visible thrashing, hence creates no headaches for pg_upgrade, and it
 is pretty easy to design it so that failure to update an external module
 that implements the API results in C compiler errors and/or sane fallback
 behavior.

That's elegant.

 Once we've done that, I think we don't even need a special catalog
 representing tablesample methods.  Given FROM foo TABLESAMPLE
 bernoulli(...), the parser could just look for a function bernoulli()
 returning tablesample_handler, and it's done.  The querytree would have an
 ordinary function dependency on that function, which would be sufficient
 to handle DROP dependency behaviors properly.  (On reflection, maybe
 better if it's bernoulli(internal) returns tablesample_handler,
 so as to guarantee that such a function doesn't create a conflict with
 any user-defined function of the same name.)

 Thoughts?

Regarding the fact that those two contrib modules can be part of a
-contrib package and could be installed, nuking those two extensions
from the tree and preventing the creating of custom tablesample
methods looks like a correct course of things to do for 9.5.

When looking at this patch I was as well surprised that the BERNOUILLI
method can only be applied on a physical relation and was not able to
fire on a materialized set of tuples, say the result of a WITH clause
for example.

 PS: now that I've written this rant, I wonder why we don't redesign the
 index AM API along the same lines.  It probably doesn't matter much at
 the moment, but if we ever get serious about supporting index AM
 extensions, I think we ought to consider doing that.

Any API redesign looks to be clearly 9.6 work IMO at this point.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 Regarding the fact that those two contrib modules can be part of a
 -contrib package and could be installed, nuking those two extensions
 from the tree and preventing the creating of custom tablesample
 methods looks like a correct course of things to do for 9.5.

TBH, I think the right thing to do at this point is to revert the entire
patch and send it back for ground-up rework.  I think the high-level
design is wrong in many ways and I have about zero confidence in most
of the code details as well.

I'll send a separate message about high-level issues, but as far as code
details go, I started to do some detailed code review last night and only
got through contrib/tsm_system_rows/tsm_system_rows.c before deciding it
was hopeless.  Let's have a look at my notes:

 * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group

Obsolete copyright date.

 * IDENTIFICATION
 *contrib/tsm_system_rows_rowlimit/tsm_system_rows.c

Wrong filename.  (For the moment, I'll refrain from any value judgements
about the overall adequacy or quality of the comments in this patch, and
just point out obvious errors that should have been caught in review.)

typedef struct
{
SamplerRandomState randstate;
uint32  seed;   /* random seed */
BlockNumber nblocks;/* number of block in relation */
int32   ntuples;/* number of tuples to return */
int32   donetuples; /* tuples already returned */
OffsetNumber lt;/* last tuple returned from current block */
BlockNumber step;   /* step size */
BlockNumber lb; /* last block visited */
BlockNumber doneblocks; /* number of already returned blocks */
} SystemSamplerData;

This same typedef name is defined in three different places in the patch
(tablesample/system.c, tsm_system_rows.c, tsm_system_time.c).  While that
might not amount to a bug, it's sure a recipe for confusion, especially
since the struct definitions are all different.

Datum
tsm_system_rows_init(PG_FUNCTION_ARGS)
{
TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
uint32  seed = PG_GETARG_UINT32(1);
int32   ntuples = PG_ARGISNULL(2) ? -1 : PG_GETARG_INT32(2);

This is rather curious coding.  Why should this function check only
one of its arguments for nullness?  If it needs to defend against
any of them being null, it really needs to check all.  But in point of
fact, this function is declared STRICT, which means it's a violation of
the function call protocol if the core code ever passes a null to it,
and so this test ought to be dead code.

A similar pattern of ARGISNULL checks in declared-strict functions exists
in all the tablesample modules, not just this one, showing that this is an
overall design error not just a thinko here.  My inclination would be to
make the core code enforce non-nullness of all tablesample arguments so as
to make it unnecessary to check strictness of the tsm*init functions, but
in any case it is Not Okay to just pass nulls willy-nilly to strict C
functions.

Also, I find this coding pretty sloppy even without the strictness angle,
because the net effect of this way of dealing with nulls is that an
argument-must-not-be-null complaint is reported as out of range,
which is both confusing and the wrong ERRCODE.

if (ntuples  1)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg(invalid sample size),
 errhint(Sample size must be positive integer value.)));

I don't find this to be good error message style.  The secondary comment
is not a hint, it's an ironclad statement of what you did wrong, so if
we wanted to phrase it like this it should be an errdetail not errhint.
But the whole thing is overly cute anyway because there is no reason at
all not to just say what we mean in the primary error message, eg
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg(sample size must be greater than zero)));

While we're on the subject, what's the reason for disallowing a sample
size of zero?  Seems like a reasonable edge case.

/* All blocks have been read, we're done */
if (sampler-doneblocks  sampler-nblocks ||
sampler-donetuples = sampler-ntuples)
PG_RETURN_UINT32(InvalidBlockNumber);

Okay, I lied, I *am* going to complain about this comment.  Comments that
do not accurately describe the code they're attached to are worse than
useless.

/*
 * Cleanup method.
 */
Datum
tsm_system_rows_end(PG_FUNCTION_ARGS)
{
TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);

pfree(tsdesc-tsmdata);

PG_RETURN_VOID();
}

This cleanup method is a waste of code space.  There is no need to
pfree individual allocations at query execution end.

limitnode = linitial(args);
limitnode = estimate_expression_value(root, limitnode);


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Simon Riggs
On 11 July 2015 at 21:28, Tom Lane t...@sss.pgh.pa.us wrote:


 What are we going to do about this?


I will address the points you raise, one by one.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Tom Lane
I wrote:
 TBH, I think the right thing to do at this point is to revert the entire
 patch and send it back for ground-up rework.  I think the high-level
 design is wrong in many ways and I have about zero confidence in most
 of the code details as well.

 I'll send a separate message about high-level issues,

And here's that.  I do *not* claim that this is a complete list of design
issues with the patch, it's just things I happened to notice in the amount
of time I've spent so far (which is already way more than I wanted to
spend on TABLESAMPLE right now).


I'm not sure that we need an extensible sampling-method capability at all,
let alone that an API for that needs to be the primary focus of a patch.
Certainly the offered examples of extension modules aren't inspiring:
tsm_system_time is broken by design (more about that below) and nobody
would miss tsm_system_rows if it weren't there.  What is far more
important is to get sampling capability into indexscans, and designing
an API ahead of time seems like mostly a distraction from that.

I'd think seriously about tossing the entire executor-stage part of the
patch, creating a stateless (history-independent) sampling filter that
just accepts or rejects TIDs, and sticking calls to that into all the
table scan node types.  Once you had something like that working well
it might be worth considering whether to try to expose an API to
generalize it.  But even then it's not clear that we really need any
more options than true-Bernoulli and block-level sampling.

The IBM paper I linked to in the other thread mentions that their
optimizer will sometimes choose to do Bernoulli sampling even if SYSTEM
was requested.  Probably that happens when it decides to do a simple
indexscan, because then there's no advantage to trying to cluster the
sampled rows.  But in the case of a bitmap scan, you could very easily do
either true Bernoulli or block-level sampling simply by adjusting the
rules about which bits you keep or clear in the bitmap (given that you
apply the filter between collection of the index bitmap and accessing the
heap, which seems natural).  The only case where a special scan type
really seems to be needed is if you want block-level sampling, the query
would otherwise use a seqscan, *and* the sampling percentage is pretty low
--- if you'd be reading every second or third block anyway, you're likely
better off with a plain seqscan so that the kernel sees sequential access
and does prefetching.  The current API doesn't seem to make it possible to
switch between seqscan and read-only-selected-blocks based on the sampling
percentage, but I think that could be an interesting optimization.
(Another bet that's been missed is having the samplescan logic request
prefetching when it is doing selective block reads.  The current API can't
support that AFAICS, since there's no expectation that nextblock calls
could be done asynchronously from nexttuple calls.)

Another issue with the API as designed is the examinetuple() callback.
Allowing sample methods to see invisible tuples is quite possibly the
worst idea in the whole patch.  They can *not* do anything with such
tuples, or they'll totally break reproducibility: if the tuple is
invisible to your scan, it might well be or soon become invisible to
everybody, whereupon it would be subject to garbage collection at the
drop of a hat.  So if an invisible tuple affects the sample method's
behavior at all, repeated scans in the same query would not produce
identical results, which as mentioned before is required both by spec
and for minimally sane query behavior.  Moreover, examining the contents
of the tuple is unsafe (it might contain pointers to TOAST values that
no longer exist); and even if it were safe, what's the point?  Sampling
that pays attention to the data is the very definition of biased.  So
if we do re-introduce an API like the current one, I'd definitely lose
this bit and only allow sample methods to consider visible tuples.

On the point of reproducibility: the tsm_system_time method is utterly
incapable of producing identical results across repeated scans, because
there is no reason to believe it would get exactly as far in the same
amount of time each time.  This might be all right across queries if
the method could refuse to work with REPEATABLE clauses (but there's
no provision for such a restriction in the current API).  But it's not
acceptable within a query.  Another problem with tsm_system_time is that
its cost/rowcount estimation is based on nothing more than wishful
thinking, and can never be based on anything more than wishful thinking,
because planner cost units are not time-based.  Adding a comment that
says we'll nonetheless pretend they are milliseconds isn't a solution.
So that sampling method really has to go away and never come back,
whatever we might ultimately salvage from this patch otherwise.

(I'm not exactly convinced that the system or tsm_system_rows methods
are adequately 

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Simon Riggs
On 13 July 2015 at 17:00, Tom Lane t...@sss.pgh.pa.us wrote:

 I wrote:
  TBH, I think the right thing to do at this point is to revert the entire
  patch and send it back for ground-up rework.  I think the high-level
  design is wrong in many ways and I have about zero confidence in most
  of the code details as well.

  I'll send a separate message about high-level issues,

 And here's that.  I do *not* claim that this is a complete list of design
 issues with the patch, it's just things I happened to notice in the amount
 of time I've spent so far (which is already way more than I wanted to
 spend on TABLESAMPLE right now).


Interesting that a time-based sample does indeed yield useful results. Good
to know.

That is exactly what this patch provides: a time-based sample, with reduced
confidence in the accuracy of the result. And other things too.


 I'm not sure that we need an extensible sampling-method capability at all,
 let alone that an API for that needs to be the primary focus of a patch.
 Certainly the offered examples of extension modules aren't inspiring:
 tsm_system_time is broken by design (more about that below) and nobody
 would miss tsm_system_rows if it weren't there.


Time based sampling isn't broken by design. It works exactly according to
the design.

Time-based sampling has been specifically requested by data visualization
developers who are trying to work out how to display anything useful from a
database beyond a certain size. The feature for PostgreSQL implements a
similar mechanism to that deployed already with BlinkDB, demonstrated at
VLDB 2012. I regard it as an Advanced feature worthy of deployment within
PostgreSQL, based upon user request.

Row based sampling offers the capability to retrieve a sample of a fixed
size however big the data set. I shouldn't need to point out that this is
already in use within the ANALYZE command. Since it implements a technique
already in use within PostgreSQL, I see no reason not to expose that to
users. As I'm sure you're aware, there is rigorous math backing up the use
of fixed size sampling, with recent developments from my research
colleagues at Manchester University that further emphasises the usefulness
of this feature for us.


 What is far more
 important is to get sampling capability into indexscans, and designing
 an API ahead of time seems like mostly a distraction from that.


This has come up as a blocker on TABLESAMPLE before. I've got no evidence
to show anyone cares about that. I can't imagine a use for it myself; it
was not omitted from this patch because its difficult, it was omitted
because its just useless. If anyone ever cares, they can add it in a later
release.


 I'd think seriously about tossing the entire executor-stage part of the
 patch, creating a stateless (history-independent) sampling filter that
 just accepts or rejects TIDs, and sticking calls to that into all the
 table scan node types.  Once you had something like that working well
 it might be worth considering whether to try to expose an API to
 generalize it.  But even then it's not clear that we really need any
 more options than true-Bernoulli and block-level sampling.


See above.


 The IBM paper I linked to in the other thread mentions that their
 optimizer will sometimes choose to do Bernoulli sampling even if SYSTEM
 was requested.


That sounds broken to me. This patch gives the user what the user asks for.
BERNOULLI or SYSTEM.

If you particularly like the idea of mixing the two sampling methods, you
can do so *because* the sampling method API is extensible for the user. So
Mr.DB2 can get it his way if he likes and he can call it SYSNOULLI if he
likes; others can get it the way they like also. No argument required.

It was very, very obvious that whatever sampling code anybody dreamt up
would be objected to. Clearly, we need a way to allow people to implement
whatever technique they wish. Stratified sampling, modified sampling, new
techniques. Whatever.


 Probably that happens when it decides to do a simple
 indexscan, because then there's no advantage to trying to cluster the
 sampled rows.  But in the case of a bitmap scan, you could very easily do
 either true Bernoulli or block-level sampling simply by adjusting the
 rules about which bits you keep or clear in the bitmap (given that you
 apply the filter between collection of the index bitmap and accessing the
 heap, which seems natural).  The only case where a special scan type
 really seems to be needed is if you want block-level sampling, the query
 would otherwise use a seqscan, *and* the sampling percentage is pretty low
 --- if you'd be reading every second or third block anyway, you're likely
 better off with a plain seqscan so that the kernel sees sequential access
 and does prefetching.  The current API doesn't seem to make it possible to
 switch between seqscan and read-only-selected-blocks based on the sampling
 percentage, but I think that could be an interesting optimization.

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-12 Thread Tom Lane
I wrote:
 The two contrib modules this patch added are nowhere near fit for public
 consumption.  They cannot clean up after themselves when dropped:
 ...
 Raw inserts into system catalogs just
 aren't a sane thing to do in extensions.

I had some thoughts about how we might fix that, without going to the
rather tedious lengths of creating a complete set of DDL infrastructure
for CREATE/DROP TABLESAMPLE METHOD.

First off, the extension API designed for the tablesample patch is
evidently modeled on the index AM API, which was not a particularly good
precedent --- it's not a coincidence that index AMs can't be added or
dropped on-the-fly.  Modeling a server internal API as a set of
SQL-visible functions is problematic when the call signatures of those
functions can't be accurately described by SQL datatypes, and it's rather
pointless and inefficient when none of the functions in question is meant
to be SQL-callable.  It's even less attractive if you don't think you've
got a completely stable API spec, because adding, dropping, or changing
signature of any one of the API functions then involves a pile of
easy-to-mess-up catalog changes on top of the actually useful work.
Not to mention then having to think about backwards compatibility of
your CREATE command's arguments.

We have a far better model to follow already, namely the foreign data
wrapper API.  In that, there's a single SQL-visible function that returns
a dummy datatype indicating that it's an FDW handler, and when called,
it hands back a struct containing pointers to all the other functions
that the particular wrapper needs to supply (and, if necessary, the struct
could have non-function-pointer fields containing other info the core
system might need to know about the handler).  We could similarly invent a
pseudotype tablesample_handler and represent each tablesample method by
a single SQL function returning tablesample_handler.  Any future changes
in the API for tablesample handlers would then appear as changes in the C
definition of the struct returned by the handler, which requires no
SQL-visible thrashing, hence creates no headaches for pg_upgrade, and it
is pretty easy to design it so that failure to update an external module
that implements the API results in C compiler errors and/or sane fallback
behavior.

Once we've done that, I think we don't even need a special catalog
representing tablesample methods.  Given FROM foo TABLESAMPLE
bernoulli(...), the parser could just look for a function bernoulli()
returning tablesample_handler, and it's done.  The querytree would have an
ordinary function dependency on that function, which would be sufficient
to handle DROP dependency behaviors properly.  (On reflection, maybe
better if it's bernoulli(internal) returns tablesample_handler,
so as to guarantee that such a function doesn't create a conflict with
any user-defined function of the same name.)

Thoughts?

regards, tom lane

PS: now that I've written this rant, I wonder why we don't redesign the
index AM API along the same lines.  It probably doesn't matter much at
the moment, but if we ever get serious about supporting index AM
extensions, I think we ought to consider doing that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-11 Thread Tom Lane
The two contrib modules this patch added are nowhere near fit for public
consumption.  They cannot clean up after themselves when dropped:

regression=# create extension tsm_system_rows;
CREATE EXTENSION
regression=# create table big as select i, random() as x from 
generate_series(1,100) i;
SELECT 100
regression=# create view v1 as select * from big tablesample system_rows(10);
CREATE VIEW
regression=# drop extension tsm_system_rows;
DROP EXTENSION

The view is still there, but is badly broken:

regression=# select * from v1;
ERROR:  cache lookup failed for function 46379

Potentially this is a security issue, since a malicious user could
probably manage to create a Trojan horse function having the now-vacated
OID, whereupon use of the view would invoke that function.

Worse still, the broken pg_tablesample_method row is still there:

regression=# create extension tsm_system_rows;
ERROR:  duplicate key value violates unique constraint 
pg_tablesample_method_name_index
DETAIL:  Key (tsmname)=(system_rows) already exists.

And even if we fixed that, these modules will not survive a pg_upgrade
cycle, because pg_upgrade has no idea that it would need to create a
pg_tablesample_method row (remember that we do *not* replay the extension
script during binary upgrade).  Raw inserts into system catalogs just
aren't a sane thing to do in extensions.

Some of the risks here come from what seems like a fundamentally
wrong decision to copy all of the info about a tablesample method out
of the pg_tablesample_method catalog *at parse time* and store it
permanently in the query parse tree.  This makes any sort of alter
tablesample method DDL operation impossible in principle, because
any views/rules referencing the method have already copied the data.

On top of that, I find the basic implementation design rather dubious,
because it supposes that the tablesample filtering step must always
come first; the moment you say TABLESAMPLE you can kiss goodbye the
idea that the query will use any indexes.  For example:

d2=# create table big as select i, random() as x from 
generate_series(1,1000) i;
SELECT 1000
d2=# create index on big(i);
CREATE INDEX
d2=# analyze big;
ANALYZE
d2=# explain analyze select * from big where i between 100 and 200;
 QUERY PLAN 


 Index Scan using big_i_idx on big  (cost=0.43..10.18 rows=87 width=12) (actual 
time=0.022..0.088 rows=101 loops=1)
   Index Cond: ((i = 100) AND (i = 200))
 Planning time: 0.495 ms
 Execution time: 0.141 ms
(4 rows)

d2=# explain analyze select * from big tablesample bernoulli(10) where i 
between 100 and 200;
 QUERY PLAN 


 Sample Scan (bernoulli) on big  (cost=0.00..54055.13 rows=9 width=12) (actual 
time=0.028..970.051 rows=13 loops=1)
   Filter: ((i = 100) AND (i = 200))
   Rows Removed by Filter: 999066
 Planning time: 0.182 ms
 Execution time: 970.093 ms
(5 rows)

Now, maybe I don't understand the use-case for this feature, but I should
think it's meant for dealing with tables that are so big that you can't
afford to scan all the data.  So, OK, a samplescan is hopefully cheaper
than a pure seqscan, but that doesn't mean that fetching 999079 rows and
discarding 999066 of them is a good plan design.  There needs to be an
operational mode whereby we can use an index and do random sampling of
the TIDs the index returns.  I do not insist that that has to appear in
version one of the feature --- but I am troubled by the fact that, by
exposing an oversimplified API for use by external modules, this patch is
doubling down on the assumption that no such operational mode will ever
need to be implemented.

There are a whole lot of lesser sins, such as documentation that was
clearly never copy-edited by a native speaker of English, badly designed
planner APIs (Paths really ought not have rowcounts different from the
underlying RelOptInfo), potential core dumps due to dereferencing values
that could be null (the guards for null values are in the wrong places
entirely), etc etc.

While there's nothing here that couldn't be fixed by nuking the contrib
modules and putting a week or two of concentrated work into fixing the
core code, I for one certainly don't have time to put that kind of effort
into TABLESAMPLE right now.  Nor do I really have the interest; I find
this feature of pretty dubious value.

What are we going to do about this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] TABLESAMPLE patch

2015-04-22 Thread Petr Jelinek

On 19/04/15 01:24, Michael Paquier wrote:

On Sat, Apr 18, 2015 at 8:38 PM, Michael Paquier
michael.paqu...@gmail.com wrote:

On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:

On 10/04/15 06:46, Michael Paquier wrote:

13) Some regression tests with pg_tablesample_method would be welcome.


Not sure what you mean by that.


I meant a sanity check on pg_tablesample_method to be sure that
tsminit, tsmnextblock and tsmnexttuple are always defined as they are
mandatory functions. So the idea is to add a query like and and to be
sure that it returns no rows:
SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;


Yesterday was a long day. I meant IS NULL and not IS NOT NULL, but I
am sure you guessed it that way already..



Yes I guessed that and it's very reasonable request, I guess it should 
look like the attached (I don't want to send new version of everything 
just for this).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From d059c792a1e864e38f321cea51169ea0b3c5caab Mon Sep 17 00:00:00 2001
From: Petr Jelinek pjmo...@pjmodos.net
Date: Wed, 22 Apr 2015 21:28:29 +0200
Subject: [PATCH 7/7] tablesample: add catalog regression test

---
 src/test/regress/expected/tablesample.out | 15 +++
 src/test/regress/sql/tablesample.sql  | 13 +
 2 files changed, 28 insertions(+)

diff --git a/src/test/regress/expected/tablesample.out b/src/test/regress/expected/tablesample.out
index 271638d..04e5eb8 100644
--- a/src/test/regress/expected/tablesample.out
+++ b/src/test/regress/expected/tablesample.out
@@ -209,6 +209,21 @@ SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5);
 ERROR:  syntax error at or near TABLESAMPLE
 LINE 1: ...CT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPL...
  ^
+-- catalog sanity
+SELECT *
+FROM pg_tablesample_method
+WHERE tsminit IS NULL
+   OR tsmseqscan IS NULL
+   OR tsmpagemode IS NULL
+   OR tsmnextblock IS NULL
+   OR tsmnexttuple IS NULL
+   OR tsmend IS NULL
+   OR tsmreset IS NULL
+   OR tsmcost IS NULL;
+ tsmname | tsmseqscan | tsmpagemode | tsminit | tsmnextblock | tsmnexttuple | tsmexaminetuple | tsmend | tsmreset | tsmcost 
+-++-+-+--+--+-++--+-
+(0 rows)
+
 -- done
 DROP TABLE test_tablesample CASCADE;
 NOTICE:  drop cascades to 2 other objects
diff --git a/src/test/regress/sql/tablesample.sql b/src/test/regress/sql/tablesample.sql
index 2f4b7de..7b3eb9b 100644
--- a/src/test/regress/sql/tablesample.sql
+++ b/src/test/regress/sql/tablesample.sql
@@ -57,5 +57,18 @@ SELECT * FROM query_select TABLESAMPLE BERNOULLI (5.5) REPEATABLE (1);
 
 SELECT q.* FROM (SELECT * FROM test_tablesample) as q TABLESAMPLE BERNOULLI (5);
 
+-- catalog sanity
+
+SELECT *
+FROM pg_tablesample_method
+WHERE tsminit IS NULL
+   OR tsmseqscan IS NULL
+   OR tsmpagemode IS NULL
+   OR tsmnextblock IS NULL
+   OR tsmnexttuple IS NULL
+   OR tsmend IS NULL
+   OR tsmreset IS NULL
+   OR tsmcost IS NULL;
+
 -- done
 DROP TABLE test_tablesample CASCADE;
-- 
1.9.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-22 Thread Michael Paquier
On Thu, Apr 23, 2015 at 4:31 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 On 19/04/15 01:24, Michael Paquier wrote:

 On Sat, Apr 18, 2015 at 8:38 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:

 On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:

 On 10/04/15 06:46, Michael Paquier wrote:

 13) Some regression tests with pg_tablesample_method would be welcome.


 Not sure what you mean by that.


 I meant a sanity check on pg_tablesample_method to be sure that
 tsminit, tsmnextblock and tsmnexttuple are always defined as they are
 mandatory functions. So the idea is to add a query like and and to be
 sure that it returns no rows:
 SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
 tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;


 Yesterday was a long day. I meant IS NULL and not IS NOT NULL, but I
 am sure you guessed it that way already..


 Yes I guessed that and it's very reasonable request, I guess it should look
 like the attached (I don't want to send new version of everything just for
 this).

Thanks. That's exactly the idea.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-18 Thread Michael Paquier
On Fri, Apr 17, 2015 at 10:54 PM, Petr Jelinek wrote:
 On 10/04/15 06:46, Michael Paquier wrote:
 13) Some regression tests with pg_tablesample_method would be welcome.

 Not sure what you mean by that.

I meant a sanity check on pg_tablesample_method to be sure that
tsminit, tsmnextblock and tsmnexttuple are always defined as they are
mandatory functions. So the idea is to add a query like and and to be
sure that it returns no rows:
SELECT tsmname FROM pg_tablesample_method WHERE tsminit IS NOT NULL OR
tsmnextblock IS NOT NULL OR tsmnexttuple IS NOT NULL;

 - Added two sample contrib modules demonstrating row limited and time
 limited sampling. I am using linear probing for both of those as the builtin
 block sampling is not well suited for row limited or time limited sampling.
 For row limited I originally thought of using the Vitter's reservoir
 sampling but that does not fit well with the executor as it needs to keep
 the reservoir of all the output tuples in memory which would have horrible
 memory requirements if the limit was high. The linear probing seems to work
 quite well for the use case of give me 500 random rows from table.

Patch 4 is interesting, it shows a direct use of examinetuple to
filter the output.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-12 Thread Amit Kapila
On Sat, Apr 11, 2015 at 12:56 AM, Peter Eisentraut pete...@gmx.net wrote:

 On 4/9/15 8:58 PM, Petr Jelinek wrote:
  Well, you can have two approaches to this, either allow some specific
  set of keywords that can be used to specify limit, or you let sampling
  methods interpret parameters, I believe the latter is more flexible.
  There is nothing stopping somebody writing sampling method which takes
  limit as number of rows, or anything else.
 
  Also for example for BERNOULLI to work correctly you'd need to convert
  the number of rows to fraction of table anyway (and that's exactly what
  the one database which has this feature does internally) and then it's
  no different than passing (SELECT 100/reltuples*number_of_rows FROM
  tablename) as a parameter.

 What is your intended use case for this feature?  I know that give me
 100 random rows from this table quickly is a common use case, but
 that's kind of cumbersome if you need to apply formulas like that.  I'm
 not sure what the use of a percentage is.  Presumably, the main use of
 this features is on large tables.  But then you might not even know how
 large it really is, and even saying 0.1% might be more than you wanted
 to handle.


The use case for specifying number of rows for sample scan is valid
and can be achieved by other means if required as suggested by Petr
Jelinek, however the current proposed syntax (w.r.t to Sample
Percentage [1]) seems to comply with SQL standard, so why not go
for it and then extend it based on more use-cases?

[1]
SQL Standard (2003) w.r.t Sample Percentage
sample clause ::=
TABLESAMPLE sample method left paren sample percentage right paren
[ repeatable clause ]


sample percentage ::= numeric value expression


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TABLESAMPLE patch

2015-04-12 Thread Simon Riggs
On 10 April 2015 at 15:26, Peter Eisentraut pete...@gmx.net wrote:

 What is your intended use case for this feature?

Likely use cases are:
* Limits on numbers of rows in sample. Some research colleagues have
published a new mathematical analysis that will allow a lower limit
than previously considered.
* Time limits on sampling. This allows data visualisation approaches
to gain approximate answers in real time.
* Stratified sampling. Anything with some kind of filtering, lifting
or bias. Allows filtering out known incomplete data.
* Limits on sample error

Later use cases would allow custom aggregates to work together with
custom sampling methods, so we might work our way towards i) an SUM()
function that provides the right answer even when used with a sample
scan, ii) custom aggregates that report the sample error, allowing you
to get both AVG() and AVG_STDERR(). That would be technically possible
with what we have here, but I think a lot more thought required yet.

These have all come out of detailed discussions with two different
groups of data mining researchers.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-10 Thread Petr Jelinek

On 10/04/15 21:26, Peter Eisentraut wrote:

On 4/9/15 8:58 PM, Petr Jelinek wrote:

Well, you can have two approaches to this, either allow some specific
set of keywords that can be used to specify limit, or you let sampling
methods interpret parameters, I believe the latter is more flexible.
There is nothing stopping somebody writing sampling method which takes
limit as number of rows, or anything else.

Also for example for BERNOULLI to work correctly you'd need to convert
the number of rows to fraction of table anyway (and that's exactly what
the one database which has this feature does internally) and then it's
no different than passing (SELECT 100/reltuples*number_of_rows FROM
tablename) as a parameter.


What is your intended use case for this feature?  I know that give me
100 random rows from this table quickly is a common use case, but
that's kind of cumbersome if you need to apply formulas like that.  I'm
not sure what the use of a percentage is.  Presumably, the main use of
this features is on large tables.  But then you might not even know how
large it really is, and even saying 0.1% might be more than you wanted
to handle.



My main intended use-case is analytics on very big tables. The 
percentages of population vs confidence levels are pretty well mapped 
there and you can get quite big speedups if you are fine with getting 
results with slightly smaller confidence (ie you care about ballpark 
figures).


But this was not really my point, the BERNOULLI just does not work well 
with row-limit by definition, it applies probability on each individual 
row and while you can get probability from percentage very easily (just 
divide by 100), to get it for specific target number of rows you have to 
know total number of source rows and that's not something we can do very 
accurately so then you won't get 500 rows but approximately 500 rows.


In any case for give me 500 somewhat random rows from table quickly 
you want probably SYSTEM sampling anyway as it will be orders of 
magnitude faster on big tables and yes even 0.1% might be more than you 
wanted in that case. I am not against having row limit input for methods 
which can work with it like SYSTEM but then that's easily doable by 
adding separate sampling method which accepts rows (even if sampling 
algorithm itself is same). In current approach all you'd have to do is 
write different init function for the sampling method and register it 
under new name (yes it won't be named SYSTEM but for example 
SYSTEM_ROWLIMIT then).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-10 Thread Tomas Vondra



On 04/10/15 21:57, Petr Jelinek wrote:

On 10/04/15 21:26, Peter Eisentraut wrote:

But this was not really my point, the BERNOULLI just does not work
well with row-limit by definition, it applies probability on each
individual row and while you can get probability from percentage very
easily (just divide by 100), to get it for specific target number of
rows you have to know total number of source rows and that's not
something we can do very accurately so then you won't get 500 rows
but approximately 500 rows.


It's actually even trickier. Even if you happen to know the exact number 
of rows in the table, you can't just convert that into a percentage like 
that and use it for BERNOULLI sampling. It may give you different number 
of result rows, because each row is sampled independently.


That is why we have Vitter's algorithm for reservoir sampling, which 
works very differently from BERNOULLI.


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-10 Thread Petr Jelinek

On 10/04/15 22:16, Tomas Vondra wrote:



On 04/10/15 21:57, Petr Jelinek wrote:

On 10/04/15 21:26, Peter Eisentraut wrote:

But this was not really my point, the BERNOULLI just does not work
well with row-limit by definition, it applies probability on each
individual row and while you can get probability from percentage very
easily (just divide by 100), to get it for specific target number of
rows you have to know total number of source rows and that's not
something we can do very accurately so then you won't get 500 rows
but approximately 500 rows.


It's actually even trickier. Even if you happen to know the exact number
of rows in the table, you can't just convert that into a percentage like
that and use it for BERNOULLI sampling. It may give you different number
of result rows, because each row is sampled independently.

That is why we have Vitter's algorithm for reservoir sampling, which
works very differently from BERNOULLI.



Hmm this actually gives me idea - perhaps we could expose Vitter's 
reservoir sampling as another sampling method for people who want the 
give me 500 rows from table fast then? We already have it implemented 
it's just matter of adding the glue.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-09 Thread Michael Paquier
On Thu, Apr 9, 2015 at 5:12 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Apr 9, 2015 at 4:30 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 06/04/15 14:30, Petr Jelinek wrote:

 On 06/04/15 11:02, Simon Riggs wrote:

 Are we ready for a final detailed review and commit?


 I plan to send v12 in the evening with some additional changes that came
 up from Amit's comments + some improvements to error reporting. I think
 it will be ready then.


 Ok so here it is.

 Changes vs v11:
 - changed input parameter list to expr_list
 - improved error reporting, particularly when input parameters are wrong
 - fixed SELECT docs to show correct syntax and mention that there can be
 more sampling methods
 - added name of the sampling method to the explain output - I don't like
 the code much there as it has to look into RTE but on the other hand I don't
 want to create new scan node just so it can hold the name of the sampling
 method for explain
 - made views containing TABLESAMPLE clause not autoupdatable
 - added PageIsAllVisible() check before trying to check for tuple
 visibility
 - some typo/white space fixes



 Compiler warnings:
 explain.c: In function 'ExplainNode':
 explain.c:861: warning: 'sname' may be used uninitialized in this function


 Docs spellings:

 PostgreSQL distrribution  extra r.

 The optional parameter REPEATABLE acceps   accepts.  But I don't know that
 'accepts' is the right word.  It makes the seed value sound optional to
 REPEATABLE.

 each block having same chance  should have the before same.

 Both of those sampling methods currently  I think it should be these
 not those, as this sentence is immediately after their introduction, not
 at a distance.

 ...tuple contents and decides if to return in, or zero if none  Something
 here is confusing. return it, not return in?

 Other comments:

 Do we want tab completions for psql?  (I will never remember how to spell
 BERNOULLI).

 Yes. I think so.

 Needs a Cat version bump.

 The committer who will pick up this patch will normally do it.

 Patch 1 is simple enough and looks fine to me.

 Regarding patch 2...
 I found for now some typos:
 +  titlestructnamepg_tabesample_method/structname/title
 +productnamePostgreSQL/productname distrribution:

 Also, I am wondering if the sampling logic based on block analysis is
 actually correct, for example for now this fails and I think that we
 should support it:
 =# with query_select as (select generate_series(1, 10) as a) select
 query_select.a from query_select tablesample system (100.0/11)
 REPEATABLE ();
 ERROR:  42P01: relation query_select does not exist

 How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
 sample from a result set?
 Thoughts?

Just to be clear, the example above being misleading... Doing table
sampling using SYSTEM at physical level makes sense. In this case I
think that we should properly error out when trying to use this method
on something not present at physical level. But I am not sure that
this restriction applies to BERNOUILLI: you may want to apply it on
other things than physical relations, like views or results of WITH
clauses. Also, based on the fact that we support custom sampling
methods, I think that it should be up to the sampling method to define
on what kind of objects it supports sampling, and where it supports
sampling fetching, be it page-level fetching or analysis from an
existing set of tuples. Looking at the patch, TABLESAMPLE is just
allowed on tables and matviews, this limitation is too restrictive
IMO.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-09 Thread Peter Eisentraut
On 4/9/15 5:02 AM, Michael Paquier wrote:
 Just to be clear, the example above being misleading... Doing table
 sampling using SYSTEM at physical level makes sense. In this case I
 think that we should properly error out when trying to use this method
 on something not present at physical level. But I am not sure that
 this restriction applies to BERNOUILLI: you may want to apply it on
 other things than physical relations, like views or results of WITH
 clauses. Also, based on the fact that we support custom sampling
 methods, I think that it should be up to the sampling method to define
 on what kind of objects it supports sampling, and where it supports
 sampling fetching, be it page-level fetching or analysis from an
 existing set of tuples. Looking at the patch, TABLESAMPLE is just
 allowed on tables and matviews, this limitation is too restrictive
 IMO.

In the SQL standard, the TABLESAMPLE clause is attached to a table
expression (table primary), which includes table functions,
subqueries, CTEs, etc.  In the proposed patch, it is attached to a table
name, allowing only an ONLY clause.  So this is a significant deviation.

Obviously, doing block sampling on a physical table is a significant use
case, but we should be clear about which restrictions and tradeoffs were
are making now and in the future, especially if we are going to present
extension interfaces.  The fact that physical tables are interchangeable
with other relation types, at least in data-reading contexts, is a
feature worth preserving.

It may be worth thinking about some examples of other sampling methods,
in order to get a better feeling for whether the interfaces are appropriate.

Earlier in the thread, someone asked about supporting specifying a
number of rows instead of percents.  While not essential, that seems
pretty useful, but I wonder how that could be implemented later on if we
take the approach that the argument to the sampling method can be an
arbitrary quantity that is interpreted only by the method.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-09 Thread Petr Jelinek

On 09/04/15 11:37, Simon Riggs wrote:

On 9 April 2015 at 04:52, Simon Riggs si...@2ndquadrant.com wrote:


TABLESAMPLE BERNOULLI could work in this case, or any other non-block
based sampling mechanism. Whether it does work yet is another matter.

This query should be part of the test suite and should generate a
useful message or work correctly.


The SQL Standard does allow the WITH query given. It makes no mention
of the obvious point that SYSTEM-defined mechanisms might not work,
but that is for the implementation to define, AFAICS.


Yes SQL Standard allows this and the reason why they don't define what 
happens with SYSTEM is that they actually don't define how SYSTEM should 
behave except that it should return approximately given percentage of 
rows, but the actual behavior is left to the DBMS. The reason why other 
dbs like MSSQL or DB2 have chosen this to be block sampling is that it 
makes most sense (and is fastest) on tables and those databases don't 
support TABLESAMPLE on anything else at all.




On balance, in this release, I would be happier to exclude sampled
results from queries, and only allow sampling against base tables.



I think so too, considering how late in the last CF we are. Especially 
given my note about MSSQL and DB2 above.


In any case I don't see any fundamental issues with extending the 
current implementation with the subquery support. I think most of the 
work there is actually in parser/analyzer and planner. The sampling 
methods will just not receive the request for next blockid and tupleid 
from that block when source of the data is subquery and if they want to 
support subquery as source of sampling they will have to provide the 
examinetuple interface (which is already there and optional, the 
test/example custom sampling method is using it).


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-09 Thread Michael Paquier
On Thu, Apr 9, 2015 at 5:52 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 9 April 2015 at 04:12, Michael Paquier michael.paqu...@gmail.com wrote:

 Also, I am wondering if the sampling logic based on block analysis is
 actually correct, for example for now this fails and I think that we
 should support it:
 =# with query_select as (select generate_series(1, 10) as a) select
 query_select.a from query_select tablesample system (100.0/11)
 REPEATABLE ();
 ERROR:  42P01: relation query_select does not exist

 How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
 sample from a result set?
 Thoughts?

 Good catch. The above query doesn't make any sense.

 TABLESAMPLE SYSTEM implies system-defined sampling mechanism, which is
 block level sampling. So any block level sampling method should be
 barred from operating on a result set in this way... i.e. should
 generate an ERROR inappropriate sampling method specified

 TABLESAMPLE BERNOULLI could work in this case, or any other non-block
 based sampling mechanism. Whether it does work yet is another matter.

 This query should be part of the test suite and should generate a
 useful message or work correctly.

Ahah, you just beat me on that ;) See a more precise reply below.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-09 Thread Simon Riggs
On 9 April 2015 at 04:52, Simon Riggs si...@2ndquadrant.com wrote:

 TABLESAMPLE BERNOULLI could work in this case, or any other non-block
 based sampling mechanism. Whether it does work yet is another matter.

 This query should be part of the test suite and should generate a
 useful message or work correctly.

The SQL Standard does allow the WITH query given. It makes no mention
of the obvious point that SYSTEM-defined mechanisms might not work,
but that is for the implementation to define, AFAICS.

The SQL Standard goes on to talk about possibly non-deterministic
issues. Which in Postgres relates to the point that the results of a
SampleScan will never be IMMUTABLE. That raises the possibility of
planner issues. We must, for example, never do inner join removal on a
sampled relation - we don't do that yet, but something to watch for.

On balance, in this release, I would be happier to exclude sampled
results from queries, and only allow sampling against base tables.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-09 Thread Michael Paquier
On Thu, Apr 9, 2015 at 4:30 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 06/04/15 14:30, Petr Jelinek wrote:

 On 06/04/15 11:02, Simon Riggs wrote:

 Are we ready for a final detailed review and commit?


 I plan to send v12 in the evening with some additional changes that came
 up from Amit's comments + some improvements to error reporting. I think
 it will be ready then.


 Ok so here it is.

 Changes vs v11:
 - changed input parameter list to expr_list
 - improved error reporting, particularly when input parameters are wrong
 - fixed SELECT docs to show correct syntax and mention that there can be
 more sampling methods
 - added name of the sampling method to the explain output - I don't like
 the code much there as it has to look into RTE but on the other hand I don't
 want to create new scan node just so it can hold the name of the sampling
 method for explain
 - made views containing TABLESAMPLE clause not autoupdatable
 - added PageIsAllVisible() check before trying to check for tuple
 visibility
 - some typo/white space fixes



 Compiler warnings:
 explain.c: In function 'ExplainNode':
 explain.c:861: warning: 'sname' may be used uninitialized in this function


 Docs spellings:

 PostgreSQL distrribution  extra r.

 The optional parameter REPEATABLE acceps   accepts.  But I don't know that
 'accepts' is the right word.  It makes the seed value sound optional to
 REPEATABLE.

 each block having same chance  should have the before same.

 Both of those sampling methods currently  I think it should be these
 not those, as this sentence is immediately after their introduction, not
 at a distance.

 ...tuple contents and decides if to return in, or zero if none  Something
 here is confusing. return it, not return in?

 Other comments:

 Do we want tab completions for psql?  (I will never remember how to spell
 BERNOULLI).

Yes. I think so.

 Needs a Cat version bump.

The committer who will pick up this patch will normally do it.

Patch 1 is simple enough and looks fine to me.

Regarding patch 2...
I found for now some typos:
+  titlestructnamepg_tabesample_method/structname/title
+productnamePostgreSQL/productname distrribution:

Also, I am wondering if the sampling logic based on block analysis is
actually correct, for example for now this fails and I think that we
should support it:
=# with query_select as (select generate_series(1, 10) as a) select
query_select.a from query_select tablesample system (100.0/11)
REPEATABLE ();
ERROR:  42P01: relation query_select does not exist

How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
sample from a result set?
Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-09 Thread Simon Riggs
On 9 April 2015 at 04:12, Michael Paquier michael.paqu...@gmail.com wrote:

 Also, I am wondering if the sampling logic based on block analysis is
 actually correct, for example for now this fails and I think that we
 should support it:
 =# with query_select as (select generate_series(1, 10) as a) select
 query_select.a from query_select tablesample system (100.0/11)
 REPEATABLE ();
 ERROR:  42P01: relation query_select does not exist

 How does the SQL spec define exactly TABLESAMPLE? Shouldn't we get a
 sample from a result set?
 Thoughts?

Good catch. The above query doesn't make any sense.

TABLESAMPLE SYSTEM implies system-defined sampling mechanism, which is
block level sampling. So any block level sampling method should be
barred from operating on a result set in this way... i.e. should
generate an ERROR inappropriate sampling method specified

TABLESAMPLE BERNOULLI could work in this case, or any other non-block
based sampling mechanism. Whether it does work yet is another matter.

This query should be part of the test suite and should generate a
useful message or work correctly.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-08 Thread Jeff Janes
On Mon, Apr 6, 2015 at 11:02 AM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 06/04/15 14:30, Petr Jelinek wrote:

 On 06/04/15 11:02, Simon Riggs wrote:

  Are we ready for a final detailed review and commit?


 I plan to send v12 in the evening with some additional changes that came
 up from Amit's comments + some improvements to error reporting. I think
 it will be ready then.


 Ok so here it is.

 Changes vs v11:
 - changed input parameter list to expr_list
 - improved error reporting, particularly when input parameters are wrong
 - fixed SELECT docs to show correct syntax and mention that there can be
 more sampling methods
 - added name of the sampling method to the explain output - I don't like
 the code much there as it has to look into RTE but on the other hand I
 don't want to create new scan node just so it can hold the name of the
 sampling method for explain
 - made views containing TABLESAMPLE clause not autoupdatable
 - added PageIsAllVisible() check before trying to check for tuple
 visibility
 - some typo/white space fixes



Compiler warnings:
explain.c: In function 'ExplainNode':
explain.c:861: warning: 'sname' may be used uninitialized in this function


Docs spellings:

PostgreSQL distrribution  extra r.

The optional parameter REPEATABLE acceps   accepts.  But I don't know
that 'accepts' is the right word.  It makes the seed value sound optional
to REPEATABLE.

each block having same chance  should have the before same.

Both of those sampling methods currently  I think it should be
these not those, as this sentence is immediately after their
introduction, not at a distance.

...tuple contents and decides if to return in, or zero if none  Something
here is confusing. return it, not return in?

Other comments:

Do we want tab completions for psql?  (I will never remember how to spell
BERNOULLI).

Needs a Cat version bump.

Cheers,

Jeff


Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Amit Kapila
On Sat, Apr 4, 2015 at 8:25 PM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 04/04/15 14:57, Amit Kapila wrote:


 1.
 +tablesample_clause:
 +TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause

 It seems to me that you want to allow it to make it extendable
 to user defined Tablesample methods, but not sure if it is
 right to use func_arg_list for the same because sample method
 arguments can have different definition than function arguments.
 Now even if we want to keep sample method arguments same as
 function arguments that sounds like a separate discussion.


 Yes I want extensibility here. And I think the tablesample method
arguments are same thing as function arguments given that in the end they
are arguments for the init function of tablesampling method.

 I would be ok with just expr_list, naming parameters here isn't overly
important, but ability to have different types and numbers of parameters
for custom TABLESAMPLE methods *is* important.


Yeah, named parameters is one reason which I think won't
be required for sample methods and neither the same is
mentioned in docs (if user has to use, what is the way he
can pass the same) and another is number of arguments
for sampling methods which is currently seems to be same
as FUNC_MAX_ARGS, I think that is sufficient but do we
want to support that many arguments for sampling method.

I have shared my thoughts regarding this point with you, if
you don't agree with the same, then proceed as you think is
the best way.


 2.
 postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
 = 2;
 ERROR:  syntax error at or near TABLESAMPLE
 LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...

 postgres=# Delete from test_tablesample TABLESAMPLE system(30);
 ERROR:  syntax error at or near TABLESAMPLE
 LINE 1: Delete from test_tablesample TABLESAMPLE system(30);

 Isn't TABLESAMPLE clause suppose to work with Update/Delete
 statements?


 It's supported in the FROM part of UPDATE and USING part of DELETE. I
think that that's sufficient.


But I think the Update on target table with sample scan is
supported via views which doesn't seem to be the right thing
in case you just want to support it via FROM/USING, example

postgres=# create view vw_test As select * from test_tablesample
TABLESAMPLE sys
tem(30);
postgres=# explain update vw_test set id = 4;
QUERY PLAN
---
 Update on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
   -  Sample Scan on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
(2 rows)


 Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research MSSQL
also only supports it in their equivalent of FROM/USING list, Oracle does
not seem to support their SAMPLING clause outside of SELECTs at all and if
I got the cryptic DB2 manual correctly I think they don't support it
outside of (sub)SELECTs either.


By the way, what is the usecase to support sample scan in
Update or Delete statement?

Also, isn't it better to mention in the docs for Update and
Delete incase we are going to support tablesample clause
for them?


 7.
 ParseTableSample()
 {
 ..
 arg = coerce_to_specific_type(pstate, arg, INT4OID, REPEATABLE);
 ..
 }

 What is the reason for coercing value of REPEATABLE clause to INT4OID
 when numeric value is expected for the clause.  If user gives the
 input value as -2.3, it will generate a seed which doesn't seem to
 be right.


 Because the REPEATABLE is numeric expression so it can produce whatever
number but we need int4 internally (well float4 would also work just the
code would be slightly uglier).


Okay, I understand that part. Here the real point is why not just expose
it as int4 to user rather than telling in docs that it is numeric and
actually we neither expect nor use it as numberic.

Even Oracle supports supports it as int with below description.
The seed_value must be an integer between 0 and 4294967295

 And we do this type of coercion even for table data (you can insert -2.3
into integer column and it will work) so I don't see what's wrong with it
here.


I am not sure we can compare it with column of a table.  I think we
can support it within a valid range (similar to tablesample method) and
if user inputs value outside the range, then return error.


 8.
 +DATA(insert OID = 3295 (  tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
 t f v 3 0 2278 2281
 23 700 _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_
));
 +DATA(insert OID = 3301 (  tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
 f f t f v 3 0 2278 2281
 23 700 _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
 _null_ ));

 Datatype for second argument is kept as  700 (Float4), shouldn't
 it be 1700 (Numeric)?


 Why is that?


As we are exposing it as numeric.

Given the sampling error I think the float4 is enough for 

Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Simon Riggs
On 2 April 2015 at 17:36, Petr Jelinek p...@2ndquadrant.com wrote:

 so here is version 11.

Looks great.

Comment on docs:

The SELECT docs refer only to SYSTEM and BERNOULLI. It doesn't mention
that if other methods are available they could be used also. The
phrasing was sampling method can be one of list.

Are we ready for a final detailed review and commit?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Petr Jelinek

On 06/04/15 12:33, Amit Kapila wrote:

On Sat, Apr 4, 2015 at 8:25 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
  Yes I want extensibility here. And I think the tablesample method
arguments are same thing as function arguments given that in the end
they are arguments for the init function of tablesampling method.
 
  I would be ok with just expr_list, naming parameters here isn't
overly important, but ability to have different types and numbers of
parameters for custom TABLESAMPLE methods *is* important.
 

Yeah, named parameters is one reason which I think won't
be required for sample methods and neither the same is
mentioned in docs (if user has to use, what is the way he
can pass the same) and another is number of arguments
for sampling methods which is currently seems to be same
as FUNC_MAX_ARGS, I think that is sufficient but do we
want to support that many arguments for sampling method.



I think I'll go with simple list of a_exprs. The reason for that is that 
I can foresee sampling methods that need multiple parameters, but I 
don't think naming them is very important. Also adding support for 
naming parameters can be done in the future if we decide so without 
breaking compatibility. Side benefit is that it makes hinting about what 
is wrong with input somewhat easier.


I don't think we need to come up with different limit from 
FUNC_MAX_ARGS. I don't think any sampling method would need that many 
parameters but I also don't see what would additional smaller limit give us.




 
  2.
  postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
  = 2;
  ERROR:  syntax error at or near TABLESAMPLE
  LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...
 
  postgres=# Delete from test_tablesample TABLESAMPLE system(30);
  ERROR:  syntax error at or near TABLESAMPLE
  LINE 1: Delete from test_tablesample TABLESAMPLE system(30);
 
  Isn't TABLESAMPLE clause suppose to work with Update/Delete
  statements?
 
 
  It's supported in the FROM part of UPDATE and USING part of DELETE. I
think that that's sufficient.
 

But I think the Update on target table with sample scan is
supported via views which doesn't seem to be the right thing
in case you just want to support it via FROM/USING, example

postgres=# create view vw_test As select * from test_tablesample
TABLESAMPLE sys
tem(30);
postgres=# explain update vw_test set id = 4;
 QUERY PLAN
---
  Update on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
-  Sample Scan on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
(2 rows)



Right, I'll make those views not auto-updatable.



  Standard is somewhat useless for UPDATE and DELETE as it only defines
quite limited syntax there. From what I've seen when doing research
MSSQL also only supports it in their equivalent of FROM/USING list,
Oracle does not seem to support their SAMPLING clause outside of SELECTs
at all and if I got the cryptic DB2 manual correctly I think they don't
support it outside of (sub)SELECTs either.
 

By the way, what is the usecase to support sample scan in
Update or Delete statement?



Well for the USING/FROM part the use-case is same as for SELECT - 
providing sample of the data for the query (it can be useful also for 
getting pseudo random rows fast). And if we didn't support it, it could 
still be done using sub-select so why not have it directly.



Also, isn't it better to mention in the docs for Update and
Delete incase we are going to support tablesample clause
for them?



Most of other clauses that we support in FROM are not mentioned in 
UPDATE/DELETE docs, both of those commands just say something like 
refer to the SELECT FROM docs for more info. Do you think TABLESAMPLE 
deserves special treatment in this regard?



 
  7.
  ParseTableSample()
  {
  ..
  arg = coerce_to_specific_type(pstate, arg, INT4OID, REPEATABLE);
  ..
  }
 
  What is the reason for coercing value of REPEATABLE clause to INT4OID
  when numeric value is expected for the clause.  If user gives the
  input value as -2.3, it will generate a seed which doesn't seem to
  be right.
 
 
  Because the REPEATABLE is numeric expression so it can produce
whatever number but we need int4 internally (well float4 would also work
just the code would be slightly uglier).
 

Okay, I understand that part. Here the real point is why not just expose
it as int4 to user rather than telling in docs that it is numeric and
actually we neither expect nor use it as numberic.

Even Oracle supports supports it as int with below description.
The seed_value must be an integer between 0 and 4294967295


Well the thing with SQL Standard's numeric value expression is that it 
actually does not mean numeric data type, it's just simple arithmetic 
expression with some given rules (by the standard), but the output data 
type can be either implementation 

Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Petr Jelinek

On 06/04/15 15:07, Amit Kapila wrote:

On Mon, Apr 6, 2015 at 5:56 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
  On 06/04/15 12:33, Amit Kapila wrote:
 
 
  But I think the Update on target table with sample scan is
  supported via views which doesn't seem to be the right thing
  in case you just want to support it via FROM/USING, example
 
  postgres=# create view vw_test As select * from test_tablesample
  TABLESAMPLE sys
  tem(30);
  postgres=# explain update vw_test set id = 4;
   QUERY PLAN
 
---
Update on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
  -  Sample Scan on test_tablesample  (cost=0.00..4.04 rows=4
width=210)
  (2 rows)
 
 
  Right, I'll make those views not auto-updatable.
 
 
Standard is somewhat useless for UPDATE and DELETE as it only defines
  quite limited syntax there. From what I've seen when doing research
  MSSQL also only supports it in their equivalent of FROM/USING list,
  Oracle does not seem to support their SAMPLING clause outside of SELECTs
  at all and if I got the cryptic DB2 manual correctly I think they don't
  support it outside of (sub)SELECTs either.
   
 
  By the way, what is the usecase to support sample scan in
  Update or Delete statement?
 
 
  Well for the USING/FROM part the use-case is same as for SELECT -
providing sample of the data for the query (it can be useful also for
getting pseudo random rows fast). And if we didn't support it, it could
still be done using sub-select so why not have it directly.
 

I can understand why someone wants to read sample data via
SELECT, but not clearly able to understand, why some one wants
to Update or Delete random data in table and if there is a valid
case, then why just based on sub-selects used in where clause
or table reference in FROM/USING list.  Can't we keep it simple
such that either we support to Update/Delete based on Tablesample
clause or prohibit it in all cases?



Well, I don't understand why would somebody do it either, but then again 
during research of this feature I've found questions on stack overflow 
and similar sites about how to do it, so people must have use-cases.


And in any case as you say sub-select would work there so there is no 
reason to explicitly disable it. Plus there is already difference 
between what can be the target table in DELETE/UPDATE versus what can be 
in the FROM/USING clause and I think the TABLESAMPLE behavior follows 
that separation nicely - it's well demonstrated by the fact that we 
would have to add explicit exception to some places in code to disallow it.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Amit Kapila
On Mon, Apr 6, 2015 at 5:56 PM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 06/04/15 12:33, Amit Kapila wrote:


 But I think the Update on target table with sample scan is
 supported via views which doesn't seem to be the right thing
 in case you just want to support it via FROM/USING, example

 postgres=# create view vw_test As select * from test_tablesample
 TABLESAMPLE sys
 tem(30);
 postgres=# explain update vw_test set id = 4;
  QUERY PLAN

---
   Update on test_tablesample  (cost=0.00..4.04 rows=4 width=210)
 -  Sample Scan on test_tablesample  (cost=0.00..4.04 rows=4
width=210)
 (2 rows)


 Right, I'll make those views not auto-updatable.


   Standard is somewhat useless for UPDATE and DELETE as it only defines
 quite limited syntax there. From what I've seen when doing research
 MSSQL also only supports it in their equivalent of FROM/USING list,
 Oracle does not seem to support their SAMPLING clause outside of SELECTs
 at all and if I got the cryptic DB2 manual correctly I think they don't
 support it outside of (sub)SELECTs either.
  

 By the way, what is the usecase to support sample scan in
 Update or Delete statement?


 Well for the USING/FROM part the use-case is same as for SELECT -
providing sample of the data for the query (it can be useful also for
getting pseudo random rows fast). And if we didn't support it, it could
still be done using sub-select so why not have it directly.


I can understand why someone wants to read sample data via
SELECT, but not clearly able to understand, why some one wants
to Update or Delete random data in table and if there is a valid
case, then why just based on sub-selects used in where clause
or table reference in FROM/USING list.  Can't we keep it simple
such that either we support to Update/Delete based on Tablesample
clause or prohibit it in all cases?

 Also, isn't it better to mention in the docs for Update and
 Delete incase we are going to support tablesample clause
 for them?


 Most of other clauses that we support in FROM are not mentioned in
UPDATE/DELETE docs, both of those commands just say something like refer
to the SELECT FROM docs for more info. Do you think TABLESAMPLE deserves
special treatment in this regard?


Nothing too important, just as I got confused while using,
someone else can also get confused, but I think we can leave
it.


   And we do this type of coercion even for table data (you can insert
 -2.3 into integer column and it will work) so I don't see what's wrong
 with it here.
  

 I am not sure we can compare it with column of a table.  I think we
 can support it within a valid range (similar to tablesample method) and
 if user inputs value outside the range, then return error.


 But that's not what standard says, it says any numeric value expression
is valid. The fact that Oracle limits it to some range should not make us
do the same. I think most important thing here is that using -2.3 will
produce same results if called repeatedly (if there are no changes to data,
vacuum etc). Yes passing -2 will produce same results, I don't know if that
is a problem. The main reason why I have the coercion there is so that
users don't have to explicitly typecast expression results.


Actually, not a big point, but I felt it will be clear if there is a valid
range and actually we are not doing anything with negative (part)
of seed input by the user.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TABLESAMPLE patch

2015-04-06 Thread Petr Jelinek

On 06/04/15 11:02, Simon Riggs wrote:

On 2 April 2015 at 17:36, Petr Jelinek p...@2ndquadrant.com wrote:


so here is version 11.


Looks great.

Comment on docs:

The SELECT docs refer only to SYSTEM and BERNOULLI. It doesn't mention
that if other methods are available they could be used also. The
phrasing was sampling method can be one of list.



Will reword.


Are we ready for a final detailed review and commit?



I plan to send v12 in the evening with some additional changes that came 
up from Amit's comments + some improvements to error reporting. I think 
it will be ready then.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-04 Thread Amit Kapila
On Fri, Apr 3, 2015 at 3:06 AM, Petr Jelinek p...@2ndquadrant.com wrote:

 Hi,

 so here is version 11.


Thanks, patch looks much better, but I think still few more
things needs to discussed/fixed.

1.
+tablesample_clause:
+ TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause


Why do you want to allow func_arg_list?

Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.

SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same numeric value expr

It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.

In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.

2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id =
2;
ERROR:  syntax error at or near TABLESAMPLE
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...

postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR:  syntax error at or near TABLESAMPLE
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);

Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?


3.
+static RangeTblEntry *
+transformTableSampleEntry(ParseState *pstate, RangeTableSample *r)
..
+ parser_errposition(pstate,
+ exprLocation((Node *) r;

Better to align exprLocation() with pstate

4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan-rs_snapshot;
Buffer buffer = scan-rs_cbuf;

bool visible =
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}

I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.

5.
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
 List
*sampleargs)
{
..
if (con-val.type == T_Null)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg(REPEATABLE
clause must be NOT NULL numeric value),
 parser_errposition
(pstate, con-location)));
..
}

InitSamplingMethod(SampleScanState *scanstate, TableSampleClause
*tablesample)
{
..
if (fcinfo.argnull[1])
ereport(ERROR,
(errcode
(ERRCODE_NULL_VALUE_NOT_ALLOWED),
 errmsg(REPEATABLE clause cannot be
NULL)));
..
}

I think it would be better if we can have same error message
and probably the same error code in both of the above cases.

6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
 List *sampleargs)

It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?

7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, REPEATABLE);
..
}

What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause.  If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.


8.
+DATA(insert OID = 3295 (  tsm_system_init PGNSP PGUID 12 1 0 0 0 f f f f t
f v 3 0 2278 2281
23 700 _null_ _null_ _null_ _null_ tsm_system_init _null_ _null_ _null_ ));
+DATA(insert OID = 3301 (  tsm_bernoulli_init PGNSP PGUID 12 1 0 0 0 f f f
f t f v 3 0 2278 2281
23 700 _null_ _null_ _null_ _null_ tsm_bernoulli_init _null_ _null_ _null_
));

Datatype for second argument is kept as  700 (Float4), shouldn't
it be 1700 (Numeric)?

9.
postgres=# explain SELECT t1.id FROM test_tablesample as t1 TABLESAMPLE
SYSTEM (
10), test_tablesample as t2 TABLESAMPLE BERNOULLI (20);
 QUERY PLAN

 Nested Loop  (cost=0.00..4.05 rows=100 width=4)
   -  Sample Scan on test_tablesample t1  (cost=0.00..0.01 rows=1 width=4)
   -  Sample Scan on test_tablesample t2  (cost=0.00..4.02 rows=2 width=0)
(3 rows)

Isn't it better to display sample method name in explain.
I think it can help in case of join queries.
We can use below format to display:
Sample Scan (System) on test_tablesample ...

10. Bernoulli.c

+/* State */
+typedef struct
+{
+ uint32 seed; /* random seed */
+ BlockNumber startblock; /* starting block, we use ths for syncscan
support */

typo.
/ths/this


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TABLESAMPLE patch

2015-04-04 Thread Petr Jelinek

On 04/04/15 14:57, Amit Kapila wrote:


1.
+tablesample_clause:
+TABLESAMPLE ColId '(' func_arg_list ')' opt_repeatable_clause


Why do you want to allow func_arg_list?

Basically if user tries to pass multiple arguments in
TABLESAMPLE method's clause like (10,20), then I think
that should be caught in grammer level as an error.


It will be reported as error during parse analysis if the TABLESAMPLE 
method does not accept two parameters, same as when the expression used 
wrong type for example.




SQL - 2003 specs says that argument to REPAEATABLE and TABLESAMPLE
method is same numeric value expr

It seems to me that you want to allow it to make it extendable
to user defined Tablesample methods, but not sure if it is
right to use func_arg_list for the same because sample method
arguments can have different definition than function arguments.
Now even if we want to keep sample method arguments same as
function arguments that sounds like a separate discussion.



Yes I want extensibility here. And I think the tablesample method 
arguments are same thing as function arguments given that in the end 
they are arguments for the init function of tablesampling method.


I would be ok with just expr_list, naming parameters here isn't overly 
important, but ability to have different types and numbers of parameters 
for custom TABLESAMPLE methods *is* important.



In general, I feel you already have good basic infrastructure for
supportting other sample methods, but it is better to keep the new
DDL's for doing the same as a separate patch than this patch, as that
way we can reduce the scope of this patch, OTOH if you or others
feel that it is mandatory to have new DDL's support for other
tablesample methods then we have to deal with this now itself.


Well I did attach it as separate diff mainly for that reason. I agree 
that DDL does not have to be committed immediately with the rest of the 
patch (although it's the simplest part of the patch IMHO).




2.
postgres=# explain update test_tablesample TABLESAMPLE system(30) set id
= 2;
ERROR:  syntax error at or near TABLESAMPLE
LINE 1: explain update test_tablesample TABLESAMPLE system(30) set i...

postgres=# Delete from test_tablesample TABLESAMPLE system(30);
ERROR:  syntax error at or near TABLESAMPLE
LINE 1: Delete from test_tablesample TABLESAMPLE system(30);

Isn't TABLESAMPLE clause suppose to work with Update/Delete
statements?



It's supported in the FROM part of UPDATE and USING part of DELETE. I 
think that that's sufficient.


Standard is somewhat useless for UPDATE and DELETE as it only defines 
quite limited syntax there. From what I've seen when doing research 
MSSQL also only supports it in their equivalent of FROM/USING list, 
Oracle does not seem to support their SAMPLING clause outside of SELECTs 
at all and if I got the cryptic DB2 manual correctly I think they don't 
support it outside of (sub)SELECTs either.



4.
SampleTupleVisible()
{
..
else
{
/* No pagemode, we have to check the tuple itself. */
Snapshot
snapshot = scan-rs_snapshot;
Bufferbuffer = scan-rs_cbuf;

bool visible =
HeapTupleSatisfiesVisibility(tuple, snapshot, buffer);
..
}

I think it is better to check if PageIsAllVisible() in above
code before visibility check as that can avoid visibility checks.


It's probably even better to do that one level up in the samplenexttup() 
and only call the SampleTupleVisible if page is not allvisible 
(PageIsAllVisible() is cheap).




6.
extern TableSampleClause *
ParseTableSample(ParseState *pstate, char *samplemethod, Node *repeatable,
List *sampleargs)

It is better to expose function (usage of extern) via header file.
Is there a need to mention extern here?



Eh, stupid copy-paste error when copying function name from header to 
actual source file.



7.
ParseTableSample()
{
..
arg = coerce_to_specific_type(pstate, arg, INT4OID, REPEATABLE);
..
}

What is the reason for coercing value of REPEATABLE clause to INT4OID
when numeric value is expected for the clause.  If user gives the
input value as -2.3, it will generate a seed which doesn't seem to
be right.



Because the REPEATABLE is numeric expression so it can produce whatever 
number but we need int4 internally (well float4 would also work just the 
code would be slightly uglier). And we do this type of coercion even for 
table data (you can insert -2.3 into integer column and it will work) so 
I don't see what's wrong with it here.




8.
+DATA(insert OID = 3295 (  tsm_system_initPGNSP PGUID 12 1 0 0 0 f f f f
t f v 3 0 2278 2281
23 700 _null_ _null_ _null_ _null_tsm_system_init _null_ _null_ _null_ ));
+DATA(insert OID = 3301 (  tsm_bernoulli_initPGNSP PGUID 12 1 0 0 0 f f
f f t f v 3 0 2278 2281
23 700 _null_ _null_ _null_ _null_tsm_bernoulli_init _null_ _null_
_null_ ));

Datatype for second argument is kept as  700 (Float4), shouldn't
it be 1700 (Numeric)?



Why is that? Given the sampling error I think the float4 is enough for 
specifying the percentage and it makes the 

Re: [HACKERS] TABLESAMPLE patch

2015-04-01 Thread Tomas Vondra

On 03/15/15 16:21, Petr Jelinek wrote:


I also did all the other adjustments we talked about up-thread and
rebased against current master (there was conflict with 31eae6028).



Hi,

I did a review of the version submitted on 03/15 today, and only found a 
few minor issues:


1) The documentation of the pg_tablesample_method catalog is missing
   documentation of the 'tsmpagemode' column, which was added later.

2) transformTableEntry() in parse_clause modifies the comment, in a way
   that made sense before part of the code was moved to a separate
   function. I suggest to revert the comment changes, and instead add
   the comment to transformTableSampleEntry()

3) The shared parts of the block sampler in sampling.c (e.g. in
   BlockSampler_Next) reference Vitter's algorithm (both the code and
   comments) which is a bit awkward as the only part that uses it is
   analyze.c. The other samplers using this code (system / bernoulli)
   don't use Vitter's algorithm.

   I don't think it's possible to separate this piece of code, though.
   It simply has to be in there somewhere, so I'd recommend adding here
   a simple comment explaining that it's needed because of analyze.c.

Otherwise I think the patch is OK. I'll wait for Petr to fix these 
issues, and then mark it as ready for committer.


What do you think, Amit? (BTW you should probably add yourself as 
reviewer in the CF app, as you've provided a lot of feedback here.)


--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-01 Thread Amit Kapila
On Wed, Apr 1, 2015 at 6:31 PM, Tomas Vondra tomas.von...@2ndquadrant.com
wrote:

 On 03/15/15 16:21, Petr Jelinek wrote:


 I also did all the other adjustments we talked about up-thread and
 rebased against current master (there was conflict with 31eae6028).


 Hi,

 I did a review of the version submitted on 03/15 today, and only found a
few minor issues:

 1) The documentation of the pg_tablesample_method catalog is missing
documentation of the 'tsmpagemode' column, which was added later.

 2) transformTableEntry() in parse_clause modifies the comment, in a way
that made sense before part of the code was moved to a separate
function. I suggest to revert the comment changes, and instead add
the comment to transformTableSampleEntry()

 3) The shared parts of the block sampler in sampling.c (e.g. in
BlockSampler_Next) reference Vitter's algorithm (both the code and
comments) which is a bit awkward as the only part that uses it is
analyze.c. The other samplers using this code (system / bernoulli)
don't use Vitter's algorithm.

I don't think it's possible to separate this piece of code, though.
It simply has to be in there somewhere, so I'd recommend adding here
a simple comment explaining that it's needed because of analyze.c.

 Otherwise I think the patch is OK. I'll wait for Petr to fix these
issues, and then mark it as ready for committer.

 What do you think, Amit? (BTW you should probably add yourself as
reviewer in the CF app, as you've provided a lot of feedback here.)


I am still not sure whether it is okay to move REPEATABLE from
unreserved to other category.  In-fact last weekend I have spent some
time to see the exact reason for shift/reduce errors and tried some ways
but didn't find a way to get away with the same.  Now I am planning to
spend some more time on the same probably in next few days and then
still if I cannot find a way, I will share my findings and then once
re-review
the changes made by Petr in last version.  I think overall the patch is in
good shape now although I haven't looked into DDL support part of the
patch which I thought could be done in a separate patch as well.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TABLESAMPLE patch

2015-04-01 Thread Petr Jelinek

On 01/04/15 17:52, Robert Haas wrote:

On Wed, Apr 1, 2015 at 9:49 AM, Amit Kapila amit.kapil...@gmail.com wrote:

I am still not sure whether it is okay to move REPEATABLE from
unreserved to other category.  In-fact last weekend I have spent some
time to see the exact reason for shift/reduce errors and tried some ways
but didn't find a way to get away with the same.  Now I am planning to
spend some more time on the same probably in next few days and then
still if I cannot find a way, I will share my findings and then once
re-review
the changes made by Petr in last version.  I think overall the patch is in
good shape now although I haven't looked into DDL support part of the
patch which I thought could be done in a separate patch as well.


That seems like a legitimate concern.  We usually try not to make
keywords more reserved in PostgreSQL than they are in the SQL
standard, and REPEATABLE is apparently non-reserved there:

http://www.postgresql.org/docs/devel/static/sql-keywords-appendix.html

This also makes method an unreserved keyword, which I'm not wild
about either.  Adding new keyword doesn't cost *much*, but is this
SQL-mandated syntax or something we created?  If the latter, can we
find something to call it that doesn't require new keywords?



REPEATABLE is mandated by standard. I did try for quite some time to 
make it unreserved but was not successful (I can only make it unreserved 
if I make it mandatory but that's not a solution). I haven't been in 
fact even able to find out what it actually conflicts with...


METHOD is something I added. I guess we could find a way to name this 
differently if we really tried. The reason why I picked METHOD was that 
I already added the same unreserved keyword in the sequence AMs patch 
and in that one any other name does not really make sense.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-01 Thread Robert Haas
On Wed, Apr 1, 2015 at 9:49 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 I am still not sure whether it is okay to move REPEATABLE from
 unreserved to other category.  In-fact last weekend I have spent some
 time to see the exact reason for shift/reduce errors and tried some ways
 but didn't find a way to get away with the same.  Now I am planning to
 spend some more time on the same probably in next few days and then
 still if I cannot find a way, I will share my findings and then once
 re-review
 the changes made by Petr in last version.  I think overall the patch is in
 good shape now although I haven't looked into DDL support part of the
 patch which I thought could be done in a separate patch as well.

That seems like a legitimate concern.  We usually try not to make
keywords more reserved in PostgreSQL than they are in the SQL
standard, and REPEATABLE is apparently non-reserved there:

http://www.postgresql.org/docs/devel/static/sql-keywords-appendix.html

This also makes method an unreserved keyword, which I'm not wild
about either.  Adding new keyword doesn't cost *much*, but is this
SQL-mandated syntax or something we created?  If the latter, can we
find something to call it that doesn't require new keywords?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-01 Thread Petr Jelinek

On 01/04/15 18:38, Robert Haas wrote:

On Wed, Apr 1, 2015 at 12:15 PM, Petr Jelinek p...@2ndquadrant.com wrote:

REPEATABLE is mandated by standard. I did try for quite some time to make it
unreserved but was not successful (I can only make it unreserved if I make
it mandatory but that's not a solution). I haven't been in fact even able to
find out what it actually conflicts with...


Yeah, that can be hard to figure out.  Did you run bison with -v and
poke around in gram.output?



Oh, no I didn't (I didn't know gram.output will be generated). This 
helped quite a bit. Thanks.


I now found the reason, it's conflicting with alias but that's my 
mistake - alias should be before TABLESAMPLE clause as per standard and 
I put it after in parser. Now that I put it at correct place REPEATABLE 
can be unreserved keyword. This change requires making TABLESAMPLE a 
type_func_name_keyword but that's probably not really an issue as 
TABLESAMPLE is reserved keyword per standard.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-04-01 Thread Robert Haas
On Wed, Apr 1, 2015 at 12:15 PM, Petr Jelinek p...@2ndquadrant.com wrote:
 REPEATABLE is mandated by standard. I did try for quite some time to make it
 unreserved but was not successful (I can only make it unreserved if I make
 it mandatory but that's not a solution). I haven't been in fact even able to
 find out what it actually conflicts with...

Yeah, that can be hard to figure out.  Did you run bison with -v and
poke around in gram.output?

 METHOD is something I added. I guess we could find a way to name this
 differently if we really tried. The reason why I picked METHOD was that I
 already added the same unreserved keyword in the sequence AMs patch and in
 that one any other name does not really make sense.

I see.  Well, I guess if we've got more than one use for it it's not so bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-03-10 Thread Petr Jelinek

On 10/03/15 10:54, Amit Kapila wrote:

On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
  Ok now I think I finally understand what you are suggesting - you are
saying let's go over whole page while tsmnexttuple returns something,
and do the visibility check and other stuff in that code block under the
buffer lock and cache the resulting valid tuples in some array and then
return those tuples one by one from that cache?
 

Yes, this is what I am suggesting.

And if the caller will try to do it in one step and cache the
  visibility info then we'll end up with pretty much same structure as
  rs_vistuples - there isn't saner way to cache this info other than
  ordered vector of tuple offsets, unless we assume that most pages have
  close to MaxOffsetNumber of tuples which they don't, so why not just use
  the heapgetpage directly and do the binary search over rs_vistuples.
   
 
  The downside of doing it via heapgetpage is that it will do
  visibility test for tuples which we might not even need (I think
  we should do visibility test for tuples retrurned by tsmnexttuple).
 
 
  Well, heapgetpage can either read visibility data for whole page or
not, depending on if we want pagemode reading or not. So we can use the
pagemode for sampling methods where it's feasible (like system) and not
use pagemode where it's not (like bernoulli) and then either use the
rs_vistuples or call HeapTupleSatisfiesVisibility individually again
depending if the method is using pagemode or not.
 

Yeah, but as mentioned above, this has some downside, but go
for it only if you feel that above suggestion is making code complex,
which I think should not be the case as we are doing something similar
in acquire_sample_rows().



I think your suggestion is actually simpler code wise, I am just 
somewhat worried by the fact that no other scan node uses that kind of 
caching and there is probably reason for that.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-03-10 Thread Petr Jelinek

On 10/03/15 04:43, Amit Kapila wrote:

On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
  On 09/03/15 04:51, Amit Kapila wrote:
 
  On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com
Double checking for tuple visibility is the only downside I can think
  of.
 
  That will happen if we use heapgetpage and the way currently
  code is written in patch, however we can easily avoid double
  checking if we don't call heapgetpage and rather do the required
  work at caller's place.
 
 
  What's the point of pagemode then if the caller code does the
visibility checks still one by one on each call. I thought one of the
points of pagemode was to do this in one step (and one buffer lock).
 

You only need one buffer lock for doing at caller's location
as well something like we do in acquire_sample_rows().



Ok now I think I finally understand what you are suggesting - you are 
saying let's go over whole page while tsmnexttuple returns something, 
and do the visibility check and other stuff in that code block under the 
buffer lock and cache the resulting valid tuples in some array and then 
return those tuples one by one from that cache?



  And if the caller will try to do it in one step and cache the
visibility info then we'll end up with pretty much same structure as
rs_vistuples - there isn't saner way to cache this info other than
ordered vector of tuple offsets, unless we assume that most pages have
close to MaxOffsetNumber of tuples which they don't, so why not just use
the heapgetpage directly and do the binary search over rs_vistuples.
 

The downside of doing it via heapgetpage is that it will do
visibility test for tuples which we might not even need (I think
we should do visibility test for tuples retrurned by tsmnexttuple).



Well, heapgetpage can either read visibility data for whole page or not, 
depending on if we want pagemode reading or not. So we can use the 
pagemode for sampling methods where it's feasible (like system) and not 
use pagemode where it's not (like bernoulli) and then either use the 
rs_vistuples or call HeapTupleSatisfiesVisibility individually again 
depending if the method is using pagemode or not. This is how sequential 
scan does it afaik.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-03-10 Thread Amit Kapila
On Tue, Mar 10, 2015 at 3:03 PM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 10/03/15 04:43, Amit Kapila wrote:

 On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek p...@2ndquadrant.com
 mailto:p...@2ndquadrant.com wrote:
  
   On 09/03/15 04:51, Amit Kapila wrote:
  
   On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com
 mailto:p...@2ndquadrant.com
 Double checking for tuple visibility is the only downside I can
think
   of.
  
   That will happen if we use heapgetpage and the way currently
   code is written in patch, however we can easily avoid double
   checking if we don't call heapgetpage and rather do the required
   work at caller's place.
  
  
   What's the point of pagemode then if the caller code does the
 visibility checks still one by one on each call. I thought one of the
 points of pagemode was to do this in one step (and one buffer lock).
  

 You only need one buffer lock for doing at caller's location
 as well something like we do in acquire_sample_rows().


 Ok now I think I finally understand what you are suggesting - you are
saying let's go over whole page while tsmnexttuple returns something, and
do the visibility check and other stuff in that code block under the buffer
lock and cache the resulting valid tuples in some array and then return
those tuples one by one from that cache?


Yes, this is what I am suggesting.

   And if the caller will try to do it in one step and cache the
 visibility info then we'll end up with pretty much same structure as
 rs_vistuples - there isn't saner way to cache this info other than
 ordered vector of tuple offsets, unless we assume that most pages have
 close to MaxOffsetNumber of tuples which they don't, so why not just use
 the heapgetpage directly and do the binary search over rs_vistuples.
  

 The downside of doing it via heapgetpage is that it will do
 visibility test for tuples which we might not even need (I think
 we should do visibility test for tuples retrurned by tsmnexttuple).


 Well, heapgetpage can either read visibility data for whole page or not,
depending on if we want pagemode reading or not. So we can use the pagemode
for sampling methods where it's feasible (like system) and not use pagemode
where it's not (like bernoulli) and then either use the rs_vistuples or
call HeapTupleSatisfiesVisibility individually again depending if the
method is using pagemode or not.


Yeah, but as mentioned above, this has some downside, but go
for it only if you feel that above suggestion is making code complex,
which I think should not be the case as we are doing something similar
in acquire_sample_rows().


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TABLESAMPLE patch

2015-03-09 Thread Amit Kapila
On Mon, Mar 9, 2015 at 3:08 PM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 09/03/15 04:51, Amit Kapila wrote:

 On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com
   Double checking for tuple visibility is the only downside I can think
 of.

 That will happen if we use heapgetpage and the way currently
 code is written in patch, however we can easily avoid double
 checking if we don't call heapgetpage and rather do the required
 work at caller's place.


 What's the point of pagemode then if the caller code does the visibility
checks still one by one on each call. I thought one of the points of
pagemode was to do this in one step (and one buffer lock).


You only need one buffer lock for doing at caller's location
as well something like we do in acquire_sample_rows().

 And if the caller will try to do it in one step and cache the visibility
info then we'll end up with pretty much same structure as rs_vistuples -
there isn't saner way to cache this info other than ordered vector of tuple
offsets, unless we assume that most pages have close to MaxOffsetNumber of
tuples which they don't, so why not just use the heapgetpage directly and
do the binary search over rs_vistuples.


The downside of doing it via heapgetpage is that it will do
visibility test for tuples which we might not even need (I think
we should do visibility test for tuples retrurned by tsmnexttuple).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TABLESAMPLE patch

2015-03-09 Thread Petr Jelinek

On 09/03/15 04:51, Amit Kapila wrote:

On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
  On 05/03/15 09:21, Amit Kapila wrote:
 
  On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com
  mailto:p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote:
   
   
I didn't add the whole page visibility caching as the tuple ids we
  get from sampling methods don't map well to the visibility info we get
  from heapgetpage (it maps to the values in the rs_vistuples array not to
  to its indexes). Commented about it in code also.
   
 
  I think we should set pagemode for system sampling as it can
  have dual benefit, one is it will allow us caching tuples and other
  is it can allow us pruning of page which is done in heapgetpage().
  Do you see any downside to it?
 
 
  Double checking for tuple visibility is the only downside I can think
of.

That will happen if we use heapgetpage and the way currently
code is written in patch, however we can easily avoid double
checking if we don't call heapgetpage and rather do the required
work at caller's place.



What's the point of pagemode then if the caller code does the visibility 
checks still one by one on each call. I thought one of the points of 
pagemode was to do this in one step (and one buffer lock).


And if the caller will try to do it in one step and cache the visibility 
info then we'll end up with pretty much same structure as rs_vistuples - 
there isn't saner way to cache this info other than ordered vector of 
tuple offsets, unless we assume that most pages have close to 
MaxOffsetNumber of tuples which they don't, so why not just use the 
heapgetpage directly and do the binary search over rs_vistuples.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-03-08 Thread Amit Kapila
On Sat, Mar 7, 2015 at 10:37 PM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 05/03/15 09:21, Amit Kapila wrote:

 On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com
 mailto:p...@2ndquadrant.com wrote:
  
  
   I didn't add the whole page visibility caching as the tuple ids we
 get from sampling methods don't map well to the visibility info we get
 from heapgetpage (it maps to the values in the rs_vistuples array not to
 to its indexes). Commented about it in code also.
  

 I think we should set pagemode for system sampling as it can
 have dual benefit, one is it will allow us caching tuples and other
 is it can allow us pruning of page which is done in heapgetpage().
 Do you see any downside to it?


 Double checking for tuple visibility is the only downside I can think of.

That will happen if we use heapgetpage and the way currently
code is written in patch, however we can easily avoid double
checking if we don't call heapgetpage and rather do the required
work at caller's place.

 Plus some added code complexity of course. I guess we could use binary
search on rs_vistuples (it's already sorted) so that info won't be
completely useless. Not sure if worth it compared to normal visibility
check, but not hard to do.


It seems to me that it is better to avoid locking/unlocking buffer
wherever possible.

 I personally don't see the page pruning in TABLESAMPLE as all that
important though, considering that in practice it will only scan tiny
portion of a relation and usually one that does not get many updates (in
practice the main use-case is BI).


In that case, I think it should be acceptable either way, because
if there are less updates then anyway it won't incur any cost of
doing the pruning.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TABLESAMPLE patch

2015-03-07 Thread Petr Jelinek

On 05/03/15 09:21, Amit Kapila wrote:

On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
 
  I didn't add the whole page visibility caching as the tuple ids we
get from sampling methods don't map well to the visibility info we get
from heapgetpage (it maps to the values in the rs_vistuples array not to
to its indexes). Commented about it in code also.
 

I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?


Double checking for tuple visibility is the only downside I can think 
of. Plus some added code complexity of course. I guess we could use 
binary search on rs_vistuples (it's already sorted) so that info won't 
be completely useless. Not sure if worth it compared to normal 
visibility check, but not hard to do.


I personally don't see the page pruning in TABLESAMPLE as all that 
important though, considering that in practice it will only scan tiny 
portion of a relation and usually one that does not get many updates (in 
practice the main use-case is BI).




Few other comments:

1.
Current patch fails to apply, minor change is required:
patching file `src/backend/utils/misc/Makefile'
Hunk #1 FAILED at 15.
1 out of 1 hunk FAILED -- saving rejects to
src/backend/utils/misc/Makefile.rej


Ah bitrot over time.



2.
Few warnings in code (compiled on windows(msvc))
1src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
truncation from 'double' to 'float4'
1src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
truncation from 'double' to 'float4'



I think this is just compiler stupidity but hopefully fixed (I don't 
have msvc to check for it and other compilers I tried don't complain).



3.
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+TableSampleClause *tablesample)
{
..
+/*
+* Page at a time mode is useless for us as we need to check visibility
+* of tuples individually because tuple offsets returned by sampling
+* methods map to rs_vistuples values and not to its indexes.
+*/
+node-ss.ss_currentScanDesc-rs_pageatatime = false;
..
}

Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
Do we modify this way at anyother place?

I think it is better to either teach heap_beginscan_* api's about
it or expose it via new API in heapam.c



Yeah I agree, I taught the heap_beginscan_strat about it as that one is 
the advanced API.




4.
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
{
..
+*tuples = path-rows * samplesize;
}

It seems above calculation considers all rows in table are of
equal size and hence multiplying directly with samplesize will
give estimated number of rows for sample method, however if
row size varies then this calculation might not give right
results.  I think if possible we should consider the possibility
of rows with varying sizes else we can at least write a comment
to indicate the possibility of improvement for future.



I am not sure how we would know what size would the tuples have in the 
random blocks that we are going to read later. That said, I am sure that 
costing can be improved even if I don't know how myself.




6.
@@ -13577,6 +13608,7 @@ reserved_keyword:
| PLACING
| PRIMARY
| REFERENCES
+| REPEATABLE

Have you tried to investigate the reason why it is giving shift/reduce
error for unreserved category and if we are not able to resolve that,
then at least we can try to make it in some less restrictive category.
I tried (on windows) by putting it in (type_func_name_keyword:) and
it seems to be working.

To investigate, you can try with information at below link:
http://www.gnu.org/software/bison/manual/html_node/Understanding.html

Basically I think we should try some more before concluding
to change the category of REPEATABLE and especially if we
want to make it a RESERVED keyword.


Yes it can be moved to type_func_name_keyword which is not all that much 
better but at least something. I did try to play with this already 
during first submission but could not find a way to move it to something 
less restrictive.


I could not even pinpoint what exactly is the shift/reduce issue except 
that it has something to do with the fact that the REPEATABLE clause is 
optional (at least I didn't have the problem when it wasn't optional).




On a related note, it seems you have agreed upthread with
Kyotaro-san for below change, but I don't see the same in latest patch:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
 ;

  opt_repeatable_clause:
-   REPEATABLE '(' AexprConst ')'   { $$ = (Node *)
$3; }
+   REPEATABLE '(' a_expr ')'   { $$ = (Node *)
$3; }

Re: [HACKERS] TABLESAMPLE patch

2015-03-05 Thread Amit Kapila
On Tue, Feb 17, 2015 at 3:29 AM, Petr Jelinek p...@2ndquadrant.com wrote:


 I didn't add the whole page visibility caching as the tuple ids we get
from sampling methods don't map well to the visibility info we get from
heapgetpage (it maps to the values in the rs_vistuples array not to to its
indexes). Commented about it in code also.


I think we should set pagemode for system sampling as it can
have dual benefit, one is it will allow us caching tuples and other
is it can allow us pruning of page which is done in heapgetpage().
Do you see any downside to it?

Few other comments:

1.
Current patch fails to apply, minor change is required:
patching file `src/backend/utils/misc/Makefile'
Hunk #1 FAILED at 15.
1 out of 1 hunk FAILED -- saving rejects to
src/backend/utils/misc/Makefile.rej

2.
Few warnings in code (compiled on windows(msvc))
1src\backend\utils\tablesample\bernoulli.c(217): warning C4305: '=' :
truncation from 'double' to 'float4'
1src\backend\utils\tablesample\system.c(177): warning C4305: '=' :
truncation from 'double' to 'float4'

3.
+static void
+InitScanRelation(SampleScanState *node, EState *estate, int eflags,
+ TableSampleClause *tablesample)
{
..
+ /*
+ * Page at a time mode is useless for us as we need to check visibility
+ * of tuples individually because tuple offsets returned by sampling
+ * methods map to rs_vistuples values and not to its indexes.
+ */
+ node-ss.ss_currentScanDesc-rs_pageatatime = false;
..
}

Modifying scandescriptor in nodeSamplescan.c looks slightly odd,
Do we modify this way at anyother place?

I think it is better to either teach heap_beginscan_* api's about
it or expose it via new API in heapam.c


4.
+Datum
+tsm_system_cost(PG_FUNCTION_ARGS)
{
..
+ *tuples = path-rows * samplesize;
}

It seems above calculation considers all rows in table are of
equal size and hence multiplying directly with samplesize will
give estimated number of rows for sample method, however if
row size varies then this calculation might not give right
results.  I think if possible we should consider the possibility
of rows with varying sizes else we can at least write a comment
to indicate the possibility of improvement for future.

5.
gram.y

-
 /*
  * Given UPDATE foo set set ..., we have to decide without looking any


Unrelated line removed.

6.
@@ -13577,6 +13608,7 @@ reserved_keyword:
  | PLACING
  | PRIMARY
  | REFERENCES
+ | REPEATABLE

Have you tried to investigate the reason why it is giving shift/reduce
error for unreserved category and if we are not able to resolve that,
then at least we can try to make it in some less restrictive category.
I tried (on windows) by putting it in (type_func_name_keyword:) and
it seems to be working.

To investigate, you can try with information at below link:
http://www.gnu.org/software/bison/manual/html_node/Understanding.html

Basically I think we should try some more before concluding
to change the category of REPEATABLE and especially if we
want to make it a RESERVED keyword.

On a related note, it seems you have agreed upthread with
Kyotaro-san for below change, but I don't see the same in latest patch:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
;

 opt_repeatable_clause:
-   REPEATABLE '(' AexprConst ')'   { $$ = (Node *) $3;
}
+   REPEATABLE '(' a_expr ')'   { $$ = (Node *) $3;
}
| /*EMPTY*/
{ $$ = NULL; }



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TABLESAMPLE patch

2015-02-22 Thread Tomas Vondra
Hi,

On 22.2.2015 18:57, Petr Jelinek wrote:
 Tomas noticed that the patch is missing error check when TABLESAMPLE
 is used on view, so here is a new version that checks it's only used 
 against table or matview.
 
 No other changes.

Curious question - could/should this use page prefetch, similar to what
bitmap heap scan does? I believe the answer is 'yes'.

With SYSTEM that should be rather straightforward to implement, because
it already works at page level, and it's likely to give significant
performance speedup, similar to bitmap index scan:

http://www.postgresql.org/message-id/CAHyXU0yiVvfQAnR9cyH=HWh1WbLRsioe=mzRJTHwtr=2azs...@mail.gmail.com

With BERNOULLI that might be more complex to implement because of the
page/tuple sampling, and the benefit is probably much lower than for
SYSTEM because it's likely that at least one tuple will be sampled.

I'm not saying it has to be done in this CF (or that it makes the patch
uncommitable).

For example, this seems like a very nice project for the GSoC (clear
scope, not too large, ...).

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-31 Thread Amit Kapila
On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 19/01/15 07:08, Amit Kapila wrote:

 On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek p...@2ndquadrant.com
 mailto:p...@2ndquadrant.com wrote:


 I think that's actually good to have, because we still do costing and the
 partial index might help produce better estimate of number of returned rows
 for tablesample as well.


I don't understand how will it help, because for tablesample scan
it doesn't consider partial index at all as per patch.

CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
generate_series(0, 9) s(i) ORDER BY i;
INSERT INTO test_tablesample values(generate_series(10,1);

create index idx_tblsample on test_tablesample(id) where id;

Analyze test_tablesample;

postgres=# explain SELECT id FROM test_tablesample where id  ;
QUERY PLAN


---
 Index Only Scan using idx_tblsample on test_tablesample  (cost=0.13..8.14
rows=
1 width=4)
   Index Cond: (id  )
(2 rows)


postgres=# explain SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI
(80) wh
ere id  ;
   QUERY PLAN

 Sample Scan on test_tablesample  (cost=0.00..658.00 rows=8000 width=4)
   Filter: (id  )
(2 rows)

The above result also clearly indicate that when TABLESAMPLE
clause is used, it won't use partial index.


 Well similar, not same as we are not always fetching whole page or doing
 visibility checks on all tuples in the page, etc. But I don't see why it
 can't be inside nodeSamplescan. If you look at bitmap heap scan, that one
 is also essentially somewhat modified sequential scan and does everything
 within the node nodeBitmapHeapscan because the algorithm is not used
 anywhere else, just like sample scan.


I don't mind doing everything in nodeSamplescan, however if
you can split the function, it will be easier to read and understand,
if you see in nodeBitmapHeapscan, that also has function like
bitgetpage().
This is just a suggestion and if you think that it can be splitted,
then it's okay, otherwise leave it as it is.


 Refer parameter (HeapScanDesc-rs_syncscan) and syncscan.c.
 Basically during sequiantial scan on same table by different
 backends, we attempt to keep them synchronized to reduce the I/O.


 Ah this, yes, it makes sense for bernoulli, not for system though. I guess
 it should be used for sampling methods that use SAS_SEQUENTIAL strategy.


Have you taken care of this in your latest patch?



 I don't know how else to explain, if we loop over every tuple in the
 relation and return it with equal probability then visibility checks don't
 matter as the percentage of visible tuples will be same in the result as in
 the relation.

 For example if you have 30% visible tuples and you are interested in 10%
 of tuples overall it will return 10% of all tuples and since every tuple
 has 10% chance of being returned, 30% of those should be visible (if we
 assume smooth distribution of random numbers generated). So in the end you
 are getting 10% of visible tuples because the scan will obviously skip the
 invisible ones and that's what you wanted.

 As I said problem of analyze is that it uses tuple limit instead of
 probability.


Okay, got the point.



 Yes the differences is smaller (in relative numbers) for bigger tables
 when I test this. On 1k row table the spread of 20 runs was between 770-818
 and on 100k row table it's between 79868-80154. I think that's acceptable
 variance and it's imho indeed the random generator that produces this.


Sure, I think this is acceptable.


 Oh and BTW when I delete 50k of tuples (make them invisible) the results
 of 20 runs are between 30880 and 40063 rows.


This is between 60% to 80%, lower than what is expected,
but I guess we can't do much for this except for trying with
reverse order for visibility test and sample tuple call,
you can decide if you want to try that once just to see if that
is better.


 I am sure it could be done with sequence scan. Not sure if it would be
 pretty and more importantly, the TABLESAMPLE is *not* sequential scan. The
 fact that bernoulli method looks similar should not make us assume that
 every other method does as well, especially when system method is
 completely different.


Okay, lets keep it as separate.



 Anyway, attached is new version with some updates that you mentioned
 (all_visible, etc).
 I also added optional interface for the sampling method to access the
 tuple contents as I can imagine sampling methods that will want to do that.


+ /*
+ * Let the sampling method examine the actual tuple and decide if we
+ * should return it.
+ *
+ * Note that we let it examine even invisible tuples.
+ */
+ if 

Re: [HACKERS] TABLESAMPLE patch

2015-01-31 Thread Petr Jelinek

On 31/01/15 14:27, Amit Kapila wrote:

On Fri, Jan 23, 2015 at 5:39 AM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:

On 19/01/15 07:08, Amit Kapila wrote:

On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek
p...@2ndquadrant.com mailto:p...@2ndquadrant.com
mailto:p...@2ndquadrant.com mailto:p...@2ndquadrant.com wrote:


I think that's actually good to have, because we still do costing
and the partial index might help produce better estimate of number
of returned rows for tablesample as well.


I don't understand how will it help, because for tablesample scan
it doesn't consider partial index at all as per patch.



Oh I think we were talking abut different things then, I thought you 
were talking about the index checks that planner/optimizer sometimes 
does to get more accurate statistics. I'll take another look then.




Well similar, not same as we are not always fetching whole page or
doing visibility checks on all tuples in the page, etc. But I don't
see why it can't be inside nodeSamplescan. If you look at bitmap
heap scan, that one is also essentially somewhat modified sequential
scan and does everything within the node nodeBitmapHeapscan because
the algorithm is not used anywhere else, just like sample scan.


I don't mind doing everything in nodeSamplescan, however if
you can split the function, it will be easier to read and understand,
if you see in nodeBitmapHeapscan, that also has function like
bitgetpage().
This is just a suggestion and if you think that it can be splitted,
then it's okay, otherwise leave it as it is.


Yeah I can split it to separate function within the nodeSamplescan, that 
sounds reasonable.





Refer parameter (HeapScanDesc-rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.


Ah this, yes, it makes sense for bernoulli, not for system though. I
guess it should be used for sampling methods that use SAS_SEQUENTIAL
strategy.


Have you taken care of this in your latest patch?


Not yet. I think I will need to make the strategy property of the 
sampling method instead of returning it by costing function so that the 
info can be used by the scan.




Oh and BTW when I delete 50k of tuples (make them invisible) the
results of 20 runs are between 30880 and 40063 rows.


This is between 60% to 80%, lower than what is expected,
but I guess we can't do much for this except for trying with
reverse order for visibility test and sample tuple call,
you can decide if you want to try that once just to see if that
is better.



No, that's because I can't write properly, the lower number was supposed 
to be 39880 which is well within the tolerance, sorry for the confusion 
 (9 and 0 are just too close).



Anyway, attached is new version with some updates that you mentioned
(all_visible, etc).
I also added optional interface for the sampling method to access
the tuple contents as I can imagine sampling methods that will want
to do that.

+/*
+* Let the sampling method examine the actual tuple and decide if we
+* should return it.
+*
+* Note that we let it examine even invisible tuples.
+*/
+if (OidIsValid(node-tsmreturntuple.fn_oid))
+{
+found = DatumGetBool(FunctionCall4(node-tsmreturntuple,
+  PointerGetDatum
(node),
+  UInt32GetDatum
(blockno),
+  PointerGetDatum
(tuple),
+  BoolGetDatum
(visible)));
+/* XXX: better error */
+if (found  !visible)
+elog(ERROR, Sampling method wanted to return invisible tuple);
+}

You have mentioned in comment that let it examine invisible tuple,
but it is not clear why you want to allow examining invisible tuple
and then later return error, why can't it skips invisible tuple.



Well I think returning invisible tuples to user is bad idea and that's 
why the check, but I also think it might make sense to examine the 
invisible tuple for the sampling function in case it wants to create 
some kind of stats about the scan and wants to use those for making the 
decision about returning other tuples. The interface should be probably 
called tsmexaminetuple instead to make it more clear what the intention is.



1.
How about statistics (pgstat_count_heap_getnext()) during
SampleNext as we do in heap_getnext?


Right, will add.



2.
+static TupleTableSlot *
+SampleNext(SampleScanState *node)
+{
..
+/*
+* Lock the buffer so we can safely assess tuple
+* visibility later.
+*/
+LockBuffer(buffer, BUFFER_LOCK_SHARE);
..
}

When is this content lock released, shouldn't we release it after
checking visibility of tuple?


Here,
+   if (!OffsetNumberIsValid(tupoffset))
+   {
+   UnlockReleaseBuffer(buffer);

but yes you are correct, it should be just released there and we can 
unlock already after visibility check.




3.
+static TupleTableSlot *
+SampleNext(SampleScanState 

Re: [HACKERS] TABLESAMPLE patch

2015-01-29 Thread Bruce Momjian
On Thu, Jan 29, 2015 at 11:08:55AM -0500, Robert Haas wrote:
 On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek p...@2ndquadrant.com wrote:
  Yes, that's my view too. I would generally be for that change also and it
  would be worth it if the code was used in more than one place, but as it is
  it seems like it will just add code/complexity for no real benefit. It would
  make sense in case we used sequential scan node instead of the new node as
  Amit also suggested, but I remain unconvinced that mixing sampling and
  sequential scan into single scan node would be a good idea.
 
 Based on previous experience, I expect that any proposal to merge
 those nodes would get shot down by Tom with his laser-guided atomic
 bazooka faster than you can say -1 from me regards tom lane.

Do we get illustrations with that?  ;-)  I want a poster for my wall!

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-29 Thread Robert Haas
On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 Yes, that's my view too. I would generally be for that change also and it
 would be worth it if the code was used in more than one place, but as it is
 it seems like it will just add code/complexity for no real benefit. It would
 make sense in case we used sequential scan node instead of the new node as
 Amit also suggested, but I remain unconvinced that mixing sampling and
 sequential scan into single scan node would be a good idea.

Based on previous experience, I expect that any proposal to merge
those nodes would get shot down by Tom with his laser-guided atomic
bazooka faster than you can say -1 from me regards tom lane.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-29 Thread Jim Nasby

On 1/29/15 10:44 AM, Bruce Momjian wrote:

On Thu, Jan 29, 2015 at 11:08:55AM -0500, Robert Haas wrote:

On Wed, Jan 28, 2015 at 5:19 AM, Petr Jelinek p...@2ndquadrant.com wrote:

Yes, that's my view too. I would generally be for that change also and it
would be worth it if the code was used in more than one place, but as it is
it seems like it will just add code/complexity for no real benefit. It would
make sense in case we used sequential scan node instead of the new node as
Amit also suggested, but I remain unconvinced that mixing sampling and
sequential scan into single scan node would be a good idea.


Based on previous experience, I expect that any proposal to merge
those nodes would get shot down by Tom with his laser-guided atomic
bazooka faster than you can say -1 from me regards tom lane.


Do we get illustrations with that?  ;-)  I want a poster for my wall!


+1. It should also be the tshirt for the next pgCon. ;)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-28 Thread Petr Jelinek

On 28/01/15 09:41, Kyotaro HORIGUCHI wrote:


As an issue related to size esmation, I got a explain result as
following,

=# explain (analyze on, buffers on) select a from t1 tablesample system(10) where 
a  5;
QUERY PLAN

  Sample Scan on t1  (cost=0.00..301.00 rows=1 width=4) (actual 
time=0.015..2
.920 rows=4294 loops=1)
Filter: (a  5)
Rows Removed by Filter: 5876
Buffers: shared hit=45

actual rows is large as twice as the estimated. tsm_system_cost
estimates the number of the result rows using baserel-tuples,
not using baserel-rows so it doesn't reflect the scan quals. Did
the code come from some other intention?



No, that's a bug.


 4.
 SampleNext()
 a.
 Isn't it better to code SampleNext() similar to SeqNext() and
 IndexNext(), which encapsulate the logic to get the tuple in
 another function(tbs_next() or something like that)?


 Possibly, my thinking was that unlike the index_getnext() and
 heap_getnext(), this function would not be called from any other
 place so adding one more layer of abstraction didn't seem useful.
 And it would mean creating new ScanDesc struct, etc.


I think adding additional abstraction would simplify the function
and reduce the chance of discrepency, I think we need to almost
create a duplicate version of code for heapgetpage() method.
Yeah, I agree that we need to build structure like ScanDesc, but
still it will be worth to keep code consistent.


Well similar, not same as we are not always fetching whole page or doing
visibility checks on all tuples in the page, etc. But I don't see why it
can't be inside nodeSamplescan. If you look at bitmap heap scan, that
one is also essentially somewhat modified sequential scan and does
everything within the node nodeBitmapHeapscan because the algorithm is
not used anywhere else, just like sample scan.


As a general opinion, I'll vote for Amit's comment, since three
or four similar instances seems to be a enough reason to abstract
it. On the other hand, since the scan processes are distributed
in ExecProcNode by the nodeTag of scan nodes, reunioning of the
control by abstraction layer after that could be said to
introducing useless annoyance. In short, bastraction at the level
of *Next() would bring the necessity of additional changes around
the execution node distribution.



Yes, that's my view too. I would generally be for that change also and 
it would be worth it if the code was used in more than one place, but as 
it is it seems like it will just add code/complexity for no real 
benefit. It would make sense in case we used sequential scan node 
instead of the new node as Amit also suggested, but I remain unconvinced 
that mixing sampling and sequential scan into single scan node would be 
a good idea.




How will it get smoothen for cases when let us say
more than 50% of tuples are not visible.  I think its
due to Postgres non-overwritting storage architecture
that we maintain multiple version of rows in same heap,
otherwise for different kind of architecture (like mysql/oracle)
where multiple row versions are not maintained in same
heap, the same function (with same percentage) can return
entirely different number of rows.



I don't know how else to explain, if we loop over every tuple in the
relation and return it with equal probability then visibility checks
don't matter as the percentage of visible tuples will be same in the
result as in the relation.


Surely it migh yield the effectively the same result. Even so,
this code needs exaplation comment about the nature aroud the
code, or write code as MMVC people won't get confused, I suppose.



Yes it does, but as I am failing to explain even here, it's not clear to 
me what to write there. From my point of view it's just effect of the 
essential property of bernoulli sampling which is that every element has 
equal probability of being included in the sample. It comes from the 
fact that we do bernoulli trial (in the code it's the while 
(sampler_random_fract(sampler-randstate)  probability) loop) on every 
individual tuple.


This means that the ratio of visible and invisible tuples should be same 
in the sample as it is in the relation. We then just skip the invisible 
tuples and get the correct percentage of the visible ones (this has 
performance benefit of not having to do visibility check on all tuples).


If that wasn't true than the bernoulli sampling would just simply not 
work as intended as the same property is the reason why it's used in 
statistics - the trends should look the same in sample as they look in 
the source population.


Obviously there is some variation in practice as we don't have perfect 
random generator but that's independent of the algorithm.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 

Re: [HACKERS] TABLESAMPLE patch

2015-01-28 Thread Kyotaro HORIGUCHI
Hello,

 On 19/01/15 07:08, Amit Kapila wrote:
  On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek p...@2ndquadrant.com
  mailto:p...@2ndquadrant.com wrote:
  No issues, but it seems we should check other paths where
  different handling could be required for tablesample scan.
  In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size())
  for rel size estimates where it checks the presence of partial indexes
  and that might effect the size estimates and that doesn't seem to
  be required when we have to perform scan based on TableSample
  method.
 
 I think that's actually good to have, because we still do costing and 
 the partial index might help produce better estimate of number of 
 returned rows for tablesample as well.

As an issue related to size esmation, I got a explain result as
following,

=# explain (analyze on, buffers on) select a from t1 tablesample system(10) 
where a  5;
   QUERY PLAN

 Sample Scan on t1  (cost=0.00..301.00 rows=1 width=4) (actual time=0.015..2
.920 rows=4294 loops=1)
   Filter: (a  5)
   Rows Removed by Filter: 5876
   Buffers: shared hit=45

actual rows is large as twice as the estimated. tsm_system_cost
estimates the number of the result rows using baserel-tuples,
not using baserel-rows so it doesn't reflect the scan quals. Did
the code come from some other intention?

  4.
  SampleNext()
  a.
  Isn't it better to code SampleNext() similar to SeqNext() and
  IndexNext(), which encapsulate the logic to get the tuple in
  another function(tbs_next() or something like that)?
 
 
  Possibly, my thinking was that unlike the index_getnext() and
  heap_getnext(), this function would not be called from any other
  place so adding one more layer of abstraction didn't seem useful.
  And it would mean creating new ScanDesc struct, etc.
 
 
  I think adding additional abstraction would simplify the function
  and reduce the chance of discrepency, I think we need to almost
  create a duplicate version of code for heapgetpage() method.
  Yeah, I agree that we need to build structure like ScanDesc, but
  still it will be worth to keep code consistent.
 
 Well similar, not same as we are not always fetching whole page or doing 
 visibility checks on all tuples in the page, etc. But I don't see why it 
 can't be inside nodeSamplescan. If you look at bitmap heap scan, that 
 one is also essentially somewhat modified sequential scan and does 
 everything within the node nodeBitmapHeapscan because the algorithm is 
 not used anywhere else, just like sample scan.

As a general opinion, I'll vote for Amit's comment, since three
or four similar instances seems to be a enough reason to abstract
it. On the other hand, since the scan processes are distributed
in ExecProcNode by the nodeTag of scan nodes, reunioning of the
control by abstraction layer after that could be said to
introducing useless annoyance. In short, bastraction at the level
of *Next() would bring the necessity of additional changes around
the execution node distribution.

  Another one is PageIsAllVisible() check.
 
 
 Done.
 
  Another thing is don't we want to do anything for sync scans
  for these method's as they are doing heap scan?
 
 
  I don't follow this one tbh.
 
 
  Refer parameter (HeapScanDesc-rs_syncscan) and syncscan.c.
  Basically during sequiantial scan on same table by different
  backends, we attempt to keep them synchronized to reduce the I/O.
 
 
 Ah this, yes, it makes sense for bernoulli, not for system though. I 
 guess it should be used for sampling methods that use SAS_SEQUENTIAL 
 strategy.
 
 
 
  c.
  for bernoulli method, it will first get the tupoffset with
  the help of function and then apply visibility check, it seems
  that could reduce the sample set way lower than expected
  in case lot of tuples are not visible, shouldn't it be done in
  reverse
  way(first visibility check, then call function to see if we want to
  include it in the sample)?
  I think something similar is done in acquire_sample_rows which
  seems right to me.
 
 
  I don't think so, the way bernoulli works is that it returns every
  tuple with equal probability, so the visible tuples have same chance
  of being returned as the invisible ones so the issue should be
  smoothed away automatically (IMHO).
 
 
  How will it get smoothen for cases when let us say
  more than 50% of tuples are not visible.  I think its
  due to Postgres non-overwritting storage architecture
  that we maintain multiple version of rows in same heap,
  otherwise for different kind of architecture (like mysql/oracle)
  where multiple row versions are not maintained in same
  heap, the same function (with same percentage) can return
  

Re: [HACKERS] TABLESAMPLE patch

2015-01-28 Thread Petr Jelinek

On 28/01/15 08:23, Kyotaro HORIGUCHI wrote:

Hi, I took a look on this and found nice.

By the way, the parameter for REPEATABLE seems allowing to be a
expression in ParseTableSample but the grammer rejects it.



It wasn't my intention to support it, but you are correct, the code is 
generic enough that we can support it.



The following change seems enough.



Seems about right, thanks.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-27 Thread Kyotaro HORIGUCHI
Hi, I took a look on this and found nice.

By the way, the parameter for REPEATABLE seems allowing to be a
expression in ParseTableSample but the grammer rejects it.

The following change seems enough.

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4578b5e..8cf09d5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -10590,7 +10590,7 @@ tablesample_clause:
;
 
 opt_repeatable_clause:
-   REPEATABLE '(' AexprConst ')'   { $$ = (Node *) $3; }
+   REPEATABLE '(' a_expr ')'   { $$ = (Node *) $3; }
| /*EMPTY*/ 
{ $$ = NULL; }
;


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-18 Thread Amit Kapila
On Sun, Jan 18, 2015 at 12:46 AM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 17/01/15 13:46, Amit Kapila wrote:


 3.
 @@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 /* Foreign table */
 set_foreign_pathlist(root, rel, rte);
 }
 +else if (rte-tablesample != NULL)
 +{
 +/* Build sample scan on relation */
 +set_tablesample_rel_pathlist(root, rel, rte);
 +}

 Have you consider to have a different RTE for this?


 I assume you mean different RTEKind, yes I did, but the fact is that it's
 still a relation, just with different scan type and I didn't want to modify
 every piece of code which deals with RTE_RELATION to also deal with the new
 RTE for sample scan as it seems unnecessary. I am not strongly opinionated
 about this one though.


No issues, but it seems we should check other paths where
different handling could be required for tablesample scan.
In set_rel_size(), it uses normal path for heapscan (set_plain_rel_size())
for rel size estimates where it checks the presence of partial indexes
and that might effect the size estimates and that doesn't seem to
be required when we have to perform scan based on TableSample
method.
I think we can once check other places where some separate
handling for (rte-inh) is done to see if we need some different handing
for Tablesample scan.


  4.
 SampleNext()
 a.
 Isn't it better to code SampleNext() similar to SeqNext() and
 IndexNext(), which encapsulate the logic to get the tuple in
 another function(tbs_next() or something like that)?


 Possibly, my thinking was that unlike the index_getnext() and
 heap_getnext(), this function would not be called from any other place so
 adding one more layer of abstraction didn't seem useful. And it would mean
 creating new ScanDesc struct, etc.


I think adding additional abstraction would simplify the function
and reduce the chance of discrepency, I think we need to almost
create a duplicate version of code for heapgetpage() method.
Yeah, I agree that we need to build structure like ScanDesc, but
still it will be worth to keep code consistent.




 b.
 Also don't we want to handle pruning of page while
 scanning (heap_page_prune_opt()) and I observed
 in heap scan API's after visibility check we do check
 for serializable conflict (CheckForSerializableConflictOut()).


 Both good points, will add.


Another one is PageIsAllVisible() check.


  Another thing is don't we want to do anything for sync scans
 for these method's as they are doing heap scan?


 I don't follow this one tbh.


Refer parameter (HeapScanDesc-rs_syncscan) and syncscan.c.
Basically during sequiantial scan on same table by different
backends, we attempt to keep them synchronized to reduce the I/O.



 c.
 for bernoulli method, it will first get the tupoffset with
 the help of function and then apply visibility check, it seems
 that could reduce the sample set way lower than expected
 in case lot of tuples are not visible, shouldn't it be done in reverse
 way(first visibility check, then call function to see if we want to
 include it in the sample)?
 I think something similar is done in acquire_sample_rows which
 seems right to me.


 I don't think so, the way bernoulli works is that it returns every tuple
 with equal probability, so the visible tuples have same chance of being
 returned as the invisible ones so the issue should be smoothed away
 automatically (IMHO).


How will it get smoothen for cases when let us say
more than 50% of tuples are not visible.  I think its
due to Postgres non-overwritting storage architecture
that we maintain multiple version of rows in same heap,
otherwise for different kind of architecture (like mysql/oracle)
where multiple row versions are not maintained in same
heap, the same function (with same percentage) can return
entirely different number of rows.



 The acquire_sample_rows has limit on number of rows it returns so that's
 why it has to do the visibility check before as the problem you described
 applies there.


Even if in tablesample method, the argument value is in
percentage that doesn't mean not to give any consideration to
number of rows returned.  The only difference I could see is with
tablesample method the number of rows returned will not be accurate
number.  I think it is better to consider only visible rows.


 The reason for using the probability instead of tuple limit is the fact
 that there is no way to accurately guess number of tuples in the table at
 the beginning of scan. This method should actually be better at returning
 the correct number of tuples without dependence on how many of them are
 visible or not and how much space in blocks is used.



 5.


 So above test yield 60% rows first time and 100% rows next time,
 when the test has requested 80%.
 I understand that sample percentage will result an approximate
 number of rows, however I just wanted that we should check if the
 implementation has any problem or not?


 I think this is caused by random 

Re: [HACKERS] TABLESAMPLE patch

2015-01-17 Thread Petr Jelinek

On 17/01/15 13:46, Amit Kapila wrote:

On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek p...@2ndquadrant.com
mailto:p...@2ndquadrant.com wrote:
 
 
  In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code
again I realized the old one would produce wrong results if there were
multiple TABLESAMPLE statements in same query or multiple cursors in
same transaction.
 

I have looked into this patch and would like to share my
findings with you.


That's a lot for this.



1.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+/*
+* There is only one plan to consider but we still need to set
+* parameters for RelOptInfo.
+*/
+set_cheapest(rel);
}

It seems we already call set_cheapest(rel) in set_rel_pathlist()
which is a caller of set_tablesample_rel_pathlist(), so why do
we need it inside set_tablesample_rel_pathlist()?


Ah, this changed after I started working on this patch and I didn't 
notice - previously all the set_something_rel_pathlist called 
set_cheapest() individually. I will change the code.




2.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+/* We only do sample scan if it was requested */
+add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
}

Do we need to add_path, if there is only one path?


Good point, we can probably just set the pathlist directly in this case.



3.
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Foreign table */
set_foreign_pathlist(root, rel, rte);
}
+else if (rte-tablesample != NULL)
+{
+/* Build sample scan on relation */
+set_tablesample_rel_pathlist(root, rel, rte);
+}

Have you consider to have a different RTE for this?



I assume you mean different RTEKind, yes I did, but the fact is that 
it's still a relation, just with different scan type and I didn't want 
to modify every piece of code which deals with RTE_RELATION to also deal 
with the new RTE for sample scan as it seems unnecessary. I am not 
strongly opinionated about this one though.



4.
SampleNext()
a.
Isn't it better to code SampleNext() similar to SeqNext() and
IndexNext(), which encapsulate the logic to get the tuple in
another function(tbs_next() or something like that)?


Possibly, my thinking was that unlike the index_getnext() and 
heap_getnext(), this function would not be called from any other place 
so adding one more layer of abstraction didn't seem useful. And it would 
mean creating new ScanDesc struct, etc.




b.
Also don't we want to handle pruning of page while
scanning (heap_page_prune_opt()) and I observed
in heap scan API's after visibility check we do check
for serializable conflict (CheckForSerializableConflictOut()).


Both good points, will add.


Another thing is don't we want to do anything for sync scans
for these method's as they are doing heap scan?


I don't follow this one tbh.



c.
for bernoulli method, it will first get the tupoffset with
the help of function and then apply visibility check, it seems
that could reduce the sample set way lower than expected
in case lot of tuples are not visible, shouldn't it be done in reverse
way(first visibility check, then call function to see if we want to
include it in the sample)?
I think something similar is done in acquire_sample_rows which
seems right to me.


I don't think so, the way bernoulli works is that it returns every tuple 
with equal probability, so the visible tuples have same chance of being 
returned as the invisible ones so the issue should be smoothed away 
automatically (IMHO).


The acquire_sample_rows has limit on number of rows it returns so that's 
why it has to do the visibility check before as the problem you 
described applies there.


The reason for using the probability instead of tuple limit is the fact 
that there is no way to accurately guess number of tuples in the table 
at the beginning of scan. This method should actually be better at 
returning the correct number of tuples without dependence on how many of 
them are visible or not and how much space in blocks is used.




5.

CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
generate_series(0, 9) s(i) ORDER BY i;

postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
  id

   1
   3
   4
   7
   8
   9
(6 rows)


postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
  id

   0
   1
   2
   3
   4
   5
   6
   7
   8
   9
(10 rows)

So above test yield 60% rows first time and 100% rows next time,
when the test has requested 80%.
I understand that sample percentage will result an approximate
number of rows, however I just wanted that we should check if the
implementation has any problem or not?


I think this is caused by random generator not producing smooth enough 
random 

Re: [HACKERS] TABLESAMPLE patch

2015-01-17 Thread Amit Kapila
On Sun, Jan 11, 2015 at 1:29 AM, Petr Jelinek p...@2ndquadrant.com wrote:


 In second patch which implements the TABLESAMPLE itself I changed the
implementation of random generator because when I looked at the code again
I realized the old one would produce wrong results if there were multiple
TABLESAMPLE statements in same query or multiple cursors in same
transaction.


I have looked into this patch and would like to share my
findings with you.

1.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+ /*
+ * There is only one plan to consider but we still need to set
+ * parameters for RelOptInfo.
+ */
+ set_cheapest(rel);
}

It seems we already call set_cheapest(rel) in set_rel_pathlist()
which is a caller of set_tablesample_rel_pathlist(), so why do
we need it inside set_tablesample_rel_pathlist()?

2.
+static void
+set_tablesample_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
RangeTblEntry *rte)
{
..
+ /* We only do sample scan if it was requested */
+ add_path(rel, (Path *) create_samplescan_path(root, rel, required_outer));
}

Do we need to add_path, if there is only one path?

3.
@@ -332,6 +334,11 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
  /* Foreign table */
  set_foreign_pathlist(root, rel, rte);
  }
+ else if (rte-tablesample != NULL)
+ {
+ /* Build sample scan on relation */
+ set_tablesample_rel_pathlist(root, rel, rte);
+ }

Have you consider to have a different RTE for this?

4.
SampleNext()
a.
Isn't it better to code SampleNext() similar to SeqNext() and
IndexNext(), which encapsulate the logic to get the tuple in
another function(tbs_next() or something like that)?

b.
Also don't we want to handle pruning of page while
scanning (heap_page_prune_opt()) and I observed
in heap scan API's after visibility check we do check
for serializable conflict (CheckForSerializableConflictOut()).
Another thing is don't we want to do anything for sync scans
for these method's as they are doing heap scan?

c.
for bernoulli method, it will first get the tupoffset with
the help of function and then apply visibility check, it seems
that could reduce the sample set way lower than expected
in case lot of tuples are not visible, shouldn't it be done in reverse
way(first visibility check, then call function to see if we want to
include it in the sample)?
I think something similar is done in acquire_sample_rows which
seems right to me.

5.

CREATE TABLE test_tablesample (id INT, name text) WITH (fillfactor=10);
INSERT INTO test_tablesample SELECT i, repeat(i::text, 200) FROM
generate_series(0, 9) s(i) ORDER BY i;

postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
 id

  1
  3
  4
  7
  8
  9
(6 rows)


postgres=# SELECT id FROM test_tablesample TABLESAMPLE BERNOULLI (80);
 id

  0
  1
  2
  3
  4
  5
  6
  7
  8
  9
(10 rows)

So above test yield 60% rows first time and 100% rows next time,
when the test has requested 80%.
I understand that sample percentage will result an approximate
number of rows, however I just wanted that we should check if the
implementation has any problem or not?

In this regard, I have one question related to below code:

+Datum
+tsm_bernoulli_nexttuple(PG_FUNCTION_ARGS)
+{
..
+ /* Every tuple has percent chance of being returned */
+ while (sampler_random_fract(sampler-randstate)  samplesize)
+ {
+ tupoffset++;
+
+ if (tupoffset  maxoffset)
+ break;
+ }

Why are we not considering tuple in above code
if tupoffset is less than maxoffset?

6.
One larger question about the approach used in patch, why do you
think it is better to have a new node (SampleScan/SamplePath) like
you have used in patch instead of doing it as part of Sequence Scan.
I agree that we need some changes in different parts of Sequence Scan
execution, but still sounds like a viable way.  One simple thing that
occurs to me is that in ExecSeqScan(), we can use something like
SampleSeqNext instead of SeqNext to scan heap in a slightly different
way, probably doing it as part of sequence scan will have advantage that
we can use most of the existing infrastructure (sequence node path)
and have less discrepancies as mentioned in point-4.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TABLESAMPLE patch

2015-01-09 Thread Michael Paquier
On Fri, Jan 9, 2015 at 1:10 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 On 06/01/15 14:22, Petr Jelinek wrote:

 On 06/01/15 08:51, Michael Paquier wrote:

 On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek p...@2ndquadrant.com
 wrote:

 Attached is v3 which besides the fixes mentioned above also includes
 changes
 discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD),
 fixes for
 crash with FETCH FIRST and is rebased against current master.

 This patch needs a rebase, there is a small conflict in
 parallel_schedule.


 Sigh, I really wish we had automation that checks this automatically for
 patches in CF.


 Here is rebase against current master.
Thanks!


 Structurally speaking, I think that the tsm methods should be added in
 src/backend/utils and not src/backend/access which is more high-level
 as tsm_bernoulli.c and tsm_system.c contain only a set of new


 I am not sure if I parsed this correctly, do you mean to say that only
 low-level access functions belong to src/backend/access? Makes sense.


 Made this change.


 procedure functions. Having a single header file tsm.h would be also a
 good thing.



 Moved into single tablesample.h file.


 Regarding the naming, is tsm (table sample method) really appealing?
 Wouldn't it be better to use simply tablesample_* for the file names
 and the method names?


 I created the src/backend/tablesample and files are named just system.c and
 bernoulli.c, but I kept tsm_ prefix for methods as they become too long for
 my taste when prefixing with tablesample_.

 This is a large patch... Wouldn't sampling.[c|h] extracted from
 ANALYZE live better as a refactoring patch? This would limit a bit bug
 occurrences on the main patch.


 That's a good idea, I'll split it into patch series.


 I've split the sampling.c/h into separate patch.

 I also wrote basic CREATE/DROP TABLESAMPLE METHOD support, again as separate
 patch in the attached patch-set. This also includes modules test with simple
 custom tablesample method.

 There are some very minor cleanups in the main tablesample code itself but
 no functional changes.

Some comments about the 1st patch:
1) Nitpicking:
+ *   Block sampling routines shared by ANALYZE and TABLESAMPLE.
TABLESAMPLE is not added yet. You may as well mention simplify this
description with something like Sampling routines for relation
blocks.

2) file_fdw and postgres_fdw do not compile correctly as they still
use anl_random_fract. This function is still mentioned in vacuum.h as
well.

3) Not really an issue of this patch, but I'd think that this comment
should be reworked:
+ * BlockSampler is used for stage one of our new two-stage tuple
+ * sampling mechanism as discussed on pgsql-hackers 2004-04-02 (subject
+ * Large DB).  It selects a random sample of samplesize blocks out of
+ * the nblocks blocks in the table.  If the table has less than
+ * samplesize blocks, all blocks are selected.

4) As a refactoring patch, why is the function providing a random
value changed? Shouldn't sample_random_fract be consistent with
anl_random_fract?

5) The seed numbers can be changed to RAND48_SEED_0, RAND48_SEED_1 and
RAND48_SEED_2 instead of being hardcoded?
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2015-01-05 Thread Michael Paquier
On Tue, Dec 23, 2014 at 5:21 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 Attached is v3 which besides the fixes mentioned above also includes changes
 discussed with Tomas (except the CREATE/DROP TABLESAMPLE METHOD), fixes for
 crash with FETCH FIRST and is rebased against current master.
This patch needs a rebase, there is a small conflict in parallel_schedule.

Structurally speaking, I think that the tsm methods should be added in
src/backend/utils and not src/backend/access which is more high-level
as tsm_bernoulli.c and tsm_system.c contain only a set of new
procedure functions. Having a single header file tsm.h would be also a
good thing.

Regarding the naming, is tsm (table sample method) really appealing?
Wouldn't it be better to use simply tablesample_* for the file names
and the method names?

This is a large patch... Wouldn't sampling.[c|h] extracted from
ANALYZE live better as a refactoring patch? This would limit a bit bug
occurrences on the main patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2014-12-22 Thread Petr Jelinek

On 21/12/14 18:38, Tomas Vondra wrote:

Hi,

On 18.12.2014 13:14, Petr Jelinek wrote:

Hi,

v2 version of this patch is attached.


I did a review of this v2 patch today. I plan to do a bit more testing,
but these are my comments/questions so far:


Thanks for looking at it!



(0) There's a TABLESAMPLE page at the wiki, not updated since 2012:

 https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation

 We should either update it or mark it as obsolete I guess. Also,
 I'd like to know what's the status regarding the TODO items
 mentioned there. Are those still valid with this patch?


I will have to look in more detail over the holidays and update it, but 
the general info about table sampling there applies and will apply to 
any patch trying to implement it.


Quick look on the mail thread suggest that at least the concerns 
mentioned in the mailing list should not apply to this implementation.
And looking at the patch the problem with BERNOULLI sampling should not 
exist either as I use completely different implementation for that.


I do also have some issues with joins which I plan to look at but it's 
different one (my optimizer code overestimates the number of rows)



(1) The patch adds a new catalog, but does not bump CATVERSION.



I thought this was always done by committer?


(2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
 as it squishes everything into a single chunk. That's inconsistent
 with naming of the other catalogs. I think pg_table_sample_method
 would be better.


Fair point, but perhaps pg_tablesample_method then as tablesample is 
used as single word everywhere including the standard.




(3) There are a few more strange naming decisions, but that's mostly
 because of the SQL standard requires that naming. I mean SYSTEM and
 BERNOULLI method names, and also the fact that the probability is
 specified as 0-100 value, which is inconsistent with other places
 (e.g. percentile_cont uses the usual 0-1 probability notion). But
 I don't think this can be fixed, that's what the standard says.


Yeah, I don't exactly love that either but what standard says, standard 
says.




(4) I noticed there's an interesting extension in SQL Server, which
 allows specifying PERCENT or ROWS, so you can say

   SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);

 or

   SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);

 That seems handy, and it'd make migration from SQL Server easier.
 What do you think?


Well doing it exactly this way somewhat kills the extensibility which 
was one of the main goals for me - I allow any kind of parameters for 
sampling and the handling of those depends solely on the sampling 
method. This means that in my approach if you'd want to change what you 
are limiting you'd have to write new sampling method and the query would 
then look like SELECT * FROM table TABLESAMPLE SYSTEM_ROWLIMIT (2500); 
or some such (depending on how you name the sampling method). Or SELECT 
* FROM table TABLESAMPLE SYSTEM (2500, 'ROWS'); if we chose to extend 
the SYSTEM sampling method, that would be also doable.


The reason for this is that I don't want to really limit too much what 
parameters can be send to sampling (I envision even SYSTEM_TIMED 
sampling method that will get limit as time interval for example).




(5) I envision a lot of confusion because of the REPEATABLE clause.
 With READ COMMITTED, it's not really repeatable because of changes
 done by the other users (and maybe things like autovacuum). Shall
 we mention this in the documentation?


Yes docs need improvement and this should be mentioned, also code-docs - 
I found few places which I forgot to update when changing code and now 
have misleading comments.




(6) This seems slightly wrong, because of long/uint32 mismatch:

   long seed = PG_GETARG_UINT32(1);

 I think uint32 would be more appropriate, no?


Probably, although I need long later in the algorithm anyway.



(7) NITPICKING: I think a 'sample_rate' would be a better name here:

   double percent = sampler-percent;


Ok.



(8) NITPICKING: InitSamplingMethod contains a command with ';;'

   fcinfo.arg[i] = (Datum) 0;;


Yeah :)



(9) The current regression tests only use the REPEATABLE cases. I
 understand queries without this clause are RANDOM, but maybe we
 could do something like this:

SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
  SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
) foo;

  Granted, there's still a small probability of false positive, but
  maybe that's sufficiently small? Or is the amount of code this
  tests negligible?


Well, depending on fillfactor and limit it could be made quite reliable 
I think, I also want to add test with VIEW (I think v2 has a bug there) 
and perhaps some JOIN.




(10) In the initial patch you mentioned it's possible to write custom

Re: [HACKERS] TABLESAMPLE patch

2014-12-22 Thread Tomas Vondra
On 22.12.2014 10:07, Petr Jelinek wrote:
 On 21/12/14 18:38, Tomas Vondra wrote:

 (1) The patch adds a new catalog, but does not bump CATVERSION.

 
 I thought this was always done by committer?

Right. Sorry for the noise.

 
 (2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
  as it squishes everything into a single chunk. That's inconsistent
  with naming of the other catalogs. I think pg_table_sample_method
  would be better.
 
 Fair point, but perhaps pg_tablesample_method then as tablesample is 
 used as single word everywhere including the standard.

Sounds good.

 (4) I noticed there's an interesting extension in SQL Server, which
  allows specifying PERCENT or ROWS, so you can say

SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);

  or

SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);

  That seems handy, and it'd make migration from SQL Server easier.
  What do you think?
 
 Well doing it exactly this way somewhat kills the extensibility which
 was one of the main goals for me - I allow any kind of parameters for
 sampling and the handling of those depends solely on the sampling
 method. This means that in my approach if you'd want to change what you
 are limiting you'd have to write new sampling method and the query would
 then look like SELECT * FROM table TABLESAMPLE SYSTEM_ROWLIMIT (2500);
 or some such (depending on how you name the sampling method). Or SELECT
 * FROM table TABLESAMPLE SYSTEM (2500, 'ROWS'); if we chose to extend
 the SYSTEM sampling method, that would be also doable.
 
 The reason for this is that I don't want to really limit too much what
 parameters can be send to sampling (I envision even SYSTEM_TIMED
 sampling method that will get limit as time interval for example).

Good point.


 (6) This seems slightly wrong, because of long/uint32 mismatch:

long seed = PG_GETARG_UINT32(1);

  I think uint32 would be more appropriate, no?
 
 Probably, although I need long later in the algorithm anyway.

Really? ISTM the sampler_setseed() does not really require long, uint32
would work exactly the same.


 (9) The current regression tests only use the REPEATABLE cases. I
  understand queries without this clause are RANDOM, but maybe we
  could do something like this:

 SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
   SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
 ) foo;

   Granted, there's still a small probability of false positive, but
   maybe that's sufficiently small? Or is the amount of code this
   tests negligible?
 
 Well, depending on fillfactor and limit it could be made quite reliable
 I think, I also want to add test with VIEW (I think v2 has a bug there)
 and perhaps some JOIN.

OK.


 (10) In the initial patch you mentioned it's possible to write custom
   sampling methods. Do you think a CREATE TABLESAMPLE METHOD,
   allowing custom methods implemented as extensions would be useful?

 
 Yes, I think so, CREATE/DROP TABLESAMPLE METHOD is on my TODO, but since
 that's just simple mechanical work with no hard problems to solve there
 I can add it later once we have agreement on the general approach of the
 patch (especially in the terms of extensibility).

OK, good to know.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2014-12-22 Thread Jaime Casanova
On Thu, Dec 18, 2014 at 7:14 AM, Petr Jelinek p...@2ndquadrant.com wrote:
 Hi,

 v2 version of this patch is attached.


a few more tests revealed that passing null as the sample size
argument works, and it shouldn't.
in repeatable it gives an error if i use null as argument but it gives
a syntax error, and it should be a data exception (data exception --
invalid repeat argument in a sample clause) according to the standard

also you need to add CHECK_FOR_INTERRUPTS somewhere, i tried with a
big table and had to wait a long time for it to finish


regression=# select count(1) from tenk1 tablesample system (null);
 count
---
28
(1 row)

regression=# select count(1) from tenk1 tablesample bernoulli (null);
 count
---
 0
(1 row)


-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2014-12-22 Thread Petr Jelinek

On 22/12/14 20:14, Jaime Casanova wrote:

On Thu, Dec 18, 2014 at 7:14 AM, Petr Jelinek p...@2ndquadrant.com wrote:

Hi,

v2 version of this patch is attached.



a few more tests revealed that passing null as the sample size
argument works, and it shouldn't.


Fixed.


in repeatable it gives an error if i use null as argument but it gives
a syntax error, and it should be a data exception (data exception --
invalid repeat argument in a sample clause) according to the standard


Also fixed.



also you need to add CHECK_FOR_INTERRUPTS somewhere, i tried with a
big table and had to wait a long time for it to finish


Ah yeah, I can't rely on CHECK_FOR_INTERRUPTS in ExecScan because it 
might take a while to fetch a row if percentage is very small and table 
is big... Fixed.



Attached is v3 which besides the fixes mentioned above also includes 
changes discussed with Tomas (except the CREATE/DROP TABLESAMPLE 
METHOD), fixes for crash with FETCH FIRST and is rebased against current 
master.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 01d24a5..250ae29 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -49,7 +49,7 @@ SELECT [ ALL | DISTINCT [ ON ( replaceable class=parameterexpression/replac
 
 phrasewhere replaceable class=parameterfrom_item/replaceable can be one of:/phrase
 
-[ ONLY ] replaceable class=parametertable_name/replaceable [ * ] [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
+[ ONLY ] replaceable class=parametertable_name/replaceable [ * ] [ TABLESAMPLE replaceable class=parametersampling_method/replaceable ( replaceable class=parameterargument/replaceable [, ...] ) [ REPEATABLE ( replaceable class=parameterseed/replaceable ) ] ] [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
 [ LATERAL ] ( replaceable class=parameterselect/replaceable ) [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ]
 replaceable class=parameterwith_query_name/replaceable [ [ AS ] replaceable class=parameteralias/replaceable [ ( replaceable class=parametercolumn_alias/replaceable [, ...] ) ] ]
 [ LATERAL ] replaceable class=parameterfunction_name/replaceable ( [ replaceable class=parameterargument/replaceable [, ...] ] )
@@ -317,6 +317,38 @@ TABLE [ ONLY ] replaceable class=parametertable_name/replaceable [ * ]
  /varlistentry
 
  varlistentry
+  termTABLESAMPLE replaceable class=parametersampling_method/replaceable ( replaceable class=parameterargument/replaceable [, ...] ) [ REPEATABLE ( replaceable class=parameterseed/replaceable ) ]/term
+  listitem
+   para
+Table sample clause after
+replaceable class=parametertable_name/replaceable indicates that
+a replaceable class=parametersampling_method/replaceable should
+be used to retrieve subset of rows in the table.
+The replaceable class=parametersampling_method/replaceable can be
+one of:
+itemizedlist
+ listitem
+  paraliteralSYSTEM/literal/para
+ /listitem
+ listitem
+  paraliteralBERNOULLI/literal/para
+ /listitem
+/itemizedlist
+Both of those sampling methods currently accept only single argument
+which is the percent (floating point from 0 to 100) of the rows to
+be returned.
+The literalSYSTEM/literal sampling method does block level
+sampling with each block having same chance of being selected and
+returns all rows from each selected block.
+The literalBERNOULLI/literal scans whole table and returns
+individual rows with equal probability.
+The optional numeric parameter literalREPEATABLE/literal is used
+as random seed for sampling.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termreplaceable class=parameteralias/replaceable/term
   listitem
para
diff --git a/src/backend/access/Makefile b/src/backend/access/Makefile
index 21721b4..595737c 100644
--- a/src/backend/access/Makefile
+++ b/src/backend/access/Makefile
@@ -8,6 +8,7 @@ subdir = src/backend/access
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist transam
+SUBDIRS	= brin common gin gist hash heap index nbtree rmgrdesc spgist \
+			  transam tsm
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/tsm/Makefile b/src/backend/access/tsm/Makefile
new file mode 100644
index 000..73bbbd7
--- /dev/null
+++ b/src/backend/access/tsm/Makefile
@@ -0,0 +1,17 @@

Re: [HACKERS] TABLESAMPLE patch

2014-12-21 Thread Tomas Vondra
Hi,

On 18.12.2014 13:14, Petr Jelinek wrote:
 Hi,
 
 v2 version of this patch is attached.

I did a review of this v2 patch today. I plan to do a bit more testing,
but these are my comments/questions so far:

(0) There's a TABLESAMPLE page at the wiki, not updated since 2012:

https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation

We should either update it or mark it as obsolete I guess. Also,
I'd like to know what's the status regarding the TODO items
mentioned there. Are those still valid with this patch?

(1) The patch adds a new catalog, but does not bump CATVERSION.

(2) The catalog naming (pg_tablesamplemethod) seems a bit awkward,
as it squishes everything into a single chunk. That's inconsistent
with naming of the other catalogs. I think pg_table_sample_method
would be better.

(3) There are a few more strange naming decisions, but that's mostly
because of the SQL standard requires that naming. I mean SYSTEM and
BERNOULLI method names, and also the fact that the probability is
specified as 0-100 value, which is inconsistent with other places
(e.g. percentile_cont uses the usual 0-1 probability notion). But
I don't think this can be fixed, that's what the standard says.

(4) I noticed there's an interesting extension in SQL Server, which
allows specifying PERCENT or ROWS, so you can say

  SELECT * FROM table TABLESAMPLE SYSTEM (25 PERCENT);

or

  SELECT * FROM table TABLESAMPLE SYSTEM (2500 ROWS);

That seems handy, and it'd make migration from SQL Server easier.
What do you think?

(5) I envision a lot of confusion because of the REPEATABLE clause.
With READ COMMITTED, it's not really repeatable because of changes
done by the other users (and maybe things like autovacuum). Shall
we mention this in the documentation?

(6) This seems slightly wrong, because of long/uint32 mismatch:

  long seed = PG_GETARG_UINT32(1);

I think uint32 would be more appropriate, no?

(7) NITPICKING: I think a 'sample_rate' would be a better name here:

  double percent = sampler-percent;

(8) NITPICKING: InitSamplingMethod contains a command with ';;'

  fcinfo.arg[i] = (Datum) 0;;

(9) The current regression tests only use the REPEATABLE cases. I
understand queries without this clause are RANDOM, but maybe we
could do something like this:

   SELECT COUNT(*) BETWEEN 5000 AND 7000 FROM (
 SELECT id FROM test_tablesample TABLESAMPLE SYSTEM (50)
   ) foo;

 Granted, there's still a small probability of false positive, but
 maybe that's sufficiently small? Or is the amount of code this
 tests negligible?

(10) In the initial patch you mentioned it's possible to write custom
 sampling methods. Do you think a CREATE TABLESAMPLE METHOD,
 allowing custom methods implemented as extensions would be useful?

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TABLESAMPLE patch

2014-12-21 Thread Michael Paquier
On Mon, Dec 22, 2014 at 2:38 AM, Tomas Vondra t...@fuzzy.cz wrote:
 (1) The patch adds a new catalog, but does not bump CATVERSION.
FWIW, this part is managed by the committer when this patch is picked up.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >