from:"Alexander"

On Sun, Sep 14, 2014 at 7:39 AM, Peter Geoghegan p...@heroku.com wrote:

 On Fri, Sep 12, 2014 at 2:19 PM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  Actually, higher cardinality skip columns is better. Sorting of smaller
  groups is faster than sorting larger groups of same size. Also, with
 smaller
  groups you achieve limit more accurate (in average), i.e. sort smaller
  amount of total rows.

 Higher cardinality leading key columns are better for what? Do you
 mean they're better in that those cases are more sympathetic to your
 patch, or better in the general sense that they'll perform better for
 the user? The first example query, that you chose to demonstrate your
 patch had a leading, indexed column (column v1) with much lower
 cardinality/n_distinct than the column that had to be sorted on
 (column v2). That was what my comments were based on.

 When this feature is used, there will often be a significantly lower
 cardinality in the leading, indexed column (as in your example).
 Otherwise, the user might well have been happy to just order on the
 indexed/leading column alone. That isn't definitely true, but it's
 often true.


I just meant higher cardinality is cheaper for do partial sort. We could
have some misunderstood because of notions high and low are relative.
When cardinality is 1 then partial sort seems to be useless. When
cardinality is few then it still could be cheaper to do sequential scan +
sort rather than index scan + partial sort. When cardinality is close to
number of rows then as you mentioned user probably don't need to sort by
rest of columns. At least one exception is when user needs to force
uniqueness of order. So, we need to target something in the middle of this
two corner cases.


  I'm not sure if that's worth it to more or less
  duplicate heap_tuple_attr_equals() to save a mere n expensive
  comparisons, but it's something to think about (actually, there are
  probably less than even n comparisons in practice because there'll be
  a limit).
 
  Not correct. Smaller groups are not OK. Imagine that two representations
 of
  same skip column value exists. Index may return them in any order, even
  change them one by one. In this case sorting on other column never takes
  place, while it should.

 I think I explained this badly - it wouldn't be okay to make the
 grouping smaller, but as you say we could tie-break with a proper
 B-Tree support function 1 comparison to make sure we really had
 reached the end of our grouping.

 FWIW I want all bttextcmp()/varstr_cmp() comparisons to try a memcmp()
 first, so the use of the equality operator with text in mind that you
 mention may soon not be useful at all. The evidence suggests that
 memcmp() is so much cheaper than special preparatory NUL-termination +
 strcoll() that we should always try it first when sorting text, even
 when we have no good reason to think it will work. The memcmp() is
 virtually free. And so, you see why it might be worth thinking about
 this when we already have reasonable confidence that many comparisons
 will indicate that datums are equal. Other datatypes will have
 expensive real comparators, not just text or numeric.


When strings are not equal bttextcmp still needs to use strcoll while
texteq doesn't need that. So, it would be still advantage of using equality
operator over comparison function: equality operator don't have to compare
unequal values. However, real cost of this advantage depends on presorted
columns cardinality as well.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

On Sun, Sep 14, 2014 at 9:32 AM, Peter Geoghegan p...@heroku.com wrote:

 I think we might be better off if a tuplesort function was called
 shortly after tuplesort_begin_heap() is called. How top-n heap sorts
 work is something that largely lives in tuplesort's head. Today, we
 call tuplesort_set_bound() to hint to tuplesort By the way, this is a
 top-n heap sort applicable sort. I think that with this patch, we
 should then hint (where applicable) by the way, you won't actually be
 required to sort those first n indexed attributes; rather, you can
 expect to scan those in logical order. You may work the rest out
 yourself, and may be clever about exploiting the sorted-ness of the
 first few columns. The idea of managing a bunch of tiny sorts from
 with ExecSort(), and calling the new function tuplesort_reset() seems
 questionable. tuplesortstate is supposed to be private/opaque to
 nodeSort.c, and the current design strains that.

 I'd like to keep nodeSort.c simple. I think it's pretty clear that the
 guts of this do not belong within ExecSort(), in any case. Also, the
 additions there should be much better commented, wherever they finally
 end up.


As I understand, you propose to incapsulate partial sort algorithm into
tuplesort. However, in this case we anyway need some significant change of
its interface: let tuplesort decide when it's able to return tuple.
Otherwise, we would miss significant part of LIMIT clause optimization.
tuplesort_set_bound() can't solve all the cases. There could be other
planner nodes between the partial sort and LIMIT.

--
With best regards,
Alexander Korotkov.

[HACKERS] Triconsistent catalog declaration

Hi!

9.4 GIN introduces new triconsistent method which can return one of three
values in spite of just consistent method. But in catalog declaration
triconsistent still returns bool type. It doesn't affect anything because
nobody calls triconsistent from SQL. But I think it would be correct to
declare them returns int2.

--
With best regards,
Alexander Korotkov.


tri-consistent-catalog.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Collation-aware comparisons in GIN opclasses

Hackers,

some GIN opclasses uses collation-aware comparisons while they don't need
to do especially collation-aware comparison. Examples are text[] and hstore
opclasses. Depending on collation this may make them a much slower.

See example.

# show lc_collate ;
 lc_collate
─
 ru_RU.UTF-8
(1 row)

# create table test as (select array_agg(i::text) from
generate_series(1,100) i group by (i-1)/10);
SELECT 10

# create index test_idx on test using gin(array_agg);
CREATE INDEX
Time: *26930,423 ms*

# create index test_idx2 on test using gin(array_agg collate C);
CREATE INDEX
Time: *5143,682 ms*

Index creation with collation ru_RU.UTF-8 is 5 times slower while
collation has absolutely no effect on index functionality.

However, we can just replace comparison function for those opclasses
because it would break binary compatibility for pg_upgrade. I see following
solution:

   1. Rename such opclasses and make them not default.
   2. Create new default opclasses with bitwise comparison functions.
   3. Write recommendation to re-create indexes with default opclasses into
   documentation.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Collation-aware comparisons in GIN opclasses

On Mon, Sep 15, 2014 at 10:51 PM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  some GIN opclasses uses collation-aware comparisons while they don't
 need to
  do especially collation-aware comparison. Examples are text[] and hstore
  opclasses. Depending on collation this may make them a much slower.

 I'm glad that I saw how pointless this was with the jsonb GIN default
 opclass during development.

  Rename such opclasses and make them not default.
  Create new default opclasses with bitwise comparison functions.
  Write recommendation to re-create indexes with default opclasses into
  documentation.

 I certainly think this should be fixed if at all possible, but I'm not
 sure about this plan. Can we really rename an opclass without
 consequence, including having that respected across pg_upgrade?


Just rename doesn't seem to be safe. Since pg_upgrade uses pg_dump, all
indexes are linked to opclasses using their names. Therefore existed
indexes will be linked to new opclasses. It's likely we need to execute SQL
script renaming opclasses before pg_upgrade. Another option is to don't
rename old opclasses, just create new default opclasses with new names.
Bruce, what is your opinion about pg_upgrade?
Contrib opclasses would be safe to rename using migration script.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] KNN-GiST with recheck

2014-09-14 Thread Alexander Korotkov

On Sun, Sep 14, 2014 at 10:09 PM, Emre Hasegeli e...@hasegeli.com wrote:

 I added the point to polygon distance operator patch to the open
 CommitFest as ready for committer and added myself as reviewer to
 both of the patches.


Thanks.

 Cost estimation of GiST is a big problem anyway. It doesn't care (and
  can't) about amount of recheck for regular operators. In this patch, same
  would be for knn recheck. The problem is that touching heap from access
  method breaks incapsulation. One idea about this is to do sorting in
  another nodes. However, I wonder if it would be an overengineering and
  overhead. In attached patch I propose a different approach: put code
  touching heap into separate index_get_heap_values function. Also new
  version of patch includes regression tests and some cleanup.

 While looking it at I found a bug.  It returns the second column
 in wrong order when both of the distance functions return recheck = true.
 Test script attached to run on the regression database.  I tried to
 fix but could not.  searchTreeItemDistanceRecheck function is not
 very easy to follow.  I think it deserves more comments.


Fixed, thanks. It was logical error in comparison function implementation.

--
With best regards,
Alexander Korotkov.


knn-gist-recheck-4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-09-13 Thread Alexander Korotkov

 analyze select x,y,z from b where z=1 order by x,y limit 10;
QUERY PLAN
──
 Limit  (cost=169374.43..263471.04 rows=10 width=12) (actual
time=2237.082..2237.083 rows=10 loops=1)
   -  Partial sort  (cost=169374.43..338748.34 rows=18 width=12) (actual
time=2237.082..2237.083 rows=10 loops=1)
 Sort Key: x, y
 Presorted Key: x
 Sort Method: quicksort  Memory: 25kB
 -  Index Scan using b_x_idx on b  (cost=0.43..338748.13 rows=18
width=12) (actual time=0.047..2237.062 rows=10 loops=1)
   Filter: (z = 1)
   Rows Removed by Filter: 990
 Planning time: 0.089 ms
 Execution time: 2237.133 ms
(10 rows)

AFAICS wrong selectivity estimations are general problem which cause
optimizer failures. But in your example x+y=1 if expression index on
x+y would exist then statistics over x+y will be collected. So, in case
of expression index estimation will be fine.

--
With best regards,
Alexander Korotkov.


partial-sort-basic-2.patch
Description: Binary data


partial-sort-merge-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] jsonb contains behaviour weirdness

2014-09-12 Thread Alexander Korotkov

Hi!

Let's consider some examples.

# select '[1,2]'::jsonb @ '[1,2,2]'::jsonb;
 ?column?
--
 f
(1 row)

One may think it's because second jsonb array contain two 2. So, contains
takes care about count of equal elements.

# select '[1,1,2]'::jsonb @ '[1,2,2]'::jsonb;
 ?column?
--
 t
(1 row)

But, it's not. Jsonb contains takes care only about length of array.

# select '[[1,2]]'::jsonb @ '[[1,2,2]]'::jsonb;
 ?column?
--
 t
(1 row)

Even more weird :)
The reason why jsonb contains behaves so is check in the beginning
of jsonb_contains. It makes fast check of jsonb type and elements count
before calling JsonbDeepContains.

if (JB_ROOT_COUNT(val)  JB_ROOT_COUNT(tmpl) ||
JB_ROOT_IS_OBJECT(val) != JB_ROOT_IS_OBJECT(tmpl))
PG_RETURN_BOOL(false);

It's likely that JB_ROOT_COUNT(val)  JB_ROOT_COUNT(tmpl) should be
checked only for objects, not arrays. Also, should JsonbDeepContains does
same fast check when it deals with nested objects?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] jsonb contains behaviour weirdness

2014-09-12 Thread Alexander Korotkov

On Fri, Sep 12, 2014 at 10:38 PM, Peter Geoghegan p...@heroku.com wrote:

 On Fri, Sep 12, 2014 at 11:21 AM, Josh Berkus j...@agliodbs.com wrote:
  # select '[1,1,2]'::jsonb @ '[1,2,2]'::jsonb;
   ?column?
  --
   t
  (1 row)
 
  But, it's not. Jsonb contains takes care only about length of array.
 
  OK, now, that's messed up.

 To be clear: I don't think that this example is messed up (in
 isolation). I think it's the correct behavior. What I find
 objectionable is the inconsistency. I believe that this is Alexander's
 concern too. Alexander's first example exhibits broken behavior.


Agree. I just tried to explain how current behaviour could look for user
who sees it for the first time.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-09-12 Thread Alexander Korotkov

On Sun, Jul 13, 2014 at 6:45 AM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Feb 10, 2014 at 10:59 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  Done. Patch is splitted.

 I took a quick look at this.

 Have you thought about making your new cmpSortSkipCols() function not
 use real comparisons? Since in the circumstances in which this
 optimization is expected to be effective (e.g. your original example)
 we can also expect a relatively low cardinality for the first n
 indexed attributes (again, as in your original example), in general
 when cmpSortSkipCols() is called there is a high chance that it will
 return true. If any pair of tuples (logically adjacent tuples fed in
 to cmpSortSkipCols() by an index scan in logical order) are not fully
 equal (i.e. their leading, indexed attributes are not equal) then we
 don't care about the details -- we just know that a new sort grouping
 is required.


Actually, higher cardinality skip columns is better. Sorting of smaller
groups is faster than sorting larger groups of same size. Also, with
smaller groups you achieve limit more accurate (in average), i.e. sort
smaller amount of total rows.


 The idea here is that you can get away with simple binary equality
 comparisons, as we do when considering HOT-safety. Of course, you
 might find that two bitwise unequal values are equal according to
 their ordinary B-Tree support function 1 comparator (e.g. two numerics
 that differ only in their display scale). AFAICT this should be okay,
 since that just means that you have smaller sort groupings than
 strictly necessary. I'm not sure if that's worth it to more or less
 duplicate heap_tuple_attr_equals() to save a mere n expensive
 comparisons, but it's something to think about (actually, there are
 probably less than even n comparisons in practice because there'll be
 a limit).


Not correct. Smaller groups are not OK. Imagine that two representations of
same skip column value exists. Index may return them in any order, even
change them one by one. In this case sorting on other column never takes
place, while it should. But some optimizations are still possible:

   1. Use bitwise comparison first, then recheck. But, no guarantees that
   acceleration will be achieved.
   2. Use equality check instead of btree comparison. For text datatype
   it would be rather faster because of no locale-aware comparison.


--
With best regards,
Alexander Korotkov.

Re: [HACKERS] gist vacuum gist access

2014-09-08 Thread Alexander Korotkov

On Mon, Sep 8, 2014 at 11:13 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 09/07/2014 05:11 PM, Костя Кузнецов wrote:

 hello.
 i recode vacuum for gist index.
 all tests is ok.
 also i test vacuum on table size 2 million rows. all is ok.
 on my machine old vaccum work about 9 second. this version work about 6-7
 sec .
 review please.


 If I'm reading this correctly, the patch changes gistbulkdelete to scan
 the index in physical order, while the old code starts from the root and
 scans the index from left to right, in logical order.

 Scanning the index in physical order is wrong, if any index pages are
 split while vacuum runs. A page split could move some tuples to a
 lower-numbered page, so that the vacuum will not scan those tuples.

 In the b-tree code, we solved that problem back in 2006, so it can be done
 but requires a bit more code. In b-tree, we solved it with a vacuum cycle
 ID number that's set on the page halves when a page is split. That allows
 VACUUM to identify pages that have been split concurrently sees them, and
 jump back to vacuum them too. See commit http://git.postgresql.org/
 gitweb/?p=postgresql.git;a=commit;h=5749f6ef0cc1c67ef9c9ad2108b3d9
 7b82555c80. It should be possible to do something similar in GiST, and in
 fact you might be able to reuse the NSN field that's already set on the
 page halves on split, instead of adding a new vacuum cycle ID.


Idea is right. But in fact, does GiST ever recycle any page? It has
F_DELETED flag, but ISTM this flag is never set. So, I think it's possible
that this patch is working correctly. However, probably GiST sometimes
leaves new page unused due to server crash.
Anyway, I'm not fan of committing patch in this shape. We need to let GiST
recycle pages first, then implement VACUUM similar to b-tree.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] gist vacuum gist access

2014-09-08 Thread Alexander Korotkov

On Mon, Sep 8, 2014 at 12:08 PM, Alexander Korotkov aekorot...@gmail.com
wrote:

 On Mon, Sep 8, 2014 at 11:13 AM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 09/07/2014 05:11 PM, Костя Кузнецов wrote:

 hello.
 i recode vacuum for gist index.
 all tests is ok.
 also i test vacuum on table size 2 million rows. all is ok.
 on my machine old vaccum work about 9 second. this version work about
 6-7 sec .
 review please.


 If I'm reading this correctly, the patch changes gistbulkdelete to scan
 the index in physical order, while the old code starts from the root and
 scans the index from left to right, in logical order.

 Scanning the index in physical order is wrong, if any index pages are
 split while vacuum runs. A page split could move some tuples to a
 lower-numbered page, so that the vacuum will not scan those tuples.

 In the b-tree code, we solved that problem back in 2006, so it can be
 done but requires a bit more code. In b-tree, we solved it with a vacuum
 cycle ID number that's set on the page halves when a page is split. That
 allows VACUUM to identify pages that have been split concurrently sees
 them, and jump back to vacuum them too. See commit
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=
 5749f6ef0cc1c67ef9c9ad2108b3d97b82555c80. It should be possible to do
 something similar in GiST, and in fact you might be able to reuse the NSN
 field that's already set on the page halves on split, instead of adding a
 new vacuum cycle ID.


 Idea is right. But in fact, does GiST ever recycle any page? It has
 F_DELETED flag, but ISTM this flag is never set. So, I think it's possible
 that this patch is working correctly. However, probably GiST sometimes
 leaves new page unused due to server crash.
 Anyway, I'm not fan of committing patch in this shape. We need to let GiST
 recycle pages first, then implement VACUUM similar to b-tree.


Another note. Assuming we have NSN which can play the role of vacuum cycle
ID, can we implement sequential (with possible jump back) index scan for
GiST?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] gist vacuum gist access

2014-09-08 Thread Alexander Korotkov

On Mon, Sep 8, 2014 at 12:51 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 09/08/2014 11:19 AM, Alexander Korotkov wrote:

 On Mon, Sep 8, 2014 at 12:08 PM, Alexander Korotkov aekorot...@gmail.com
 
 wrote:

  On Mon, Sep 8, 2014 at 11:13 AM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

  In the b-tree code, we solved that problem back in 2006, so it can be
 done but requires a bit more code. In b-tree, we solved it with a
 vacuum
 cycle ID number that's set on the page halves when a page is split.
 That
 allows VACUUM to identify pages that have been split concurrently sees
 them, and jump back to vacuum them too. See commit
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=
 5749f6ef0cc1c67ef9c9ad2108b3d97b82555c80. It should be possible to do
 something similar in GiST, and in fact you might be able to reuse the
 NSN
 field that's already set on the page halves on split, instead of adding
 a
 new vacuum cycle ID.

 ...


 Another note. Assuming we have NSN which can play the role of vacuum
 cycle
 ID, can we implement sequential (with possible jump back) index scan
 for
 GiST?


 Yeah, I think it would work. It's pretty straightforward, the page split
 code already sets the NSN, just when we need it. Vacuum needs to memorize
 the current NSN when it begins, and whenever it sees a page with a higher
 NSN (or the FOLLOW_RIGHT flag is set), follow the right-link if it points
 to lower-numbered page.


I mean full index scan feature for SELECT queries might be implemented as
well as sequential VACUUM.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-08 Thread Alexander Korotkov

On Fri, Aug 8, 2014 at 9:14 PM, Teodor Sigaev teo...@sigaev.ru wrote:

 Curious idea: we could swap JEntry array and values: values in the
 begining of type will be catched by pg_lzcompress. But we will need to
 know offset of JEntry array, so header will grow up till 8 bytes
 (actually, it will be a varlena header!)


 May be I wasn't clear:jsonb type will start from string collection instead
 of JEntry array, JEntry array will be placed at the end of object/array.
 so, pg_lzcompress will find repeatable 4-byte pieces in first 1024 bytes of
 jsonb.


Another idea I have is that store offset in each JEntry is not necessary to
have benefit of binary search. Namely what if we store offsets in each 8th
JEntry and length in others? The speed of binary search will be about the
same: overhead is only calculation offsets in the 8-entries chunk. But
lengths will probably repeat.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] KNN-GiST with recheck

2014-08-04 Thread Alexander Korotkov

On Sun, Aug 3, 2014 at 5:48 PM, Emre Hasegeli e...@hasegeli.com wrote:

   1. This patch introduces a new polygon - point operator. That seems
   useful on its own, with or without this patch.
 
  Yeah, but exact-knn cant come with no one implementation. But it would
  better come in a separate patch.

 I tried to split them.  Separated patches are attached.  I changed
 the order of the arguments as point - polygon, because point was
 the first one on all the others.  Its commutator was required for
 the index, so I added it on the second patch.  I also added tests
 for the operator.  I think it is ready for committer as a separate
 patch.  We can add it to the open CommitFest.

 I have made some cosmetic changes on the patches.  I hope they are
 useful.

 I added support to point - circle operator with the same GiST
 distance function you added for polygon.  I can change it, if it is not
 the right way.


Great, thanks!

   2. I wonder how useful it really is to allow mixing exact and non-exact

return values from the distance function. The distance function
 included in
   the patch always returns recheck=true. I have a feeling that all other
   distance function will also always return either true or false.
 
  For geometrical datatypes recheck variations in consistent methods are
 also
  very rare (I can't remember any). But imagine opclass for arrays where
 keys
  have different representation depending on array length. For such opclass
  and knn on similarity recheck flag could be useful.

 I also wonder how useful it is.  Your example is convincing, but maybe
 setting it index-wide will make the decisions on the framework easier.
 For example, how hard would it be to decide if further sorting is
 required or not on the planner?


I think that for most use cases just some operators require further sorting
and some of them not. But it could appear one day that some index gives
part of its knn answers exact and part of them inexact. Same happen to
recheck of regular operators. Initially recheck flag was defined in
opclass. But later recheck became runtime flag.

  4. (as you mentioned in the other thread: ) It's a modularity violation
   that you peek into the heap tuple from gist. I think the proper way to
 do
   this would be to extend the IndexScan executor node to perform the
   re-shuffling of tuples that come from the index in wrong order, or
 perhaps
   add a new node type for it.
  
   Of course that's exactly what your partial sort patch does :-). I
 haven't
   looked at that in detail, but I don't think the approach the partial
 sort
   patch takes will work here as is. In the KNN-GiST case, the index is
   returning tuples roughly in the right order, but a tuple that it
 returns
   might in reality belong somewhere later in the ordering. In the partial
   sort patch, the input stream of tuples is divided into
 non-overlapping
   groups, so that the tuples within the group are not sorted, but the
 groups
   are. I think the partial sort case is a special case of the KNN-GiST
 case,
   if you consider the lower bound of each tuple to be the leading keys
 that
   you don't need to sort.
 
  Yes. But, for instance btree accesses heap for unique checking. Is really
  it so crimilal? :-)
  This is not only question of a new node or extending existing node. We
 need
  to teach planner/executor access method can return value of some
 expression
  which is lower bound of another expression. AFICS now access method can
  return only original indexed datums and TIDs. So, I afraid that enormous
  infrastructure changes are required. And I can hardly imagine what they
  should look like.

 Unfortunately, I am not experienced enough to judge your implementation.
 As far as I understand the problem is partially sorting rows on
 the index scan node.  It can lead the planner to choose non-optimal
 plans, because of not taking into account the cost of sorting.


Cost estimation of GiST is a big problem anyway. It doesn't care (and
can't) about amount of recheck for regular operators. In this patch, same
would be for knn recheck. The problem is that touching heap from access
method breaks incapsulation. One idea about this is to do sorting in
another nodes. However, I wonder if it would be an overengineering and
overhead. In attached patch I propose a different approach: put code
touching heap into separate index_get_heap_values function. Also new
version of patch includes regression tests and some cleanup.

--
With best regards,
Alexander Korotkov.


knn-gist-recheck-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Distance from point to box

2014-07-30 Thread Alexander Korotkov

Hackers,

while reading code written by my GSoC student for knn-spgist I faced many
questions about calculation distance from point to box.

1) dist_pb calls close_pb which calls on_pb, dist_ps_internal, close_ps and
so on. So, it's very complex way to calculate very simple value. I see this
way has some mathematical beauty, but is it acceptable in practice?

2) computeDistance in knn-gist uses it's own quite simplified
implementation of this calculation. But why has it own implementation
instead on simplifying dist_pb?

3) computeDistance implementation looks still very complex for me. Right
way (simple and fast) to calculate point to box distance seems for me to be
so:

double dx = 0.0, dy = 0.0;

if (point-x  box-low.x)
dx = box-low.x - point-x;
if (point-x  box-high.x)
dx = point-x - box-high.x;
if (point-y  box-low.y)
dy = box-low.y - point-y;
if (point-y  box-high.y)
dy = point-y - box-high.y;
return HYPOT(dx, dy);

I feel myself quite tangled.
Could anybody clarify it for me? Did I miss something? Thanks.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Distance from point to box

2014-07-30 Thread Alexander Korotkov

On Wed, Jul 30, 2014 at 4:06 PM, Fabien COELHO coe...@cri.ensmp.fr wrote:


  double dx = 0.0, dy = 0.0;

 if (point-x  box-low.x)
dx = box-low.x - point-x;
 if (point-x  box-high.x)
dx = point-x - box-high.x;
 if (point-y  box-low.y)
dy = box-low.y - point-y;
 if (point-y  box-high.y)
dy = point-y - box-high.y;
 return HYPOT(dx, dy);

 I feel myself quite tangled.
 Could anybody clarify it for me? Did I miss something? Thanks.


 ISTM that you miss the projection on the segment if dx=0 or dy=0.


I don't need to find projection itself, I need only distance. When dx = 0
then nearest point is on horizontal line of box, so distance to it is dy.
Same when dy = 0. When both of them are 0 then point is in the box.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Distance from point to box

2014-07-30 Thread Alexander Korotkov

On Wed, Jul 30, 2014 at 7:26 PM, Fabien COELHO coe...@cri.ensmp.fr wrote:


  ISTM that you miss the projection on the segment if dx=0 or dy=0.


 I don't need to find projection itself, I need only distance. When dx = 0
 then nearest point is on horizontal line of box, so distance to it is dy.
 Same when dy = 0. When both of them are 0 then point is in the box.


 Indeed. I thought that the box sides where not parallel to the axis, but
 they are. So I do not see why it should be more complex. Maybe they is a
 general algorithm which works for polygons, and they use it for boxes?


Yeah, this answers question #1, but not #2 and #3 :)

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] 9.5 CF1

2014-07-02 Thread Alexander Korotkov

On Wed, Jul 2, 2014 at 10:08 AM, Abhijit Menon-Sen a...@2ndquadrant.com
wrote:

 KNN-GiST with recheck
 Partial sort
 Some earlier discussion, but no conclusions, and as far as I know
 nobody is working on these patches at the moment.


I'm now working on next revision of KNN-GiST with recheck. Also, I think
partial sort need a look of somebody more aware of planner than me and
Marti.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Negative imact of maintenance_work_mem to GIN size

2014-05-20 Thread Alexander Korotkov

On Tue, May 20, 2014 at 4:50 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 I found that sometimes larger maintenance_work_mem leads to larger GIN
 index. That is quite strange. ISTM that it's related to posting lists
 compression but I can't figure out how exactly it is.


It appears to be not related to posting lists compression. I did reproduce
it on 9.2.

create table test as (select array[(random()*1000)::int]::int[] as v
from generate_series(1,1000) g);
set maintenance_work_mem = '2GB';
create index test_idx1 on test using gin(v);
set maintenance_work_mem = '16MB';
create index test_idx2 on test using gin(v);

 Schema |Name| Type  |  Owner   |Table|
 Size   | Description
++---+--+-+-+-
 public | test_idx1  | index | smagen   | test| 392
MB  |
 public | test_idx2  | index | smagen   | test| 268
MB  |
(2 rows)

The reason of it is that we filling entry tree with inserting without any
special optimization. Large maintenance_work_mem gives us ascending order
of insertion. Thus insertion is performed always into rightmost page
leaving rest of pages half-empty. Small maintenance_work_mem gives us more
random order of insertion. Such insertions makes pages about 75% filled in
average. Posting trees has special optimization for this case while entry
tree doesn't.

--
With best regards,
Alexander Korotkov.

[HACKERS] Negative imact of maintenance_work_mem to GIN size

2014-05-19 Thread Alexander Korotkov

Hi!

I found that sometimes larger maintenance_work_mem leads to larger GIN
index. That is quite strange. ISTM that it's related to posting lists
compression but I can't figure out how exactly it is.

See example on delicious bookmarks dataset.

http://mira.sai.msu.ru/~megera/tmp/js.copy.gz

set maintenance_work_mem = '1GB';
create index js_idx1 on js using gin(v jsonb_path_idx);
set maintenance_work_mem = '16MB';
create index js_idx2 on js using gin(v jsonb_path_ops);

List of relations
 Schema |  Name   | Type  | Owner  | Table |  Size  | Description
+-+---++---++-
 public | js_idx1 | index | smagen | js| 432 MB |
 public | js_idx2 | index | smagen | js| 309 MB |
(2 rows)

--
With best regards,
Alexander Korotkov.

[HACKERS] Lossy bitmap scan is broken in GIN

2014-05-10 Thread Alexander Korotkov

Hackers,

GIN hangs on lossy bitmap scan. Here is test case:

create extension btree_gin;
create table test as (select random() v from generate_series(1,100));
create index test_idx on test using gin(v);
set work_mem = '16MB';
select count(*) from test where v  0.9;
 count
───
 99916
(1 row)

Time: 63,142 ms

set work_mem = '64kB';
select count(*) from test where v  0.9;

The last query hangs. I've debugged it. It's another bug in this cursed
loop in entryGetItem. Fix is attached.

--
With best regards,
Alexander Korotkov.


gin_lossy_bitmap_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Clean up jsonb code.

2014-05-09 Thread Alexander Korotkov

On Thu, May 8, 2014 at 11:45 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 05/08/2014 02:25 AM, Peter Geoghegan wrote:

 findJsonbValueFromSuperHeader()'s lowbound argument
 previously served to establish a low bound for searching when
 searching for multiple keys (so the second and subsequent
 user-supplied key could skip much of the object).


 Got that.

  In the case of
 jsonb_exists_any(), say, if you only have a reasonable expectation
 that about 1 key exists, and that happens to be the last key that the
 user passed to the text[] argument (to the existence/? operator), then
 n - 1 calls to what is now findJsonbValueFromContainer() (which now
 does not accept a lowbound) are wasted.


 Check.

  That's elem_count - 1
 top-level binary searches of the entire jsonb. Or elem_count such
 calls rather than 1 call (plus 1 sort of the supplied array) in the
 common case where jsonb_exists_any() will return false.


 Ok, but I don't see any big difference in that regard. It still called
 findJsonbValueFromContainer() elem_count times, before this commit. Each
 call was somewhat cheaper, because the lower bound of the binary search was
 initialized to where the previous search ended, but you still had to
 perform the search.

  Granted, that might not be that bad right now, given that it's only
 ever (say) elem_count or elem_count - 1 wasted binary searches through
 the *top* level, but that might not always be true.


 If we are ever to perform a deep search, I think we'll want to do much
 more to optimize that than just keep track of the lower bound. Like, start
 an iterator of tree and check for all of the keys in one go.

  And even today,
 sorting a presumably much smaller user-passed lookup array once has to
 be cheaper than searching through the entire jsonb perhaps elem_count
 times per call.


 Well, the quick testing I did suggested otherwise. It's not a big
 difference, but sorting has all kinds of overhead, like pallocing the array
 to sort, copying the elements around etc. For a small array, the startup
 cost of sorting trumps the savings in the binary searches. Possibly the way
 the sorting was done was not optimal, and you could reduce the copying and
 allocations involved in that, but it's hardly worth the trouble.


With current head I can't load delicious dataset into jsonb format. I got
segfault. It looks like memory corruption.

$ gzip -c -d js.copy.gz | psql postgres -c 'copy js from stdin;'
WARNING:  problem in alloc set ExprContext: bogus aset link in block
0x14a846000, chunk 0x14a84d278
CONTEXT:  COPY js, line 246766
WARNING:  problem in alloc set ExprContext: bad single-chunk 0x14a804b18 in
block 0x14a7e6000
CONTEXT:  COPY js, line 1009820
WARNING:  problem in alloc set ExprContext: bogus aset link in block
0x14a7e6000, chunk 0x14a804b18
CONTEXT:  COPY js, line 1009820
server closed the connection unexpectedly
 This probably means the server terminated abnormally
before or while processing the request.

You can get dataset here:
http://mira.sai.msu.ru/~megera/tmp/js.copy.gz

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Sequential disk access during VACUUM for GiST/GIN

2014-05-06 Thread Alexander Korotkov

Hi!

On Tue, Apr 29, 2014 at 2:34 PM, Костя Кузнецов chapae...@yandex.ru wrote:

 There is a task Sequential disk access during VACUUM for GiST/GIN  in
 list GSOC14.
 Nobody is working on this task?


I didn't hear anybody is working on it.


 Do I understand this task correctly?
 I must recode gistbulkdelete.
 GistBDItem *stack is must have items with sequential blkno as possible.


Yes, make gistbulkdelete and ginbulkdelete access disk sequentially while
now tree is traversed in logical order. So these functions need to be
completely reworked: I'm not sure GistBDItem will survive :)
The challenge is concurrency. Vacuum shouldn't block concurrent readers and
writers. You can see btbulkdelete  which supports sequential disk access
now.


 I have a question:
 where are access to disk in this function? ReadBufferExtended?


Yes, this function read buffer to shared memory (if it isn't already) and
pins it.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] 9.4 Proposal: Initdb creates a single table

2014-04-23 Thread Alexander Korotkov

On Wed, Apr 23, 2014 at 10:11 AM, Simon Riggs si...@2ndquadrant.com wrote:

 We start with a database called Postgres and a schema called Public.
 Yet we don't start up with any usable tables.

 I propose we add a single table called Postgres when we Initdb

   CREATE TABLE Postgres (Id Integer, Data Jsonb);
   COMMENT ON TABLE Postgres IS 'Single table for quick start usage -
 design your database';

 The purpose of this is to make the database immediately usable. By
 including this table in the default  initdb it will mean that programs
 can rely on the existence of this table and begin working quicker.

 By now, some of you will be doubled over laughing as if this is an
 April fool joke.  I don't mean it to be at all.


I can propose contrib PostgreNoSQL providing following:
1) Table postgres as you proposed.
2) Functions: get_postgres(id intgeger) returns jsonb, set_postgres(id
integer, data jsonb) returns void, search_postgres(query jsonb) returns
setof postgres. search_postgres will have semantics of @ jsonb operator
3) Background workers which provides HTTP wrapper over those functions.

--
With best regards,
Alexander Korotkov.

[HACKERS] Partial match fix for fast scan

2014-04-10 Thread Alexander Korotkov

Hi,

GIN partial match appears to be broken after fast scan. Following simple
test case raises assertion failure.

create extension btree_gin;
create table test as (select id, random() as val from
generate_series(1,100) id);
create index test_idx on test using gin (val);
vacuum test;
select * from test where val between 0.1 and 0.9;

Attached patch fixes bugs in entryGetItem function.
I would especially point that continue; checks while condition even if
it's postfix while. That's why I surrounded tbm_iterate with another
while.

--
With best regards,
Alexander Korotkov.


ginfastscanfix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partial match fix for fast scan

2014-04-10 Thread Alexander Korotkov

On Thu, Apr 10, 2014 at 8:22 PM, Fabrízio de Royes Mello 
fabriziome...@gmail.com wrote:


 On Thu, Apr 10, 2014 at 11:09 AM, Alexander Korotkov aekorot...@gmail.com
  wrote:

 Hi,

 GIN partial match appears to be broken after fast scan. Following simple
 test case raises assertion failure.

 create extension btree_gin;
 create table test as (select id, random() as val from
 generate_series(1,100) id);
 create index test_idx on test using gin (val);
 vacuum test;
 select * from test where val between 0.1 and 0.9;

 Attached patch fixes bugs in entryGetItem function.
 I would especially point that continue; checks while condition even
 if it's postfix while. That's why I surrounded tbm_iterate with another
 while.


 Interesting... to me (using current master) your test case doesn't fail...

 fabrizio=# select * from test where val between 0.1 and 0.9;
id   |val
 +---
   1 | 0.554413774050772
   2 | 0.767866868525743
   3 | 0.601187175605446
 ...


 But fail if I change the values of between clause:

 fabrizio=# select * from test where val between 0.1 and 0.19;
 ERROR:  tuple offset out of range: 8080



It must be compiled with --enable-cassert to fail on assertion.

--
With best regards,
Alexander Korotkov.

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Alexander Korotkov

On Wed, Apr 9, 2014 at 10:37 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 The ship has cleatly sailed to add parameterized opclasses to 9.4, but
 let's keep it in mind when we decide on the defaults.

 In the absence of parameterizable opclasses, it would be much more
 flexible to have opclasses that index, keys, values, key-value pairs and
 paths separately, instead of the current json_ops and json_hash_ops
 opclasses which index all of those in the same index. That way, if you only
 e.g. ever query on the existence of a key, you'd only need to index the
 keys.

 I don't understand how we ended up with the current dichotomy of json_ops
 and json_hash_ops...


+1 for parameterizable opclasses

--
With best regards,
Alexander Korotkov.

Re: default opclass for jsonb (was Re: [HACKERS] Call for GIST/GIN/SP-GIST opclass documentation)

2014-04-09 Thread Alexander Korotkov

On Wed, Apr 9, 2014 at 12:40 PM, Peter Geoghegan p...@heroku.com wrote:

 On Wed, Apr 9, 2014 at 1:21 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  I didn't say that. On the contrary, I think the shotgun approach
 jsonb_ops
  and jsonb_hash_ops take is too broad. It should be possible to specify
 what
  to index in a more detailed fashion.

 It is - use an expression index. That's by far the most important way
 to specify what to index in a more detailed fashion. There are others,
 but that's the major one. Beyond that, yes, it's necessary to
 carefully write your query predicate a certain way. However, a similar
 situation exists in MongoDB, where there is a distinction between
 Indexes on embedded fields (which must be accessed using special
 dot notation) and indexes on subdocuments (which cannot be
 accessed using dot notation). It's late here, but I'm pretty sure
 that's a feature and not a limitation.


I believe that serious limitation we now have is that we actually specify
kind of index to be used in the SQL query.
For example you need to find objects with active = true. You can write:

js @ {active: true}

then GIN index on js can be used. Also you can write:

js-'active' = true

then btree expression index on (js-'active') can be used. For sure, one
can do

js @ {active: true} AND js-'active' = true

This query can use any of indexes, but it is:
1) Cluge
2) Excess recheck
3) If both indexes present, excess bitmap and.

Having to choose index in SQL-query we make our SQL more imperative and
less declarative. Similar things can happen without json/hstore (user have
to rewrite SQL in order to use expression index), but now it could become
very common. My opinion is that we have to do something in planner to make
it understand at least this two kinds of queries to be equivalent.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GSoC 2014 proposal

2014-04-03 Thread Alexander Korotkov

On Wed, Apr 2, 2014 at 2:22 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Tue, Apr 1, 2014 at 2:23 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 The BIRCH algorithm as described in the paper describes building a tree
 in memory. If I understood correctly, you're suggesting to use a pre-built
 GiST index instead. Interesting idea!

 There are a couple of signifcant differences between the CF tree
 described in the paper and GiST:

 1. In GiST, a leaf item always represents one heap tuple. In the CF tree,
 a leaf item represents a cluster, which consists of one or more tuples. So
 the CF tree doesn't store an entry for every input tuple, which makes it
 possible to keep it in memory.

 2. In the CF tree, all entries in a leaf node must satisfy a threshold
 requirement, with respect to a threshold value T: the diameter (or radius)
 has to be less than T. GiST imposes no such restrictions. An item can
 legally be placed anywhere in the tree; placing it badly will just lead to
 degraded search performance, but it's still a legal GiST tree.

 3. A GiST index, like any other index in PostgreSQL, holds entries also
 for deleted tuples, until the index is vacuumed. So you cannot just use
 information from a non-leaf node and use it in the result, as the
 information summarized at a non-leaf level includes noise from the dead
 tuples.

 Can you elaborate how you are planning to use a GiST index to implement
 BIRCH? You might also want to take a look at SP-GiST; SP-GiST is more
 strict in where in the tree an item can be stored, and lets the operator
 class to specify exactly when a node is split etc.


 Hmmm, it's likely I've imagined something quite outside of this paper, and
 even already suggested it to Ivan... :)
 I need a little time to rethink it.


Using GiST we can implement BIRCH-like clustering like so:
1) Build a CF tree as GiST index without restriction of T threshold value.
2) Scan CF tree with threshold T with some auxiliary operator. If
consistent method see CF entry which diameter is greater than T then it
returns true. Otherwise it returns false and put this CF entry into output
area (could be either in-memory or temporary table).
3) Process other steps of algorithm as usual.

This modification would have following advantages:
1) User can build GiST index once and then try clustering with different
parameters. Initial GiST index build would be slowest operation while other
steps is expected to be fast.
2) Use GiST infrastructure and automatically get buffering build.

The drawback is that building GiST index is more expensive than building
in-memory CF tree with given threshold T (assuming T is well chosen).

Does it make any sense?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GSoC 2014 proposal

2014-04-03 Thread Alexander Korotkov

On Thu, Apr 3, 2014 at 11:21 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 04/03/2014 04:15 PM, Alexander Korotkov wrote:

 On Wed, Apr 2, 2014 at 2:22 PM, Alexander Korotkov aekorot...@gmail.com
 wrote:

  On Tue, Apr 1, 2014 at 2:23 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

  The BIRCH algorithm as described in the paper describes building a tree
 in memory. If I understood correctly, you're suggesting to use a
 pre-built
 GiST index instead. Interesting idea!

 There are a couple of signifcant differences between the CF tree
 described in the paper and GiST:

 1. In GiST, a leaf item always represents one heap tuple. In the CF
 tree,
 a leaf item represents a cluster, which consists of one or more tuples.
 So
 the CF tree doesn't store an entry for every input tuple, which makes it
 possible to keep it in memory.

 2. In the CF tree, all entries in a leaf node must satisfy a threshold
 requirement, with respect to a threshold value T: the diameter (or
 radius)
 has to be less than T. GiST imposes no such restrictions. An item can
 legally be placed anywhere in the tree; placing it badly will just lead
 to
 degraded search performance, but it's still a legal GiST tree.

 3. A GiST index, like any other index in PostgreSQL, holds entries also
 for deleted tuples, until the index is vacuumed. So you cannot just use
 information from a non-leaf node and use it in the result, as the
 information summarized at a non-leaf level includes noise from the dead
 tuples.

 Can you elaborate how you are planning to use a GiST index to implement
 BIRCH? You might also want to take a look at SP-GiST; SP-GiST is more
 strict in where in the tree an item can be stored, and lets the operator
 class to specify exactly when a node is split etc.


 Hmmm, it's likely I've imagined something quite outside of this paper,
 and
 even already suggested it to Ivan... :)
 I need a little time to rethink it.


 Using GiST we can implement BIRCH-like clustering like so:
 1) Build a CF tree as GiST index without restriction of T threshold value.
 2) Scan CF tree with threshold T with some auxiliary operator. If
 consistent method see CF entry which diameter is greater than T then it
 returns true. Otherwise it returns false and put this CF entry into output
 area (could be either in-memory or temporary table).
 3) Process other steps of algorithm as usual.


 I still don't understand how that would work. You can't trust the non-leaf
 entries, because their CF value can contain deleted entries. So you have to
 scan every tuple anyway. Once you do that, what's the point of the index?


Yeah, it is limitation of this idea. It's not going to be auto-updatable
CF. User can build index on freshly vacuumed table and play with clustering
some time. Updates on table during that time would be either allowed
clustering error or prohibited. Another potential solution is to let this
index to hold some snapshot of data. But it seems not possible to do in
extension now.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GSoC 2014 proposal

2014-04-02 Thread Alexander Korotkov

On Tue, Apr 1, 2014 at 2:23 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:

 The BIRCH algorithm as described in the paper describes building a tree in
 memory. If I understood correctly, you're suggesting to use a pre-built
 GiST index instead. Interesting idea!

 There are a couple of signifcant differences between the CF tree described
 in the paper and GiST:

 1. In GiST, a leaf item always represents one heap tuple. In the CF tree,
 a leaf item represents a cluster, which consists of one or more tuples. So
 the CF tree doesn't store an entry for every input tuple, which makes it
 possible to keep it in memory.

 2. In the CF tree, all entries in a leaf node must satisfy a threshold
 requirement, with respect to a threshold value T: the diameter (or radius)
 has to be less than T. GiST imposes no such restrictions. An item can
 legally be placed anywhere in the tree; placing it badly will just lead to
 degraded search performance, but it's still a legal GiST tree.

 3. A GiST index, like any other index in PostgreSQL, holds entries also
 for deleted tuples, until the index is vacuumed. So you cannot just use
 information from a non-leaf node and use it in the result, as the
 information summarized at a non-leaf level includes noise from the dead
 tuples.

 Can you elaborate how you are planning to use a GiST index to implement
 BIRCH? You might also want to take a look at SP-GiST; SP-GiST is more
 strict in where in the tree an item can be stored, and lets the operator
 class to specify exactly when a node is split etc.


Hmmm, it's likely I've imagined something quite outside of this paper, and
even already suggested it to Ivan... :)
I need a little time to rethink it.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Cube extension kNN support

2014-03-31 Thread Alexander Korotkov

On Mon, Mar 31, 2014 at 10:01 PM, Sergey Konoplev gray...@gmail.com wrote:

 On Thu, Mar 27, 2014 at 3:26 PM, Sergey Konoplev gray...@gmail.com
 wrote:
  On Sun, Sep 22, 2013 at 4:38 PM, Stas Kelvich stas.kelv...@gmail.com
 wrote:
  Here is the patch that introduces kNN search for cubes with euclidean,
 taxicab and chebyshev distances.
 
  What is the status of this patch?

 Referring to our private conversation with Alexander Korotkov, the
 patch is in WIP state currently, and, hopefully, will be ready by 9.5.
 I'm ready to actively participate in its testing on a real world
 production set of data.

 I'm not sure if it is doable at all, but are there any possibility to
 implement here, or, what would be just great, any ready/half ready
 solutions of a Hamming distance based kNN search?


Cube dealing with float8 numbers. There is another patch making it work
with other number types. But Hamming distance is for bit vectors, isn't it?


With best regards,
Alexander Korotkov.

[HACKERS] Proposal: fix range queries in btree_gin

2014-03-28 Thread Alexander Korotkov

Hackers,

after reading Heikki Linnakangas presentation about GIN from Nordic PGDay,
I figure out that btree_gin became much more attractive.
http://hlinnaka.iki.fi/presentations/NordicPGDay2014-GIN.pdf

But it have one weird behaviour: when you have range restriction like col
 const1 AND col  const2 it's handled as intersection of two separate
partial matches col  const1 and  col  const2. So, it's awfully slow :(

The opclass can't handle it better because it only deals with col 
const1 and  col  const2 separately. This two restrictions are
separately passed to gin_extract_query.

This problem is known, but now I can propose some solution.

We have range types, and restriction col @ range can be correctly
handled by gin_extract_query, because it will be passed there as single
restriction. This is workaround itself, but it's weird to force users
express queries like this.

We can add some logic to gin. If it sees:
1) Query contain both col  const1 and  col  const2 restrictions.
2) There is a range type for this type and comparison operator.
3) opclass supports col @ range
then rewrite this two restrictions as col @ range(const1, const2)

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] gsoc knn spgist

2014-03-25 Thread Alexander Korotkov

On Tue, Mar 25, 2014 at 8:16 PM, Костя Кузнецов chapae...@yandex.ru wrote:

 Hello.

 I submit a proposal. But Heikki Linnakangas write comments that i dont
 have a plan of implementation.

 My project is knn for spgist.

 Can I ask you a few questions?

 1. I research a commit gist knn implementation. in gist implementation in
 role of queue is ised rtree(with distance comparator) , in spgist
 implementation this is List. Can i use rtree in spgist ? if i cant then i
 can use.


KNN-GiST uses RB-tree for queue. RB-tree is very different from R-tree. And
yes, it can be used in SP-GiST. However, alternative is heap (in-memory
structure, not table heap). I don't know why GiST doesn't use heap instead
of RB-tree.


With best regards,
Alexander Korotkov.

Re: [HACKERS] jsonb and nested hstore

2014-03-20 Thread Alexander Korotkov

I've noticed two commits on github.

commit b8199ee3c2506ab81b47a0b440363fc90c0d6956
Author: Peter Geoghegan p...@heroku.com
Date:   Wed Mar 19 02:02:16 2014 -0700

For jsonb_hash_ops, hash less

By limiting the GIN entries to the least-nested level, the delicious.com
sample JSON dataset index shrinks in size from 382MB to 255MB without
any apparent downside.

commit 2cea5213dba011625fc0d5c6b447e838080087b1
Author: Peter Geoghegan p...@heroku.com
Date:   Wed Mar 19 02:13:42 2014 -0700

Revert For jsonb_hash_ops, hash less

This might be workable with another approach, but leave it for now. This
reverts commit b8199ee3c2506ab81b47a0b440363fc90c0d6956.

Besides implementation, what the idea was here? For me, it's impossible to
skip any single element, because it's possible for query to include only
this element. If we skip that element, we can't answer corresponding query
no more.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Alexander Korotkov

On Tue, Mar 18, 2014 at 5:12 PM, Anastasia Lubennikova 
lubennikov...@gmail.com wrote:

 2. gistget algorithm (several parts omitted):

 Check the key
 gistindex_keytest() - does this index tuple satisfy the scan key(s)?

 Scan all items on the GiST index page and insert them into the queue (or
 directly to output areas)

 plain scan

 Heap tuple TIDs are returned into so-pageData[]

 ordered scan

 Heap tuple TIDs are pushed into individual search queue items

 If the fetch() is specified by the developer, then using it, algorithm can
 retrieve the data directly to output areas at this stage, without reference
 to the heap.


I think there are following changes to be made to GiST code:
1) When next consistent IndexTuple is found extract original Datums using
fetch method.
2) Put those original Datums to queue.
3) When returning next ItemPointer from queue put original Datums to
IndexScanDesc (into xs_itup).

 3. Create function gistcanreturn to check whether fetch() is specified by
 user.

 Amcanreturn - Function to check whether index supports index-only scans,
 or zero if none

 There is the question of support index-only scans for multicolumn indexes.
 Probably it would require extend the interface to add separate columns
 checking.

 To solve this question I need the community's help.


 4. Add details for index only scans into gistcostestimate function


Also, another part of work to be mentioned is to implement fetch function
for all suitable opclasses.


With best regards,
Alexander Korotkov.

Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Alexander Korotkov

Josh,

Anastasia has already consulted to me in person. It is not big proposal.
But for newbie who is not familiar with PostgreSQL code base and especially
GiST it seems fair enough.

With best regards,
Alexander Korotkov.

On Tue, Mar 18, 2014 at 9:16 PM, Josh Berkus j...@agliodbs.com wrote:

Alexander,

Can you comment on the below proposal? I'd like your opinion on how
difficult it will be.

On 03/18/2014 06:12 AM, Anastasia Lubennikova wrote:
Hello!
Here is the text of my proposal which I've applied to GSoC.
(and link

http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120
)
Any suggestions and comments are welcome.

*Project name*

Support for index-only scans for GIST index

*Brief review*

Currently GiST index don't have index-only scan functionality. Index-only
scans are a major performance feature added to Postgres 9.2. They allow
certain types of queries to be satisfied just by retrieving data from
indexes, and not from tables. This feature for B-trees (implemented since
PostgreSQL-9.2) allows doing certain types of queries significantly
faster.
This is achieved by reduction in the amount of I/O necessary to satisfy
queries. I think it will be useful to implement this feature for user
defined data types that use GiST index.

*Benefits to the PostgreSQL Community*

Faster GiST index search would be actual for many PostgreSQL applications
(for example some GIS systems).

*Quantifiable results*

Acceleration of GiST index search.

*Project details*

1. The GiST is a balanced tree structure like a B-tree, containing key,
pointer pairs. GiST key is a member of a user-defined class, and
represents some property that is true of all data items reachable from
the
pointer associated with the key. The GiST provides a possibility to
create
custom data types with indexed access methods and extensible set of
queries.

There are seven methods that an index operator class for GiST must
provide,
and an eighth that is optional.

-consistent

-union

-compress

-decompress

-penalty

-picksplit

-equal

-distance (optional)

I'm going to create new optional method fetch() in addition to existing.
Thus user can create a method of retrieving data from the index without
loss. It will be used when performing search queries to speed data
retrieving.

2. gistget algorithm (several parts omitted):

Check the key
gistindex_keytest() - does this index tuple satisfy the scan key(s)?

Scan all items on the GiST index page and insert them into the queue (or
directly to output areas)

plain scan

Heap tuple TIDs are returned into so-pageData[]

ordered scan

Heap tuple TIDs are pushed into individual search queue items

If the fetch() is specified by the developer, then using it, algorithm
can
retrieve the data directly to output areas at this stage, without
reference
to the heap.

3. Create function gistcanreturn to check whether fetch() is specified by
user.

Amcanreturn - Function to check whether index supports index-only scans,
or
zero if none

There is the question of support index-only scans for multicolumn
indexes.
Probably it would require extend the interface to add separate columns
checking.

To solve this question I need the community's help.

4. Add details for index only scans into gistcostestimate function

*Links*

1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
trees for database systems. - September, 1995.

2) http://www.sai.msu.su/~megera/postgres/gist/

3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
Indexes.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-03-15 Thread Alexander Korotkov

On Sat, Mar 15, 2014 at 11:27 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 03/15/2014 08:40 PM, Fujii Masao wrote:

 Hi,

 I executed the following statements in HEAD and 9.3, and compared
 the size of WAL which were generated by data insertion in GIN index.

 -
 CREATE EXTENSION pg_trgm;
 CREATE TABLE hoge (col1 text);
 CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH
 (FASTUPDATE = off);

 CHECKPOINT;
 SELECT pg_switch_xlog();
 SELECT pg_switch_xlog();

 SELECT pg_current_xlog_location();
 INSERT INTO hoge SELECT 'POSTGRESQL' FROM generate_series(1, 100);
 SELECT pg_current_xlog_location();
 -

 The results of WAL size are

  960 MB (9.3)
2113 MB (HEAD)

 The WAL size in HEAD was more than two times bigger than that in 9.3.
 Recently the source code of GIN index has been changed dramatically.
 Is the increase in GIN-related WAL intentional or a bug?


 It was somewhat expected. Updating individual items on the new-format GIN
 pages requires decompressing and recompressing the page, and the
 recompressed posting lists need to be WAL-logged. Which generates much
 larger WAL records.

 That said, I didn't expect the difference to be quite that big when you're
 appending to the end of the table. When the new entries go to the end of
 the posting lists, you only need to recompress and WAL-log the last posting
 list, which is max 256 bytes long. But I guess that's still a lot more WAL
 than in the old format.

 That could be optimized, but I figured we can live with it, thanks to the
 fastupdate feature. Fastupdate allows amortizing that cost over several
 insertions. But of course, you explicitly disabled that...


Let me know if you want me to write patch addressing this issue.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] jsonb and nested hstore

2014-03-13 Thread Alexander Korotkov

On Thu, Mar 13, 2014 at 1:21 PM, Greg Stark st...@mit.edu wrote:

 Well these are just normal gin and gist indexes. If we want to come up
 with new index operator classess we can still do that and keep the old
 ones if necessary. Even that seems pretty unlikely from past experience.

 I'm actually pretty sanguine even about keeping the GIST opclass. If
 it has bugs then the bugs only affect people who use this non-default
 opclass and we can fix them. It doesn't risk questioning any basic
 design choices in the patch.

 It does sound like the main question here is which opclass should be
 the default. From the discussion there's a jsonb_hash_ops which works
 on all input values but supports fewer operators and a jsonb_ops which
 supports more operators but can't handle json with larger individual
 elements. Perhaps it's better to make jsonb_hash_ops the default so at
 least it's always safe to create a default gin index?


A couple of thoughts from me:
1) We can evade length limitation if GIN index by truncating long values
and setting recheck flag. We can introduce some indicator of truncated
value like zero byte at the end.
2) jsonb_hash_ops can be extended to handle keys queries too. We can
preserve one bit in hash as flag indicating whether it's a hash of key or
hash of path to value. For sure, such index would be a bit larger. Also,
jsonb_hash_ops can be split into two: with and without keys.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

2014-03-13 Thread Alexander Korotkov

On Thu, Mar 13, 2014 at 8:58 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 03/12/2014 07:52 PM, Alexander Korotkov wrote:

 
 * I just noticed that the dummy trueTriConsistentFn returns GIN_MAYBE,
 rather than GIN_TRUE. The equivalent boolean version returns 'true'
 without
 recheck. Is that a typo, or was there some reason for the discrepancy?
 

 Actually, there is not difference in current implementation, But I
 implemented it so that trueTriConsistentFn can correctly work
 with shimBoolConsistentFn. In this case it should return GIN_MAYBE in case
 when it have no GIN_MAYBE in the input (as analogue of setting recheck
 flag). So, it could return GIN_TRUE only if it checked that input has
 GIN_MAYBE. However, checking would be just wasting of cycles. So I end up
 with just GIN_MAYBE :-)


 I don't understand that. As it is, it's inconsistent with the boolean
 trueConsistent function. trueConsistent always returns TRUE with
 recheck=false. And in GIN_SEARCH_MODE_EVERYTHING mode, there are no regular
 scan keys.


Ok, I see. I just messed it up.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

2014-03-12 Thread Alexander Korotkov

On Wed, Mar 12, 2014 at 8:29 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 03/12/2014 12:09 AM, Tomas Vondra wrote:

 Hi all,

 a quick question that just occured to me - do you plan to tweak the cost
 estimation fot GIN indexes, in this patch?

 IMHO it would be appropriate, given the improvements and gains, but it
 seems to me gincostestimate() was not touched by this patch.


 Good point. We have done two major changes to GIN in this release cycle:
 changed the data page format and made it possible to skip items without
 fetching all the keys (fast scan). gincostestimate doesn't know about
 either change.

 Adjusting gincostestimate for the more compact data page format seems
 easy. When I hacked on that, I assumed all along that gincostestimate
 doesn't need to be changed as the index will just be smaller, which will be
 taken into account automatically. But now that I look at gincostestimate,
 it assumes that the size of one item on a posting tree page is a constant 6
 bytes (SizeOfIptrData), which is no longer true. I'll go fix that.

 Adjusting for the effects of skipping is harder. gincostestimate needs to
 do the same preparation steps as startScanKey: sort the query keys by
 frequency, and call consistent function to split the keys intao required
 and additional sets. And then model that the additional entries only
 need to be fetched when the other keys match. That's doable in principle,
 but requires a bunch of extra code.

 Alexander, any thoughts on that? It's getting awfully late to add new code
 for that, but it sure would be nice somehow take fast scan into account.


Preparation we do in startScanKey requires knowledge of estimate size of
posting lists/trees. We do this estimate by traversal to leaf pages. I
think gincostestimate is expected to be way more cheap. So, we probably
need so more rough estimate there, don't we?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

2014-03-12 Thread Alexander Korotkov

On Wed, Mar 12, 2014 at 8:02 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 02/26/2014 11:25 PM, Alexander Korotkov wrote:

 On Thu, Feb 27, 2014 at 1:07 AM, Alexander Korotkov aekorot...@gmail.com
 wrote:

  On Thu, Feb 20, 2014 at 1:48 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

  On 02/09/2014 12:11 PM, Alexander Korotkov wrote:

  I've rebased catalog changes with last master. Patch is attached. I've
 rerun my test suite with both last master ('committed') and attached
 patch ('ternary-consistent').


 Thanks!


 method |   sum

 +--
committed  | 143491.71501
fast-scan-11   | 126916.11199
fast-scan-light|   137321.211
fast-scan-light-heikki | 138168.02801
master |   446976.288
ternary-consistent |   125923.514

 I explain regression in last master by change of MAX_MAYBE_ENTRIES
 from 8
 to 4.


 Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make
 sure we get similar behavior in Tomas' tests that used 6 search terms.
 But
 I always felt that it was too large for real queries, once we have the
 catalog changes, that's why I lowered to 4 when committing. If an
 opclass
 benefits greatly from fast scan, it should provide the ternary
 consistent
 function, and not rely on the shim implementation.


   I'm not sure about decision to reserve separate procedure number for

 ternary consistent. Probably, it would be better to add ginConfig
 method.
 It would be useful for post 9.4 improvements.


 Hmm, it might be useful for an opclass to provide both, a boolean and
 ternary consistent function, if the boolean version is significantly
 more
 efficient when all the arguments are TRUE/FALSE. OTOH, you could also
 do a
 quick check through the array to see if there are any MAYBE arguments,
 within the consistent function. But I'm inclined to keep the
 possibility to
 provide both versions. As long as we support the boolean version at all,
 there's not much difference in terms of the amount of code to support
 having them both for the same opclass.

 A ginConfig could be useful for many other things, but I don't think
 it's
 worth adding it now.


 What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck?
 We discussed that earlier, but didn't reach any conclusion. That needs
 to
 be clarified in the docs. One possibility is to document that they're
 equivalent. Another is to forbid one of them. Yet another is to assign a
 different meaning to each.

 I've been thinking that it might be useful to define them so that a
 MAYBE
 result from the tri-consistent function means that it cannot decide if
 you
 have a match or not, because some of the inputs were MAYBE. And
 TRUE+recheck means that even if all the MAYBE inputs were passed as
 TRUE or
 FALSE, the result would be the same, TRUE+recheck. The practical
 difference
 would be that if the tri-consistent function returns TRUE+recheck,
 ginget.c
 wouldn't need to bother fetching the other entries, it could just return
 the entry with recheck=true immediately. While with MAYBE result, it
 would
 fetch the other entries and call tri-consistent again. ginget.c doesn't
 currently use the tri-consistent function that way - it always fetches
 all
 the entries for a potential match before calling tri-consistent, but it
 could. I had it do that in some of the patch versions, but Tomas'
 testing
 showed that it was a big loss on some queries, because the consistent
 function was called much more often. Still, something like that might be
 sensible in the future, so it might be good to distinguish those cases
 in
 the API now. Note that ginarrayproc is already using the return values
 like
 that: in GinContainedStrategy, it always returns TRUE+recheck
 regardless of
 the inputs, but in other cases it uses GIN_MAYBE.



 Next revision of patch is attached.

 In this version opclass should provide at least one consistent function:
 binary or ternary. It's expected to achieve best performance when opclass
 provide both of them. However, tests shows opposite :( I've to recheck it
 carefully.


 However, it's not!
 This revision of patch completes my test case in 123330 ms. This is
 slightly faster than previous revision.


 Great. Committed, I finally got around to it.

 Some minor changes: I reworded the docs and also updated the table of
 support functions in xindex.sgml. I refactored the query in opr_sanity.sql,
 to make it easier to read, and to check more carefully that the required
 support functions are present. I also added a runtime check to avoid
 crashing if neither is present.

 A few things we ought to still discuss:

 * I just noticed that the dummy trueTriConsistentFn returns GIN_MAYBE,
 rather than GIN_TRUE. The equivalent boolean version returns 'true' without
 recheck. Is that a typo, or was there some reason for the discrepancy?


Actually

Re: [HACKERS] jsonb and nested hstore

2014-03-11 Thread Alexander Korotkov

On Tue, Mar 11, 2014 at 5:19 AM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Mar 10, 2014 at 4:19 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  Here it is.

 So it looks like what you have here is analogous to the other problems
 that I fixed with both GiST and GIN. That isn't surprising, and this
 does fix my test-case. I'm not terribly happy about the lack of
 explanation for the hashing in that loop, though. Why use COMP_CRC32()
 at all, for one thing?

 Why do this for non-primitive jsonb hashing?

 COMP_CRC32(stack-hash_state, PATH_SEPARATOR, 1);

 Where PATH_SEPARATOR is:

 #define PATH_SEPARATOR (\0)

 Actually, come to think of it, why not just use one hashing function
 everywhere? i.e., jsonb_hash(PG_FUNCTION_ARGS)? It's already very
 similar. Pretty much every hash operator support function 1 (i.e. a
 particular type's hash function) is implemented with hash_any(). Can't
 we just do the same here? In any case it isn't obvious why the
 requirements for those two things (the hashing mechanism used by the
 jsonb_hash_ops GIN opclass, and the hash operator class support
 function 1 hash function) cannot be the same thing.


It's because CRC32 interface allows incremental calculation while hash_any
requires single chunk of memory. I don't think that unfolding everything is
good idea. But we could implement incremental interface for hash_any.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] jsonb and nested hstore

2014-03-10 Thread Alexander Korotkov

On Mon, Mar 10, 2014 at 1:18 PM, Peter Geoghegan p...@heroku.com wrote:

 * The jsonb_hash_ops non-default GIN opclass is broken. It has its own
 idiosyncratic notion of what constitutes containment, that sees it
 only return, say, jsonb arrays that have a matching string as their
 leftmost element (if we ask it if it contains within it another
 array with the same string). Because of the limited number of
 indexable operators (only @), I'd put this opclass in the same
 category as GiST in terms of my willingness to forgo it for a release,
 even if it did receive a loud applause at pgConf.EU. Again, it might
 be some disparity between the opertors as they existed in hstore2 at
 one time, and as they exist in the core code now, but I doubt it, not
 least since the regression tests didn't pick this up, and it's such a
 basic thing. Perhaps Oleg and Teodor just need to explain this to me.


I din't get comment about leftmost element. There is absolutely no
distinguish between array elements. All elements are extracted into same
keys independent of their indexes. It seems to have no change since I wrote
hstore_hash_ops.  Could you share test case to illustrate what you mean?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] jsonb and nested hstore

2014-03-10 Thread Alexander Korotkov

On Mon, Mar 10, 2014 at 2:19 PM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Mar 10, 2014 at 3:00 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  I din't get comment about leftmost element. There is absolutely no
  distinguish between array elements. All elements are extracted into same
  keys independent of their indexes. It seems to have no change since I
 wrote
  hstore_hash_ops.  Could you share test case to illustrate what you mean?

 I don't have time to post that at the moment, but offhand I *think*
 your confusion may be due to the fact that the json_hash_ops opclass
 (as I call it) was previously consistent with the behavior of the
 other GIN opclass (the default). The problem is that they (well, at
 least the default GIN and GiST opclasses) were inconsistent with how
 the containment operator behaved in respect of jsonb array elements
 generally.

 Here is the commit on our feature branch where I fixed the problem for
 the default GIN opclass:


 https://github.com/feodor/postgres/commit/6f5e4fe9fc34f9512919b1c8b6a54952ab288640s

 If it doesn't explain the problem, you may still wish to comment on
 the correctness of this fix. I am still waiting on feedback from Oleg
 and Teodor.


Apparently, there is bug in calculation of hashes. Array elements were
hashed incrementally while each of them should be hashed separately. That
cause an effect of distinguishing array elements by their indexes. Not sure
about when this bug was added.
Fix is attached.

--
With best regards,
Alexander Korotkov.


jsonb-hash-ops-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-10 Thread Alexander Korotkov

On Mon, Mar 10, 2014 at 3:04 PM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Mar 10, 2014 at 3:47 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  Fix is attached.

 Could you post a patch with regression tests, please?


Here it is.

--
With best regards,
Alexander Korotkov.


jsonb-hash-ops-fix-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] trgm regex index peculiarity

2014-03-01 Thread Alexander Korotkov

On Mon, Feb 10, 2014 at 1:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Alexander Korotkov aekorot...@gmail.com writes:
  On Thu, Jan 16, 2014 at 3:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I looked at this patch a bit.  It seems like this:
  +  *BLANK_COLOR_SIZE - How much blank character is more frequent
 than
  +  *   other character in average
  + #define BLANK_COLOR_SIZE  32
  is a drastic overestimate.

  OK, I would like to make more reasoning for penalty.
  Let's consider word of length n.
  It contains n+1 trigrams including:
  1 __x trigram
  1 _xx trigram
  1 xx_ trigram
  n - 2 xxx trigrams

  Assume alphabet of size m those trigrams have following average
 frequencies:
  __x 1/((n+1)*m)
  _xx 1/((n+1)*m^2)
  xx_ 1/((n+1)*m^2)
  xxx (n-2)/((n+1)*m^3)

 Hmm, but you're assuming that the m letters are all equally frequent,
 which is surely not true in normal text.

 I didn't have a machine-readable copy of Hemingway or Tolstoy at hand,
 but I do have a text file with the collected works of H.P. Lovecraft,
 so I tried analyzing the trigrams in that.  I find that
 * The average word length is 4.78 characters.
 * There are 5652 distinct trigrams (discounting some containing digits),
 the most common one ('  t') occurring 81222 times; the average
 occurrence count is 500.0.
 * Considering only trigrams not containing any blanks, there are
 4937 of them, the most common one ('the') occurring 45832 times,
 with an average count of 266.9.
 * There are (unsurprisingly) exactly 26 trigrams of the form '  x',
 with an average count of 19598.5.
 * There are 689 trigrams of the form ' xx' or 'xx ', the most common one
 (' th') occurring 58200 times, with an average count of 1450.1.

 So, we've rediscovered the fact that 'the' is the most common word in
 English text ;-).  But I think the significant conclusion for our purposes
 here is that single-space trigrams are on average about 1450.1/266.9 = 5.4
 times more common than the average space-free trigram, and two-space
 trigrams about 19598.5/266.9 = 73.4 times more common.

 So this leads me to the conclusion that we should be using a
 BLANK_COLOR_SIZE value around 5 or 6 (with maybe something other than
 a square-law rule for two-space trigrams).  Even using your numbers,
 it shouldn't be 32.


Next revision of patch is attached. Changes are so:
1) Notion penalty is used instead of size.
2) We try to reduce total penalty to WISH_TRGM_PENALTY, but restriction is
MAX_TRGM_COUNT total trigrams count.
3) Penalties are assigned to particular color trigram classes. I.e.
separate penalties for __a, _aa, _a_, aa_. It's based on analysis of
trigram frequencies in Oscar Wilde writings. We can end up with different
numbers, but I don't think they will be dramatically different.

--
With best regards,
Alexander Korotkov.


trgm-regex-optimize.3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-26 Thread Alexander Korotkov

On Thu, Feb 20, 2014 at 1:48 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 02/09/2014 12:11 PM, Alexander Korotkov wrote:

 I've rebased catalog changes with last master. Patch is attached. I've
 rerun my test suite with both last master ('committed') and attached
 patch ('ternary-consistent').


 Thanks!


method |   sum
 +--
   committed  | 143491.71501
   fast-scan-11   | 126916.11199
   fast-scan-light|   137321.211
   fast-scan-light-heikki | 138168.02801
   master |   446976.288
   ternary-consistent |   125923.514

 I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8
 to 4.


 Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make
 sure we get similar behavior in Tomas' tests that used 6 search terms. But
 I always felt that it was too large for real queries, once we have the
 catalog changes, that's why I lowered to 4 when committing. If an opclass
 benefits greatly from fast scan, it should provide the ternary consistent
 function, and not rely on the shim implementation.


  I'm not sure about decision to reserve separate procedure number for
 ternary consistent. Probably, it would be better to add ginConfig method.
 It would be useful for post 9.4 improvements.


 Hmm, it might be useful for an opclass to provide both, a boolean and
 ternary consistent function, if the boolean version is significantly more
 efficient when all the arguments are TRUE/FALSE. OTOH, you could also do a
 quick check through the array to see if there are any MAYBE arguments,
 within the consistent function. But I'm inclined to keep the possibility to
 provide both versions. As long as we support the boolean version at all,
 there's not much difference in terms of the amount of code to support
 having them both for the same opclass.

 A ginConfig could be useful for many other things, but I don't think it's
 worth adding it now.


 What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck? We
 discussed that earlier, but didn't reach any conclusion. That needs to be
 clarified in the docs. One possibility is to document that they're
 equivalent. Another is to forbid one of them. Yet another is to assign a
 different meaning to each.

 I've been thinking that it might be useful to define them so that a MAYBE
 result from the tri-consistent function means that it cannot decide if you
 have a match or not, because some of the inputs were MAYBE. And
 TRUE+recheck means that even if all the MAYBE inputs were passed as TRUE or
 FALSE, the result would be the same, TRUE+recheck. The practical difference
 would be that if the tri-consistent function returns TRUE+recheck, ginget.c
 wouldn't need to bother fetching the other entries, it could just return
 the entry with recheck=true immediately. While with MAYBE result, it would
 fetch the other entries and call tri-consistent again. ginget.c doesn't
 currently use the tri-consistent function that way - it always fetches all
 the entries for a potential match before calling tri-consistent, but it
 could. I had it do that in some of the patch versions, but Tomas' testing
 showed that it was a big loss on some queries, because the consistent
 function was called much more often. Still, something like that might be
 sensible in the future, so it might be good to distinguish those cases in
 the API now. Note that ginarrayproc is already using the return values like
 that: in GinContainedStrategy, it always returns TRUE+recheck regardless of
 the inputs, but in other cases it uses GIN_MAYBE.


Next revision of patch is attached.

In this version opclass should provide at least one consistent function:
binary or ternary. It's expected to achieve best performance when opclass
provide both of them. However, tests shows opposite :( I've to recheck it
carefully.

I've removed recheck flag from ternary consistent function. It have to
return GIN_MAYBE instead.

--
With best regards,
Alexander Korotkov.


gin-ternary-consistent-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-26 Thread Alexander Korotkov

On Thu, Feb 27, 2014 at 1:07 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Thu, Feb 20, 2014 at 1:48 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 02/09/2014 12:11 PM, Alexander Korotkov wrote:

 I've rebased catalog changes with last master. Patch is attached. I've
 rerun my test suite with both last master ('committed') and attached
 patch ('ternary-consistent').


 Thanks!


method |   sum
 +--
   committed  | 143491.71501
   fast-scan-11   | 126916.11199
   fast-scan-light|   137321.211
   fast-scan-light-heikki | 138168.02801
   master |   446976.288
   ternary-consistent |   125923.514

 I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8
 to 4.


 Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make
 sure we get similar behavior in Tomas' tests that used 6 search terms. But
 I always felt that it was too large for real queries, once we have the
 catalog changes, that's why I lowered to 4 when committing. If an opclass
 benefits greatly from fast scan, it should provide the ternary consistent
 function, and not rely on the shim implementation.


  I'm not sure about decision to reserve separate procedure number for
 ternary consistent. Probably, it would be better to add ginConfig method.
 It would be useful for post 9.4 improvements.


 Hmm, it might be useful for an opclass to provide both, a boolean and
 ternary consistent function, if the boolean version is significantly more
 efficient when all the arguments are TRUE/FALSE. OTOH, you could also do a
 quick check through the array to see if there are any MAYBE arguments,
 within the consistent function. But I'm inclined to keep the possibility to
 provide both versions. As long as we support the boolean version at all,
 there's not much difference in terms of the amount of code to support
 having them both for the same opclass.

 A ginConfig could be useful for many other things, but I don't think it's
 worth adding it now.


 What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck?
 We discussed that earlier, but didn't reach any conclusion. That needs to
 be clarified in the docs. One possibility is to document that they're
 equivalent. Another is to forbid one of them. Yet another is to assign a
 different meaning to each.

 I've been thinking that it might be useful to define them so that a MAYBE
 result from the tri-consistent function means that it cannot decide if you
 have a match or not, because some of the inputs were MAYBE. And
 TRUE+recheck means that even if all the MAYBE inputs were passed as TRUE or
 FALSE, the result would be the same, TRUE+recheck. The practical difference
 would be that if the tri-consistent function returns TRUE+recheck, ginget.c
 wouldn't need to bother fetching the other entries, it could just return
 the entry with recheck=true immediately. While with MAYBE result, it would
 fetch the other entries and call tri-consistent again. ginget.c doesn't
 currently use the tri-consistent function that way - it always fetches all
 the entries for a potential match before calling tri-consistent, but it
 could. I had it do that in some of the patch versions, but Tomas' testing
 showed that it was a big loss on some queries, because the consistent
 function was called much more often. Still, something like that might be
 sensible in the future, so it might be good to distinguish those cases in
 the API now. Note that ginarrayproc is already using the return values like
 that: in GinContainedStrategy, it always returns TRUE+recheck regardless of
 the inputs, but in other cases it uses GIN_MAYBE.


 Next revision of patch is attached.

 In this version opclass should provide at least one consistent function:
 binary or ternary. It's expected to achieve best performance when opclass
 provide both of them. However, tests shows opposite :( I've to recheck it
 carefully.


However, it's not!
This revision of patch completes my test case in 123330 ms. This is
slightly faster than previous revision.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-02-19 Thread Alexander Korotkov

On Thu, Feb 13, 2014 at 1:54 AM, Marti Raudsepp ma...@juffo.org wrote:

 I think the 1st patch now has a bug in initial_cost_mergejoin; you
 still pass the presorted_keys argument to cost_sort, making it
 calculate a partial sort cost, but generated plans never use partial
 sort. I think 0 should be passed instead. Patch attached, needs to be
 applied on top of partial-sort-basic-1 and then reverse-applied on
 partial-sort-merge-1.


It doesn't look so for me. Merge join doesn't find partial sort especially.
But if path with some presorted pathkeys will be accidentally selected then
partial sort will be used. See create_mergejoin_plan function. So, I think
this cost_sort call is relevant to create_mergejoin_plan. If we don't want
partial sort to be used in such rare cases then we should revert it from
both places. However, I doubt that it does any overhead, so we can leave it
as is.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Small GIN optimizations (after 9.4)

2014-02-11 Thread Alexander Korotkov

On Wed, Feb 12, 2014 at 2:58 AM, Bruce Momjian br...@momjian.us wrote:

 On Sun, Feb  9, 2014 at 02:17:12PM +0400, Alexander Korotkov wrote:
  On Thu, Feb 6, 2014 at 8:31 PM, PostgreSQL - Hans-J rgen Sch nig 
  postg...@cybertec.at wrote:
 
  i think there is one more thing which would be really good in GIN
 and which
  would solve a ton of issues.
  atm GIN entries are sorted by item pointer.
  if we could sort them by a column it would fix a couple of real
 work
  issues such as ...
 
  SELECT ... FROM foo WHERE tsearch_query ORDER BY price
 DESC LIMIT
  10
 
  ... or so.
  it many cases you want to search for a, say, product and find the
 cheapest
  / most expensive one.
  if the tsearch_query yields a high number of rows (which it often
 does) the
  subsequent sort will kill you.
 
 
  This is not intended to be a small change. However, some solution might
 be
  possible in post 9.4 gin improvements or in new secret indexing project
 which
  will be presented at PGCon :-)

 Would any of the listed changes cause backward-incompatible changes to
 the on-disk format, causing problems for pg_upgrade?


None of them.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-02-10 Thread Alexander Korotkov

On Mon, Feb 10, 2014 at 2:33 PM, Marti Raudsepp ma...@juffo.org wrote:

 On Sun, Feb 9, 2014 at 7:37 PM, Alexander Korotkov aekorot...@gmail.com
 wrote:
  This is not only place that worry me about planning overhead. See
  get_cheapest_fractional_path_for_pathkeys. I had to estimate number of
  groups for each sorting column in order to get right fractional path.

 AFAICT this only happens once per plan and the overhead is O(n) to the
 number of pathkeys? I can't get worried about that, but I guess it's
 better to test anyway.

 PS: You didn't answer my questions about splitting the patch. I guess
 I'll have to do that anyway to run the tests.


Done. Patch is splitted.

--
With best regards,
Alexander Korotkov.


partial-sort-basic-1.patch.gz
Description: GNU Zip compressed data


partial-sort-merge-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2014 - mentors, students and admins

2014-02-10 Thread Alexander Korotkov

Hi!

On Tue, Jan 28, 2014 at 9:34 PM, Thom Brown t...@linux.com wrote:

 And I'd be fine with being admin again this year, unless there's
 anyone else who would like to take up the mantle?


Thanks for your work. I would like to see you as admin this year again.

Who would be up for mentoring this year?  And are there any project
 ideas folk would like to suggest?


I would like to be mentor.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

On Fri, Feb 7, 2014 at 5:33 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:

 On 02/06/2014 01:22 PM, Alexander Korotkov wrote:

 Difference is very small. For me, it looks ready for commit.


 Great, committed!

 Now, to review the catalog changes...


I've rebased catalog changes with last master. Patch is attached. I've
rerun my test suite with both last master ('committed') and attached
patch ('ternary-consistent').

 method |   sum
+--
 committed  | 143491.71501
 fast-scan-11   | 126916.11199
 fast-scan-light|   137321.211
 fast-scan-light-heikki | 138168.02801
 master |   446976.288
 ternary-consistent |   125923.514

I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8
to 4. However I'm not sure why ternary-consistent show so good results.
Probably it's because some optimizations you did in committed version which
wasn't visible because of change of MAX_MAYBE_ENTRIES.
I'm not sure about decision to reserve separate procedure number for
ternary consistent. Probably, it would be better to add ginConfig method.
It would be useful for post 9.4 improvements.

--
With best regards,
Alexander Korotkov.


gin-ternary-consistent.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Small GIN optimizations (after 9.4)

On Thu, Feb 6, 2014 at 8:31 PM, PostgreSQL - Hans-Jürgen Schönig 
postg...@cybertec.at wrote:

 i think there is one more thing which would be really good in GIN and
 which would solve a ton of issues.
 atm GIN entries are sorted by item pointer.
 if we could sort them by a column it would fix a couple of real work
 issues such as ...

 SELECT ... FROM foo WHERE tsearch_query ORDER BY price DESC
 LIMIT 10

 ... or so.
 it many cases you want to search for a, say, product and find the cheapest
 / most expensive one.
 if the tsearch_query yields a high number of rows (which it often does)
 the subsequent sort will kill you.


This is not intended to be a small change. However, some solution might be
possible in post 9.4 gin improvements or in new secret indexing project
which will be presented at PGCon :-)

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Small GIN optimizations (after 9.4)

On Thu, Feb 6, 2014 at 3:36 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:

 While hacking on the GIN patches, I've come up with a few different ideas
 for improving performance. It's too late for 9.4, but I'll list them here
 if someone wants to work on them later:

 * Represent ItemPointers as uint64's, to speed up comparisons.
 ginCompareItemPointers is inlined into only a few instructions, but it's
 still more expensive than a single-instruction 64-bit comparison.
 ginCompareItemPointers is called very heavily in a GIN scan, so even a
 small improvement there would make for a noticeable speedup. It might be an
 improvement in code clarity, too.

 * Keep the entry streams of a GinScanKey in a binary heap, to quickly find
 the minimum curItem among them.

 I did this in various versions of the fast scan patch, but then I realized
 that the straightforward way of doing it is wrong, because a single
 GinScanEntry can be part of multiple GinScanKeys. If an entry's curItem is
 updated as part of advancing one key, and the entry is in a heap of another
 key, updating the curItem can violate the heap property of the other
 entry's heap.

 * Build a truth table (or cache) of consistent-function's results, and use
 that instead of calling consistent for every item.


Caching seems more appropriate for me. Intuition tells me that when number
of entries is high then far not all consistent combinations will be used.
However, intuition might be false :-)

* Deduce AND or OR logic from the consistent function. Or have the opclass
 provide a tree of AND/OR/NOT nodes directly, instead of a consistent
 function. For example, if the query is foo  bar, we could avoid calling
 consistent function altogether, and only return items that match both.


I also had this idea. But this solution doesn't cover similarity queries.
If you have 20 entries and consistent function returns true when at least
10 of 20 are present then representation in AND/OR/NOT nodes would be too
enormous, so useless.

* Delay decoding segments during a scan. Currently, we decode all segments
 of a posting tree page into a single array at once. But with fast scan,
 we might be able to skip over all entries in some of the segments. So it
 would be better to copy the segments into backend-private memory in
 compressed format, and decode them one segment at a time (or maybe even one
 item at a time), when needed. That would avoid the unnecessary decoding of
 segments that can be skipped over, and would also reduce memory usage of a
 scan.


I would like to add another one as continue of fast-scan:
* Skip scanning of some entries at all forcing recheck instead. Correct
decision should be done based on costs. However, I'm not sure about design.
Because it's like a planning feature. How correct to do this inside of GIN?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

On Thu, Feb 6, 2014 at 12:39 PM, Marti Raudsepp ma...@juffo.org wrote:

 On Thu, Feb 6, 2014 at 5:31 AM, Robert Haas robertmh...@gmail.com wrote:
  Hmm, sounds a little steep.  Why is it so expensive?  I'm probably
  missing something here, because I would have thought that planner
  support for partial sorts would consist mostly of considering the same
  sorts we consider today, but with the costs reduced by the batching.

 I guess it's because the patch undoes some optimizations in the
 mergejoin planner wrt caching merge clauses and adds a whole lot of
 code to find_mergeclauses_for_pathkeys. In other code paths the
 overhead does seem to be negligible.

 Notice the removal of:
 /* Select the right mergeclauses, if we didn't already */
 /*
  * Avoid rebuilding clause list if we already made one;
  * saves memory in big join trees...
  */


This is not only place that worry me about planning overhead.
See get_cheapest_fractional_path_for_pathkeys. I had to estimate number of
groups for each sorting column in order to get right fractional path. For
partial sort path, cost of first batch should be included into initial
cost.
If don't do so, optimizer can pick up strange plans basing on assumption
that it need only few rows from inner node. See an example.

create table test1 as (
select id,
(random()*100)::int as v1,
(random()*1)::int as v2
from generate_series(1,100) id);

create table test2 as (
select id,
(random()*100)::int as v1,
(random()*1)::int as v2
from generate_series(1,100) id);

create index test1_v1_idx on test1 (v1);

Plan without fraction estimation in
get_cheapest_fractional_path_for_pathkeys:

postgres=# explain select * from test1 t1 join test2 t2 on t1.v1 = t2.v1
order by t1.v1, t1.id limit 10;
QUERY PLAN

--
 Limit  (cost=198956893.20..198956913.33 rows=10 width=24)
   -  Partial sort  (cost=198956893.20..19909637942.82 rows=9791031169
width=24)
 Sort Key: t1.v1, t1.id
 Presorted Key: t1.v1
 -  Nested Loop  (cost=0.42..19883065506.84 rows=9791031169
width=24)
   Join Filter: (t1.v1 = t2.v1)
   -  Index Scan using test1_v1_idx on test1 t1
 (cost=0.42..47600.84 rows=100 width=12)
   -  Materialize  (cost=0.00..25289.00 rows=100 width=12)
 -  Seq Scan on test2 t2  (cost=0.00..15406.00
rows=100 width=12)
(9 rows)

Current version of patch:

postgres=# explain select * from test1 t1 join test2 t2 on t1.v1 = t2.v1
order by t1.v1, t1.id limit 10;
QUERY PLAN

--
 Limit  (cost=3699913.43..3699913.60 rows=10 width=24)
   -  Partial sort  (cost=3699913.43..173638549.67 rows=9791031169
width=24)
 Sort Key: t1.v1, t1.id
 Presorted Key: t1.v1
 -  Merge Join  (cost=150444.79..147066113.70 rows=9791031169
width=24)
   Merge Cond: (t1.v1 = t2.v1)
   -  Index Scan using test1_v1_idx on test1 t1
 (cost=0.42..47600.84 rows=100 width=12)
   -  Materialize  (cost=149244.84..154244.84 rows=100
width=12)
 -  Sort  (cost=149244.84..151744.84 rows=100
width=12)
   Sort Key: t2.v1
   -  Seq Scan on test2 t2  (cost=0.00..15406.00
rows=100 width=12)
(11 rows)

I don't compare actual execution times because I didn't wait until first
plan execution ends up :-)
But anyway costs are extraordinary and inner sequential scan of 100
rows is odd.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] trgm regex index peculiarity

On Thu, Jan 16, 2014 at 3:34 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Alexander Korotkov aekorot...@gmail.com writes:
  Revised version of patch with necessary comments.

 I looked at this patch a bit.  It seems like this:

 +  *BLANK_COLOR_SIZE - How much blank character is more frequent than
 +  *   other character in average
 + #define BLANK_COLOR_SIZE  32

 is a drastic overestimate.  Using the program Erik provided to generate
 some sample data, I find that blanks in trigrams are maybe about three
 times more common than other characters.  Possibly Erik's data isn't
 very representative of real text, but if that's the case we'd better
 get a more representative sample case before we start optimizing ...

 Now maybe the above number is okay for this purpose anyway, but if so
 it needs some different justification; maybe call it a penalty?
 But nonetheless it seems like a darn large penalty, particularly for  x
 type trigrams where the value would effectively be squared.  Compared to
 the proposed values of MAX_TRGM_SIZE and WISH_TRGM_SIZE, it seems like
 you might as well forget the sizing approach and just flat out reject
 trigrams containing COLOR_BLANK, because they'll never get through the
 size filter.


It seems to be right decision to me when we have other trigrams can reject
them. But there are cases when we can't.


  I'm inclined to think you need a larger MAX_TRGM_SIZE;
 and WISH_TRGM_SIZE seems mighty small as well, compared to what the
 code would have done before.


OK, I would like to make more reasoning for penalty.
Let's consider word of length n.
It contains n+1 trigrams including:
1 __x trigram
1 _xx trigram
1 xx_ trigram
n - 2 xxx trigrams

Assume alphabet of size m those trigrams have following average frequencies:
__x 1/((n+1)*m)
_xx 1/((n+1)*m^2)
xx_ 1/((n+1)*m^2)
xxx (n-2)/((n+1)*m^3)

The ratio of this frequencies with blanks to frequency of xxx is:
__x m^2/(n-2)
_xx and xx_ m/(n-2)

In order to estimate n I've analyzed some classics:
Ernest Hemingway A farewell to arms - 3.8
Leo Tolstoy War and Peace - 5.0

In english with alphabet size = 26 we have

__x m^2/(n-2) = 375
_xx and xx_ m/(n-2) = 14.4

In russian with alphabet size = 33 we have

__x m^2/(n-2) = 363
_xx and xx_ m/(n-2) = 11

Assuming these calculations is approximate, we can safely use same values
for both languages.
Does such reasoning make sense?

Also, surely this bit:

 !   trgmNFA-totalTrgmCount = (int) totalTrgmSize;

 is flat out wrong?  expandColorTrigrams() expects that
 trgmNFA-totalTrgmCount is the truth, not some penalty-bloated
 abstraction.  The fact that the patch hasn't failed on you completely
 demonstrates that you've not tested any cases where blank-containing
 trigrams got through the filter, reinforcing my feeling that it's
 probably too strict now.


Oh, I wonder how can I leave such weird bug in patch :-(


 I wonder if it would be more appropriate to continue to measure the limit
 MAX_TRGM_COUNT in terms of actual trigram counts, and just use the
 penalized size as the sort key while determining which color trigrams
 to discard first.


Agree. But I would like to save some equivalent of WISH_TRGM_SIZE.

Another thought is that it's not clear that you should apply the same
 penalty to blanks in all three positions.  Because of the padding
 settings, any one word will normally lead to padded strings   a,
  ab, yz , so that spaces in the first position are about twice
 as common as spaces in the other positions.  (It's a little less
 than that because single-letter words produce only   a and  a ;
 but I'd think that such words aren't very common in text that trigrams
 are effective for.)  So I'm thinking the penalty ought to take that
 into account.

 I'm also inclined to think that it might be worth adding a size
 field to ColorTrgmInfo rather than repeatedly calculating the
 size estimate.  We only allow a thousand and some ColorTrgmInfos
 at most, so the extra space wouldn't be that large, and I'm concerned
 by the expense the current patch adds to the sort comparator.


Agree.



 Another point is that this comment:

  * Note: per-color-trigram counts cannot overflow an int so long as
  * COLOR_COUNT_LIMIT is not more than the cube root of INT_MAX, ie
 about
  * 1290.  However, the grand total totalTrgmCount might conceivably
  * overflow an int, so we use int64 for that within this routine.

 is no longer valid, or at least it fails to demonstrate that the result
 of getColorTrigramSize() can't overflow an int; indeed I rather fear it
 can.
 The patch failed to even update the variable name in this comment, let
 alone address the substantive question.

 There are some other cosmetic things I didn't like, for instance
 colorTrgmInfoCountCmp() is no longer comparing counts but you didn't
 rename it.  That's about it for substantive comments though.


Thanks, will be fixed.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-06 Thread Alexander Korotkov

On Thu, Feb 6, 2014 at 2:21 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:

 On 02/05/2014 12:42 PM, Alexander Korotkov wrote:

 Attached patch is light version of fast scan. It does extra consistent
 function calls only on startScanKey, no extra calls during scan of the
 index.
 It finds subset of rarest entries absence of which guarantee false
 consistent function result.


 Sounds good. Since the extra consistent calls are only performed at
 startup, it's unlikely to cause any noticeable performance regression, even
 when it's not helping.


  I've run real-life tests based of postgresql.org logs (33318 queries).
 Here
 is a table with summary time of running whole test case.

 =# select method, sum(time) from test_result group by method order by
 method;
 method|   sum
 -+--
   fast-scan-11| 126916.11199
   fast-scan-light |   137321.211
   heikki  |   466751.693
   heikki-true-ternary | 454113.62397
   master  |   446976.288
 (6 rows)

 where 'heikki' is gin-ternary-logic binary-heap
 preconsistent-only-on-new-page.patch and 'heikki-true-ternary' is version
 with my catalog changes promoting ternary consistent function to opclass.


 Looks good.


  We can see fast-scan-light gives almost same performance benefit on 
 queries. However, I test only CPU effect, not IO effect.
 Any thoughts?


 This test doesn't handle lossy-pages correctly:

  /*
  * Check if all items marked as scanEntryUseForSkip is
 not present in
  * minItem. If so, we can correct advancePast.
  */
 if (ginCompareItemPointers(minItem, minItemSkip)  0)
 {
 advancePast = minItemSkip;
 advancePast.ip_posid--;
 continue;
 }


 If minItemSkip is a lossy page, we skip all exact items on the same page.
 That's wrong, here's a test case to demonstrate:

 CREATE TABLE foo (ts tsvector);
 INSERT INTO foo SELECT to_tsvector('foo bar'||g) from generate_series(1,
 100) g;
 CREATE INDEX i_foo ON foo USING gin (ts);

 set work_mem='64'; -- to force lossy pages
 -- should return 11
 select count(*) from foo where ts @@ to_tsquery('foo  bar4:*');

 After some fiddling (including fixing the above), I ended up with the
 attached version of your patch. I still left out the catalog changes, again
 not because I don't think we should have them, but to split this into
 smaller patches that can be reviewed separately. I extended the both ways
 shim function to work with more than one maybe input. Of course, that is
 O(n^2), where n is the number of maybe arguments, so that won't scale,
 but it's OK for testing purposes. And many if not most real world queries,
 too.

 I had a very hard time understanding what it really means for an entry to
 be in the scanEntryUseForSkip array. What does it mean to use an entry
 for skipping?

 So I refactored that logic, to hopefully make it more clear. In essence,
 the entries are divided into two groups, required and other items. If none
 of the entries in the required set are true, then we know that the overall
 qual cannot match. See the comments for a more detailed explanation. I'm
 not wedded to the term required here; one might think that *all* the
 entries in the required set must be TRUE for a match, while it's actually
 that at least one of them must be TRUE.

 In keyGetItem, we fetch the minimum item among all the entries in the
 requiredEntries set. That's the next item we're going to return, regardless
 of any items in otherEntries set. But before calling the consistent
 function, we advance the other entries up to that point, so that we know
 whether to pass TRUE or FALSE to the consistent function. IOW, otherEntries
 only provide extra information for the consistent function to decide if we
 have a match or not - they don't affect which items we return to the caller.

 I think this is pretty close to committable state now. I'm slightly
 disappointed that we can't do the skipping in more scenarios. For example,
 in foo  bar, we can currently skip bar entry up to the next foo, but
 we cannot skip foo up to the next bar. But this is fairly simple, and
 since we sort the entries by estimated frequency, should already cover all
 the pathological rare  frequent type queries. So that's OK.


Sorry for my scanty explanation. Your version of patch looks good for me.
I've rerun my testcase:

=# select method, sum(time) from test_result group by method order by
method;
 method |   sum
+--
 fast-scan-11   | 126916.11199
 fast-scan-light|   137321.211
 fast-scan-light-heikki | 138168.02801
 heikki |   466751.693
 heikki-tru-ternary | 454113.62397
 master |   446976.288
 tri-state

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-05 Thread Alexander Korotkov

On Wed, Feb 5, 2014 at 1:23 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Mon, Feb 3, 2014 at 6:31 PM, Alexander Korotkov 
 aekorot...@gmail.comwrote:

 On Mon, Jan 27, 2014 at 7:30 PM, Alexander Korotkov aekorot...@gmail.com
  wrote:

 On Mon, Jan 27, 2014 at 2:32 PM, Alexander Korotkov 
 aekorot...@gmail.com wrote:

 Every single change you did in fast scan seems to be reasonable, but
 testing shows that something went wrong. Simple test with 3 words of
 different selectivities.

 After applying your patches:

 # select count(*) from fts_test where fti @@ plainto_tsquery('english',
 'gin index select');
  count
 ───
627
 (1 row)

 Time: 21,252 ms

 In original fast-scan:

 # select count(*) from fts_test where fti @@ plainto_tsquery('english',
 'gin index select');
  count
 ───
627
 (1 row)

 Time: 3,382 ms

 I'm trying to get deeper into it.


 I had two guesses about why it's become so slower than in my original
 fast-scan:
 1) Not using native consistent function
 2) Not sorting entries

 I attach two patches which rollback these two features (sorry for awful
 quality of second). Native consistent function accelerates thing
 significantly, as expected. Tt seems that sorting entries have almost no
 effect. However it's still not as fast as initial fast-scan:

 # select count(*) from fts_test where fti @@ plainto_tsquery('english',
 'gin index select');
  count
 ───
627
 (1 row)

 Time: 5,381 ms

 Tomas, could you rerun your tests with first and both these patches
 applied against patches by Heikki?


 I found my patch 0005-Ternary-consistent-implementation.patch to be
 completely wrong. It introduces ternary consistent function to opclass, but
 don't uses it, because I forgot to include ginlogic.c change into patch.
 So, it shouldn't make any impact on performance. However, testing results
 with that patch significantly differs. That makes me very uneasy. Can we
 now reproduce exact same?

 Right version of these two patches in one against current head is
 attached. I've rerun tests with it, results are
 /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
 postprocessing including graph drawing?

 Sometimes test cases are not what we expect. For example:

 =# explain SELECT id FROM messages WHERE body_tsvector @@
 to_tsquery('english','(5alpha1-initdb''d)');
QUERY PLAN


 ─
  Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
Recheck Cond: (body_tsvector @@ '''5alpha1-initdb''  ''5alpha1'' 
 ''initdb''  ''d'''::tsquery)
-  Bitmap Index Scan on messages_body_tsvector_idx  (cost=0.00..84.00
 rows=1 width=0)
  Index Cond: (body_tsvector @@ '''5alpha1-initdb''  ''5alpha1''
  ''initdb''  ''d'''::tsquery)
  Planning time: 0.257 ms
 (5 rows)

 5alpha1-initdb'd is 3 gin entries with different frequencies.

 Also, these patches are not intended to change relevance ordering speed.
 When number of results are high, most of time is relevance calculating and
 sorting. I propose to remove ORDER BY clause from test cases to see scan
 speed more clear.

 I've dump of postgresql.org search queries from Magnus. We can add them
 to our test case.


  It turns out that version 10 contained bug in ternary consistent function
 implementation for tsvector. Fixed in attached version.


Attached patch is light version of fast scan. It does extra consistent
function calls only on startScanKey, no extra calls during scan of the
index.
It finds subset of rarest entries absence of which guarantee false
consistent function result.

I've run real-life tests based of postgresql.org logs (33318 queries). Here
is a table with summary time of running whole test case.

=# select method, sum(time) from test_result group by method order by
method;
   method|   sum
-+--
 fast-scan-11| 126916.11199
 fast-scan-light |   137321.211
 heikki  |   466751.693
 heikki-true-ternary | 454113.62397
 master  |   446976.288
(6 rows)

where 'heikki' is gin-ternary-logic binary-heap
preconsistent-only-on-new-page.patch and 'heikki-true-ternary' is version
with my catalog changes promoting ternary consistent function to opclass.

We can see fast-scan-light gives almost same performance benefit on 
queries. However, I test only CPU effect, not IO effect.
Any thoughts?

--
With best regards,
Alexander Korotkov.




gin-fast-scan-light.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix picksplit with nan values

2014-02-05 Thread Alexander Korotkov

On Sat, Feb 1, 2014 at 7:50 AM, Bruce Momjian br...@momjian.us wrote:


 Where are we on this?


I found myself to have empty draft letter from November with new version of
patch attached. I'll return here when we have some solution in gin fast
scan challenge.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

2014-02-04 Thread Alexander Korotkov

On Mon, Feb 3, 2014 at 6:31 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Mon, Jan 27, 2014 at 7:30 PM, Alexander Korotkov 
 aekorot...@gmail.comwrote:

 On Mon, Jan 27, 2014 at 2:32 PM, Alexander Korotkov aekorot...@gmail.com
  wrote:

 On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 01/26/2014 08:24 AM, Tomas Vondra wrote:

 Hi!

 On 25.1.2014 22:21, Heikki Linnakangas wrote:

 Attached is a new version of the patch set, with those bugs fixed.


 I've done a bunch of tests with all the 4 patches applied, and it seems
 to work now. I've done tests with various conditions (AND/OR, number of
 words, number of conditions) and I so far I did not get any crashes,
 infinite loops or anything like that.

 I've also compared the results to 9.3 - by dumping the database and
 running the same set of queries on both machines, and indeed I got 100%
 match.

 I also did some performance tests, and that's when I started to worry.

 For example, I generated and ran 1000 queries that look like this:

SELECT id FROM messages
 WHERE body_tsvector @@ to_tsquery('english','(header  53  32 
 useful  dropped)')
 ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header  53
 
 32  useful  dropped)')) DESC;

 i.e. in this case the query always was 5 words connected by AND. This
 query is a pretty common pattern for fulltext search - sort by a list
 of
 words and give me the best ranked results.

 On 9.3, the script was running for ~23 seconds, on patched HEAD it was
 ~40. It's perfectly reproducible, I've repeated the test several times
 with exactly the same results. The test is CPU bound, there's no I/O
 activity at all. I got the same results with more queries (~100k).

 Attached is a simple chart with x-axis used for durations measured on
 9.3.2, y-axis used for durations measured on patched HEAD. It's obvious
 a vast majority of queries is up to 2x slower - that's pretty obvious
 from the chart.

 Only about 50 queries are faster on HEAD, and 700 queries are more
 than
 50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes
 150ms
 on HEAD).

 Typically, the EXPLAIN ANALYZE looks something like this (on 9.3):

   http://explain.depesz.com/s/5tv

 and on HEAD (same query):

   http://explain.depesz.com/s/1lI

 Clearly the main difference is in the Bitmap Index Scan which takes
 60ms on 9.3 and 120ms on HEAD.

 On 9.3 the perf top looks like this:

  34.79%  postgres [.] gingetbitmap
  28.96%  postgres [.] ginCompareItemPointers
   9.36%  postgres [.] TS_execute
   5.36%  postgres [.] check_stack_depth
   3.57%  postgres [.] FunctionCall8Coll

 while on 9.4 it looks like this:

  28.20%  postgres [.] gingetbitmap
  21.17%  postgres [.] TS_execute
   8.08%  postgres [.] check_stack_depth
   7.11%  postgres [.] FunctionCall8Coll
   4.34%  postgres [.] shimTriConsistentFn

 Not sure how to interpret that, though. For example where did the
 ginCompareItemPointers go? I suspect it's thanks to inlining, and that
 it might be related to the performance decrease. Or maybe not.


 Yeah, inlining makes it disappear from the profile, and spreads that
 time to the functions calling it.

 The profile tells us that the consistent function is called a lot more
 than before. That is expected - with the fast scan feature, we're calling
 consistent not only for potential matches, but also to refute TIDs based on
 just a few entries matching. If that's effective, it allows us to skip many
 TIDs and avoid consistent calls, which compensates, but if it's not
 effective, it's just overhead.

 I would actually expect it to be fairly effective for that query, so
 that's a bit surprising. I added counters to see where the calls are coming
 from, and it seems that about 80% of the calls are actually coming from
 this little the feature I explained earlier:


  In addition to that, I'm using the ternary consistent function to check
 if minItem is a match, even if we haven't loaded all the entries yet.
 That's less important, but I think for something like rare1 | (rare2 
 frequent) it might be useful. It would allow us to skip fetching
 'frequent', when we already know that 'rare1' matches for the current
 item. I'm not sure if that's worth the cycles, but it seemed like an
 obvious thing to do, now that we have the ternary consistent function.


 So, that clearly isn't worth the cycles :-). At least not with an
 expensive consistent function; it might be worthwhile if we pre-build the
 truth-table, or cache the results of the consistent function.

 Attached is a quick patch to remove that, on top of all the other
 patches, if you want to test the effect.


 Every single change you did in fast scan seems to be reasonable, but
 testing shows that something went wrong

Re: [HACKERS] GIN improvements part2: fast scan

On Mon, Jan 27, 2014 at 7:30 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Mon, Jan 27, 2014 at 2:32 PM, Alexander Korotkov 
 aekorot...@gmail.comwrote:

 On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 01/26/2014 08:24 AM, Tomas Vondra wrote:

 Hi!

 On 25.1.2014 22:21, Heikki Linnakangas wrote:

 Attached is a new version of the patch set, with those bugs fixed.


 I've done a bunch of tests with all the 4 patches applied, and it seems
 to work now. I've done tests with various conditions (AND/OR, number of
 words, number of conditions) and I so far I did not get any crashes,
 infinite loops or anything like that.

 I've also compared the results to 9.3 - by dumping the database and
 running the same set of queries on both machines, and indeed I got 100%
 match.

 I also did some performance tests, and that's when I started to worry.

 For example, I generated and ran 1000 queries that look like this:

SELECT id FROM messages
 WHERE body_tsvector @@ to_tsquery('english','(header  53  32 
 useful  dropped)')
 ORDER BY ts_rank(body_tsvector, to_tsquery('english','(header  53 
 32  useful  dropped)')) DESC;

 i.e. in this case the query always was 5 words connected by AND. This
 query is a pretty common pattern for fulltext search - sort by a list of
 words and give me the best ranked results.

 On 9.3, the script was running for ~23 seconds, on patched HEAD it was
 ~40. It's perfectly reproducible, I've repeated the test several times
 with exactly the same results. The test is CPU bound, there's no I/O
 activity at all. I got the same results with more queries (~100k).

 Attached is a simple chart with x-axis used for durations measured on
 9.3.2, y-axis used for durations measured on patched HEAD. It's obvious
 a vast majority of queries is up to 2x slower - that's pretty obvious
 from the chart.

 Only about 50 queries are faster on HEAD, and 700 queries are more than
 50% slower on HEAD (i.e. if the query took 100ms on 9.3, it takes 150ms
 on HEAD).

 Typically, the EXPLAIN ANALYZE looks something like this (on 9.3):

   http://explain.depesz.com/s/5tv

 and on HEAD (same query):

   http://explain.depesz.com/s/1lI

 Clearly the main difference is in the Bitmap Index Scan which takes
 60ms on 9.3 and 120ms on HEAD.

 On 9.3 the perf top looks like this:

  34.79%  postgres [.] gingetbitmap
  28.96%  postgres [.] ginCompareItemPointers
   9.36%  postgres [.] TS_execute
   5.36%  postgres [.] check_stack_depth
   3.57%  postgres [.] FunctionCall8Coll

 while on 9.4 it looks like this:

  28.20%  postgres [.] gingetbitmap
  21.17%  postgres [.] TS_execute
   8.08%  postgres [.] check_stack_depth
   7.11%  postgres [.] FunctionCall8Coll
   4.34%  postgres [.] shimTriConsistentFn

 Not sure how to interpret that, though. For example where did the
 ginCompareItemPointers go? I suspect it's thanks to inlining, and that
 it might be related to the performance decrease. Or maybe not.


 Yeah, inlining makes it disappear from the profile, and spreads that
 time to the functions calling it.

 The profile tells us that the consistent function is called a lot more
 than before. That is expected - with the fast scan feature, we're calling
 consistent not only for potential matches, but also to refute TIDs based on
 just a few entries matching. If that's effective, it allows us to skip many
 TIDs and avoid consistent calls, which compensates, but if it's not
 effective, it's just overhead.

 I would actually expect it to be fairly effective for that query, so
 that's a bit surprising. I added counters to see where the calls are coming
 from, and it seems that about 80% of the calls are actually coming from
 this little the feature I explained earlier:


  In addition to that, I'm using the ternary consistent function to check
 if minItem is a match, even if we haven't loaded all the entries yet.
 That's less important, but I think for something like rare1 | (rare2 
 frequent) it might be useful. It would allow us to skip fetching
 'frequent', when we already know that 'rare1' matches for the current
 item. I'm not sure if that's worth the cycles, but it seemed like an
 obvious thing to do, now that we have the ternary consistent function.


 So, that clearly isn't worth the cycles :-). At least not with an
 expensive consistent function; it might be worthwhile if we pre-build the
 truth-table, or cache the results of the consistent function.

 Attached is a quick patch to remove that, on top of all the other
 patches, if you want to test the effect.


 Every single change you did in fast scan seems to be reasonable, but
 testing shows that something went wrong. Simple test with 3 words of
 different selectivities.

 After applying your patches

Re: [HACKERS] GIN improvements part2: fast scan

On Thu, Jan 30, 2014 at 8:38 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/30/2014 01:53 AM, Tomas Vondra wrote:

 (3) A file with explain plans for 4 queries suffering ~2x slowdown,
  and explain plans with 9.4 master and Heikki's patches is available
  here:

http://www.fuzzy.cz/tmp/gin/queries.txt

  All the queries have 6 common words, and the explain plans look
  just fine to me - exactly like the plans for other queries.

  Two things now caught my eye. First some of these queries actually
  have words repeated - either exactly like database  database or
  in negated form like !anything  anything. Second, while
  generating the queries, I use dumb frequency, where only exact
  matches count. I.e. write != written etc. But the actual number
  of hits may be much higher - for example write matches exactly
  just 5% documents, but using @@ it matches more than 20%.

  I don't know if that's the actual cause though.


 I tried these queries with the data set you posted here:
 http://www.postgresql.org/message-id/52e4141e.60...@fuzzy.cz. The reason
 for the slowdown is the extra consistent calls it causes. That's expected -
 the patch certainly does call consistent in situations where it didn't
 before, and if the pre-consistent checks are not able to eliminate many
 tuples, you lose.

 So, what can we do to mitigate that? Some ideas:

 1. Implement the catalog changes from Alexander's patch. That ought to
 remove the overhead, as you only need to call the consistent function once,
 not both ways. OTOH, currently we only call the consistent function if
 there is only one unknown column. If with the catalog changes, we always
 call the consistent function even if there are more unknown columns, we
 might end up calling it even more often.

 2. Cache the result of the consistent function.

 3. Make the consistent function cheaper. (How? Magic?)

 4. Use some kind of a heuristic, and stop doing the pre-consistent checks
 if they're not effective. Not sure what the heuristic would look like.


I'm fiddling around with following idea of such heuristic:
1) Sort entries ascending by estimate of TIDs number. Assume number of
entries to be n.
2) Sequentially call ternary consistent with first m of GIN_FALSE and rest
n-m of GIN_MAYBE until it returns GIN_FALSE.
3) Decide whether to use fast-scan depending on number of TIDs in first m
entries and number of TIDs in rest (n-m) entries. Something like
number_of_TIDs_in_frist_m_entries * k  number_of_TIDs_in_rest_n-m_entries
where k is some quotient.
For sure, it rough method, but it should be fast. Without natural
implementation of ternary consistent function algorithm will be slightly
different: only m values close to n should be checked.
I'm going to implement this heuristic against last version of your patch.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] KNN-GiST with recheck

On Tue, Jan 28, 2014 at 9:32 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/28/2014 04:12 PM, Alexander Korotkov wrote:

 On Tue, Jan 28, 2014 at 5:54 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com

 wrote:


  4. (as you mentioned in the other thread: ) It's a modularity violation
 that you peek into the heap tuple from gist. I think the proper way to do
 this would be to extend the IndexScan executor node to perform the
 re-shuffling of tuples that come from the index in wrong order, or
 perhaps
 add a new node type for it.

 Of course that's exactly what your partial sort patch does :-). I haven't
 looked at that in detail, but I don't think the approach the partial sort
 patch takes will work here as is. In the KNN-GiST case, the index is
 returning tuples roughly in the right order, but a tuple that it returns
 might in reality belong somewhere later in the ordering. In the partial
 sort patch, the input stream of tuples is divided into non-overlapping
 groups, so that the tuples within the group are not sorted, but the
 groups
 are. I think the partial sort case is a special case of the KNN-GiST
 case,
 if you consider the lower bound of each tuple to be the leading keys that
 you don't need to sort.


 Yes. But, for instance btree accesses heap for unique checking. Is really
 it so crimilal? :-)


 Well, it is generally considered an ugly hack in b-tree too. I'm not 100%
 opposed to doing such a hack in GiST, but would very much prefer not to.


  This is not only question of a new node or extending existing node. We
 need
 to teach planner/executor access method can return value of some
 expression
 which is lower bound of another expression. AFICS now access method can
 return only original indexed datums and TIDs. So, I afraid that enormous
 infrastructure changes are required. And I can hardly imagine what they
 should look like.


 Yeah, I'm not sure either. Maybe a new field in IndexScanDesc, along with
 xs_itup. Or as an attribute of xs_itup itself.


This shouldn't look like a hack too. Otherwise I see no point of it: it's
better to have some isolated hack in access method than hack in
planner/executor. So I see following changes to be needed to implement this
right way:
1) Implement new relation between operators: operator1 is lower bound of
operator2.
2) Extend am interface to let it return values of operators.
3) Implement new node for knn-sorting.
However, it requires a lot of changes in PostgreSQL infrastructure and can
appear to be not enough general too (we don't know until we have another
application).

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

On Mon, Feb 3, 2014 at 7:24 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 3 Únor 2014, 15:31, Alexander Korotkov wrote:
 
  I found my patch 0005-Ternary-consistent-implementation.patch to be
  completely wrong. It introduces ternary consistent function to opclass,
  but
  don't uses it, because I forgot to include ginlogic.c change into patch.
  So, it shouldn't make any impact on performance. However, testing results
  with that patch significantly differs. That makes me very uneasy. Can we
  now reproduce exact same?

 Do I understand it correctly that the 0005 patch should give exactly the
 same performance as the 9.4-heikki branch (as it was applied on it, and
 effectively did no change). This wasn't exactly what I measured, although
 the differences were not that significant.


Do I undestand correctly it's 9.4-heikki and 9.4-alex-1 here:
http://www.fuzzy.cz/tmp/gin/#
In some queries it differs in times. I wonder why.

I can rerun the tests, if that's what you're asking for. I'll improve the
 test a bit - e.g. I plan to average multiple runs, to filter out random
 noise (which might be significant for such short queries).

  Right version of these two patches in one against current head is
  attached.
  I've rerun tests with it, results are
  /mnt/sas-raid10/gin-testing/queries/9.4-fast-scan-10. Could you rerun
  postprocessing including graph drawing?

 Yes, I'll do that. However I'll have to rerun the other tests too, because
 the
 previous runs were done on a different machine.

 I'm a bit confused right now. The previous patches (0005 + 0007) were
 supposed
 to be applied on top of the 4 from Heikki (0001-0004), right? AFAIK those
 were
 not commited yet, so why is this version against HEAD?

 To summarize, I know of these patch sets:

 9.4-heikki (old version)
 0001-Optimize-GIN-multi-key-queries.patch
 0002-Further-optimize-the-multi-key-GIN-searches.patch
 0003-Further-optimize-GIN-multi-key-searches.patch
 0004-Add-the-concept-of-a-ternary-consistent-check-and-us.patch

 9.4-alex-1 (based on 9.4-heikki)
 0005-Ternary-consistent-implementation.patch

 9.4-alex-1 (based on 9.4-alex-1)
 0006-Sort-entries.patch


 From these patches I only need to compare 9.4-heikki (old version) and
9.4-alex-1 to release my doubts.

9.4-heikki (new version)
 gin-ternary-logic+binary-heap+preconsistent-only-on-new-page.patch

 9.4-alex-2 (new version)
 gin-fast-scan.10.patch.gz

 Or did I get that wrong?


 Only you mentioned 9.4-alex-1 twice. I afraid to have some mess in
numbering.


  Sometimes test cases are not what we expect. For example:
 
  =# explain SELECT id FROM messages WHERE body_tsvector @@
  to_tsquery('english','(5alpha1-initdb''d)');
 QUERY PLAN
 
 
 
   Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
 Recheck Cond: (body_tsvector @@ '''5alpha1-initdb''  ''5alpha1'' 
  ''initdb''  ''d'''::tsquery)
 -  Bitmap Index Scan on messages_body_tsvector_idx  (cost=0.00..84.00
  rows=1 width=0)
   Index Cond: (body_tsvector @@ '''5alpha1-initdb''  ''5alpha1''
 
  ''initdb''  ''d'''::tsquery)
   Planning time: 0.257 ms
  (5 rows)
 
  5alpha1-initdb'd is 3 gin entries with different frequencies.

 Why do you find that strange? The way the query is formed or the way it's
 evaluated?

 The query generator certainly is not perfect, so it may produce some
 strange queries.


I just mean that in this case 3 words doesn't mean 3 gin entries.

  Also, these patches are not intended to change relevance ordering speed.
  When number of results are high, most of time is relevance calculating
 and
  sorting. I propose to remove ORDER BY clause from test cases to see scan
  speed more clear.

 Sure, I can do that. Or maybe one set of queries with ORDER BY, the other
 one without it.


Good.

  I've dump of postgresql.org search queries from Magnus. We can add them
 to
  our test case.

 You mean search queries from the search for mailing list archives? Sure,
 we add that.


Yes. I'll transform it into tsquery and send you privately.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

On Mon, Feb 3, 2014 at 8:19 PM, Tomas Vondra t...@fuzzy.cz wrote:

Sometimes test cases are not what we expect. For example:
  
   =# explain SELECT id FROM messages WHERE body_tsvector @@
   to_tsquery('english','(5alpha1-initdb''d)');
  QUERY PLAN
  
  
 
 
Bitmap Heap Scan on messages  (cost=84.00..88.01 rows=1 width=4)
  Recheck Cond: (body_tsvector @@ '''5alpha1-initdb''  ''5alpha1'' 
   ''initdb''  ''d'''::tsquery)
  -  Bitmap Index Scan on messages_body_tsvector_idx
  (cost=0.00..84.00
   rows=1 width=0)
Index Cond: (body_tsvector @@ '''5alpha1-initdb'' 
  ''5alpha1''
  
   ''initdb''  ''d'''::tsquery)
Planning time: 0.257 ms
   (5 rows)
  
   5alpha1-initdb'd is 3 gin entries with different frequencies.
 
  Why do you find that strange? The way the query is formed or the way
  it's
  evaluated?
 
  The query generator certainly is not perfect, so it may produce some
  strange queries.
 
 
  I just mean that in this case 3 words doesn't mean 3 gin entries.

 Isn't that expected? I mean, that's what to_tsquery may do, right?


Everything is absolutely correct. :-) It just may be not what do you expect
if you aren't getting into details.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] KNN-GiST with recheck

2014-01-28 Thread Alexander Korotkov

On Tue, Jan 28, 2014 at 5:54 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/13/2014 07:17 PM, Alexander Korotkov wrote:

 Here goes a desription of this patch same as in original thread.

 KNN-GiST provides ability to get ordered results from index, but this
 order
 is based only on index information. For instance, GiST index contains
 bounding rectangles for polygons, and we can't get exact distance to
 polygon from index (similar situation is in PostGIS). In attached patch,
 GiST distance method can set recheck flag (similar to consistent method).
 This flag means that distance method returned lower bound of distance and
 we should recheck it from heap.

 See an example.

 create table test as (select id, polygon(3+(random()*10)::int,
 circle(point(random(), random()), 0.0003 + random()*0.001)) as p from
 generate_series(1,100) id);
 create index test_idx on test using gist (p);

 We can get results ordered by distance from polygon to point.

 postgres=# select id, p - point(0.5,0.5) from test order by p -
 point(0.5,0.5) limit 10;
 id   |   ?column?
 +--
   755611 | 0.000405855808916853
   807562 | 0.000464123777564343
   437778 | 0.000738524708741959
   947860 |  0.00076250998760724
   389843 | 0.000886362723569568
17586 | 0.000981960100555216
   411329 |  0.00145338112316853
   894191 |  0.00149399559703506
   391907 |   0.0016647896049741
   235381 |  0.00167554614889509
 (10 rows)

 It's fast using just index scan.

  QUERY PLAN

 
 --
   Limit  (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230
 rows=10 loops=1)
 -  Index Scan using test_idx on test  (cost=0.29..157672.29
 rows=100 width=36) (actual time=0.179..0.228 rows=10 loops=1)
   Order By: (p - '(0.5,0.5)'::point)
   Total runtime: 0.305 ms
 (4 rows)


 Nice! Some thoughts:

 1. This patch introduces a new polygon - point operator. That seems
 useful on its own, with or without this patch.


Yeah, but exact-knn cant come with no one implementation. But it would
better come in a separate patch.

2. I wonder how useful it really is to allow mixing exact and non-exact
 return values from the distance function. The distance function included in
 the patch always returns recheck=true. I have a feeling that all other
 distance function will also always return either true or false.


For geometrical datatypes recheck variations in consistent methods are also
very rare (I can't remember any). But imagine opclass for arrays where keys
have different representation depending on array length. For such opclass
and knn on similarity recheck flag could be useful.



 3. A binary heap would be a better data structure to buffer the rechecked
 values. A Red-Black tree allows random insertions and deletions, but in
 this case you need to insert arbitrary values but only remove the minimum
 item. That's exactly what a binary heap excels at. We have a nice binary
 heap implementation in the backend that you can use, see
 src/backend/lib/binaryheap.c.


Hmm. For me binary heap would be a better data structure for KNN-GiST at
all :-)


 4. (as you mentioned in the other thread: ) It's a modularity violation
 that you peek into the heap tuple from gist. I think the proper way to do
 this would be to extend the IndexScan executor node to perform the
 re-shuffling of tuples that come from the index in wrong order, or perhaps
 add a new node type for it.

 Of course that's exactly what your partial sort patch does :-). I haven't
 looked at that in detail, but I don't think the approach the partial sort
 patch takes will work here as is. In the KNN-GiST case, the index is
 returning tuples roughly in the right order, but a tuple that it returns
 might in reality belong somewhere later in the ordering. In the partial
 sort patch, the input stream of tuples is divided into non-overlapping
 groups, so that the tuples within the group are not sorted, but the groups
 are. I think the partial sort case is a special case of the KNN-GiST case,
 if you consider the lower bound of each tuple to be the leading keys that
 you don't need to sort.


Yes. But, for instance btree accesses heap for unique checking. Is really
it so crimilal? :-)
This is not only question of a new node or extending existing node. We need
to teach planner/executor access method can return value of some expression
which is lower bound of another expression. AFICS now access method can
return only original indexed datums and TIDs. So, I afraid that enormous
infrastructure changes are required. And I can hardly imagine what they
should look like.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

 count(*) from fts_test where fti @@ plainto_tsquery('english',
'gin index select');
 count
───
   627
(1 row)

Time: 3,382 ms

I'm trying to get deeper into it.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan

On Sun, Jan 26, 2014 at 8:14 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 In addition to that, I'm using the ternary consistent function to check
 if minItem is a match, even if we haven't loaded all the entries yet.
 That's less important, but I think for something like rare1 | (rare2 
 frequent) it might be useful. It would allow us to skip fetching
 'frequent', when we already know that 'rare1' matches for the current
 item. I'm not sure if that's worth the cycles, but it seemed like an
 obvious thing to do, now that we have the ternary consistent function.


 So, that clearly isn't worth the cycles :-). At least not with an
 expensive consistent function; it might be worthwhile if we pre-build the
 truth-table, or cache the results of the consistent function.


I believe cache consistent function results is quite same as lazy
truth-table. I think it's a good option to use with two-state consistent
function. However, I don't think it's a reason to refuse from three-state
consistent function because number of entries could be large.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

Hi!

On Tue, Jan 21, 2014 at 3:24 AM, Marti Raudsepp ma...@juffo.org wrote:

 On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov
 aekorot...@gmail.com wrote:
 On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp ma...@juffo.org wrote:
  I've been trying it out in a few situations. I implemented a new
  enable_partialsort GUC to make it easier to turn on/off, this way it's
 a lot
  easier to test. The attached patch applies on top of
 partial-sort-5.patch
 
  I though about such option. Generally not because of testing convenience,
  but because of overhead of planning. This way you implement it is quite
  naive :)

 I don't understand. I had another look at this and cost_sort still
 seems like the best place to implement this, since that's where the
 patch decides how many pre-sorted columns to use. Both mergejoin and
 simple order-by plans call into it. If enable_partialsort=false then I
 skip all pre-sorted options except full sort, making cost_sort behave
 pretty much like it did before the patch.

 I could change pathkeys_common to return 0, but that seems like a
 generic function that shouldn't be tied to partialsort. The old code
 paths called pathkeys_contained_in anyway, which has similar
 complexity. (Apart for initial_cost_mergejoin, but that doesn't seem
 special enough to make an exception for).

 Or should I use?:
   enable_partialsort ? pathkeys_common(...) : 0

  For instance, merge join rely on partial sort which will be
  replaced with simple sort.

 Are you saying that enable_partialsort=off should keep
 partialsort-based mergejoins enabled?

 Or are you saying that merge joins shouldn't use simple sort at all?
 But merge join was already able to use full Sort nodes before your
 patch.


Sorry that I didn't explained it. In particular I mean following:
1) With enable_partialsort = off all mergejoin logic should behave as
without partial sort patch.
2) With partial sort patch get_cheapest_fractional_path_for_pathkeys
function is much more expensive to execute. With enable_partialsort = off
it should be as cheap as without partial sort patch.
I'll try to implement this option in this week.
For now, I have attempt to fix extra columns in mergejoin problem. It would
be nice if you test it.

--
With best regards,
Alexander Korotkov.


partial-sort-7.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

On Tue, Jan 28, 2014 at 7:41 AM, Marti Raudsepp ma...@juffo.org wrote:

 On Mon, Jan 27, 2014 at 9:26 PM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  For now, I have attempt to fix extra columns in mergejoin problem. It
 would
  be nice if you test it.

 Yes, it solves the test cases I was trying with, thanks.

  1) With enable_partialsort = off all mergejoin logic should behave as
  without partial sort patch.
  2) With partial sort patch get_cheapest_fractional_path_for_pathkeys
  function is much more expensive to execute. With enable_partialsort =
 off it
  should be as cheap as without partial sort patch.

 When it comes to planning time, I really don't think you should
 bother. The planner enable_* settings are meant for troubleshooting,
 debugging and learning about the planner. You should not expect people
 to disable them in a production setting. It's not worth complicating
 the code for that rare case.

 This is stated in the documentation
 (http://www.postgresql.org/docs/current/static/runtime-config-query.html)
 and repeatedly on the mailing lists.

 But some benchmarks of planning performance are certainly warranted.


I didn't test it, but I worry that overhead might be high.
If it's true then it could be like constraint_exclusion option which id off
by default because of planning overhead.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part 1: additional information

On Thu, Jan 23, 2014 at 9:27 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Wed, Jan 22, 2014 at 9:28 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 01/22/2014 02:17 PM, Alexander Korotkov wrote:

 We already spent a lot of time with compression. Now we need to figure
 out
 the result we want see. I spent quite long time debugging varbyte
 encoding
 without segments. Finally, it passed very many tests without any
 problems.
 Now, it is just piece of junk. I'm afraid that we will have to
 reimplement
 everything from scratch another two or three times because code doesn't
 look perfect. For sure, it's incompatible with getting something into
 9.4.


 That's a bit harsh :-). But yes, I understand what you're saying. It's
 quite common for large patches like this to be rewritten several times
 before being committed; you just don't know what works best until you've
 tried many designs.


  Remember we have also subsequent fast-scan which is very needed for
 hstore
 and other application.
 I propose to do final decisions now and concentrate forces on making
 committable patch with these decisions. And don't change these decisions
 even if somebody have idea how to improve code readability in 100 times
 and
 potential extendability in 1000 times.
 I propose following decisions:
 1) Leave uncompressed area, allow duplicates in it
 2) Don't introduce Items on page.


 Well, I got the insertions to work now without the uncompressed area, so
 in the spirit of Just Getting This Damn Patch Committed, I'm going to go
 ahead with that. We can add the uncompressed area later if performance
 testing shows it to be necessary. And I agree about the Items on page idea
 - we can come back to that too still in 9.4 timeframe if necessary, but
 probably not.

 So, committed. It's the same design as in the most recent patch I posted,
 but with a bunch of bugs fixed, and cleaned up all over. I'm going to move
 to the fast scan patch now, but please do test and review the committed
 version to see if I missed something.


 Great! Thanks a lot!
 Assertion in dataPlaceToPageLeafRecompress is too strong. Page can contain
 GinDataLeafMaxContentSize bytes. Patch is attached.
 My test-suite don't run correctly. I'm debugging now.


ITSM I found this bug. ginVacuumPostingTreeLeaf re-encodes only some
segments. Others are not even re-palloced. They are moved left
in dataPlaceToPageLeafRecompress by memcpy. But it's incorrect to with
memcpy, proper solution is memmove. Using memcpy platform dependently can
lead to page corruption. Another solution is to re-palloc segments in
ginVacuumPostingTreeLeaf.

--
With best regards,
Alexander Korotkov.


gin-memmove-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GIN improvements part 1: additional information

On Fri, Jan 24, 2014 at 12:50 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 01/24/2014 10:03 AM, Alexander Korotkov wrote:

 ITSM I found this bug. ginVacuumPostingTreeLeaf re-encodes only some
 segments. Others are not even re-palloced. They are moved left
 in dataPlaceToPageLeafRecompress by memcpy. But it's incorrect to with
 memcpy, proper solution is memmove. Using memcpy platform dependently can
 lead to page corruption. Another solution is to re-palloc segments in
 ginVacuumPostingTreeLeaf.


 Good catch. Thanks, committed, changing memcpy to memmove. Will have to
 switch to pallocing everything in the future, if we make leafRepackItems
 smarter, so that it doesn't rewrite all the segments after the first
 modified one.


OK. What about previous fix in assert?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part 1: additional information

On Fri, Jan 24, 2014 at 1:19 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/24/2014 10:53 AM, Alexander Korotkov wrote:

 OK. What about previous fix in assert?


 Ah right, fixed that too now.


Good, now my test-suite passed. Results are so.

Time of operations
 event | period
---+-
 index_build   | 00:01:45.709546
 index_build_recovery  | 00:00:09
 index_update  | 00:05:45.732214
 index_update_recovery | 00:01:17
 search_new| 00:24:02.191049
 search_updated| 00:26:54.122852
(6 rows)

Index sizes
 label |   size
---+---
 new   | 387547136
 after_updates | 610877440
(2 rows)

Before compression results were following.

Time of operations
 event | period
---+-
 index_build   | 00:01:51.116722
 index_build_recovery  | 00:00:08
 index_update  | 00:05:12.124758
 index_update_recovery | 00:01:43
 search_new| 00:23:44.281424
 search_updated| 00:25:41.96251
(6 rows)

Index sizes
 label |size
---+
 new   |  884514816
 after_updates | 1595252736
(2 rows)

So, we can see little regression. However, I'm not sure if it's very
significant.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] GIN improvements part2: fast scan