Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-25 Thread Anastasia Lubennikova
2014-03-18 18:47 GMT+04:00 Robert Haas robertmh...@gmail.com


  If the fetch() is specified by the developer, then using it, algorithm
 can
  retrieve the data directly to output areas at this stage, without
 reference
  to the heap.

 This seems to be the crux of your proposal, but it seems vague: what
 exactly do you mean by retrieve the data directly to output areas?
  What data are you going to retrieve and where are you going to put it?


 I meant Datum that storages in Gist-tree nodes. Now gistgettuple() returns
xs_ctup.t_self (item pointer). I'm going to add index-only scan
functionality: gistsettuple() will return pointer and Datum itself as
xs_itup . So queue will contain both the pointer and the Datum. If
visibilitymap_test returns true then Datum from xs_itup would be added into
queue. Otherwise page must be scanned.

Another question to consider is: which operator classes do you
 anticipate that this will work for and which ones do you anticipate
 that it will not work for?  Any operator class that lossifies that
 input data before storing it in the index is presumably doomed, but
 which ones do that, and which do not?


about amcanreturn:
I'm going to create function gistcanreturn() = If fetch() is defined for
all indexed columns?

And last point of my project is to implement fetch() for existing opclasses
based on GIST.

-- 
Best regards,
Lubennikova Anastasia


[HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Anastasia Lubennikova
Hello!
Here is the text of my proposal which I've applied to GSoC.
(and link
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120)
Any suggestions and comments are welcome.

*Project name*

Support for index-only scans for GIST index



*Brief review*

Currently GiST index don't have index-only scan functionality. Index-only
scans are a major performance feature added to Postgres 9.2. They allow
certain types of queries to be satisfied just by retrieving data from
indexes, and not from tables. This feature for B-trees (implemented since
PostgreSQL-9.2) allows doing certain types of queries significantly faster.
This is achieved by reduction in the amount of I/O necessary to satisfy
queries. I think it will be useful to implement this feature for user
defined data types that use GiST index.



*Benefits to the PostgreSQL Community*

Faster GiST index search would be actual for many PostgreSQL applications
(for example some GIS systems).


 *Quantifiable results*

Acceleration of GiST index search.


*Project details*

1. The GiST is a balanced tree structure like a B-tree, containing key,
pointer pairs. GiST key is a member of a user-defined class, and
represents some property that is true of all data items reachable from the
pointer associated with the key. The GiST provides a possibility to create
custom data types with indexed access methods and extensible set of queries.

There are seven methods that an index operator class for GiST must provide,
and an eighth that is optional.

-consistent

-union

-compress

-decompress

-penalty

-picksplit

-equal

-distance (optional)

I'm going to create new optional method fetch() in addition to existing.
Thus user can create a method of retrieving data from the index without
loss. It will be used when performing search queries to speed data
retrieving.



2. gistget algorithm (several parts omitted):

Check the key
gistindex_keytest() - does this index tuple satisfy the scan key(s)?

Scan all items on the GiST index page and insert them into the queue (or
directly to output areas)

plain scan

Heap tuple TIDs are returned into so-pageData[]

ordered scan

Heap tuple TIDs are pushed into individual search queue items

If the fetch() is specified by the developer, then using it, algorithm can
retrieve the data directly to output areas at this stage, without reference
to the heap.


3. Create function gistcanreturn to check whether fetch() is specified by
user.

Amcanreturn - Function to check whether index supports index-only scans, or
zero if none

There is the question of support index-only scans for multicolumn indexes.
Probably it would require extend the interface to add separate columns
checking.

To solve this question I need the community's help.


4. Add details for index only scans into gistcostestimate function



 *Links*

1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
trees for database systems. - September, 1995.

2) http://www.sai.msu.su/~megera/postgres/gist/

3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
Indexes.

-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Robert Haas
On Tue, Mar 18, 2014 at 9:12 AM, Anastasia Lubennikova
lubennikov...@gmail.com wrote:
 Support for index-only scans for GIST index

This is a cool idea, if it can be made to work.

 If the fetch() is specified by the developer, then using it, algorithm can
 retrieve the data directly to output areas at this stage, without reference
 to the heap.

This seems to be the crux of your proposal, but it seems vague: what
exactly do you mean by retrieve the data directly to output areas?
What data are you going to retrieve and where are you going to put it?

Another question to consider is: which operator classes do you
anticipate that this will work for and which ones do you anticipate
that it will not work for?  Any operator class that lossifies that
input data before storing it in the index is presumably doomed, but
which ones do that, and which do not?

Tom Lane previously proposed extending SP-GiST to support index-only
scans.  You might find that discussing worth reading, or perhaps
consider it as an alternative if GiST doesn't work out:

http://www.postgresql.org/message-id/10839.1323885...@sss.pgh.pa.us

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Tom Lane previously proposed extending SP-GiST to support index-only
 scans.  You might find that discussing worth reading, or perhaps
 consider it as an alternative if GiST doesn't work out:
 http://www.postgresql.org/message-id/10839.1323885...@sss.pgh.pa.us

That wasn't just a proposal, see commits
3695a555136a6d179cac8ae48d5f90171d5b30e9 and
92203624934095163f8b57b5b3d7bbd2645da2c8.  But yeah, that might be a
useful reference for what is likely to be involved with making GIST
do it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Josh Berkus
Alexander,

Can you comment on the below proposal?  I'd like your opinion on how
difficult it will be.

On 03/18/2014 06:12 AM, Anastasia Lubennikova wrote:
 Hello!
 Here is the text of my proposal which I've applied to GSoC.
 (and link
 http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120)
 Any suggestions and comments are welcome.
 
 *Project name*
 
 Support for index-only scans for GIST index
 
 
 
 *Brief review*
 
 Currently GiST index don't have index-only scan functionality. Index-only
 scans are a major performance feature added to Postgres 9.2. They allow
 certain types of queries to be satisfied just by retrieving data from
 indexes, and not from tables. This feature for B-trees (implemented since
 PostgreSQL-9.2) allows doing certain types of queries significantly faster.
 This is achieved by reduction in the amount of I/O necessary to satisfy
 queries. I think it will be useful to implement this feature for user
 defined data types that use GiST index.
 
 
 
 *Benefits to the PostgreSQL Community*
 
 Faster GiST index search would be actual for many PostgreSQL applications
 (for example some GIS systems).
 
 
  *Quantifiable results*
 
 Acceleration of GiST index search.
 
 
 *Project details*
 
 1. The GiST is a balanced tree structure like a B-tree, containing key,
 pointer pairs. GiST key is a member of a user-defined class, and
 represents some property that is true of all data items reachable from the
 pointer associated with the key. The GiST provides a possibility to create
 custom data types with indexed access methods and extensible set of queries.
 
 There are seven methods that an index operator class for GiST must provide,
 and an eighth that is optional.
 
 -consistent
 
 -union
 
 -compress
 
 -decompress
 
 -penalty
 
 -picksplit
 
 -equal
 
 -distance (optional)
 
 I'm going to create new optional method fetch() in addition to existing.
 Thus user can create a method of retrieving data from the index without
 loss. It will be used when performing search queries to speed data
 retrieving.
 
 
 
 2. gistget algorithm (several parts omitted):
 
 Check the key
 gistindex_keytest() - does this index tuple satisfy the scan key(s)?
 
 Scan all items on the GiST index page and insert them into the queue (or
 directly to output areas)
 
 plain scan
 
 Heap tuple TIDs are returned into so-pageData[]
 
 ordered scan
 
 Heap tuple TIDs are pushed into individual search queue items
 
 If the fetch() is specified by the developer, then using it, algorithm can
 retrieve the data directly to output areas at this stage, without reference
 to the heap.
 
 
 3. Create function gistcanreturn to check whether fetch() is specified by
 user.
 
 Amcanreturn - Function to check whether index supports index-only scans, or
 zero if none
 
 There is the question of support index-only scans for multicolumn indexes.
 Probably it would require extend the interface to add separate columns
 checking.
 
 To solve this question I need the community's help.
 
 
 4. Add details for index only scans into gistcostestimate function
 
 
 
  *Links*
 
 1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
 trees for database systems. - September, 1995.
 
 2) http://www.sai.msu.su/~megera/postgres/gist/
 
 3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
 Indexes.
 


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Alexander Korotkov
On Tue, Mar 18, 2014 at 5:12 PM, Anastasia Lubennikova 
lubennikov...@gmail.com wrote:

 2. gistget algorithm (several parts omitted):

 Check the key
 gistindex_keytest() - does this index tuple satisfy the scan key(s)?

 Scan all items on the GiST index page and insert them into the queue (or
 directly to output areas)

 plain scan

 Heap tuple TIDs are returned into so-pageData[]

 ordered scan

 Heap tuple TIDs are pushed into individual search queue items

 If the fetch() is specified by the developer, then using it, algorithm can
 retrieve the data directly to output areas at this stage, without reference
 to the heap.


I think there are following changes to be made to GiST code:
1) When next consistent IndexTuple is found extract original Datums using
fetch method.
2) Put those original Datums to queue.
3) When returning next ItemPointer from queue put original Datums to
IndexScanDesc (into xs_itup).

 3. Create function gistcanreturn to check whether fetch() is specified by
 user.

 Amcanreturn - Function to check whether index supports index-only scans,
 or zero if none

 There is the question of support index-only scans for multicolumn indexes.
 Probably it would require extend the interface to add separate columns
 checking.

 To solve this question I need the community's help.


 4. Add details for index only scans into gistcostestimate function


Also, another part of work to be mentioned is to implement fetch function
for all suitable opclasses.


With best regards,
Alexander Korotkov.


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Alexander Korotkov
Josh,

Anastasia has already consulted to me in person. It is not big proposal.
But for newbie who is not familiar with PostgreSQL code base and especially
GiST it seems fair enough.


With best regards,
Alexander Korotkov.

On Tue, Mar 18, 2014 at 9:16 PM, Josh Berkus j...@agliodbs.com wrote:

 Alexander,

 Can you comment on the below proposal?  I'd like your opinion on how
 difficult it will be.

 On 03/18/2014 06:12 AM, Anastasia Lubennikova wrote:
  Hello!
  Here is the text of my proposal which I've applied to GSoC.
  (and link
 
 http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120
 )
  Any suggestions and comments are welcome.
 
  *Project name*
 
  Support for index-only scans for GIST index
 
 
 
  *Brief review*
 
  Currently GiST index don't have index-only scan functionality. Index-only
  scans are a major performance feature added to Postgres 9.2. They allow
  certain types of queries to be satisfied just by retrieving data from
  indexes, and not from tables. This feature for B-trees (implemented since
  PostgreSQL-9.2) allows doing certain types of queries significantly
 faster.
  This is achieved by reduction in the amount of I/O necessary to satisfy
  queries. I think it will be useful to implement this feature for user
  defined data types that use GiST index.
 
 
 
  *Benefits to the PostgreSQL Community*
 
  Faster GiST index search would be actual for many PostgreSQL applications
  (for example some GIS systems).
 
 
   *Quantifiable results*
 
  Acceleration of GiST index search.
 
 
  *Project details*
 
  1. The GiST is a balanced tree structure like a B-tree, containing key,
  pointer pairs. GiST key is a member of a user-defined class, and
  represents some property that is true of all data items reachable from
 the
  pointer associated with the key. The GiST provides a possibility to
 create
  custom data types with indexed access methods and extensible set of
 queries.
 
  There are seven methods that an index operator class for GiST must
 provide,
  and an eighth that is optional.
 
  -consistent
 
  -union
 
  -compress
 
  -decompress
 
  -penalty
 
  -picksplit
 
  -equal
 
  -distance (optional)
 
  I'm going to create new optional method fetch() in addition to existing.
  Thus user can create a method of retrieving data from the index without
  loss. It will be used when performing search queries to speed data
  retrieving.
 
 
 
  2. gistget algorithm (several parts omitted):
 
  Check the key
  gistindex_keytest() - does this index tuple satisfy the scan key(s)?
 
  Scan all items on the GiST index page and insert them into the queue (or
  directly to output areas)
 
  plain scan
 
  Heap tuple TIDs are returned into so-pageData[]
 
  ordered scan
 
  Heap tuple TIDs are pushed into individual search queue items
 
  If the fetch() is specified by the developer, then using it, algorithm
 can
  retrieve the data directly to output areas at this stage, without
 reference
  to the heap.
 
 
  3. Create function gistcanreturn to check whether fetch() is specified by
  user.
 
  Amcanreturn - Function to check whether index supports index-only scans,
 or
  zero if none
 
  There is the question of support index-only scans for multicolumn
 indexes.
  Probably it would require extend the interface to add separate columns
  checking.
 
  To solve this question I need the community's help.
 
 
  4. Add details for index only scans into gistcostestimate function
 
 
 
   *Links*
 
  1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
  trees for database systems. - September, 1995.
 
  2) http://www.sai.msu.su/~megera/postgres/gist/
 
  3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
  Indexes.
 


 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com



Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Josh Berkus
On 03/18/2014 12:11 PM, Alexander Korotkov wrote:
 Josh,
 
 Anastasia has already consulted to me in person. It is not big proposal.
 But for newbie who is not familiar with PostgreSQL code base and especially
 GiST it seems fair enough.
 

Thanks, that's what I wanted to know.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers