[ https://issues.apache.org/jira/browse/PHOENIX-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kadir Ozdemir resolved PHOENIX-6832. ------------------------------------ Fix Version/s: 5.2.0 Resolution: Fixed > Uncovered Global Secondary Indexes > ---------------------------------- > > Key: PHOENIX-6832 > URL: https://issues.apache.org/jira/browse/PHOENIX-6832 > Project: Phoenix > Issue Type: Improvement > Reporter: Kadir Ozdemir > Assignee: Kadir Ozdemir > Priority: Major > Fix For: 5.2.0 > > > An index can be called an uncovered index if the index cannot serve a query > alone. The sole purpose of an uncovered index would be identifying the data > table rows to be scanned for the query. This implies that the DDL for an > uncovered index does not have the INCLUDE clause. > Then an index is called a covered index if the index can serve a query alone. > Please note that a covered index does not mean that it can cover all queries. > It just means that it can cover a query. A covered index can still cover some > queries even if the index DDL does not have the INCLUDE clause. This is > because a given query may reference only PK and/or indexed columns, and thus > a covered index without any included columns can serve this query by itself > (i.e., without joining index rows with data table rows). Another use case > for covered indexes without included columns is the count(*) queries. > Currently Phoenix uses indexes for count(*) queries by default. > Since uncovered indexes will be used to identify data table rows affected by > a given query and the column values will be picked up from the data table, we > can provide a solution that is much simpler than the solution for covered > indexes by taking the advantage of the fact that the data table is the source > of truth, and an index table is used to only map secondary keys to the > primary keys to eliminate full table scans. The correctness of such a > solution is ensured if for every data table row, there exists an index row. > Then our solution to update the data tables and their indexes in a consistent > fashion for global secondary indexes would be a two-phase update approach, > where we first insert the index table rows, and only if they are successful, > then we update the data table rows. > This approach does not require reading the existing data table rows which is > currently required for covered indexes. Also, it does not require two-phase > commit writes for updating and maintaining global secondary index table rows. > Eliminating a data table read operation and an RPC call to update the index > row verification status on the corresponding index row would cut down index > write latency overhead by at least 50% for global uncovered indexes when > compared to global covered indexes. This is because global covered indexes > require one data table read and two index write operations for every data > table update whereas global uncovered indexes would require only one index > write. For batch writes, the expected performance and latency improvement > would be much higher than 50% since a batch of random row updates would not > anymore require random seeks on the data table for reading existing data > table rows. > PHOENIX-6458, PHOENIX-6501 and PHOENIX-6663 improve the performance and > efficiency of joining index rows with their data table rows when a covered > index cannot cover a given query. We can further leverage it to support > uncovered indexes. > The uncovered indexes would be a significant performance improvement for > write intensive workloads. Also a common use case where uncovered indexes > will be desired is the upsert select use case on the data table, where a > subset of rows are updated in a batch. In this use case, the select query > performance is greatly improved via a covered index but the upsert part > suffers due to the covered index write overhead especially when the selected > data table rows are not consecutively stored on disk which is the most common > case. > As mentioned before, the DDL for index creation does not include the INCLUDE > clause. We can add the UNCOVERED keyword to indicate the index to be created > is an uncovered index, for example, CREATE UNCOVERED INDEX. > As in the case of covered indexes, we can do read repair for uncovered > indexes too. The difference is that instead of using the verify status for > index rows, we would check if the corresponding data table row exists for a > given index row. Since we would always retrieve the data table rows to join > back with index rows for uncovered indexes, the read repair cost would occur > only for deleting invalid index rows. Also, the existing index reverse > verification and repair feature supported by IndexTool can be used to do bulk > repair operations from time to time. -- This message was sent by Atlassian Jira (v8.20.10#820010)