Re: Phoenix ignoring index and index hint with some queries over mapped hbase tables.
On Fri, Mar 3, 2017 at 7:52 PM, James Taylorwrote: > Hi Marcin, > There's a few things going on here: > 1) An index created on a view over HBase data won't be maintained by > Phoenix. You'd need to maintain it yourself through some other external > means. If you create a table that maps to your HBase data, then it will be > maintained. UPSERTING data into the view will also work, right? > 2) An index would only be used if you match against a constant on the > right-hand side (while you're matching against the "number" column). For > example, the following query would use the index and limit the scan to only > rows in which "number" starts with 'Queen': > > 0: jdbc:phoenix:> explain select * from "traces" where > regexp_substr("number", 'Q.*') = 'Queen'; > +--+ > |PLAN > | > +--+ > | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER traces > [1,'Queen'] | > | SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*', 1) > = | > +--+ > > Note that with local indexes, interpreting when the index is used is a bit > subtle, but you'll see a [1, ...] after the RANGE SCAN as an indication. The > index may be used if it's a full table scan (since the data contained in the > index table may be smaller than that in the data table), but that won't > buying you very much. > 3) The index would only be used if your REGEXP_SUBSTR has a constant string > before any wildcard matches in the second argument. You could also > potentially use a function index [1], but it'd only use the index if the > REGEXP_SUBSTR makes the same exact call (i.e. same pattern argument) as was > used when the functional index was created. > > HTH. Thanks, > > James > > [1] https://phoenix.apache.org/secondary_indexing.html#Functional_Indexes So, if I understand correctly, my best bet would be to use some heuristics to hopefully extract a coarse prefix range from the regex? That way I can do a preliminary range scan on the index and refine the filtering on the returned data. I assume I could implement this as a UDF with an appropriate newKeyPart method? > > On Fri, Mar 3, 2017 at 12:59 AM, Marcin Januszkiewicz > wrote: >> >> Hi, >> >> I have a table in hbase and created a view of it in phoenix, along >> with a local index: >> >> create view "traces" (rowkey VARCHAR PRIMARY KEY, "cf"."time" VARCHAR, >> "cf"."number" VARCHAR, "cf"."class" VARCHAR); >> create local index idxf on "dmstab_vehicle_traces" ("cf"."number", >> "cf"."class", rowkey); >> >> I need to filter rows based on a regex condition on the "number" >> column, so I use queries like these: >> >> explain select * from "traces" where regexp_substr("number", 'Q.*') = >> "number"; >> >> CLIENT 2-CHUNK PARALLEL 2-WAY ROUND ROBIN RANGE SCAN OVER traces [2] >> SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*', >> 1) = "number" >> >> >> It's a little ugly and less efficient than using LIKE, but the >> performance is still relatively acceptable thanks to the index. >> >> However, if I want to range of rowkeys to include, Phoenix stops using >> the index, which slows down the query significantly: >> >> explain select * from "traces" where regexp_substr("number", 'Q.*') = >> "number" and rowkey < 'BY'; >> >> CLIENT 4-CHUNK 687413 ROWS 629145681 BYTES PARALLEL 1-WAY ROUND ROBIN >> RANGE SCAN OVER traces [*] - ['BY'] >> SERVER FILTER BY REGEXP_SUBSTR(cf."number", 'Q.*', 1) = cf."number" >> >> Using an index hint doesn't change anything. Is there a way to make >> this work, and is this a bug? > >
Re: Phoenix ignoring index and index hint with some queries over mapped hbase tables.
Hi Marcin, There's a few things going on here: 1) An index created on a view over HBase data won't be maintained by Phoenix. You'd need to maintain it yourself through some other external means. If you create a table that maps to your HBase data, then it will be maintained. 2) An index would only be used if you match against a constant on the right-hand side (while you're matching against the "number" column). For example, the following query would use the index and limit the scan to only rows in which "number" starts with 'Queen': 0: jdbc:phoenix:> explain select * from "traces" where regexp_substr("number", 'Q.*') = 'Queen'; +--+ |PLAN | +--+ | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER traces [1,'Queen'] | | SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*', 1) = | +--+ Note that with local indexes, interpreting when the index is used is a bit subtle, but you'll see a [1, ...] after the RANGE SCAN as an indication. The index may be used if it's a full table scan (since the data contained in the index table may be smaller than that in the data table), but that won't buying you very much. 3) The index would only be used if your REGEXP_SUBSTR has a constant string before any wildcard matches in the second argument. You could also potentially use a function index [1], but it'd only use the index if the REGEXP_SUBSTR makes the same exact call (i.e. same pattern argument) as was used when the functional index was created. HTH. Thanks, James [1] https://phoenix.apache.org/secondary_indexing.html#Functional_Indexes On Fri, Mar 3, 2017 at 12:59 AM, Marcin Januszkiewicz < katamaran...@gmail.com> wrote: > Hi, > > I have a table in hbase and created a view of it in phoenix, along > with a local index: > > create view "traces" (rowkey VARCHAR PRIMARY KEY, "cf"."time" VARCHAR, > "cf"."number" VARCHAR, "cf"."class" VARCHAR); > create local index idxf on "dmstab_vehicle_traces" ("cf"."number", > "cf"."class", rowkey); > > I need to filter rows based on a regex condition on the "number" > column, so I use queries like these: > > explain select * from "traces" where regexp_substr("number", 'Q.*') = > "number"; > > CLIENT 2-CHUNK PARALLEL 2-WAY ROUND ROBIN RANGE SCAN OVER traces [2] > SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*', > 1) = "number" > > > It's a little ugly and less efficient than using LIKE, but the > performance is still relatively acceptable thanks to the index. > > However, if I want to range of rowkeys to include, Phoenix stops using > the index, which slows down the query significantly: > > explain select * from "traces" where regexp_substr("number", 'Q.*') = > "number" and rowkey < 'BY'; > > CLIENT 4-CHUNK 687413 ROWS 629145681 BYTES PARALLEL 1-WAY ROUND ROBIN > RANGE SCAN OVER traces [*] - ['BY'] > SERVER FILTER BY REGEXP_SUBSTR(cf."number", 'Q.*', 1) = cf."number" > > Using an index hint doesn't change anything. Is there a way to make > this work, and is this a bug? >
Phoenix ignoring index and index hint with some queries over mapped hbase tables.
Hi, I have a table in hbase and created a view of it in phoenix, along with a local index: create view "traces" (rowkey VARCHAR PRIMARY KEY, "cf"."time" VARCHAR, "cf"."number" VARCHAR, "cf"."class" VARCHAR); create local index idxf on "dmstab_vehicle_traces" ("cf"."number", "cf"."class", rowkey); I need to filter rows based on a regex condition on the "number" column, so I use queries like these: explain select * from "traces" where regexp_substr("number", 'Q.*') = "number"; CLIENT 2-CHUNK PARALLEL 2-WAY ROUND ROBIN RANGE SCAN OVER traces [2] SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*', 1) = "number" It's a little ugly and less efficient than using LIKE, but the performance is still relatively acceptable thanks to the index. However, if I want to range of rowkeys to include, Phoenix stops using the index, which slows down the query significantly: explain select * from "traces" where regexp_substr("number", 'Q.*') = "number" and rowkey < 'BY'; CLIENT 4-CHUNK 687413 ROWS 629145681 BYTES PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER traces [*] - ['BY'] SERVER FILTER BY REGEXP_SUBSTR(cf."number", 'Q.*', 1) = cf."number" Using an index hint doesn't change anything. Is there a way to make this work, and is this a bug?