Hi Kevin, Did you get an answer to your question, maybe over on hbase-user?
As it seems you're aware, HBase is built on a single index -- the rowkey. You may be able to implement something like MySQL's composite indexing on HBase if the algorithm can be mapped to a 1-dimensional linear index. You would have to implement this yourself as HBase doesn't offer this out of the box. Such an encoding would be an interesting contribution to HBase, it might sit over next to our other data encoding "types" in `org.apache.hadoop.hbase.types`. As for why your filtered queries are slow, you're the best person to start answering that question. Is your data local to the region server that's hosting it, or do you have multiple network hops and service serialize/deserialize steps in your hot path? Is your index optimized for your query (sounds like maybe not, based on the first question)? Have you seen the Profiling Servlet [0]? You can start by setting that up, isolating the workload, and collecting some FlameGraphs to analyze. Thanks, Nick [0]: https://hbase.apache.org/book.html#profiler On Mon, Apr 12, 2021 at 10:26 AM Kevin Wright <kevinwright1...@gmail.com> wrote: > Hi! > > Our application requires fast read queries that specify two ranges. One > range on timestamps, and another on ids. We are currently using Apache > HBase as our db, but we’re unsure how to optimally design the row keys / > schemas. Currently, scanning over row key (the ids) with filter on > timeranges is taking more time than what we expect. A normal query would > probably have say 200 rows that match the id range, and about 10 rows that > match both ranges, and we have currently on the order of 10s of millions of > rows. > > We’re wondering if there’s something we can do to increase throughput with > HBase (e.g., is there something like composite indexing like in MySQL?). > Not sure if this is the best place to ask this, but if anyone could point > us to the right direction, that would be great! > > Thank you! >