Istvan, I think we should also involve dev@hbase and see what guidelines we are recommending so far for coprocs that would like to implement timeout features for long running scans, wdyt?
On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani <[email protected]> wrote: > Thank you for starting this thread, Istvan! > > This is an important issue. I have recently come across data correctness > issues with PHOENIX-7733, to be fixed by HBASE-29722. This also got me > thinking about the heartbeat and dummy cell overlap leading to possible > data correctness issues. > > > I propose dropping the dummy cell mechanics from Phoenix, and using the > > HBase keepalive/cursor mechanics instead (we may not even need the > cursors). > > +1 > > > If we cannot find a better way to shortcut some processing in Phoenix we > > may need to keep dummy cells internally, but we have to make sure that > they > > never appear on the wire and reach the client. > > I don't think it is possible for Phoenix to ensure a dummy cell never > reaches the HBase client. > > > in that case we'd need > > to check and convert to a heartbeat scan result somehow > > This needs changes in HBase only, which I don't think HBase would (should) > allow. > > > Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x > clients > > to Hbase 3 even a possibility ? > > Yes, wire compatibility is important. When this happens, the only thing we > can do is set the page timeout high enough that we never have to send the > dummy result to the client, or disable the paging feature. > > > On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth <[email protected]> wrote: > >> I've been struggling with errors on the region moving tests on my HBase >> 3.0 >> WIP branch and have finally tracked the problems down to Phoenix's dummy >> Cells (as well as some built-in assumptions in Phoenix which are not true >> for Hbase 3, see PHOENIX-7728 >> <https://issues.apache.org/jira/browse/PHOENIX-7728>) >> >> HBase is not aware that these are dummy cells, and is considering the rows >> as already processed when retrying scans after the region goes away from >> under the scan, i.e. it restarts the scan from AFTER the dummy cell's >> rowkey, leading to the scan skipping rows. >> >> I have been able to fix the tests by hacking Hbase to ignore these dummy >> cells (and fixing the phoenix side problems described in PHOENIX-7728 >> <https://issues.apache.org/jira/browse/PHOENIX-7728>), but I don't think >> that hacking HBase to work with dummy cells is the way to go (or even if >> that would be accepted by HBase). >> >> AFAIU the dummy cells were added back in the HBase 1.x when there was no >> other way to ensure timely responses from the server. >> >> HBase 2 has introduced the keepalive/cursor mechanics, which IUC serves >> the >> exact same purpose at the Phoenix dummy cells. >> >> I propose dropping the dummy cell mechanics from Phoenix, and using the >> HBase keepalive/cursor mechanics instead (we may not even need the >> cursors). >> >> If we cannot find a better way to shortcut some processing in Phoenix we >> may need to keep dummy cells internally, but we have to make sure that >> they >> never appear on the wire and reach the client. (i.e. in that case we'd >> need >> to check and convert to a heartbeat scan result somehow) >> >> We will also need to consider backwards compatibility. >> >> Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x clients >> to Hbase 3 even a possibility ? >> >> Do we want to support that ? >> >> When using Hbase 2.x, if Phoenix starts to use the HBase keepalive >> mechanics, will old clients work with that without changes, or do we need >> to keep sending Dummy cells for older clients ? >> >> Looking forward to hearing your take, >> >> Istvan >> >
