Hi Istvan, I agree that instead of using dummy cells, we should rely on keepalive/cursor mechanics. We have been working towards that. As part of PHOENIX-7707, I propagated the scanner context all the way down to phoenix scanners. We can leverage that.
Tanuj On Wed, 19 Nov 2025 at 20:09, Istvan Toth <[email protected]> wrote: > Thanks for your thoughtful response, Viraj. > > I have added my thoughts below. > > On Wed, Nov 19, 2025 at 2:38 PM Viraj Jasani <[email protected]> wrote: > > > We also need to understand: what happens when hbase client gets heartbeat > > and the region moves? > > > > I have checked that code in HBase, and the HBase client seems to handle > this case transparently. > We may of course find bugs, but handling that is part of the design. > > > > > > On Wed, Nov 19, 2025 at 7:05 PM Viraj Jasani <[email protected]> wrote: > > > > > Istvan, I think we should also involve dev@hbase and see what > guidelines > > > we are recommending so far for coprocs that would like to implement > > timeout > > > features for long running scans, wdyt? > > > > Based on my current understanding, if the Scan / ScannerContext is > correctly set up (allows partial rows, sets the time limit and requests a > cursor), > HBase will honor that and the Scan will return a heartbeat result when it > times out. > > I THINK that's all we need. Of course if we get stuck we should ask for > help. > > > > > > > > On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani <[email protected]> > wrote: > > > > > >> Thank you for starting this thread, Istvan! > > >> > > >> This is an important issue. I have recently come across data > correctness > > >> issues with PHOENIX-7733, to be fixed by HBASE-29722. This also got me > > >> thinking about the heartbeat and dummy cell overlap leading to > possible > > >> data correctness issues. > > >> > > >> > I propose dropping the dummy cell mechanics from Phoenix, and using > > the > > >> > HBase keepalive/cursor mechanics instead (we may not even need the > > >> cursors). > > >> > > >> +1 > > >> > > >> > If we cannot find a better way to shortcut some processing in > Phoenix > > we > > >> > may need to keep dummy cells internally, but we have to make sure > that > > >> they > > >> > never appear on the wire and reach the client. > > >> > > >> I don't think it is possible for Phoenix to ensure a dummy cell never > > >> reaches the HBase client. > > > > I think if nothing else works, we can still catch and filter/convert them > in RegionObserver.postScannerNext(). > Of course ideally we would never generate any Dummy cells in the first > place. > > > > >> > > >> > in that case we'd need > > >> > to check and convert to a heartbeat scan result somehow > > >> > > >> This needs changes in HBase only, which I don't think HBase would > > >> (should) allow. > > >> > > >> > Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x > > >> clients > > >> > to Hbase 3 even a possibility ? > > >> > > >> Yes, wire compatibility is important. When this happens, the only > thing > > >> we can do is set the page timeout high enough that we never have to > send > > >> the dummy result to the client, or disable the paging feature. > > >> > > >> > > >> On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth <[email protected]> > wrote: > > >> > > >>> I've been struggling with errors on the region moving tests on my > HBase > > >>> 3.0 > > >>> WIP branch and have finally tracked the problems down to Phoenix's > > dummy > > >>> Cells (as well as some built-in assumptions in Phoenix which are not > > true > > >>> for Hbase 3, see PHOENIX-7728 > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>) > > >>> > > >>> HBase is not aware that these are dummy cells, and is considering the > > >>> rows > > >>> as already processed when retrying scans after the region goes away > > from > > >>> under the scan, i.e. it restarts the scan from AFTER the dummy cell's > > >>> rowkey, leading to the scan skipping rows. > > >>> > > >>> I have been able to fix the tests by hacking Hbase to ignore these > > dummy > > >>> cells (and fixing the phoenix side problems described in PHOENIX-7728 > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>), but I don't > > think > > >>> that hacking HBase to work with dummy cells is the way to go (or even > > if > > >>> that would be accepted by HBase). > > >>> > > >>> AFAIU the dummy cells were added back in the HBase 1.x when there was > > no > > >>> other way to ensure timely responses from the server. > > >>> > > >>> HBase 2 has introduced the keepalive/cursor mechanics, which IUC > serves > > >>> the > > >>> exact same purpose at the Phoenix dummy cells. > > >>> > > >>> I propose dropping the dummy cell mechanics from Phoenix, and using > the > > >>> HBase keepalive/cursor mechanics instead (we may not even need the > > >>> cursors). > > >>> > > >>> If we cannot find a better way to shortcut some processing in Phoenix > > we > > >>> may need to keep dummy cells internally, but we have to make sure > that > > >>> they > > >>> never appear on the wire and reach the client. (i.e. in that case > we'd > > >>> need > > >>> to check and convert to a heartbeat scan result somehow) > > >>> > > >>> We will also need to consider backwards compatibility. > > >>> > > >>> Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x > > >>> clients > > >>> to Hbase 3 even a possibility ? > > >>> > > >>> Do we want to support that ? > > >>> > > >>> When using Hbase 2.x, if Phoenix starts to use the HBase keepalive > > >>> mechanics, will old clients work with that without changes, or do we > > need > > >>> to keep sending Dummy cells for older clients ? > > >>> > > >>> Looking forward to hearing your take, > > >>> > > >>> Istvan > > >>> > > >> > > > > > -- > *István Tóth* | Sr. Staff Software Engineer > *Email*: [email protected] > cloudera.com <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera > on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > ------------------------------ >
