Thanks for your thoughtful response, Viraj.

I have added my thoughts below.

On Wed, Nov 19, 2025 at 2:38 PM Viraj Jasani <[email protected]> wrote:

> We also need to understand: what happens when hbase client gets heartbeat
> and the region moves?
>
> I have checked that code in HBase, and the HBase client seems to handle
this case transparently.
We may of course find bugs, but handling that is part of the design.


>
> On Wed, Nov 19, 2025 at 7:05 PM Viraj Jasani <[email protected]> wrote:
>
> > Istvan, I think we should also involve dev@hbase and see what guidelines
> > we are recommending so far for coprocs that would like to implement
> timeout
> > features for long running scans, wdyt?
>

Based on my current understanding, if the Scan / ScannerContext is
correctly set up (allows partial rows, sets the time limit and requests a
cursor),
HBase will honor that and the Scan will return a heartbeat result when it
times out.

I THINK that's all we need. Of course if we get stuck we should ask for
help.


> >
> > On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani <[email protected]> wrote:
> >
> >> Thank you for starting this thread, Istvan!
> >>
> >> This is an important issue. I have recently come across data correctness
> >> issues with PHOENIX-7733, to be fixed by HBASE-29722. This also got me
> >> thinking about the heartbeat and dummy cell overlap leading to possible
> >> data correctness issues.
> >>
> >> > I propose dropping the dummy cell mechanics from Phoenix, and using
> the
> >> > HBase keepalive/cursor mechanics instead (we may not even need the
> >> cursors).
> >>
> >> +1
> >>
> >> > If we cannot find a better way to shortcut some processing in Phoenix
> we
> >> > may need to keep dummy cells internally, but we have to make sure that
> >> they
> >> > never appear on the wire and reach the client.
> >>
> >> I don't think it is possible for Phoenix to ensure a dummy cell never
> >> reaches the HBase client.
>

I think if nothing else works, we can still catch and filter/convert them
in RegionObserver.postScannerNext().
Of course ideally we would never generate any Dummy cells in the first
place.


> >>
> >> > in that case we'd need
> >> > to check and convert to a heartbeat scan result somehow
> >>
> >> This needs changes in HBase only, which I don't think HBase would
> >> (should) allow.
> >>
> >> > Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x
> >> clients
> >> > to Hbase 3 even a possibility ?
> >>
> >> Yes, wire compatibility is important. When this happens, the only thing
> >> we can do is set the page timeout high enough that we never have to send
> >> the dummy result to the client, or disable the paging feature.
> >>
> >>
> >> On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth <[email protected]> wrote:
> >>
> >>> I've been struggling with errors on the region moving tests on my HBase
> >>> 3.0
> >>> WIP branch and have finally tracked the problems down to Phoenix's
> dummy
> >>> Cells (as well as some built-in assumptions in Phoenix which are not
> true
> >>> for Hbase 3, see PHOENIX-7728
> >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>)
> >>>
> >>> HBase is not aware that these are dummy cells, and is considering the
> >>> rows
> >>> as already processed when retrying scans after the region goes away
> from
> >>> under the scan, i.e. it restarts the scan from AFTER the dummy cell's
> >>> rowkey, leading to the scan skipping rows.
> >>>
> >>> I have been able to fix the tests by hacking Hbase to ignore these
> dummy
> >>> cells (and fixing the phoenix side problems described in PHOENIX-7728
> >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>), but I don't
> think
> >>> that hacking HBase to work with dummy cells is the way to go (or even
> if
> >>> that would be accepted by HBase).
> >>>
> >>> AFAIU the dummy cells were added back in the HBase 1.x when there was
> no
> >>> other way to ensure timely responses from the server.
> >>>
> >>> HBase 2 has introduced the keepalive/cursor mechanics, which IUC serves
> >>> the
> >>> exact same purpose at the Phoenix dummy cells.
> >>>
> >>> I propose dropping the dummy cell mechanics from Phoenix, and using the
> >>> HBase keepalive/cursor mechanics instead (we may not even need the
> >>> cursors).
> >>>
> >>> If we cannot find a better way to shortcut some processing in Phoenix
> we
> >>> may need to keep dummy cells internally, but we have to make sure that
> >>> they
> >>> never appear on the wire and reach the client. (i.e. in that case we'd
> >>> need
> >>> to check and convert to a heartbeat scan result somehow)
> >>>
> >>> We will also need to consider backwards compatibility.
> >>>
> >>> Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x
> >>> clients
> >>> to Hbase 3 even a possibility ?
> >>>
> >>> Do we want to support that ?
> >>>
> >>> When using Hbase 2.x, if Phoenix starts to use the HBase keepalive
> >>> mechanics, will old clients work with that without changes, or do we
> need
> >>> to keep sending Dummy cells for older clients ?
> >>>
> >>> Looking forward to hearing your take,
> >>>
> >>> Istvan
> >>>
> >>
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: [email protected]
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
------------------------------

Reply via email to