We also need to understand: what happens when hbase client gets heartbeat
and the region moves?


On Wed, Nov 19, 2025 at 7:05 PM Viraj Jasani <[email protected]> wrote:

> Istvan, I think we should also involve dev@hbase and see what guidelines
> we are recommending so far for coprocs that would like to implement timeout
> features for long running scans, wdyt?
>
> On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani <[email protected]> wrote:
>
>> Thank you for starting this thread, Istvan!
>>
>> This is an important issue. I have recently come across data correctness
>> issues with PHOENIX-7733, to be fixed by HBASE-29722. This also got me
>> thinking about the heartbeat and dummy cell overlap leading to possible
>> data correctness issues.
>>
>> > I propose dropping the dummy cell mechanics from Phoenix, and using the
>> > HBase keepalive/cursor mechanics instead (we may not even need the
>> cursors).
>>
>> +1
>>
>> > If we cannot find a better way to shortcut some processing in Phoenix we
>> > may need to keep dummy cells internally, but we have to make sure that
>> they
>> > never appear on the wire and reach the client.
>>
>> I don't think it is possible for Phoenix to ensure a dummy cell never
>> reaches the HBase client.
>>
>> > in that case we'd need
>> > to check and convert to a heartbeat scan result somehow
>>
>> This needs changes in HBase only, which I don't think HBase would
>> (should) allow.
>>
>> > Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x
>> clients
>> > to Hbase 3 even a possibility ?
>>
>> Yes, wire compatibility is important. When this happens, the only thing
>> we can do is set the page timeout high enough that we never have to send
>> the dummy result to the client, or disable the paging feature.
>>
>>
>> On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth <[email protected]> wrote:
>>
>>> I've been struggling with errors on the region moving tests on my HBase
>>> 3.0
>>> WIP branch and have finally tracked the problems down to Phoenix's dummy
>>> Cells (as well as some built-in assumptions in Phoenix which are not true
>>> for Hbase 3, see PHOENIX-7728
>>> <https://issues.apache.org/jira/browse/PHOENIX-7728>)
>>>
>>> HBase is not aware that these are dummy cells, and is considering the
>>> rows
>>> as already processed when retrying scans after the region goes away from
>>> under the scan, i.e. it restarts the scan from AFTER the dummy cell's
>>> rowkey, leading to the scan skipping rows.
>>>
>>> I have been able to fix the tests by hacking Hbase to ignore these dummy
>>> cells (and fixing the phoenix side problems described in PHOENIX-7728
>>> <https://issues.apache.org/jira/browse/PHOENIX-7728>), but I don't think
>>> that hacking HBase to work with dummy cells is the way to go (or even if
>>> that would be accepted by HBase).
>>>
>>> AFAIU the dummy cells were added back in the HBase 1.x when there was no
>>> other way to ensure timely responses from the server.
>>>
>>> HBase 2 has introduced the keepalive/cursor mechanics, which IUC serves
>>> the
>>> exact same purpose at the Phoenix dummy cells.
>>>
>>> I propose dropping the dummy cell mechanics from Phoenix, and using the
>>> HBase keepalive/cursor mechanics instead (we may not even need the
>>> cursors).
>>>
>>> If we cannot find a better way to shortcut some processing in Phoenix we
>>> may need to keep dummy cells internally, but we have to make sure that
>>> they
>>> never appear on the wire and reach the client. (i.e. in that case we'd
>>> need
>>> to check and convert to a heartbeat scan result somehow)
>>>
>>> We will also need to consider backwards compatibility.
>>>
>>> Is Hbase 2/3 wire compatible enough that connecting with HBase 2.x
>>> clients
>>> to Hbase 3 even a possibility ?
>>>
>>> Do we want to support that ?
>>>
>>> When using Hbase 2.x, if Phoenix starts to use the HBase keepalive
>>> mechanics, will old clients work with that without changes, or do we need
>>> to keep sending Dummy cells for older clients ?
>>>
>>> Looking forward to hearing your take,
>>>
>>> Istvan
>>>
>>

Reply via email to