I'm glad that you as the original designer of the feature has joined the
discussion, Kadir.

On Wed, Nov 19, 2025 at 10:56 PM Kadir Ozdemir <[email protected]> wrote:

> Istvan,
>
> When I introduced server paging and the dummy result, Phoenix did not
> support ScannerContext. Now that Phoenix supports ScannerContext, we can
> think about leveraging it better for server paging.
>

I realize that it was necessary for HBase 1.x. This was a good design when
HBase 1
support was a requirement, but specifically the dummy cell implementation
detail
is redundant now that HBase 2+ has native support for the same
functionality.


>
> "HBase is not aware that these are dummy cells, and is considering the rows
> as already processed when retrying scans after the region goes away from
> under the scan, i.e. it restarts the scan from AFTER the dummy cell's
> rowkey, leading to the scan skipping rows."
>

This assumption is no longer true in HBase 3.

The client side heartbeat logic in HBase 3 is thrown off by the dummy cells
generated by Phoenix.

I had to add this hack to get some tests in Phoenix to pass:
https://github.com/stoty/hbase/tree/PHOENIX_DUMMY_CELL_WORKAROUND


>
> That is the whole purpose of the dummy result, that is, not to scan the
> rows that have been scanned already. This allows Phoenix to make progress
> in the presence of table region movements, otherwise every time a region
> moves or splits, Phoenix has to scan the region from the row key of the
> last valid result from this region instead of the last scanned row. What is
> the problem with this? Consider a large region and a scan with a very
> selective filter such that a large number of rows need to be scanned before
> returning a valid row. One can create a sequence of region movements that
> prevents Phoenix from making any progress for this scan


Thanks for the explanation.

I'm not questioning the usefulness of the paging design.
The HBase community also agrees, so they have added this feature natively
in
HBase 2 in the form of the heartbeat/cursor feature.

.
>
> Please note that Phoenix has some complex logic on the server side for
> handling various SQL language features including grouping, aggregating,
> sorting and joining. Implementing paging is much more complex in Phoenix
> than implementing keep alive and ScannerContext in HBase. Either you
> discovered an issue in Phoenix paging or a compatibility issue between
> HBase 2 and HBase 3. I suggest that we understand what the issue is first
> before replacing the dummy result.
>

It is the latter.

The internal heartbeat retry logic in the HBase 3 client sees the dummy row
and concludes that
it should continue after an error (i.e. region move) from AFTER that row.
(see my HBase hack above)

This is different from the HBase 2 logic, which does not do this.

In a way, this is related to, and sometimes casued by another Phoenix
change I have made for HBase 3:
PHOENIX-7728 <https://issues.apache.org/jira/browse/PHOENIX-7728>
https://github.com/stoty/phoenix/blob/62112097bc1f050a760225663001fc0f084d4fb4/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/GroupedAggregateRegionObserver.java#L482
<https://issues.apache.org/jira/browse/PHOENIX-7728>

However, without removing the plus/minus row logic there even more tests
were failing, so
Hbase 3 doesn't work with the current Phoenix dummy row logic either.

<https://issues.apache.org/jira/browse/PHOENIX-7728>I agree that Phoenix
still needs to be aware of paging, and will need logic to convert the
Cursor rowkeys returned from inner scanners into rowkeys that make sense
for the outer scanners and client, but
my expectation is that we can simply? convert the current Dummy cell logic
that handles this to work with the
cursor value instead on the server side.


>
>
>
> On Wed, Nov 19, 2025 at 7:23 AM Tanuj Khurana <[email protected]> wrote:
>
> > Hi Istvan,
> >
> > I agree that instead of using dummy cells, we should rely on
> > keepalive/cursor mechanics. We have been working towards that. As part of
> > PHOENIX-7707, I  propagated the scanner context all the way down to
> phoenix
> > scanners. We can leverage that.
> >
> > Tanuj
> >
> > On Wed, 19 Nov 2025 at 20:09, Istvan Toth <[email protected]>
> > wrote:
> >
> > > Thanks for your thoughtful response, Viraj.
> > >
> > > I have added my thoughts below.
> > >
> > > On Wed, Nov 19, 2025 at 2:38 PM Viraj Jasani <[email protected]>
> wrote:
> > >
> > > > We also need to understand: what happens when hbase client gets
> > heartbeat
> > > > and the region moves?
> > > >
> > > > I have checked that code in HBase, and the HBase client seems to
> handle
> > > this case transparently.
> > > We may of course find bugs, but handling that is part of the design.
> > >
> > >
> > > >
> > > > On Wed, Nov 19, 2025 at 7:05 PM Viraj Jasani <[email protected]>
> > wrote:
> > > >
> > > > > Istvan, I think we should also involve dev@hbase and see what
> > > guidelines
> > > > > we are recommending so far for coprocs that would like to implement
> > > > timeout
> > > > > features for long running scans, wdyt?
> > > >
> > >
> > > Based on my current understanding, if the Scan / ScannerContext is
> > > correctly set up (allows partial rows, sets the time limit and
> requests a
> > > cursor),
> > > HBase will honor that and the Scan will return a heartbeat result when
> it
> > > times out.
> > >
> > > I THINK that's all we need. Of course if we get stuck we should ask for
> > > help.
> > >
> > >
> > > > >
> > > > > On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani <[email protected]>
> > > wrote:
> > > > >
> > > > >> Thank you for starting this thread, Istvan!
> > > > >>
> > > > >> This is an important issue. I have recently come across data
> > > correctness
> > > > >> issues with PHOENIX-7733, to be fixed by HBASE-29722. This also
> got
> > me
> > > > >> thinking about the heartbeat and dummy cell overlap leading to
> > > possible
> > > > >> data correctness issues.
> > > > >>
> > > > >> > I propose dropping the dummy cell mechanics from Phoenix, and
> > using
> > > > the
> > > > >> > HBase keepalive/cursor mechanics instead (we may not even need
> the
> > > > >> cursors).
> > > > >>
> > > > >> +1
> > > > >>
> > > > >> > If we cannot find a better way to shortcut some processing in
> > > Phoenix
> > > > we
> > > > >> > may need to keep dummy cells internally, but we have to make
> sure
> > > that
> > > > >> they
> > > > >> > never appear on the wire and reach the client.
> > > > >>
> > > > >> I don't think it is possible for Phoenix to ensure a dummy cell
> > never
> > > > >> reaches the HBase client.
> > > >
> > >
> > > I think if nothing else works, we can still catch and filter/convert
> them
> > > in RegionObserver.postScannerNext().
> > > Of course ideally we would never generate any Dummy cells in the first
> > > place.
> > >
> > >
> > > > >>
> > > > >> > in that case we'd need
> > > > >> > to check and convert to a heartbeat scan result somehow
> > > > >>
> > > > >> This needs changes in HBase only, which I don't think HBase would
> > > > >> (should) allow.
> > > > >>
> > > > >> > Is Hbase 2/3 wire compatible enough that connecting with HBase
> 2.x
> > > > >> clients
> > > > >> > to Hbase 3 even a possibility ?
> > > > >>
> > > > >> Yes, wire compatibility is important. When this happens, the only
> > > thing
> > > > >> we can do is set the page timeout high enough that we never have
> to
> > > send
> > > > >> the dummy result to the client, or disable the paging feature.
> > > > >>
> > > > >>
> > > > >> On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth <[email protected]>
> > > wrote:
> > > > >>
> > > > >>> I've been struggling with errors on the region moving tests on my
> > > HBase
> > > > >>> 3.0
> > > > >>> WIP branch and have finally tracked the problems down to
> Phoenix's
> > > > dummy
> > > > >>> Cells (as well as some built-in assumptions in Phoenix which are
> > not
> > > > true
> > > > >>> for Hbase 3, see PHOENIX-7728
> > > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>)
> > > > >>>
> > > > >>> HBase is not aware that these are dummy cells, and is considering
> > the
> > > > >>> rows
> > > > >>> as already processed when retrying scans after the region goes
> away
> > > > from
> > > > >>> under the scan, i.e. it restarts the scan from AFTER the dummy
> > cell's
> > > > >>> rowkey, leading to the scan skipping rows.
> > > > >>>
> > > > >>> I have been able to fix the tests by hacking Hbase to ignore
> these
> > > > dummy
> > > > >>> cells (and fixing the phoenix side problems described in
> > PHOENIX-7728
> > > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>), but I
> don't
> > > > think
> > > > >>> that hacking HBase to work with dummy cells is the way to go (or
> > even
> > > > if
> > > > >>> that would be accepted by HBase).
> > > > >>>
> > > > >>> AFAIU the dummy cells were added back in the HBase 1.x when there
> > was
> > > > no
> > > > >>> other way to ensure timely responses from the server.
> > > > >>>
> > > > >>> HBase 2 has introduced the keepalive/cursor mechanics, which IUC
> > > serves
> > > > >>> the
> > > > >>> exact same purpose at the Phoenix dummy cells.
> > > > >>>
> > > > >>> I propose dropping the dummy cell mechanics from Phoenix, and
> using
> > > the
> > > > >>> HBase keepalive/cursor mechanics instead (we may not even need
> the
> > > > >>> cursors).
> > > > >>>
> > > > >>> If we cannot find a better way to shortcut some processing in
> > Phoenix
> > > > we
> > > > >>> may need to keep dummy cells internally, but we have to make sure
> > > that
> > > > >>> they
> > > > >>> never appear on the wire and reach the client. (i.e. in that case
> > > we'd
> > > > >>> need
> > > > >>> to check and convert to a heartbeat scan result somehow)
> > > > >>>
> > > > >>> We will also need to consider backwards compatibility.
> > > > >>>
> > > > >>> Is Hbase 2/3 wire compatible enough that connecting with HBase
> 2.x
> > > > >>> clients
> > > > >>> to Hbase 3 even a possibility ?
> > > > >>>
> > > > >>> Do we want to support that ?
> > > > >>>
> > > > >>> When using Hbase 2.x, if Phoenix starts to use the HBase
> keepalive
> > > > >>> mechanics, will old clients work with that without changes, or do
> > we
> > > > need
> > > > >>> to keep sending Dummy cells for older clients ?
> > > > >>>
> > > > >>> Looking forward to hearing your take,
> > > > >>>
> > > > >>> Istvan
> > > > >>>
> > > > >>
> > > >
> > >
> > >
> > > --
> > > *István Tóth* | Sr. Staff Software Engineer
> > > *Email*: [email protected]
> > > cloudera.com <https://www.cloudera.com>
> > > [image: Cloudera] <https://www.cloudera.com/>
> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > Cloudera
> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > ------------------------------
> > > ------------------------------
> > >
> >
>


-- 
*István Tóth* | Sr. Staff Software Engineer
*Email*: [email protected]
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------
------------------------------

Reply via email to