Hi Istvan,

As part of PHOENIX-7707, Phoenix extends the ScannerContext so that we can
add custom fields to it.

On Fri, 21 Nov 2025 at 10:53, Istvan Toth <[email protected]> wrote:

> Thanks for these points, Kadir.
>
> On Thu, Nov 20, 2025 at 8:59 PM Kadir Ozdemir <
> [email protected]>
> wrote:
>
> > The row key of the dummy result is not simply the last row that was
> scanned
> > by RegionScannerImpl. It is computed by Phoenix coprocs based on the
> query.
> > For example, for an ordered group by query, it should be the last row of
> > the last group computed. For an unordered group by query, it nevers
> changes
> > until the entire region is processed.  For Phoenix to be able to use the
> > HBase cursor, the coprocs needs to be able to change the cursor value.
> > Otherwise, there will be data integrity issues.
> >
>
> We can create a new synthetic Cursor Result and return that the same way we
> create and return a new dummy Cell now.
> In this regard I see no difference between the two.
>
>
> >
> > Another reason for the dummy result is to provide an end-to-end fair
> > scheduling for Phoenix in future. Without a Phoenix level signal (the
> dummy
> > result), the Phoenix client would not know if the server already spent
> the
> > page time for a given query. I was thinking that we may be able to
> leverage
> > this to decide if the current blocked thread should be released. This is
> a
> > secondary concern but I want to make sure we all understand the
> > implications of replacing this Phoenix level concept.
> >
>
> Good point.
>
> My current understanding is that if we set the *needCursorResult* flag on
> the scan,
> then HBase will return all cursor results to the client, and we can use
> those the same way
> we use the dummy cells, so I see no problem here either.
>
> In fact the more I look the Hbase Heartbeat/cursor implementation, the more
> it feels like it was
> taylor made for implementing Phoenix paging (even though it was not coming
> from Phoenix developers)
>
> The only snag I've found so far is that HBase creates the default
> ScannerContext and there is no easy way
> to set a custom paging time on it.
>
>
> > On Wed, Nov 19, 2025 at 9:27 PM Istvan Toth <[email protected]>
> > wrote:
> >
> > > I'm glad that you as the original designer of the feature has joined
> the
> > > discussion, Kadir.
> > >
> > > On Wed, Nov 19, 2025 at 10:56 PM Kadir Ozdemir <[email protected]>
> wrote:
> > >
> > > > Istvan,
> > > >
> > > > When I introduced server paging and the dummy result, Phoenix did not
> > > > support ScannerContext. Now that Phoenix supports ScannerContext, we
> > can
> > > > think about leveraging it better for server paging.
> > > >
> > >
> > > I realize that it was necessary for HBase 1.x. This was a good design
> > when
> > > HBase 1
> > > support was a requirement, but specifically the dummy cell
> implementation
> > > detail
> > > is redundant now that HBase 2+ has native support for the same
> > > functionality.
> > >
> > >
> > > >
> > > > "HBase is not aware that these are dummy cells, and is considering
> the
> > > rows
> > > > as already processed when retrying scans after the region goes away
> > from
> > > > under the scan, i.e. it restarts the scan from AFTER the dummy cell's
> > > > rowkey, leading to the scan skipping rows."
> > > >
> > >
> > > This assumption is no longer true in HBase 3.
> > >
> > > The client side heartbeat logic in HBase 3 is thrown off by the dummy
> > cells
> > > generated by Phoenix.
> > >
> > > I had to add this hack to get some tests in Phoenix to pass:
> > > https://github.com/stoty/hbase/tree/PHOENIX_DUMMY_CELL_WORKAROUND
> > >
> > >
> > > >
> > > > That is the whole purpose of the dummy result, that is, not to scan
> the
> > > > rows that have been scanned already. This allows Phoenix to make
> > progress
> > > > in the presence of table region movements, otherwise every time a
> > region
> > > > moves or splits, Phoenix has to scan the region from the row key of
> the
> > > > last valid result from this region instead of the last scanned row.
> > What
> > > is
> > > > the problem with this? Consider a large region and a scan with a very
> > > > selective filter such that a large number of rows need to be scanned
> > > before
> > > > returning a valid row. One can create a sequence of region movements
> > that
> > > > prevents Phoenix from making any progress for this scan
> > >
> > >
> > > Thanks for the explanation.
> > >
> > > I'm not questioning the usefulness of the paging design.
> > > The HBase community also agrees, so they have added this feature
> natively
> > > in
> > > HBase 2 in the form of the heartbeat/cursor feature.
> > >
> > > .
> > > >
> > > > Please note that Phoenix has some complex logic on the server side
> for
> > > > handling various SQL language features including grouping,
> aggregating,
> > > > sorting and joining. Implementing paging is much more complex in
> > Phoenix
> > > > than implementing keep alive and ScannerContext in HBase. Either you
> > > > discovered an issue in Phoenix paging or a compatibility issue
> between
> > > > HBase 2 and HBase 3. I suggest that we understand what the issue is
> > first
> > > > before replacing the dummy result.
> > > >
> > >
> > > It is the latter.
> > >
> > > The internal heartbeat retry logic in the HBase 3 client sees the dummy
> > row
> > > and concludes that
> > > it should continue after an error (i.e. region move) from AFTER that
> row.
> > > (see my HBase hack above)
> > >
> > > This is different from the HBase 2 logic, which does not do this.
> > >
> > > In a way, this is related to, and sometimes casued by another Phoenix
> > > change I have made for HBase 3:
> > > PHOENIX-7728 <https://issues.apache.org/jira/browse/PHOENIX-7728>
> > >
> > >
> >
> https://github.com/stoty/phoenix/blob/62112097bc1f050a760225663001fc0f084d4fb4/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/GroupedAggregateRegionObserver.java#L482
> > > <https://issues.apache.org/jira/browse/PHOENIX-7728>
> > >
> > > However, without removing the plus/minus row logic there even more
> tests
> > > were failing, so
> > > Hbase 3 doesn't work with the current Phoenix dummy row logic either.
> > >
> > > <https://issues.apache.org/jira/browse/PHOENIX-7728>I agree that
> Phoenix
> > > still needs to be aware of paging, and will need logic to convert the
> > > Cursor rowkeys returned from inner scanners into rowkeys that make
> sense
> > > for the outer scanners and client, but
> > > my expectation is that we can simply? convert the current Dummy cell
> > logic
> > > that handles this to work with the
> > > cursor value instead on the server side.
> > >
> > >
> > > >
> > > >
> > > >
> > > > On Wed, Nov 19, 2025 at 7:23 AM Tanuj Khurana <[email protected]>
> > > wrote:
> > > >
> > > > > Hi Istvan,
> > > > >
> > > > > I agree that instead of using dummy cells, we should rely on
> > > > > keepalive/cursor mechanics. We have been working towards that. As
> > part
> > > of
> > > > > PHOENIX-7707, I  propagated the scanner context all the way down to
> > > > phoenix
> > > > > scanners. We can leverage that.
> > > > >
> > > > > Tanuj
> > > > >
> > > > > On Wed, 19 Nov 2025 at 20:09, Istvan Toth
> <[email protected]
> > >
> > > > > wrote:
> > > > >
> > > > > > Thanks for your thoughtful response, Viraj.
> > > > > >
> > > > > > I have added my thoughts below.
> > > > > >
> > > > > > On Wed, Nov 19, 2025 at 2:38 PM Viraj Jasani <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > > We also need to understand: what happens when hbase client gets
> > > > > heartbeat
> > > > > > > and the region moves?
> > > > > > >
> > > > > > > I have checked that code in HBase, and the HBase client seems
> to
> > > > handle
> > > > > > this case transparently.
> > > > > > We may of course find bugs, but handling that is part of the
> > design.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > On Wed, Nov 19, 2025 at 7:05 PM Viraj Jasani <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Istvan, I think we should also involve dev@hbase and see
> what
> > > > > > guidelines
> > > > > > > > we are recommending so far for coprocs that would like to
> > > implement
> > > > > > > timeout
> > > > > > > > features for long running scans, wdyt?
> > > > > > >
> > > > > >
> > > > > > Based on my current understanding, if the Scan / ScannerContext
> is
> > > > > > correctly set up (allows partial rows, sets the time limit and
> > > > requests a
> > > > > > cursor),
> > > > > > HBase will honor that and the Scan will return a heartbeat result
> > > when
> > > > it
> > > > > > times out.
> > > > > >
> > > > > > I THINK that's all we need. Of course if we get stuck we should
> ask
> > > for
> > > > > > help.
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Thank you for starting this thread, Istvan!
> > > > > > > >>
> > > > > > > >> This is an important issue. I have recently come across data
> > > > > > correctness
> > > > > > > >> issues with PHOENIX-7733, to be fixed by HBASE-29722. This
> > also
> > > > got
> > > > > me
> > > > > > > >> thinking about the heartbeat and dummy cell overlap leading
> to
> > > > > > possible
> > > > > > > >> data correctness issues.
> > > > > > > >>
> > > > > > > >> > I propose dropping the dummy cell mechanics from Phoenix,
> > and
> > > > > using
> > > > > > > the
> > > > > > > >> > HBase keepalive/cursor mechanics instead (we may not even
> > need
> > > > the
> > > > > > > >> cursors).
> > > > > > > >>
> > > > > > > >> +1
> > > > > > > >>
> > > > > > > >> > If we cannot find a better way to shortcut some processing
> > in
> > > > > > Phoenix
> > > > > > > we
> > > > > > > >> > may need to keep dummy cells internally, but we have to
> make
> > > > sure
> > > > > > that
> > > > > > > >> they
> > > > > > > >> > never appear on the wire and reach the client.
> > > > > > > >>
> > > > > > > >> I don't think it is possible for Phoenix to ensure a dummy
> > cell
> > > > > never
> > > > > > > >> reaches the HBase client.
> > > > > > >
> > > > > >
> > > > > > I think if nothing else works, we can still catch and
> > filter/convert
> > > > them
> > > > > > in RegionObserver.postScannerNext().
> > > > > > Of course ideally we would never generate any Dummy cells in the
> > > first
> > > > > > place.
> > > > > >
> > > > > >
> > > > > > > >>
> > > > > > > >> > in that case we'd need
> > > > > > > >> > to check and convert to a heartbeat scan result somehow
> > > > > > > >>
> > > > > > > >> This needs changes in HBase only, which I don't think HBase
> > > would
> > > > > > > >> (should) allow.
> > > > > > > >>
> > > > > > > >> > Is Hbase 2/3 wire compatible enough that connecting with
> > HBase
> > > > 2.x
> > > > > > > >> clients
> > > > > > > >> > to Hbase 3 even a possibility ?
> > > > > > > >>
> > > > > > > >> Yes, wire compatibility is important. When this happens, the
> > > only
> > > > > > thing
> > > > > > > >> we can do is set the page timeout high enough that we never
> > have
> > > > to
> > > > > > send
> > > > > > > >> the dummy result to the client, or disable the paging
> feature.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth <
> > [email protected]>
> > > > > > wrote:
> > > > > > > >>
> > > > > > > >>> I've been struggling with errors on the region moving tests
> > on
> > > my
> > > > > > HBase
> > > > > > > >>> 3.0
> > > > > > > >>> WIP branch and have finally tracked the problems down to
> > > > Phoenix's
> > > > > > > dummy
> > > > > > > >>> Cells (as well as some built-in assumptions in Phoenix
> which
> > > are
> > > > > not
> > > > > > > true
> > > > > > > >>> for Hbase 3, see PHOENIX-7728
> > > > > > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>)
> > > > > > > >>>
> > > > > > > >>> HBase is not aware that these are dummy cells, and is
> > > considering
> > > > > the
> > > > > > > >>> rows
> > > > > > > >>> as already processed when retrying scans after the region
> > goes
> > > > away
> > > > > > > from
> > > > > > > >>> under the scan, i.e. it restarts the scan from AFTER the
> > dummy
> > > > > cell's
> > > > > > > >>> rowkey, leading to the scan skipping rows.
> > > > > > > >>>
> > > > > > > >>> I have been able to fix the tests by hacking Hbase to
> ignore
> > > > these
> > > > > > > dummy
> > > > > > > >>> cells (and fixing the phoenix side problems described in
> > > > > PHOENIX-7728
> > > > > > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>),
> but I
> > > > don't
> > > > > > > think
> > > > > > > >>> that hacking HBase to work with dummy cells is the way to
> go
> > > (or
> > > > > even
> > > > > > > if
> > > > > > > >>> that would be accepted by HBase).
> > > > > > > >>>
> > > > > > > >>> AFAIU the dummy cells were added back in the HBase 1.x when
> > > there
> > > > > was
> > > > > > > no
> > > > > > > >>> other way to ensure timely responses from the server.
> > > > > > > >>>
> > > > > > > >>> HBase 2 has introduced the keepalive/cursor mechanics,
> which
> > > IUC
> > > > > > serves
> > > > > > > >>> the
> > > > > > > >>> exact same purpose at the Phoenix dummy cells.
> > > > > > > >>>
> > > > > > > >>> I propose dropping the dummy cell mechanics from Phoenix,
> and
> > > > using
> > > > > > the
> > > > > > > >>> HBase keepalive/cursor mechanics instead (we may not even
> > need
> > > > the
> > > > > > > >>> cursors).
> > > > > > > >>>
> > > > > > > >>> If we cannot find a better way to shortcut some processing
> in
> > > > > Phoenix
> > > > > > > we
> > > > > > > >>> may need to keep dummy cells internally, but we have to
> make
> > > sure
> > > > > > that
> > > > > > > >>> they
> > > > > > > >>> never appear on the wire and reach the client. (i.e. in
> that
> > > case
> > > > > > we'd
> > > > > > > >>> need
> > > > > > > >>> to check and convert to a heartbeat scan result somehow)
> > > > > > > >>>
> > > > > > > >>> We will also need to consider backwards compatibility.
> > > > > > > >>>
> > > > > > > >>> Is Hbase 2/3 wire compatible enough that connecting with
> > HBase
> > > > 2.x
> > > > > > > >>> clients
> > > > > > > >>> to Hbase 3 even a possibility ?
> > > > > > > >>>
> > > > > > > >>> Do we want to support that ?
> > > > > > > >>>
> > > > > > > >>> When using Hbase 2.x, if Phoenix starts to use the HBase
> > > > keepalive
> > > > > > > >>> mechanics, will old clients work with that without changes,
> > or
> > > do
> > > > > we
> > > > > > > need
> > > > > > > >>> to keep sending Dummy cells for older clients ?
> > > > > > > >>>
> > > > > > > >>> Looking forward to hearing your take,
> > > > > > > >>>
> > > > > > > >>> Istvan
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *István Tóth* | Sr. Staff Software Engineer
> > > > > > *Email*: [email protected]
> > > > > > cloudera.com <https://www.cloudera.com>
> > > > > > [image: Cloudera] <https://www.cloudera.com/>
> > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera>
> > [image:
> > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera>
> [image:
> > > > > Cloudera
> > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > > > > ------------------------------
> > > > > > ------------------------------
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > *István Tóth* | Sr. Staff Software Engineer
> > > *Email*: [email protected]
> > > cloudera.com <https://www.cloudera.com>
> > > [image: Cloudera] <https://www.cloudera.com/>
> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> > Cloudera
> > > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > > ------------------------------
> > > ------------------------------
> > >
> >
>
>
> --
> *István Tóth* | Sr. Staff Software Engineer
> *Email*: [email protected]
> cloudera.com <https://www.cloudera.com>
> [image: Cloudera] <https://www.cloudera.com/>
> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
> on LinkedIn] <https://www.linkedin.com/company/cloudera>
> ------------------------------
> ------------------------------
>

Reply via email to