Hi Istvan, As part of PHOENIX-7707, Phoenix extends the ScannerContext so that we can add custom fields to it.
On Fri, 21 Nov 2025 at 10:53, Istvan Toth <[email protected]> wrote: > Thanks for these points, Kadir. > > On Thu, Nov 20, 2025 at 8:59 PM Kadir Ozdemir < > [email protected]> > wrote: > > > The row key of the dummy result is not simply the last row that was > scanned > > by RegionScannerImpl. It is computed by Phoenix coprocs based on the > query. > > For example, for an ordered group by query, it should be the last row of > > the last group computed. For an unordered group by query, it nevers > changes > > until the entire region is processed. For Phoenix to be able to use the > > HBase cursor, the coprocs needs to be able to change the cursor value. > > Otherwise, there will be data integrity issues. > > > > We can create a new synthetic Cursor Result and return that the same way we > create and return a new dummy Cell now. > In this regard I see no difference between the two. > > > > > > Another reason for the dummy result is to provide an end-to-end fair > > scheduling for Phoenix in future. Without a Phoenix level signal (the > dummy > > result), the Phoenix client would not know if the server already spent > the > > page time for a given query. I was thinking that we may be able to > leverage > > this to decide if the current blocked thread should be released. This is > a > > secondary concern but I want to make sure we all understand the > > implications of replacing this Phoenix level concept. > > > > Good point. > > My current understanding is that if we set the *needCursorResult* flag on > the scan, > then HBase will return all cursor results to the client, and we can use > those the same way > we use the dummy cells, so I see no problem here either. > > In fact the more I look the Hbase Heartbeat/cursor implementation, the more > it feels like it was > taylor made for implementing Phoenix paging (even though it was not coming > from Phoenix developers) > > The only snag I've found so far is that HBase creates the default > ScannerContext and there is no easy way > to set a custom paging time on it. > > > > On Wed, Nov 19, 2025 at 9:27 PM Istvan Toth <[email protected]> > > wrote: > > > > > I'm glad that you as the original designer of the feature has joined > the > > > discussion, Kadir. > > > > > > On Wed, Nov 19, 2025 at 10:56 PM Kadir Ozdemir <[email protected]> > wrote: > > > > > > > Istvan, > > > > > > > > When I introduced server paging and the dummy result, Phoenix did not > > > > support ScannerContext. Now that Phoenix supports ScannerContext, we > > can > > > > think about leveraging it better for server paging. > > > > > > > > > > I realize that it was necessary for HBase 1.x. This was a good design > > when > > > HBase 1 > > > support was a requirement, but specifically the dummy cell > implementation > > > detail > > > is redundant now that HBase 2+ has native support for the same > > > functionality. > > > > > > > > > > > > > > "HBase is not aware that these are dummy cells, and is considering > the > > > rows > > > > as already processed when retrying scans after the region goes away > > from > > > > under the scan, i.e. it restarts the scan from AFTER the dummy cell's > > > > rowkey, leading to the scan skipping rows." > > > > > > > > > > This assumption is no longer true in HBase 3. > > > > > > The client side heartbeat logic in HBase 3 is thrown off by the dummy > > cells > > > generated by Phoenix. > > > > > > I had to add this hack to get some tests in Phoenix to pass: > > > https://github.com/stoty/hbase/tree/PHOENIX_DUMMY_CELL_WORKAROUND > > > > > > > > > > > > > > That is the whole purpose of the dummy result, that is, not to scan > the > > > > rows that have been scanned already. This allows Phoenix to make > > progress > > > > in the presence of table region movements, otherwise every time a > > region > > > > moves or splits, Phoenix has to scan the region from the row key of > the > > > > last valid result from this region instead of the last scanned row. > > What > > > is > > > > the problem with this? Consider a large region and a scan with a very > > > > selective filter such that a large number of rows need to be scanned > > > before > > > > returning a valid row. One can create a sequence of region movements > > that > > > > prevents Phoenix from making any progress for this scan > > > > > > > > > Thanks for the explanation. > > > > > > I'm not questioning the usefulness of the paging design. > > > The HBase community also agrees, so they have added this feature > natively > > > in > > > HBase 2 in the form of the heartbeat/cursor feature. > > > > > > . > > > > > > > > Please note that Phoenix has some complex logic on the server side > for > > > > handling various SQL language features including grouping, > aggregating, > > > > sorting and joining. Implementing paging is much more complex in > > Phoenix > > > > than implementing keep alive and ScannerContext in HBase. Either you > > > > discovered an issue in Phoenix paging or a compatibility issue > between > > > > HBase 2 and HBase 3. I suggest that we understand what the issue is > > first > > > > before replacing the dummy result. > > > > > > > > > > It is the latter. > > > > > > The internal heartbeat retry logic in the HBase 3 client sees the dummy > > row > > > and concludes that > > > it should continue after an error (i.e. region move) from AFTER that > row. > > > (see my HBase hack above) > > > > > > This is different from the HBase 2 logic, which does not do this. > > > > > > In a way, this is related to, and sometimes casued by another Phoenix > > > change I have made for HBase 3: > > > PHOENIX-7728 <https://issues.apache.org/jira/browse/PHOENIX-7728> > > > > > > > > > https://github.com/stoty/phoenix/blob/62112097bc1f050a760225663001fc0f084d4fb4/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/GroupedAggregateRegionObserver.java#L482 > > > <https://issues.apache.org/jira/browse/PHOENIX-7728> > > > > > > However, without removing the plus/minus row logic there even more > tests > > > were failing, so > > > Hbase 3 doesn't work with the current Phoenix dummy row logic either. > > > > > > <https://issues.apache.org/jira/browse/PHOENIX-7728>I agree that > Phoenix > > > still needs to be aware of paging, and will need logic to convert the > > > Cursor rowkeys returned from inner scanners into rowkeys that make > sense > > > for the outer scanners and client, but > > > my expectation is that we can simply? convert the current Dummy cell > > logic > > > that handles this to work with the > > > cursor value instead on the server side. > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 19, 2025 at 7:23 AM Tanuj Khurana <[email protected]> > > > wrote: > > > > > > > > > Hi Istvan, > > > > > > > > > > I agree that instead of using dummy cells, we should rely on > > > > > keepalive/cursor mechanics. We have been working towards that. As > > part > > > of > > > > > PHOENIX-7707, I propagated the scanner context all the way down to > > > > phoenix > > > > > scanners. We can leverage that. > > > > > > > > > > Tanuj > > > > > > > > > > On Wed, 19 Nov 2025 at 20:09, Istvan Toth > <[email protected] > > > > > > > > wrote: > > > > > > > > > > > Thanks for your thoughtful response, Viraj. > > > > > > > > > > > > I have added my thoughts below. > > > > > > > > > > > > On Wed, Nov 19, 2025 at 2:38 PM Viraj Jasani <[email protected] > > > > > > wrote: > > > > > > > > > > > > > We also need to understand: what happens when hbase client gets > > > > > heartbeat > > > > > > > and the region moves? > > > > > > > > > > > > > > I have checked that code in HBase, and the HBase client seems > to > > > > handle > > > > > > this case transparently. > > > > > > We may of course find bugs, but handling that is part of the > > design. > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 19, 2025 at 7:05 PM Viraj Jasani < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > > > Istvan, I think we should also involve dev@hbase and see > what > > > > > > guidelines > > > > > > > > we are recommending so far for coprocs that would like to > > > implement > > > > > > > timeout > > > > > > > > features for long running scans, wdyt? > > > > > > > > > > > > > > > > > > > Based on my current understanding, if the Scan / ScannerContext > is > > > > > > correctly set up (allows partial rows, sets the time limit and > > > > requests a > > > > > > cursor), > > > > > > HBase will honor that and the Scan will return a heartbeat result > > > when > > > > it > > > > > > times out. > > > > > > > > > > > > I THINK that's all we need. Of course if we get stuck we should > ask > > > for > > > > > > help. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Nov 19, 2025 at 6:51 PM Viraj Jasani < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > >> Thank you for starting this thread, Istvan! > > > > > > > >> > > > > > > > >> This is an important issue. I have recently come across data > > > > > > correctness > > > > > > > >> issues with PHOENIX-7733, to be fixed by HBASE-29722. This > > also > > > > got > > > > > me > > > > > > > >> thinking about the heartbeat and dummy cell overlap leading > to > > > > > > possible > > > > > > > >> data correctness issues. > > > > > > > >> > > > > > > > >> > I propose dropping the dummy cell mechanics from Phoenix, > > and > > > > > using > > > > > > > the > > > > > > > >> > HBase keepalive/cursor mechanics instead (we may not even > > need > > > > the > > > > > > > >> cursors). > > > > > > > >> > > > > > > > >> +1 > > > > > > > >> > > > > > > > >> > If we cannot find a better way to shortcut some processing > > in > > > > > > Phoenix > > > > > > > we > > > > > > > >> > may need to keep dummy cells internally, but we have to > make > > > > sure > > > > > > that > > > > > > > >> they > > > > > > > >> > never appear on the wire and reach the client. > > > > > > > >> > > > > > > > >> I don't think it is possible for Phoenix to ensure a dummy > > cell > > > > > never > > > > > > > >> reaches the HBase client. > > > > > > > > > > > > > > > > > > > I think if nothing else works, we can still catch and > > filter/convert > > > > them > > > > > > in RegionObserver.postScannerNext(). > > > > > > Of course ideally we would never generate any Dummy cells in the > > > first > > > > > > place. > > > > > > > > > > > > > > > > > > > >> > > > > > > > >> > in that case we'd need > > > > > > > >> > to check and convert to a heartbeat scan result somehow > > > > > > > >> > > > > > > > >> This needs changes in HBase only, which I don't think HBase > > > would > > > > > > > >> (should) allow. > > > > > > > >> > > > > > > > >> > Is Hbase 2/3 wire compatible enough that connecting with > > HBase > > > > 2.x > > > > > > > >> clients > > > > > > > >> > to Hbase 3 even a possibility ? > > > > > > > >> > > > > > > > >> Yes, wire compatibility is important. When this happens, the > > > only > > > > > > thing > > > > > > > >> we can do is set the page timeout high enough that we never > > have > > > > to > > > > > > send > > > > > > > >> the dummy result to the client, or disable the paging > feature. > > > > > > > >> > > > > > > > >> > > > > > > > >> On Thu, Nov 13, 2025 at 11:22 PM Istvan Toth < > > [email protected]> > > > > > > wrote: > > > > > > > >> > > > > > > > >>> I've been struggling with errors on the region moving tests > > on > > > my > > > > > > HBase > > > > > > > >>> 3.0 > > > > > > > >>> WIP branch and have finally tracked the problems down to > > > > Phoenix's > > > > > > > dummy > > > > > > > >>> Cells (as well as some built-in assumptions in Phoenix > which > > > are > > > > > not > > > > > > > true > > > > > > > >>> for Hbase 3, see PHOENIX-7728 > > > > > > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>) > > > > > > > >>> > > > > > > > >>> HBase is not aware that these are dummy cells, and is > > > considering > > > > > the > > > > > > > >>> rows > > > > > > > >>> as already processed when retrying scans after the region > > goes > > > > away > > > > > > > from > > > > > > > >>> under the scan, i.e. it restarts the scan from AFTER the > > dummy > > > > > cell's > > > > > > > >>> rowkey, leading to the scan skipping rows. > > > > > > > >>> > > > > > > > >>> I have been able to fix the tests by hacking Hbase to > ignore > > > > these > > > > > > > dummy > > > > > > > >>> cells (and fixing the phoenix side problems described in > > > > > PHOENIX-7728 > > > > > > > >>> <https://issues.apache.org/jira/browse/PHOENIX-7728>), > but I > > > > don't > > > > > > > think > > > > > > > >>> that hacking HBase to work with dummy cells is the way to > go > > > (or > > > > > even > > > > > > > if > > > > > > > >>> that would be accepted by HBase). > > > > > > > >>> > > > > > > > >>> AFAIU the dummy cells were added back in the HBase 1.x when > > > there > > > > > was > > > > > > > no > > > > > > > >>> other way to ensure timely responses from the server. > > > > > > > >>> > > > > > > > >>> HBase 2 has introduced the keepalive/cursor mechanics, > which > > > IUC > > > > > > serves > > > > > > > >>> the > > > > > > > >>> exact same purpose at the Phoenix dummy cells. > > > > > > > >>> > > > > > > > >>> I propose dropping the dummy cell mechanics from Phoenix, > and > > > > using > > > > > > the > > > > > > > >>> HBase keepalive/cursor mechanics instead (we may not even > > need > > > > the > > > > > > > >>> cursors). > > > > > > > >>> > > > > > > > >>> If we cannot find a better way to shortcut some processing > in > > > > > Phoenix > > > > > > > we > > > > > > > >>> may need to keep dummy cells internally, but we have to > make > > > sure > > > > > > that > > > > > > > >>> they > > > > > > > >>> never appear on the wire and reach the client. (i.e. in > that > > > case > > > > > > we'd > > > > > > > >>> need > > > > > > > >>> to check and convert to a heartbeat scan result somehow) > > > > > > > >>> > > > > > > > >>> We will also need to consider backwards compatibility. > > > > > > > >>> > > > > > > > >>> Is Hbase 2/3 wire compatible enough that connecting with > > HBase > > > > 2.x > > > > > > > >>> clients > > > > > > > >>> to Hbase 3 even a possibility ? > > > > > > > >>> > > > > > > > >>> Do we want to support that ? > > > > > > > >>> > > > > > > > >>> When using Hbase 2.x, if Phoenix starts to use the HBase > > > > keepalive > > > > > > > >>> mechanics, will old clients work with that without changes, > > or > > > do > > > > > we > > > > > > > need > > > > > > > >>> to keep sending Dummy cells for older clients ? > > > > > > > >>> > > > > > > > >>> Looking forward to hearing your take, > > > > > > > >>> > > > > > > > >>> Istvan > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > *István Tóth* | Sr. Staff Software Engineer > > > > > > *Email*: [email protected] > > > > > > cloudera.com <https://www.cloudera.com> > > > > > > [image: Cloudera] <https://www.cloudera.com/> > > > > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> > > [image: > > > > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> > [image: > > > > > Cloudera > > > > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > > > > > ------------------------------ > > > > > > ------------------------------ > > > > > > > > > > > > > > > > > > > > > > > > -- > > > *István Tóth* | Sr. Staff Software Engineer > > > *Email*: [email protected] > > > cloudera.com <https://www.cloudera.com> > > > [image: Cloudera] <https://www.cloudera.com/> > > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > > Cloudera > > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > > ------------------------------ > > > ------------------------------ > > > > > > > > -- > *István Tóth* | Sr. Staff Software Engineer > *Email*: [email protected] > cloudera.com <https://www.cloudera.com> > [image: Cloudera] <https://www.cloudera.com/> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera > on LinkedIn] <https://www.linkedin.com/company/cloudera> > ------------------------------ > ------------------------------ >
