[jira] [Commented] (PHOENIX-4018) HashJoin may produce nulls for LHS table columns

Ankit Singhal (JIRA) Fri, 14 Jul 2017 02:20:31 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087074#comment-16087074
 ]


Ankit Singhal commented on PHOENIX-4018:
----------------------------------------

Thanks, [~sergey.soldatov] for the patch. The problem may not be visible with 
other region scanner probably because we override 
BaseRegionScanner#next(List<Cell>,ScannerContext) to call 
BaseRegionScanner#next(List<Cell>). 

This might already be impacting scanner Heartbeat, though we have a thread at 
the client which is renewing scanners but we should fix this too.

And, if we are already avoiding RPC chunking at the cell level, so IMO to 
leverage "hbase.client.scanner.max.result.size" (applied at row level), We 
should be incrementing limits in scannerContext while we finish with all the 
region scanners because there will be other wrapped regionScanner(like 
OffsetRegionScanner) which might be skipping rows subsequently. Probably the 
best place would be 
BaseScannerRegionObserver.RegionScannerHolder#next(List<Cell>,ScannerContext). 

If you think it is of huge effort and testing let's do it in a separate JIRA 
and we can close this by just removing scannerContext from next() call for now.



> HashJoin may produce nulls for LHS table columns
> ------------------------------------------------
>
>                 Key: PHOENIX-4018
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4018
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.11.0
>            Reporter: Sergey Soldatov
>            Assignee: Sergey Soldatov
>            Priority: Critical
>         Attachments: PHOENIX-4018-1.patch
>
>
> Here is the problem: in HashJoinRegionScanner methods (nextRow for example) 
> we are using the same scanner context that was created in RSRpcServices. It 
> has limits (i.e. 2Mb size). Let's say that we have 3Mb region and the only 
> key that match the join condition is located at the end of the region. In 
> HashJoinRegionScanner#nextRow when we iterate through the region rows once we 
> reached the limit of 2Mb, every region scanner nextRow will  return a single 
> cell and the scanner context will have SIZE_LIMIT_REACHED_MID_ROW state. But 
> we don't have any logic that check that, so this single cell is considered as 
> a complete row with all nulls except one column. 
> How to fix it: 
> 1. for region scanner we may provide NoLimitScannerContext, so we will never 
> get a partial result.  
> 2. We need to update the scanner context that we got from RSRpcServices with 
> the real data, basing on the size of results we are going to return. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PHOENIX-4018) HashJoin may produce nulls for LHS table columns

Reply via email to