[ 
https://issues.apache.org/jira/browse/PHOENIX-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434489#comment-15434489
 ] 

chenglei commented on PHOENIX-2900:
-----------------------------------

I create a new issue 
[PHOENIX-3199|https://issues.apache.org/jira/browse/PHOENIX-3199] to fix the 
ServerCacheClient's problem.

> Unable to find hash cache once a salted table 's first region has split
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-2900
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2900
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.7.0
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Critical
>             Fix For: 4.8.0
>
>         Attachments: PHOENIX-2900_addendum1.patch, PHOENIX-2900_v1.patch, 
> PHOENIX-2900_v2.patch, PHOENIX-2900_v3.patch, PHOENIX-2900_v4.patch, 
> PHOENIX-2900_v5.patch, PHOENIX-2900_v6.patch, PHOENIX-2900_v7.patch
>
>
> When I join a salted table (which has been split after creation) with another 
> table in my business system ,I meet following error,even though I clear the 
> salted table 's  TableRegionCache: 
> {code:borderStyle=solid} 
>  org.apache.phoenix.exception.PhoenixIOException: 
> org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache for 
> joinId: %�����2. The cache might have expired and have been removed.
>       at 
> org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:98)
>       at 
> org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:218)
>       at 
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:201)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1203)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1517)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1592)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1556)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1198)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3178)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
>       at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96)
>       at java.lang.Thread.run(Thread.java:745)
>       at 
> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:111)
>       at 
> org.apache.phoenix.iterate.TableResultIterator.initScanner(TableResultIterator.java:127)
>       at 
> org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:108)
>       at 
> org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:103)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:183)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: 
> org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache for 
> joinId: %�����2. The cache might have expired and have been removed.
>       at 
> org.apache.phoenix.coprocessor.HashJoinRegionScanner.<init>(HashJoinRegionScanner.java:98)
>       at 
> org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:218)
>       at 
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:201)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1203)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1517)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1592)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1556)
>       at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1198)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3178)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
>       at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96)
>       at java.lang.Thread.run(Thread.java:745)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>       at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>       at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
>       at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:304)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:316)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:164)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:283)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:188)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:183)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:110)
>       at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:739)
>       at 
> org.apache.phoenix.iterate.TableResultIterator.initScanner(TableResultIterator.java:123)
>       ... 7 more
> {code}
> I write a unit test to reproduce this error,  just as follows: 
> {code:borderStyle=solid} 
>      public void testSaltTableJoinError() throws Exception
>      {
>               //1.create LHS  SALT_TEST salted table
>               this.jdbcTemplate.update("drop table if exists SALT_TEST ");
>               this.jdbcTemplate.update(
>                               "create table SALT_TEST"+
>                               "("+
>                               "id UNSIGNED_INT not null primary key,"+
>                               "appId VARCHAR"+
>                       ")SALT_BUCKETS=2");
>               
>               this.jdbcTemplate.update("upsert into SALT_TEST(id,appId) 
> values(1,'app1')");
>               this.jdbcTemplate.update("upsert into SALT_TEST(id,appId) 
> values(2,'app2')");
>               this.jdbcTemplate.update("upsert into SALT_TEST(id,appId) 
> values(3,'app3')");
>               this.jdbcTemplate.update("upsert into SALT_TEST(id,appId) 
> values(4,'app4')");
>               this.jdbcTemplate.update("upsert into SALT_TEST(id,appId) 
> values(5,'app5')");
>               this.jdbcTemplate.update("upsert into SALT_TEST(id,appId) 
> values(6,'app6')");
>               
>               //2.split SALT_TEST at rowkey3,i.e.,split the first region
>               byte[] id3=Bytes.toBytes(3);
>               byte[] rowKey3=new byte[1+4];
>               System.arraycopy(id3, 0, rowKey3, 1, 4);
>               byte salt3=SaltingUtil.getSaltingByte(rowKey3, 1, 
> rowKey3.length-1, 2);
>               rowKey3[0]=salt3;
>               HBaseAdmin hbaseAdmin=this.getHBaseAdmin();
>               hbaseAdmin.split(Bytes.toBytes("SALT_TEST"), rowKey3);
>               
>               //3.wait the SALT_TEST split complele 
>               
> while(hbaseAdmin.getTableRegions(Bytes.toBytes("SALT_TEST")).size() < 3)
>               {
>                       Thread.sleep(1000);
>               }
>                 //4.we should make sure region0 and region1 is not on same 
> region server,not share the GlobalCache
>               HRegionInfo 
> region1Info=hbaseAdmin.getTableRegions(Bytes.toBytes("SALT_TEST")).get(1);
>               String region1EncodedName=region1Info.getEncodedName();
>               System.out.println(region1EncodedName);
>               hbaseAdmin.move(Bytes.toBytes(region1EncodedName), null);
>               
>               
>               //5.create RHS RIGHT_TEST table
>               this.jdbcTemplate.update("drop table if exists RIGHT_TEST ");
>               this.jdbcTemplate.update(
>                               "create table RIGHT_TEST"+
>                               "("+
>                               "appId VARCHAR not null primary key,"+
>                               "createTime VARCHAR"+
>                       ")");
>               this.jdbcTemplate.update("upsert into 
> RIGHT_TEST(appId,createTime) values('app2','201601')");
>               this.jdbcTemplate.update("upsert into 
> RIGHT_TEST(appId,createTime) values('app3','201602')");
>               this.jdbcTemplate.update("upsert into 
> RIGHT_TEST(appId,createTime) values('app4','201603')");
>               this.jdbcTemplate.update("upsert into 
> RIGHT_TEST(appId,createTime) values('app5','201604')");
>                 //6.clear SALT_TEST's TableRegionCache,let join know 
> SALT_TEST's newest 3 region           
>                ((PhoenixConnection)this.dataSource.getConnection())
>                       
> .getQueryServices().clearTableRegionCache(Bytes.toBytes("SALT_TEST"));
>               
>               //7. join the salted table SALT_TEST with RIGHT_TEST,throw 
> exception
>               String sql="select * from SALT_TEST a inner join RIGHT_TEST b 
> on a.appId=b.appId where a.id>=3 and a.id<=5";
>               List<Map<String,Object>> 
> result=this.jdbcTemplate.queryForList(sql, new Object[0]);
>               assertTrue(result.size()==3);
>       }
> {code}
> I debug the source code,and find something error  in  the addServerCache 
> method of org.apache.phoenix.cache.ServerCacheClient  class, just as the 
> following code. In line 166, we get all three regions of SALT_TEST table, but 
> in line 176,the keyRanges is [[[\x00], [[\x00\x00\x00\x03 - 
> \x00\x00\x00\x05]]]], so only the second region 
> [\x00\x00\x00\x00\x03,\x01\x00\x00\x00\x00) can pass the keyRanges's 
> intersects method in line 176. As a result, the RHS is only sent to the 
> second region,However,we know SALT_TEST is a salted table, the third region 
> [\x01\x00\x00\x00\x00,+∞)  also has records matching the where condition,and 
> the RHS should also be sent to the third region. Therefore, the correct 
> keyRanges should be  [[[\x00-\x01], [[\x00\x00\x00\x03 - 
> \x00\x00\x00\x05]]]],not  [[[\x00], [[\x00\x00\x00\x03 - \x00\x00\x00\x05]]]]
> {code:borderStyle=solid}
>  164  try {
>  165           final PTable cacheUsingTable = cacheUsingTableRef.getTable();
>  166           List<HRegionLocation> locations = 
> services.getAllTableRegions(cacheUsingTable.getPhysicalName().getBytes());
>  167           int nRegions = locations.size();
>  168             // Size these based on worst case
>  169             futures = new ArrayList<Future<Boolean>>(nRegions);
>  170           Set<HRegionLocation> servers = new 
> HashSet<HRegionLocation>(nRegions);
>  171           for (HRegionLocation entry : locations) {
>  172              // Keep track of servers we've sent to and only send once
>  173               byte[] regionStartKey = 
> entry.getRegionInfo().getStartKey();
>  174               byte[] regionEndKey = entry.getRegionInfo().getEndKey();
>  175               if ( ! servers.contains(entry) && 
>  176                      keyRanges.intersects(regionStartKey, regionEndKey,
>  177                               cacheUsingTable.getIndexType() == 
> IndexType.LOCAL ? 
>  178                                  
> ScanUtil.getRowKeyOffset(regionStartKey, regionEndKey) : 0, true)) {  
>  179                   // Call RPC once per server
>  180                  servers.add(entry);
>  181                   if (LOG.isDebugEnabled()) 
> {LOG.debug(addCustomAnnotations("Adding cache entry to be sent for " + entry, 
> connection));}
>  182                   final byte[] key = entry.getRegionInfo().getStartKey();
>  183                  final HTableInterface htable = 
> services.getTable(cacheUsingTableRef.getTable().getPhysicalName().getBytes());
>  184                   closeables.add(htable);
>  185                  futures.add(executor.submit(new JobCallable<Boolean>() {
>  186                      
>  187                       @Override
>  188                      public Boolean call() throws Exception {
>  189                          final Map<byte[], AddServerCacheResponse> 
> results;
>  190                           try {
>  191                              results = 
> htable.coprocessorService(ServerCachingService.class, key, key, 
> {code}
> I think the incorrect keyRanges problem is caused by the following code in 
> org.apache.phoenix.compile.ScanRanges's ctor (around the 178 line), In my 
> opinion ,the if statement should be if (isSalted && !isPointLookup).
> {code:borderStyle=solid} 
>          if (useSkipScanFilter && isSalted && !isPointLookup) {
>               ranges.set(0, SaltingUtil.generateAllSaltingRanges(bucketNum));
>         }
> {code}
> Another problem is why when the salted table's first region is not split , 
> the join sql can execute normally? I think it is caused by a subtle code 
> error in the addServerCache method of 
> org.apache.phoenix.cache.ServerCacheClient  class:
> {code:borderStyle=solid} 
>  171           for (HRegionLocation entry : locations) {
>  172              // Keep track of servers we've sent to and only send once
>  173               byte[] regionStartKey = 
> entry.getRegionInfo().getStartKey();
>  174               byte[] regionEndKey = entry.getRegionInfo().getEndKey();
>  175               if ( ! servers.contains(entry) && 
>  176                      keyRanges.intersects(regionStartKey, regionEndKey,
>  177                               cacheUsingTable.getIndexType() == 
> IndexType.LOCAL ? 
>  178                                  
> ScanUtil.getRowKeyOffset(regionStartKey, regionEndKey) : 0, true)) {  
>  179                   // Call RPC once per server
>  180                  servers.add(entry);
>  181                  if (LOG.isDebugEnabled()) 
> {LOG.debug(addCustomAnnotations("Adding cache entry to be sent for " + entry, 
> connection));}
>  182                  final byte[] key = entry.getRegionInfo().getStartKey();
>  183                  final HTableInterface htable = 
> services.getTable(cacheUsingTableRef.getTable().getPhysicalName().getBytes());
>  184                  closeables.add(htable);
>  185                  futures.add(executor.submit(new JobCallable<Boolean>() {
>  186                      
>  187                       @Override
>  188                      public Boolean call() throws Exception {
>  189                          final Map<byte[], AddServerCacheResponse> 
> results;
>  190                           try {
>  191                              results = 
> htable.coprocessorService(ServerCachingService.class, key, key, 
> {code}
> In the above code, if the first region is not split, the first region 
> (-∞,\x01\x00\x00\x00\x00)  can always pass the if  test in line 176, and the 
> startKey of the first region is a byte[0](empty array), so in line 191, when 
> we invoke the htable's coprocessorService method, the startKey and endKey 
> parameters are both byte[0].  
> However,  in the following org.apache.hadoop.hbase.client.HTable's 
> getKeysAndRegionsInRange method, when the endKey parameter equals 
> HConstants.EMPTY_END_ROW(which is also byte[0]), the bool variable 
> endKeyIsEndOfTable is set to true. This causes all regions would be returned, 
> which obviously is not as expected.  Eventually, the RHS is sent to LHS 's 
> all regions,and the join SQL certainly run normally :
> {code:borderStyle=solid} 
> private Pair<List<byte[]>, List<HRegionLocation>> getKeysAndRegionsInRange(
>       final byte[] startKey, final byte[] endKey, final boolean includeEndKey,
>       final boolean reload) throws IOException {
>     final boolean endKeyIsEndOfTable = 
> Bytes.equals(endKey,HConstants.EMPTY_END_ROW);
>     if ((Bytes.compareTo(startKey, endKey) > 0) && !endKeyIsEndOfTable) {
>       throw new IllegalArgumentException(
>         "Invalid range: " + Bytes.toStringBinary(startKey) +
>         " > " + Bytes.toStringBinary(endKey));
>     }
>     List<byte[]> keysInRange = new ArrayList<byte[]>();
>     List<HRegionLocation> regionsInRange = new ArrayList<HRegionLocation>();
>     byte[] currentKey = startKey;
>     do {
>       HRegionLocation regionLocation = getRegionLocation(currentKey, reload);
>       keysInRange.add(currentKey);
>       regionsInRange.add(regionLocation);
>       currentKey = regionLocation.getRegionInfo().getEndKey();
>     } while (!Bytes.equals(currentKey, HConstants.EMPTY_END_ROW)
>         && (endKeyIsEndOfTable || Bytes.compareTo(currentKey, endKey) < 0
>             || (includeEndKey && Bytes.compareTo(currentKey, endKey) == 0)));
>     return new Pair<List<byte[]>, List<HRegionLocation>>(keysInRange,
>         regionsInRange);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to