[jira] [Updated] (HUDI-1289) Using hbase index in spark hangs in Hudi 0.6.0

ASF GitHub Bot (Jira) Mon, 05 Oct 2020 10:42:13 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated HUDI-1289:
---------------------------------
    Labels: pull-request-available  (was: )

> Using hbase index in spark hangs in Hudi 0.6.0
> ----------------------------------------------
>
>                 Key: HUDI-1289
>                 URL: https://issues.apache.org/jira/browse/HUDI-1289
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ryan Pifer
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.6.1
>
>
> In Hudi 0.6.0 I can see that there was a change to shade the hbase 
> dependencies in hudi-spark-bundle jar. When using HBASE index with only 
> hudi-spark-bundle jar specified in spark session there are several issues:
>  
>  # Dependencies are not being correctly resolved:
> Hbase default status listener class value is defined by the class name before 
> relocation
> {code:java}
> Caused by: java.lang.RuntimeException: java.lang.RuntimeException: class 
> org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not 
> org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$Listener 
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2427) at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:656)
>  ... 39 moreCaused by: java.lang.RuntimeException: class 
> org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener not 
> org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$Listener 
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2421) ... 
> 40 more{code}
>  
> [https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java#L72-L73]
>  
> This can be fixed by overriding the status listener class in the hbase 
> configuration used in hudi 
> {code:java}
> hbaseConfig.set("hbase.status.listener.class", 
> "org.apache.hudi.org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener"){code}
> [https://github.com/apache/hudi/blob/master/hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java#L134]
>  
> 2. After modifying the above, executors hang when trying to connect to hbase 
> and fail after about 45 minutes
> {code:java}
> Caused by: 
> org.apache.hudi.org.apache.hadoop.hbase.client.RetriesExhaustedException: 
> Failed after attempts=36, exceptions:Thu Sep 17 23:59:42 UTC 2020, null, 
> java.net.SocketTimeoutException: callTimeout=60000, callDuration=68536: row 
> 'hudiindex,12345678,99999999999999' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, 
> hostname=ip-10-81-236-56.ec2.internal,16020,1600130997457, seqNum=0
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:210)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:212)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:186)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1275)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1181)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1165)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1122)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:957)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:75)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>  ... 35 more{code}
>  
> When investigating the executor logs I was able to find the following
> {code:java}
>  
> 20/09/18 21:35:48 TRACE TransportClient: Sending RPC to 
> ip-10-31-253-39.ec2.internal/10.31.253.39:46825
> 20/09/18 21:35:48 TRACE TransportClient: Sending request RPC 
> 7802669247197305083 to ip-10-31-253-39.ec2.internal/10.31.253.39:46825 took 0 
> ms
> 20/09/18 21:35:48 TRACE MessageDecoder: Received message RpcResponse: 
> RpcResponse{requestId=7802669247197305083, 
> body=NettyManagedBuffer{buf=SimpleLeakAwareByteBuf(PooledUnsafeDirectByteBuf(ridx:
>  21, widx: 102, cap: 128))}}
> 20/09/18 21:35:53 TRACE ZooKeeperRegistry: Looking up meta region location in 
> ZK, 
> connection=org.apache.hudi.org.apache.hadoop.hbase.client.ZooKeeperRegistry@268d3bae
> 20/09/18 21:35:53 TRACE ZKUtil: hconnection-0x4f596c31-0x10000036821007a, 
> quorum=ip-10-31-253-39.ec2.internal:2181, baseZNode=/hbase Retrieved 51 
> byte(s) of data from znode /hbase/meta-region-server; 
> data=PBUF\x0A)\x0A\x1Dip-10-16-254...
> 20/09/18 21:35:53 TRACE ZooKeeperRegistry: Looked up meta region location, 
> connection=org.apache.hudi.org.apache.hadoop.hbase.client.ZooKeeperRegistry@268d3bae;
>  servers = ip-10-16-254-233.ec2.internal,16020,1600298383776 
> 20/09/18 21:35:53 TRACE MetaCache: Merged cached locations: 
> [region=hbase:meta,,1.1588230740, 
> hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0]
> 20/09/18 21:35:53 DEBUG RpcClientImpl: Use SIMPLE authentication for service 
> ClientService, sasl=false
> 20/09/18 21:35:53 DEBUG RpcClientImpl: Connecting to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: starting, 
> connections 1
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: marking at 
> should close, reason: null
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: closing ipc 
> connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: ipc connection 
> to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 closed
> 20/09/18 21:35:53 TRACE RpcClientImpl: IPC Client (669392975) connection to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 from hadoop: stopped, 
> connections 0
> 20/09/18 21:35:53 INFO RpcRetryingCaller: MESSAGE: Call to 
> ip-10-16-254-233.ec2.internal/10.16.254.233:16020 failed on local exception: 
> org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
>  Connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 is closing. 
> Call id=418, waitTime=2
> 20/09/18 21:35:53 INFO RpcRetryingCaller: 
> STACKTRACE[Ljava.lang.StackTraceElement;@20efcd07
> 20/09/18 21:35:53 INFO RpcRetryingCaller: CAUSE
> org.apache.hudi.org.apache.hadoop.hbase.exceptions.ConnectionClosingException:
>  Connection to ip-10-16-254-233.ec2.internal/10.16.254.233:16020 is closing. 
> Call id=418, waitTime=2
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1089)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:865)
>  at 
> org.apache.hudi.org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:582)
> 20/09/18 21:35:53 INFO RpcRetryingCaller: Call exception, tries=10, 
> retries=35, started=38363 ms ago, cancelled=false, msg=row 
> 'huditest,12345678,99999999999999' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, 
> hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0
> 20/09/18 21:35:53 TRACE MetaCache: Removed region=hbase:meta,,1.1588230740, 
> hostname=ip-10-16-254-233.ec2.internal,16020,1600298383776, seqNum=0 from 
> cache
>  
>  
> {code}
>  
> Even after adding the hbase jars to the session it will continue to hang. I 
> was able to resolve the hanging issue by building the hudi spark bundle jar 
> without shading the hbase related dependencies and adding them expliciting 
> when launching my spark shell so it seems like a problem with relocation.
>  
> Example of able to use hbase index successfully:
> {code:java}
> spark-shell --jars 
> /usr/lib/hudi/cli/lib/hbase-client-1.2.3.jar,/usr/lib/hudi/cli/lib/hbase-common-1.2.3.jar,usr/lib/hudi/cli/lib/hbase-protocol-1.2.3.jar,/usr/lib/hudi/cli/lib/htrace-core-3.1.0-incubating.jar,/usr/lib/hudi/cli/lib/metrics-core-2.2.0.jar,hudi-spark-bundle_2.11-0.6.0-amzn-0.jar,/usr/lib/spark/external/lib/spark-avro.jar
>  --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" --conf 
> "spark.sql.hive.convertMetastoreParquet=false"
>  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1289) Using hbase index in spark hangs in Hudi 0.6.0

Reply via email to