Re: Phoenix CSV Bulk Load fails to load a large file
Thanks, setting hbase.bulkload.retries.retryOnIOException to true in the configuration worked. My Hbase cluster is colocated with the Yarn cluster on EMR. On Thu, Sep 7, 2017 at 4:08 AM, Ankit Singhalwrote: > bq. This runs successfully if I split this into 2 files, but I'd like to > avoid doing that. > do you run a different job for each file? > > if your HBase cluster is not co-located with your yarn cluster then it may > be possible that copying of large HFile is timing out(this may happen due > to the fewer regions in HBase table or hot-spotting). you can check your > output directory for file sizes and see if you are hitting this problem. > Consider increasing the time-out or splitting the hot region. > > > > > On Thu, Sep 7, 2017 at 5:25 AM, Ted Yu wrote: > >> bq. hbase.bulkload.retries.retryOnIOException is disabled. Unable to >> recover >> >> The above is from HBASE-17165. >> >> See if the load can pass after enabling the config. >> >> On Wed, Sep 6, 2017 at 3:11 PM, Sriram Nookala >> wrote: >> >>> It finally times out with these exceptions >>> >>> ed Sep 06 21:38:07 UTC 2017, >>> RpcRetryingCaller{globalStartTime=1504731276347, >>> pause=100, retries=35}, java.io.IOException: Call to >>> ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local >>> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call >>> id=77, waitTime=60001, operationTimeout=6 expired. >>> >>> >>> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRet >>> ries(RpcRetryingCaller.java:159) >>> >>> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryA >>> tomicRegionLoad(LoadIncrementalHFiles.java:956) >>> >>> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.ca >>> ll(LoadIncrementalHFiles.java:594) >>> >>> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.ca >>> ll(LoadIncrementalHFiles.java:590) >>> >>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> >>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >>> Executor.java:1142) >>> >>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >>> lExecutor.java:617) >>> >>> at java.lang.Thread.run(Thread.java:745) >>> >>> Caused by: java.io.IOException: Call to ip-10-123-0-60.ec2.internal/10 >>> .123.0.60:16020 failed on local exception: >>> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, >>> waitTime=60001, operationTimeout=6 expired. >>> >>> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException( >>> AbstractRpcClient.java:292) >>> >>> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl >>> .java:1274) >>> >>> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMe >>> thod(AbstractRpcClient.java:227) >>> >>> at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcCha >>> nnelImplementation.callBlockingMethod(AbstractRpcClient.java:336) >>> >>> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Clie >>> ntService$BlockingStub.bulkLoadHFile(ClientProtos.java:35408) >>> >>> at org.apache.hadoop.hbase.protobuf.ProtobufUtil.bulkLoadHFile( >>> ProtobufUtil.java:1676) >>> >>> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.ca >>> ll(LoadIncrementalHFiles.java:656) >>> >>> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.ca >>> ll(LoadIncrementalHFiles.java:645) >>> >>> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRet >>> ries(RpcRetryingCaller.java:137) >>> >>> ... 7 more >>> >>> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call >>> id=77, waitTime=60001, operationTimeout=6 expired. >>> >>> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73) >>> >>> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl >>> .java:1248) >>> >>> ... 14 more >>> >>> 17/09/06 21:38:07 ERROR mapreduce.LoadIncrementalHFiles: >>> hbase.bulkload.retries.retryOnIOException is disabled. Unable to recover >>> >>> 17/09/06 21:38:07 INFO zookeeper.ZooKeeper: Session: 0x15e58ca21fc004c >>> closed >>> >>> 17/09/06 21:38:07 INFO zookeeper.ClientCnxn: EventThread shut down >>> >>> Exception in thread "main" java.io.IOException: BulkLoad encountered an >>> unrecoverable problem >>> >>> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulk >>> LoadPhase(LoadIncrementalHFiles.java:614) >>> >>> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBu >>> lkLoad(LoadIncrementalHFiles.java:463) >>> >>> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBu >>> lkLoad(LoadIncrementalHFiles.java:373) >>> >>> at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebu >>> lkload(AbstractBulkLoadTool.java:355) >>> >>> at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob( >>> AbstractBulkLoadTool.java:332) >>> >>> at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(A >>> bstractBulkLoadTool.java:270) >>> >>> at
Re: Phoenix CSV Bulk Load fails to load a large file
bq. This runs successfully if I split this into 2 files, but I'd like to avoid doing that. do you run a different job for each file? if your HBase cluster is not co-located with your yarn cluster then it may be possible that copying of large HFile is timing out(this may happen due to the fewer regions in HBase table or hot-spotting). you can check your output directory for file sizes and see if you are hitting this problem. Consider increasing the time-out or splitting the hot region. On Thu, Sep 7, 2017 at 5:25 AM, Ted Yuwrote: > bq. hbase.bulkload.retries.retryOnIOException is disabled. Unable to > recover > > The above is from HBASE-17165. > > See if the load can pass after enabling the config. > > On Wed, Sep 6, 2017 at 3:11 PM, Sriram Nookala > wrote: > >> It finally times out with these exceptions >> >> ed Sep 06 21:38:07 UTC 2017, RpcRetryingCaller{globalStartTime=1504731276347, >> pause=100, retries=35}, java.io.IOException: Call to >> ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local exception: >> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, >> waitTime=60001, operationTimeout=6 expired. >> >> >> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRet >> ries(RpcRetryingCaller.java:159) >> >> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryA >> tomicRegionLoad(LoadIncrementalHFiles.java:956) >> >> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2. >> call(LoadIncrementalHFiles.java:594) >> >> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2. >> call(LoadIncrementalHFiles.java:590) >> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> >> at java.lang.Thread.run(Thread.java:745) >> >> Caused by: java.io.IOException: Call to ip-10-123-0-60.ec2.internal/10 >> .123.0.60:16020 failed on local exception: >> org.apache.hadoop.hbase.ipc.CallTimeoutException: >> Call id=77, waitTime=60001, operationTimeout=6 expired. >> >> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException( >> AbstractRpcClient.java:292) >> >> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl >> .java:1274) >> >> at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMe >> thod(AbstractRpcClient.java:227) >> >> at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcCha >> nnelImplementation.callBlockingMethod(AbstractRpcClient.java:336) >> >> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ >> ClientService$BlockingStub.bulkLoadHFile(ClientProtos.java:35408) >> >> at org.apache.hadoop.hbase.protobuf.ProtobufUtil.bulkLoadHFile( >> ProtobufUtil.java:1676) >> >> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3. >> call(LoadIncrementalHFiles.java:656) >> >> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3. >> call(LoadIncrementalHFiles.java:645) >> >> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRet >> ries(RpcRetryingCaller.java:137) >> >> ... 7 more >> >> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, >> waitTime=60001, operationTimeout=6 expired. >> >> at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73) >> >> at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl >> .java:1248) >> >> ... 14 more >> >> 17/09/06 21:38:07 ERROR mapreduce.LoadIncrementalHFiles: >> hbase.bulkload.retries.retryOnIOException is disabled. Unable to recover >> >> 17/09/06 21:38:07 INFO zookeeper.ZooKeeper: Session: 0x15e58ca21fc004c >> closed >> >> 17/09/06 21:38:07 INFO zookeeper.ClientCnxn: EventThread shut down >> >> Exception in thread "main" java.io.IOException: BulkLoad encountered an >> unrecoverable problem >> >> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulk >> LoadPhase(LoadIncrementalHFiles.java:614) >> >> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBu >> lkLoad(LoadIncrementalHFiles.java:463) >> >> at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBu >> lkLoad(LoadIncrementalHFiles.java:373) >> >> at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebu >> lkload(AbstractBulkLoadTool.java:355) >> >> at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob( >> AbstractBulkLoadTool.java:332) >> >> at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(A >> bstractBulkLoadTool.java:270) >> >> at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(Abstra >> ctBulkLoadTool.java:183) >> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) >> >> at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoa >> dTool.java:101) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at
Re: Phoenix CSV Bulk Load fails to load a large file
bq. hbase.bulkload.retries.retryOnIOException is disabled. Unable to recover The above is from HBASE-17165. See if the load can pass after enabling the config. On Wed, Sep 6, 2017 at 3:11 PM, Sriram Nookalawrote: > It finally times out with these exceptions > > ed Sep 06 21:38:07 UTC 2017, RpcRetryingCaller{globalStartTime=1504731276347, > pause=100, retries=35}, java.io.IOException: Call to > ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local exception: > org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, > waitTime=60001, operationTimeout=6 expired. > > > at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries( > RpcRetryingCaller.java:159) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. > tryAtomicRegionLoad(LoadIncrementalHFiles.java:956) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call( > LoadIncrementalHFiles.java:594) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call( > LoadIncrementalHFiles.java:590) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.io.IOException: Call to ip-10-123-0-60.ec2.internal/10 > .123.0.60:16020 failed on local exception: > org.apache.hadoop.hbase.ipc.CallTimeoutException: > Call id=77, waitTime=60001, operationTimeout=6 expired. > > at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException( > AbstractRpcClient.java:292) > > at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1274) > > at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod( > AbstractRpcClient.java:227) > > at org.apache.hadoop.hbase.ipc.AbstractRpcClient$ > BlockingRpcChannelImplementation.callBlockingMethod( > AbstractRpcClient.java:336) > > at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$ > BlockingStub.bulkLoadHFile(ClientProtos.java:35408) > > at org.apache.hadoop.hbase.protobuf.ProtobufUtil. > bulkLoadHFile(ProtobufUtil.java:1676) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call( > LoadIncrementalHFiles.java:656) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call( > LoadIncrementalHFiles.java:645) > > at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries( > RpcRetryingCaller.java:137) > > ... 7 more > > Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, > waitTime=60001, operationTimeout=6 expired. > > at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73) > > at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1248) > > ... 14 more > > 17/09/06 21:38:07 ERROR mapreduce.LoadIncrementalHFiles: > hbase.bulkload.retries.retryOnIOException is disabled. Unable to recover > > 17/09/06 21:38:07 INFO zookeeper.ZooKeeper: Session: 0x15e58ca21fc004c > closed > > 17/09/06 21:38:07 INFO zookeeper.ClientCnxn: EventThread shut down > > Exception in thread "main" java.io.IOException: BulkLoad encountered an > unrecoverable problem > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase( > LoadIncrementalHFiles.java:614) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad( > LoadIncrementalHFiles.java:463) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad( > LoadIncrementalHFiles.java:373) > > at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebulkload( > AbstractBulkLoadTool.java:355) > > at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob( > AbstractBulkLoadTool.java:332) > > at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData( > AbstractBulkLoadTool.java:270) > > at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run( > AbstractBulkLoadTool.java:183) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > > at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main( > CsvBulkLoadTool.java:101) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > > Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: > Failed after attempts=35, exceptions: > > Wed Sep 06 20:55:36 UTC 2017, RpcRetryingCaller{globalStartTime=1504731276347, > pause=100, retries=35}, java.io.IOException: Call to > ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local exception: > org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=9, >
Re: Phoenix CSV Bulk Load fails to load a large file
It finally times out with these exceptions ed Sep 06 21:38:07 UTC 2017, RpcRetryingCaller{globalStartTime=1504731276347, pause=100, retries=35}, java.io.IOException: Call to ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, waitTime=60001, operationTimeout=6 expired. at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:956) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:594) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:590) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Call to ip-10-123-0-60.ec2.internal/ 10.123.0.60:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, waitTime=60001, operationTimeout=6 expired. at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:292) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1274) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.bulkLoadHFile(ClientProtos.java:35408) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.bulkLoadHFile(ProtobufUtil.java:1676) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:656) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:645) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:137) ... 7 more Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, waitTime=60001, operationTimeout=6 expired. at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1248) ... 14 more 17/09/06 21:38:07 ERROR mapreduce.LoadIncrementalHFiles: hbase.bulkload.retries.retryOnIOException is disabled. Unable to recover 17/09/06 21:38:07 INFO zookeeper.ZooKeeper: Session: 0x15e58ca21fc004c closed 17/09/06 21:38:07 INFO zookeeper.ClientCnxn: EventThread shut down Exception in thread "main" java.io.IOException: BulkLoad encountered an unrecoverable problem at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:614) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:463) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:373) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebulkload(AbstractBulkLoadTool.java:355) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob(AbstractBulkLoadTool.java:332) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(AbstractBulkLoadTool.java:270) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(AbstractBulkLoadTool.java:183) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions: Wed Sep 06 20:55:36 UTC 2017, RpcRetryingCaller{globalStartTime=1504731276347, pause=100, retries=35}, java.io.IOException: Call to ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=9, waitTime=60002, operationTimeout=6 expired. On Wed, Sep 6, 2017 at 5:01 PM, Sriram Nookalawrote: > Phoenix 4.11.0, HBase 1.3.1 > > This is what I get from jstack > > "main" #1 prio=5 os_prio=0 tid=0x7fb3d0017000 nid=0x5de7 waiting on > condition [0x7fb3d75f7000] > >java.lang.Thread.State: WAITING (parking) > > at sun.misc.Unsafe.park(Native Method) > > - parking to wait for <0xf588> (a java.util.concurrent. > FutureTask) > > at
Re: Phoenix CSV Bulk Load fails to load a large file
Phoenix 4.11.0, HBase 1.3.1 This is what I get from jstack "main" #1 prio=5 os_prio=0 tid=0x7fb3d0017000 nid=0x5de7 waiting on condition [0x7fb3d75f7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xf588> (a java.util.concurrent.FutureTask) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at java.util.concurrent.FutureTask.get(FutureTask.java:191) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:604) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:463) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:373) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebulkload(AbstractBulkLoadTool.java:355) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob(AbstractBulkLoadTool.java:332) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(AbstractBulkLoadTool.java:270) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(AbstractBulkLoadTool.java:183) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) On Wed, Sep 6, 2017 at 4:16 PM, Sergey Soldatovwrote: > Do you have more details on the version of Phoenix/HBase you are using as > well as how it hangs (Exceptions/messages that may help to understand the > problem)? > > Thanks, > Sergey > > On Wed, Sep 6, 2017 at 1:13 PM, Sriram Nookala > wrote: > >> I'm trying to load a 3.5G file with 60 million rows using >> CsvBulkLoadTool. It hangs while loading HFiles. This runs successfully if I >> split this into 2 files, but I'd like to avoid doing that. This is on >> Amazon EMR, is this an issue due to disk space or memory. I have a single >> master and 2 region server configuration with 16 GB memory on each node. >> > >
Re: Phoenix CSV Bulk Load fails to load a large file
Do you have more details on the version of Phoenix/HBase you are using as well as how it hangs (Exceptions/messages that may help to understand the problem)? Thanks, Sergey On Wed, Sep 6, 2017 at 1:13 PM, Sriram Nookalawrote: > I'm trying to load a 3.5G file with 60 million rows using CsvBulkLoadTool. > It hangs while loading HFiles. This runs successfully if I split this into > 2 files, but I'd like to avoid doing that. This is on Amazon EMR, is this > an issue due to disk space or memory. I have a single master and 2 region > server configuration with 16 GB memory on each node. >