Re: Phoenix CSV Bulk Load fails to load a large file
bq. hbase.bulkload.retries.retryOnIOException is disabled. Unable to recover The above is from HBASE-17165. See if the load can pass after enabling the config. On Wed, Sep 6, 2017 at 3:11 PM, Sriram Nookala wrote: > It finally times out with these exceptions > > ed Sep 06 21:38:07 UTC 2017, RpcRetryingCaller{globalStartTime=1504731276347, > pause=100, retries=35}, java.io.IOException: Call to > ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local exception: > org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, > waitTime=60001, operationTimeout=6 expired. > > > at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries( > RpcRetryingCaller.java:159) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles. > tryAtomicRegionLoad(LoadIncrementalHFiles.java:956) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call( > LoadIncrementalHFiles.java:594) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call( > LoadIncrementalHFiles.java:590) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.io.IOException: Call to ip-10-123-0-60.ec2.internal/10 > .123.0.60:16020 failed on local exception: > org.apache.hadoop.hbase.ipc.CallTimeoutException: > Call id=77, waitTime=60001, operationTimeout=6 expired. > > at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException( > AbstractRpcClient.java:292) > > at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1274) > > at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod( > AbstractRpcClient.java:227) > > at org.apache.hadoop.hbase.ipc.AbstractRpcClient$ > BlockingRpcChannelImplementation.callBlockingMethod( > AbstractRpcClient.java:336) > > at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$ > BlockingStub.bulkLoadHFile(ClientProtos.java:35408) > > at org.apache.hadoop.hbase.protobuf.ProtobufUtil. > bulkLoadHFile(ProtobufUtil.java:1676) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call( > LoadIncrementalHFiles.java:656) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call( > LoadIncrementalHFiles.java:645) > > at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries( > RpcRetryingCaller.java:137) > > ... 7 more > > Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, > waitTime=60001, operationTimeout=6 expired. > > at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73) > > at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1248) > > ... 14 more > > 17/09/06 21:38:07 ERROR mapreduce.LoadIncrementalHFiles: > hbase.bulkload.retries.retryOnIOException is disabled. Unable to recover > > 17/09/06 21:38:07 INFO zookeeper.ZooKeeper: Session: 0x15e58ca21fc004c > closed > > 17/09/06 21:38:07 INFO zookeeper.ClientCnxn: EventThread shut down > > Exception in thread "main" java.io.IOException: BulkLoad encountered an > unrecoverable problem > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase( > LoadIncrementalHFiles.java:614) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad( > LoadIncrementalHFiles.java:463) > > at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad( > LoadIncrementalHFiles.java:373) > > at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebulkload( > AbstractBulkLoadTool.java:355) > > at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob( > AbstractBulkLoadTool.java:332) > > at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData( > AbstractBulkLoadTool.java:270) > > at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run( > AbstractBulkLoadTool.java:183) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > > at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main( > CsvBulkLoadTool.java:101) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > > Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: > Failed after attempts=35, exceptions: > > Wed Sep 06 20:55:36 UTC 2017, RpcRetryingCaller{globalStartTime=1504731276347, > pause=100, retries=35}, java.io.IOException: Call to > ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local exception: > org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=9, > waitTime=60002, operationTim
Re: Phoenix CSV Bulk Load fails to load a large file
It finally times out with these exceptions ed Sep 06 21:38:07 UTC 2017, RpcRetryingCaller{globalStartTime=1504731276347, pause=100, retries=35}, java.io.IOException: Call to ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, waitTime=60001, operationTimeout=6 expired. at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:956) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:594) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:590) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Call to ip-10-123-0-60.ec2.internal/ 10.123.0.60:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, waitTime=60001, operationTimeout=6 expired. at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:292) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1274) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.bulkLoadHFile(ClientProtos.java:35408) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.bulkLoadHFile(ProtobufUtil.java:1676) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:656) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3.call(LoadIncrementalHFiles.java:645) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:137) ... 7 more Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=77, waitTime=60001, operationTimeout=6 expired. at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:73) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1248) ... 14 more 17/09/06 21:38:07 ERROR mapreduce.LoadIncrementalHFiles: hbase.bulkload.retries.retryOnIOException is disabled. Unable to recover 17/09/06 21:38:07 INFO zookeeper.ZooKeeper: Session: 0x15e58ca21fc004c closed 17/09/06 21:38:07 INFO zookeeper.ClientCnxn: EventThread shut down Exception in thread "main" java.io.IOException: BulkLoad encountered an unrecoverable problem at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:614) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:463) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:373) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebulkload(AbstractBulkLoadTool.java:355) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob(AbstractBulkLoadTool.java:332) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(AbstractBulkLoadTool.java:270) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(AbstractBulkLoadTool.java:183) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions: Wed Sep 06 20:55:36 UTC 2017, RpcRetryingCaller{globalStartTime=1504731276347, pause=100, retries=35}, java.io.IOException: Call to ip-10-123-0-60.ec2.internal/10.123.0.60:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=9, waitTime=60002, operationTimeout=6 expired. On Wed, Sep 6, 2017 at 5:01 PM, Sriram Nookala wrote: > Phoenix 4.11.0, HBase 1.3.1 > > This is what I get from jstack > > "main" #1 prio=5 os_prio=0 tid=0x7fb3d0017000 nid=0x5de7 waiting on > condition [0x7fb3d75f7000] > >java.lang.Thread.State: WAITING (parking) > > at sun.misc.Unsafe.park(Native Method) > > - parking to wait for <0xf588> (a java.util.concurrent. > FutureTask) > > at java.util.concurrent.locks.Lock
Re: Phoenix CSV Bulk Load fails to load a large file
Phoenix 4.11.0, HBase 1.3.1 This is what I get from jstack "main" #1 prio=5 os_prio=0 tid=0x7fb3d0017000 nid=0x5de7 waiting on condition [0x7fb3d75f7000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xf588> (a java.util.concurrent.FutureTask) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429) at java.util.concurrent.FutureTask.get(FutureTask.java:191) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.bulkLoadPhase(LoadIncrementalHFiles.java:604) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:463) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:373) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.completebulkload(AbstractBulkLoadTool.java:355) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.submitJob(AbstractBulkLoadTool.java:332) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(AbstractBulkLoadTool.java:270) at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(AbstractBulkLoadTool.java:183) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) On Wed, Sep 6, 2017 at 4:16 PM, Sergey Soldatov wrote: > Do you have more details on the version of Phoenix/HBase you are using as > well as how it hangs (Exceptions/messages that may help to understand the > problem)? > > Thanks, > Sergey > > On Wed, Sep 6, 2017 at 1:13 PM, Sriram Nookala > wrote: > >> I'm trying to load a 3.5G file with 60 million rows using >> CsvBulkLoadTool. It hangs while loading HFiles. This runs successfully if I >> split this into 2 files, but I'd like to avoid doing that. This is on >> Amazon EMR, is this an issue due to disk space or memory. I have a single >> master and 2 region server configuration with 16 GB memory on each node. >> > >
Re: Phoenix CSV Bulk Load fails to load a large file
Do you have more details on the version of Phoenix/HBase you are using as well as how it hangs (Exceptions/messages that may help to understand the problem)? Thanks, Sergey On Wed, Sep 6, 2017 at 1:13 PM, Sriram Nookala wrote: > I'm trying to load a 3.5G file with 60 million rows using CsvBulkLoadTool. > It hangs while loading HFiles. This runs successfully if I split this into > 2 files, but I'd like to avoid doing that. This is on Amazon EMR, is this > an issue due to disk space or memory. I have a single master and 2 region > server configuration with 16 GB memory on each node. >
Phoenix CSV Bulk Load fails to load a large file
I'm trying to load a 3.5G file with 60 million rows using CsvBulkLoadTool. It hangs while loading HFiles. This runs successfully if I split this into 2 files, but I'd like to avoid doing that. This is on Amazon EMR, is this an issue due to disk space or memory. I have a single master and 2 region server configuration with 16 GB memory on each node.
Re: Phoenix CSV Bulk Load Tool Date format for TIMESTAMP
I'm still trying to set those up in Amazon EMR. However, setting the ` phoenix.query.dateFormatTimeZone` wouldn't fix the issue for all files since we could receive a different date format in some other type of files. Is there an option to write a custom mapper to transform the date? On Tue, Sep 5, 2017 at 2:50 PM, Josh Elser wrote: > Sriram, > > Did you set the timezone and date-format configuration properties > correctly for your environment? > > See `phoenix.query.dateFormatTimeZone` and `phoenix.query.dateFormat` as > described http://phoenix.apache.org/tuning.html > > > On 9/5/17 2:05 PM, Sriram Nookala wrote: > >> I'm trying to bulkload data using the CsvBulkLoadTool, one of the columns >> is a data in the format YYDD for example 20160912. I don't get an >> error, but the parsing is wrong and when I use sqlline I see the date show >> up as 20160912-01-01 00:00:00.000. I had assumed as per the fix for >> https://issues.apache.org/jira/browse/PHOENIX-1127 all data values would >> be parsed correctly. >> >
Re: Support of OFFSET in Phoenix 4.7
Hi Sumanta, Here you have the answer. You already asked the same question some months ago :) https://mail-archives.apache.org/mod_mbox/phoenix-user/201705.mbox/browser >From 4.8 regards, rafa On Wed, Sep 6, 2017 at 9:19 AM, Sumanta Gh wrote: > Hi, > From which version of Phoenix pagination with OFFSET is supported. It > seems this is not supported in 4.7 > > https://phoenix.apache.org/paged.html > > regards, > Sumanta > > =-=-= > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > >
Support of OFFSET in Phoenix 4.7
Hi, >From which version of Phoenix pagination with OFFSET is supported. It seems >this is not supported in 4.7 https://phoenix.apache.org/paged.html regards, Sumanta =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
Re: How to speed up write performance
Hi Hef, Have you had a chance to read our Tuning Guide [1] yet? There's a lot of good, general guidance there. There are some optimizations for write performance that depend on how you expect/allow your data and schema to change: 1) Is your data write-once? Make sure to declare your table with the IMMUTABLE_ROWS=true property[2]. That will lower the overhead of a secondary index as it's not necessary to read the data row (to get the old value) prior to writing it when there are secondary indexes. 2) Does your schema only change in an append-only manner? For example, are columns only added, but never removed? If so, you can declare your table as APPEND_ONLY_SCHEMA as described here [2]. 3) Does your schema never or rarely change at know times? If so, you can declare an UPDATE_CACHE_FREQUENCY property as described here [2] to reduce the RPC traffic. 4) Can you bulk load data [3] and then add or rebuild the index afterwards? 5) Have you investigated using local indexes [4]? They're optimized for write speed since they ensure that the index data is on the same region server as the data (i.e. all writes are local to the region server, no cross region server calls, but there's some overhead at read time). 6) Have you considered not using secondary indexes and just letting your less common queries be slower? Keep in mind, with secondary indexes, you're essentially writing your data twice. You'll need to expect that your write performance will drop. As usual, there's a set of tradeoffs that you need to understand and choose according to your requirements. Thanks, James [1] https://phoenix.apache.org/tuning_guide.html [2] https://phoenix.apache.org/language/index.html#options [3] https://phoenix.apache.org/bulk_dataload.html [4] https://phoenix.apache.org/secondary_indexing.html#Local_Indexes On Tue, Sep 5, 2017 at 11:48 AM, Josh Elser wrote: > 500writes/seconds seems very low to me. On my wimpy laptop, I can easily > see over 10K writes/second depending on the schema. > > The first check is to make sure that you have autocommit disabled. > Otherwise, every update you make via JDBC will trigger an HBase RPC. > Batching of RPCs to HBase is key to optimal performance via Phoenix. > > Regarding #2, unless you have intimate knowledge with how Phoenix writes > data to HBase, do not investigate this approach. > > > On 9/5/17 5:56 AM, Hef wrote: > >> Hi guys, >> I'm evaluating using Phoenix to replace MySQL for better scalability. >> The version I'm evaluating is 4.11-HBase-1.2, with some dependencies >> modified to match CDH5.9 which we are using. >> >> The problem I'm having is the write performance to Phoenix from JDBC is >> too poor, only 500writes/second, while our data's throughput is almost >> 50,000/s. My questions are: >> 1. If the 500/s TPS is normal speed? How fast can you achieve in your >> production? >> 2. Whether I can write directly into HBase with mutation API, and read >> from Phoenix, that could be fast. But I don't see the secondary index be >> created automatically in this case. >> >> Regards, >> Hef >> >