Re: HBase failing to restart in single-user mode
Same for me, I had faced similar issues especially on my virtual machines since I would restart them more often than my host machine. Moving ZK from /tmp which could get cleared on reboots fixed the issue for me. Thanks, Viral On Sun, May 17, 2015 at 10:39 PM, Lars George lars.geo...@gmail.com wrote: I noticed similar ZK related issues but those went away after changing the ZK directory to a permanent directory along with the HBase root directory. Both point now to a location in my home folder and restarts work fine now. Not much help but wanted to at least state that. Lars Sent from my iPhone On 18 May 2015, at 05:55, tsuna tsuna...@gmail.com wrote: Hi all, For testing on my laptop (OSX with JDK 1.7.0_45) I usually build the latest version from branch-1.0 and use the following config: configuration property namehbase.rootdir/name valuefile:///tmp/hbase-${user.name}/value /property property namehbase.online.schema.update.enable/name valuetrue/value /property property namezookeeper.session.timeout/name value30/value /property property namehbase.zookeeper.property.tickTime/name value200/value /property property namehbase.zookeeper.dns.interface/name valuelo0/value /property property namehbase.regionserver.dns.interface/name valuelo0/value /property property namehbase.master.dns.interface/name valuelo0/value /property /configuration Since at least a month ago (perhaps longer, I don’t remember exactly) I can’t restart HBase. The very first time it starts up fine, but subsequent startup attempts all fail with: 2015-05-17 20:39:19,024 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: starting 2015-05-17 20:39:19,024 INFO [RpcServer.listener,port=49809] ipc.RpcServer: RpcServer.listener,port=49809: starting 2015-05-17 20:39:19,029 INFO [main] http.HttpRequestLog: Http request log for http.requests.regionserver is not defined 2015-05-17 20:39:19,030 INFO [main] http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter) 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context regionserver 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2015-05-17 20:39:19,031 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2015-05-17 20:39:19,033 INFO [main] http.HttpServer: Jetty bound to port 49811 2015-05-17 20:39:19,033 INFO [main] mortbay.log: jetty-6.1.26 2015-05-17 20:39:19,157 INFO [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:49811 2015-05-17 20:39:19,222 INFO [M:0;localhost:49807] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4f708099 connecting to ZooKeeper ensemble=localhost:2181 2015-05-17 20:39:19,222 INFO [M:0;localhost:49807] zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=1 watcher=hconnection-0x4f7080990x0, quorum=localhost:2181, baseZNode=/hbase 2015-05-17 20:39:19,223 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) 2015-05-17 20:39:19,223 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session 2015-05-17 20:39:19,223 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /127.0.0.1:49812 2015-05-17 20:39:19,223 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Client attempting to establish new session at /127.0.0.1:49812 2015-05-17 20:39:19,224 INFO [SyncThread:0] server.ZooKeeperServer: Established session 0x14d651aaec2 with negotiated timeout 400 for client /127.0.0.1:49812 2015-05-17 20:39:19,224 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14d651aaec2, negotiated timeout = 400 2015-05-17 20:39:19,249 INFO [M:0;localhost:49807] regionserver.HRegionServer: ClusterId : 6ad7eddd-2886-4ff0-b377-a2ff42c8632f 2015-05-17 20:39:49,208 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Master not active after 30 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:194) at
Re: AsyncHBase 1.5.0 has been released
. Update the per-RPC header for HBase 0.95+. Refactor how RPC objects know what RPC method they're for. Compile Java code generated from protobuf files separately. Kill some trailing whitespaces. Add a helper function to de-serialize Protocol Buffer VarInt32. Add a semi-evil helper class to avoid copying byte arrays from protobufs. De-serialize RPC responses from HBase 0.95+. Add a helper function to de-serialize protocol buffers. Handle META lookups with 0.95+. Dedup byte arrays when deserializing KeyValues from 0.95+. Make sure we have all the data we need before de-serializing. Convert GetRequest to HBase 0.95+. Convert AtomicIncrementRequest to HBase 0.95+. Make the run target depend on the jar. Sync protobuf files with changes made in the HBase 0.95 branch. Add some make variables for the compilers used. Add support for cell blocks. Avoid unnecessary string copies on each RPC. Convert scanners and filters to HBase 0.95+. Add a missing accessor to retrieve the max number of KVs on a Scanner. Expose HBase 0.95's ability to limit the number of bytes returned by a scanner. Log the channel when logging outgoing RPCs. Convert single-put RPCs to HBase 0.95+. Ensure that RPCs that shouldn't get cell blocks don't. Convert the CAS RPC to HBase 0.95+. Convert single-Delete RPCs to HBase 0.95+. Catch up with the protobuf changes made in 0.96 Explicit row locks are no longer supported in HBase 0.95 and up. The 'getProtocolVersion' RPC no longer exists in 0.95+. Add support for batched edits with HBase 0.96+. Optimize multi-action a bit with HBase 0.95+ by sorting less. Remove some warnings that come up when compiling with JDK 7. Note that ColumnRangeFilter was added in HBase 0.92. Upgrade to logback 1.0.13. Upgrade to SLF4J 1.7.5. Make sure to not schedule timers with a negative timeout. Add some constructor overloads for GetRequest. Upgrade to async 1.4.0. Add a couple integration tests covering NoSuchColumnFamilyException. Fix distclean: remove com/google/protobuf from the build directory. Update NEWS for the v1.5.0 release. Prevent warnings due to generated protobuf classes. Properly flush queued up RPCs upon connecting. Make sure we compile for Java 6. Add a note regarding some infuriating limitations of the JRE. Fix an edge case in HBase 0.96 one-shot scanners. Add/improve a few toString() methods. Fix accesses to hbase:meta. Make Scanner usable with -ROOT- / hbase:meta. Handle an edge case with `prefetchMeta' related to 0.96 compatibility. Update NEWS/THANKS/AUTHORS. Fix the distclean rule so we can distclean twice in a row. Have pom.xml cleaned during `distclean'. Upgrade to Netty 3.8.0. Fix bug in deserialization of the `Multi' RPC in HBase 0.96 Add additional logging when decoding fails with an uncaught exception. Release 1.5.0. Brandon Forehand (1): Add support for prefetching the meta region. Phil Smith (1): Here's some one-liners to compile and run tests. St.Ack (1): Make mvn build accomodate protobuf files Viral Bajaria (2): Initial commit for ScanFilter. Add more scanner filters. Xun Liu (2): Properly honor timestamps in DeleteRequest. Add support to delete a value at the specified timestamp. -- Benoit tsuna Sigoure
Re: Online/Realtime query with filter and join?
Pradeep, correct me if I am wrong but prestodb has not released the HBase plugin as yet or they did and maybe I missed the announcement ? I agree with what Doug is saying here, you can't achieve 100ms on every kind of query on HBase unless and until you design the rowkey in a way to help you reduce your I/O. A full scan of a table with billions of rows and columns can take forever, but good indexing (via rowkey or secondary indexes) could help speed up. Thanks, Viral On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota pradeep...@gmail.comwrote: In addition to Impala and Pheonix, I'm going to throw PrestoDB into the mix. :) http://prestodb.io/ On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil doug.m...@explorysmedical.com wrote: You are going to want to figure out a rowkey (or a set of tables with rowkeys) to restrict the number of I/O's. If you just slap Impala in front of HBase (or even Phoenix, for that matter) you could write SQL against it but if it's winds up doing a full-scan of an Hbase table underneath you won't get your 100ms response time. Note: I'm not saying you can't do this with Impala or Phoenix, I'm just saying start with the rowkeys first so that you limit the I/O. Then start adding frameworks as needed (and/or build a schema with Phoenix in the same rowkey exercise). Such response-time requirements make me think that this is for application support, so why the requirement for SQL? Might want to start writing it as a Java program first. On 11/29/13 4:32 PM, Mourad K mourad...@gmail.com wrote: You might want to consider something like Impala or Phoenix, I presume you are trying to do some report query for dashboard or UI? MapReduce is certainly not adequate as there is too much latency on startup. If you want to give this a try, cdh4 and Impala are a good start. Mouradk On 29 Nov 2013, at 10:33, Ramon Wang ra...@appannie.com wrote: The general performance requirement for each query is less than 100 ms, that's the average level. Sounds crazy, but yes we need to find a way for it. Thanks Ramon On Fri, Nov 29, 2013 at 5:01 PM, yonghu yongyong...@gmail.com wrote: The question is what you mean of real-time. What is your performance request? In my opinion, I don't think the MapReduce is suitable for the real time data processing. On Fri, Nov 29, 2013 at 9:55 AM, Azuryy Yu azury...@gmail.com wrote: you can try phoniex. On 2013-11-29 3:44 PM, Ramon Wang ra...@appannie.com wrote: Hi Folks It seems to be impossible, but I still want to check if there is a way we can do complex query on HBase with Order By, JOIN.. etc like we have with normal RDBMS, we are asked to provided such a solution for it, any ideas? Thanks for your help. BTW, i think maybe impala from CDH would be a way to go, but haven't got time to check it yet. Thanks Ramon
Re: AsyncHBase 1.5.0-rc1 available for download and testing (HBase 0.96 compatibility inside)
. De-serialize the HBase 0.95+ znode that points to META. Fix the process of META lookups for HBase 0.95 and up. Send the proper hello message to HBase 0.95 and up. Remove some unused helper code to create buffers. Update the per-RPC header for HBase 0.95+. Refactor how RPC objects know what RPC method they're for. Compile Java code generated from protobuf files separately. Kill some trailing whitespaces. Add a helper function to de-serialize Protocol Buffer VarInt32. Add a semi-evil helper class to avoid copying byte arrays from protobufs. De-serialize RPC responses from HBase 0.95+. Add a helper function to de-serialize protocol buffers. Handle META lookups with 0.95+. Dedup byte arrays when deserializing KeyValues from 0.95+. Make sure we have all the data we need before de-serializing. Convert GetRequest to HBase 0.95+. Convert AtomicIncrementRequest to HBase 0.95+. Make the run target depend on the jar. Sync protobuf files with changes made in the HBase 0.95 branch. Add some make variables for the compilers used. Add support for cell blocks. Avoid unnecessary string copies on each RPC. Convert scanners and filters to HBase 0.95+. Add a missing accessor to retrieve the max number of KVs on a Scanner. Expose HBase 0.95's ability to limit the number of bytes returned by a scanner. Log the channel when logging outgoing RPCs. Convert single-put RPCs to HBase 0.95+. Ensure that RPCs that shouldn't get cell blocks don't. Convert the CAS RPC to HBase 0.95+. Convert single-Delete RPCs to HBase 0.95+. Catch up with the protobuf changes made in 0.96 Explicit row locks are no longer supported in HBase 0.95 and up. The 'getProtocolVersion' RPC no longer exists in 0.95+. Add support for batched edits with HBase 0.96+. Optimize multi-action a bit with HBase 0.95+ by sorting less. Remove some warnings that come up when compiling with JDK 7. Fix de-serialization of multi-packet responses. Note that ColumnRangeFilter was added in HBase 0.92. Fix up a logging message. Upgrade to logback 1.0.13. Upgrade to SLF4J 1.7.5. Make sure to not schedule timers with a negative timeout. Add some constructor overloads for GetRequest. Upgrade to async 1.4.0. Add a couple integration tests covering NoSuchColumnFamilyException. Fix distclean: remove com/google/protobuf from the build directory. Make sure we compile for Java 6. Update NEWS for the v1.5.0 release. Prevent warnings due to generated protobuf classes. Upgrade to Netty 3.7.0. Viral Bajaria (2): Initial commit for ScanFilter. Add more scanner filters. Xun Liu (1): Properly honor timestamps in DeleteRequest. -- Benoit tsuna Sigoure
Re: Scan performance
Hi Tony, I know it's been a while and am not sure if you already figured out the issue but try taking at HBASE-9079 and see if it's similar to the problem that you are facing with FuzzyRowFilter. I have attached a patch to that ticket too and have verified that it fixed things for me in production. Thanks, Viral On Tue, Jul 16, 2013 at 8:07 PM, Tony Dean tony.d...@sas.com wrote: I was able to test scan performance with 0.94.9 with around 6000 rows X 40 columns and FuzzyRowFilter gave us 2-4 times better performance. I was able to test this offline without any problems. However, once I turned it on in our development cluster, we noticed that with some row keys that should have matched were not matching. After reverting back to SingleColumnValueFilter the cases that were failing, began to work again. We thought that the anomaly was due to certain data in row key, but we managed to create identical row keys in a different table and see the scan work. So, bottom line I can't explain this behavior. Has anyone seen this behavior and does anyone have debugging tips? Thanks.
Re: FilterList: possible bug in getNextKeyHint
Attached are 2 patches: one of them is TestFail.patch where I show that the behavior is not as expected. On the other hand, the second patch is with the changes that I did to FilterList and the behavior is as expected. I have tested the state maintenance on two filters that implement getNextKeyHint but not on 3. I doubt that would break though. The part that I am most skeptical about is MUST_PASS_ONE since we can't bail out the filterKeyValue call as soon as we see the SEEK_NEXT_USING_HINT and so need that extra if statement before returning the ReturnCode. Let me know if something is not clear or as expected. Thanks, Viral On Mon, Jul 29, 2013 at 10:49 AM, Ted Yu yuzhih...@gmail.com wrote: Looking into FilterList#filterKeyValue() and FilterList#getNextKeyHint(), they both iterate through all the filters. Suppose there are 3 or more filters in the FilterList which implement getNextKeyHint(), how would the state be maintained ? Cheers
Re: FilterList: possible bug in getNextKeyHint
Attached the two test patches to this JIRA: https://issues.apache.org/jira/browse/HBASE-9079 On Mon, Jul 29, 2013 at 4:36 PM, Ted Yu yuzhih...@gmail.com wrote: Can you log a JIRA and attach the patches there ? Your attachments did not go through.
FilterList: possible bug in getNextKeyHint
Hi, I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method. The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird. For example: - create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint - wrap them in FilterList with MUST_PASS_ALL - FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code. - Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint. - The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator. I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different. Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head. Thanks, Viral
Re: optimizing block cache requests + eviction
Thanks guys for going through that never-ending email! I will create the JIRA for block cache eviction and the regionserver assignment command. Ted already pointed to the JIRA which tries to go a different datanode if the primary is busy (I will add comments to that one). To answer Andrews' questions: - I am using HBase 0.94.4 - I tried taking a stack trace using jstack but after the dump it crashed the regionserver. I also did not take the dump on the offending regionserver, rather took it on the regionservers that were making the block count. I will take a stack trace on the offending server. Is there any other tool besides jstack ? I don't want to crash my regionserver. - The HBase clients workload is fairly random and I write to a table every 4-5 seconds. I have varying workloads for different tables. But I do a lot of batching on the client side and group similar rowkeys together before doing a GET/PUT. For example: best case I end up doing ~100 puts every second to a region or in the worst case it's ~5K puts every second. But again since the workload is fairly random. Currently the clients for the table which had the most amount of data has been disabled and yet I see the heavy loads. To answer Vladimir's points: - Data access pattern definitely turns out to be uniform over a period of time. - I just did a sweep of my code base and found that there are a few places where Scanner are using block cache. I will disable that and see how it goes. Thanks, Viral
Re: optimizing block cache requests + eviction
I was able to reproduce the same regionserver asking for the same local block over 300 times within the same 2 minute window by running one of my heavy workloads. Let me try and gather some stack dumps. I agree that jstack crashing the jvm is concerning but there is nothing in the errors to know why it happened. I will keep that conversation out of here. As an addendum, I am using asynchbase as my client. Not sure if the arrival of multiple requests for rowkeys that could be in the same non-cached block causes hbase to queue up a non-cached block read via SCR and since the box is under load, it queues up multiple of these and makes the problem worse. Thanks, Viral On Mon, Jul 8, 2013 at 3:53 PM, Andrew Purtell apurt...@apache.org wrote: but unless the behavior you see is the _same_ regionserver asking for the _same_ block many times consecutively, it's probably workload related.
Re: optimizing block cache requests + eviction
Good question. When I looked at the logs, it's not clear from it whether it's reading a meta or data block. Is there any kind of log line that indicates that ? Given that it's saying that it's ready from a startOffset I would assume this is a data block. A question that comes to mind, is this read doing a seek to that position directly or is it going to cache the block ? Looks like it is not caching the block if it's reading directly from a given offset. Or am I wrong ? Following is a sample line that I used while debugging: 2013-07-08 22:58:55,221 DEBUG org.apache.hadoop.hdfs.DFSClient: New BlockReaderLocal for file /mnt/data/current/subdir34/subdir26/blk_-448970697931783518 of size 67108864 startOffset 13006577 length 54102287 short circuit checksum true On Mon, Jul 8, 2013 at 4:37 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Do you know if it's a data or meta block?
Re: optimizing block cache requests + eviction
We haven't disable block cache. So I doubt that's the problem. On Mon, Jul 8, 2013 at 4:50 PM, Varun Sharma va...@pinterest.com wrote: FYI, if u disable your block cache - you will ask for Index blocks for every single request. So such a high rate of request is plausible for Index blocks even when your requests are totally random on your data. Varun
Re: question about clienttrace logs in hdfs and shortcircuit read
Asaf, the hdfsBlocksLocalityIndex is around 76 and it's 86 for the regionserver which is under the heaviest load for IO. Ram, I saw that you updated the JIRA saying the checksum metrics are available in the regionserver. What group are they published under ? I checked my ganglia stats and can't find it under hbase.RegionServerDynamicStatistics. After looking at the code I found that the metrics are registered under regionserver but on my ganglia side I never see any metrics that are tagged as regionserver. Am I missing a config to send those metrics out to ganglia ? Thanks, Viral On Thu, Jul 4, 2013 at 3:50 AM, Asaf Mesika asaf.mes...@gmail.com wrote: What the hdfs data locality metric? And remote read and local read?
Re: question about clienttrace logs in hdfs and shortcircuit read
I saw the same code and also saw the following in RegionServerMetrics.java /** * Number of times checksum verification failed. */ public final MetricsLongValue checksumFailuresCount = new MetricsLongValue(checksumFailuresCount, registry); The registry is then registered in JMX via: // export for JMX statistics = new RegionServerStatistics(this.registry, name); And in doUpdates() I see the following: this.checksumFailuresCount.pushMetric(this.metricsRecord); In the constructor I see: // export for JMX statistics = new RegionServerStatistics(this.registry, name); Which makes me feel the metrics are exported. But for some reason in my ganglia all I see is hbase.RegionServerDynamicStatistics which comes from a different class. I don't see a group for RegionServerStatistics which is strange. Let me dig deeper and see if the metrics are getting suppressed somewhere. Thanks, Viral On Thu, Jul 4, 2013 at 11:21 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Am not sure on the Ganglia side, but seeing the code i found this {code} // For measuring number of checksum failures static final AtomicLong checksumFailures = new AtomicLong(); {code} And it was getting updated too. Let me check on the JMX side if it really getting sent out to the Metrics system. If not we can address that in the JIRA. Regards Ram
Re: question about clienttrace logs in hdfs and shortcircuit read
Yes I was checking 0.94 code. And sorry for the brain fart, I just spotted the metric in ganglia. There are just too many metrics in ganglia and skipped this one! It was under the group hbase.regionserver, while I was expected it to be hbase.regionserver.RegionServerStatistics. The chart shows no failures (i.e. it has no data to show). So now I need to figure out if shortcircuit is on or not. Is there an easier way to do via JMX if not by logs ? I tried to grep the logs for shortcircuit, circuit and didn't find anything in the last 2 days. I restarted a region server yesterday and would have expected it to show up on that one atleast. On Fri, Jul 5, 2013 at 12:30 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Are you checking with 0.94 code? I believe so.
Re: question about clienttrace logs in hdfs and shortcircuit read
No worries, Anoop. Here is some clarification for this chain. It started initially to figure out how to check whether SCR is effective at the RS or not. I could not find the metric anywhere in ganglia/JMX and didn't find any RegionServer level metric either and so started looking at my DN logs. I saw a lot of clienttrace logs which are basically logs for HDFS_READ and HDFS_WRITE ops. I wanted to see if it's a valid assumption that SCR is working if I don't see any clienttrace logs for the RS that is hosted on the same box as the DN. Hopefully I clarified it. On Fri, Jul 5, 2013 at 12:55 AM, Anoop John anoop.hb...@gmail.com wrote: Agree that right now the HBase handled checksum will work only with SCR. But it might work without SCR also later (as support come from HDFS).. So I am not getting the direct relation btw the SCR metric and this one. Am I missed some thing in the mail loop? Sorry just trying to be clear on this :)
Re: question about clienttrace logs in hdfs and shortcircuit read
Sweet! enabled debug logging for org.apache.hadoop.hdfs.DFSClient and found the New BlockReaderLocal log line. Got some verification that SCR is ON and working fine. Regarding no clienttrace lines in DN, I verified that too. Last time I saw a few lines since I forgot to remove HDFS_WRITE lines. After I typed my last email I realized that mistake. On Fri, Jul 5, 2013 at 1:07 AM, Anoop John anoop.hb...@gmail.com wrote: As I said pls check the DFS client side log(RS log) for BlockReaderLocal. AFAIK there should not be HDFS_READ op log in DN side when the SCR is happening..
question about clienttrace logs in hdfs and shortcircuit read
Hi, If I have enabled shortcircuit reads, should I ever be seeing clienttrace logs in the datanode for the regionserver DFSClient that is co-located with the datanode ? Besides that is there any other way to verify that my setting for short circuit reads is working fine. Thanks, Viral
Re: question about clienttrace logs in hdfs and shortcircuit read
I looked up the ganglia metrics that I have setup for the cluster (both HBase and HDFS) and don't see it there. Is it not published to ganglia ? On Wed, Jul 3, 2013 at 11:33 PM, Asaf Mesika asaf.mes...@gmail.com wrote: I think there is a metric in HBase and HDFS (JMX) reflecting that. If you find it and find it useful, do tell...
Re: question about clienttrace logs in hdfs and shortcircuit read
Currently datanode shows a lot of clienttrace logs for DFSClient. I did a quick command line check to see how many clienttrace do I get per active RegionServer and it seems the local RegionServer had very few ( 1%). Given that datanode logs are too noisy with clienttrace, I was hoping to find the metrics in an easier way. But maybe I will hunt the logs now. Any pointers on what type of log to look for ? Thanks, Viral On Thu, Jul 4, 2013 at 1:57 AM, Azuryy Yu azury...@gmail.com wrote: if SCR take effect, you can see releated logs in the datanode log.
Re: question about clienttrace logs in hdfs and shortcircuit read
Created the JIRA at: https://issues.apache.org/jira/browse/HBASE-8868 Sorry if I got a few fields wrong, will learn from this one to open better JIRAs going forward. Thanks, Viral On Thu, Jul 4, 2013 at 2:02 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: I think we should have some explicit indicator for this feature. Mind filing a JIRA for this? Regards Ram
Re: HBASE-7846 : is it safe to use on 0.94.4 ?
I ended up writing a tool which helps merge the table regions into a target # of regions. For example if you want to go from N -- N/8, then the tool figures out the grouping and merges them in one pass. I will put it up in a github repo soon and share it here. The sad part of this approach is the downtime required. It's taking over 2 hours on my test cluster which is less than 30% of the production table size. In absolute value, the table has over 100 regions and I am merging it down to 20 or so and it has 20GB of compressed (lzo) data. Is there a better way to achieve this ? If not, should I open a JIRA to explore the chance of running the Merge util on a disabled table rather than having to shutdown the entire cluster ? It will also be great to ignore compaction when merging the table and then do it as a later step since that can happen online. Just throwing some ideas here. Thanks, Viral On Tue, Jul 2, 2013 at 11:22 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Viral, It was working fine when I did it. I'm not sure you can still apply it to a recent HBase version because some code change. But I can take a look to see if I can rebase it... JM
Re: HBASE-7846 : is it safe to use on 0.94.4 ?
Found this while going through the online merge jira... https://issues.apache.org/jira/browse/HBASE-8217 The comments were interesting and I as an user would agree to the fact that supplying a patch is good and it's on me to decide whether I should use it or not. The core committee obviously is pushing for 0.96 but given that we have clusters running older versions, I feel this tool would have helped my current case immensely. I am all for disabling a region and then doing a merge. To shutdown the whole cluster and put things in maintenance mode is something that I would love to avoid. On a side note, it will help me avoid staying up all night :-) I will give this patch a try and provide my feedback. Maybe it might help some other folks out there. Thanks, Viral On Wed, Jul 3, 2013 at 4:21 AM, Ted Yu yuzhih...@gmail.com wrote: Would online merge help (https://issues.apache.org/jira/browse/HBASE-7403) ? The feature is not in 0.94 though. Cheers
Re: How many column families in one table ?
When you did the scan, did you check what the bottleneck was ? Was it I/O ? Did you see any GC locks ? How much RAM are you giving to your RS ? -Viral On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: To completely scan the table for all 140 columns , it takes around 30-40 minutes.
HBASE-7846 : is it safe to use on 0.94.4 ?
Hi, Just wanted to check if it's safe to use the JIRA mentioned in the subject i.e. https://issues.apache.org/jira/browse/HBASE-7846 Thanks, Viral
Re: How many column families in one table ?
On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain vkj...@gmail.com wrote: Sorry for the typo .. please ignore previous mail.. Here is the corrected one.. 1)I have around 140 columns for each row , out of 140 , around 100 columns hold java primitive data type , remaining 40 columns contain serialized java object as byte array(Inside each object is an ArrayList). Yes , I do delete data but the frequency is very less ( 1 out of 5K operations ). I dont run any compaction. This answers the type of data in each cell not the size of data. Can you figure out the average size of data that you insert in that size. For example what is the length of the byte array ? Also for java primitive, is it 8-byte long ? 4-byte int ? In addition to that, what is in the row key ? How long is that in bytes ? Same for column family, can you share the names of the column family ? How about qualifiers ? If you have disabled major compactions, you should run it once a few days (if not once a day) to consolidate the # of files that each scan will have to open. 2) I had ran scan keeping in mind the CPU,IO and other system related parameters.I found them to be normal with system load being 0.1-0.3. How many disks do you have in your box ? Have you ever benchmarked the hardware ? Thanks, Viral
Re: how can i do compaction manually?
You can use hbase shell and run major_compact 'tablename' or you could run echo major_compact 'tablename' | hbase shell On Sun, Jun 30, 2013 at 7:51 PM, ch huang justlo...@gmail.com wrote: i want clean the data that is deleted ,question is which command i can execute on commandline? thanks all!
Re: 答复: flushing + compactions after config change
On Fri, Jun 28, 2013 at 9:31 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Thu, Jun 27, 2013 at 4:27 PM, Viral Bajaria viral.baja...@gmail.com wrote: It's not random, it picks the region with the most data in its memstores. That's weird, because I see some of my regions which receive the least amount of data in a given time period flushing before the regions that are receiving data continuously. The reason I know this is because of the write pattern. Some of my tables are in catch-up mode i.e. I am ingesting data from the past and they always have something to do. While some tables are not in catch-up mode and are just sitting idle for most of the time. Yet I see high number of flushes for those regions too. I doubt that it's the fact that it's a major compaction that it's making everything worse. When a minor gets promoted into a major it's because we're already going to compact all the files, so we might as well get rid of some deletes at the same time. They are all getting selected because the files are within the selection ratio. I would not focus on this to resolve your problem. I meant worse for my writes not for HBase as a whole. I haven't been closely following this thread, but have you posted a log snippet somewhere? It's usually much more telling and we eliminate a few levels of interpretation. Make sure it's at DEBUG, and that you grab a few hours of activity. Get the GC log for the same time as well. Drop this on a web server or pastebin if it fits. The only log snippet that I posted was the flushing action. Also that log was not everything, I had grep'd a few lines out. Let me collect some more stats here and post it again. I just enabled GC logging on this server, deployed the wrong config out initially which had no GC logging. I am not sure how GC logs will help here given that I am at less than 50% heap space used and so I would doubt a stop the world GC happening. Are you trying to look for some other information ? Thx, J-D Thanks to everyone on this list for their time! -Viral
flushing + compactions after config change
Hi All, I wanted some help on understanding what's going on with my current setup. I updated from config to the following settings: property namehbase.hregion.max.filesize/name value107374182400/value /property property namehbase.hregion.memstore.block.multiplier/name value4/value /property property namehbase.hregion.memstore.flush.size/name value134217728/value /property property namehbase.hstore.blockingStoreFiles/name value50/value /property property namehbase.hregion.majorcompaction/name value0/value /property Prior to this, all the settings were default values. I wanted to increase the write throughput on my system and also control when major compactions happen. In addition to that, I wanted to make sure that my regions don't split quickly. After the change in settings, I am seeing a huge storm of memstore flushing and minor compactions some of which get promoted to major compaction. The compaction queue is also way too high. For example a few of the line that I see in the logs are as follows: http://pastebin.com/Gv1S9GKX The regionserver whose logs are pasted above keeps on flushing and creating those small files shows the follwoing metrics: memstoreSizeMB=657, compactionQueueSize=233, flushQueueSize=0, usedHeapMB=3907, maxHeapMB=10231 I am unsure why it's causing such high amount of flush ( 100m) even though the flush size is at 128m and there is no memory pressure. Any thoughts ? Let me know if you need any more information, I also have ganglia running and can provide more metrics if needed. Thanks, Viral
Re: flushing + compactions after config change
Thanks for the quick response Anoop. The current memstore reserved (IIRC) would be 0.35 of total heap right ? The RS total heap is 10231MB, used is at 5000MB. Total number of regions is 217 and there are approx 150 regions with 2 families, ~60 with 1 family and remaining with 3 families. How to check if the flushes are due to too many WAL files ? Does it get logged ? Thanks, Viral On Thu, Jun 27, 2013 at 12:51 AM, Anoop John anoop.hb...@gmail.com wrote: You mean there is enough memstore reserved heap in the RS, so that there wont be premature flushes because of global heap pressure? What is the RS max mem and how many regions and CFs in each? Can you check whether the flushes happening because of too many WAL files? -Anoop-
Re: 答复: flushing + compactions after config change
Thanks Liang! Found the logs. I had gone overboard with my grep's and missed the Too many hlogs line for the regions that I was trying to debug. A few sample log lines: 2013-06-27 07:42:49,602 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=33, maxlogs=32; forcing flush of 2 regions(s): 0e940167482d42f1999b29a023c7c18a, 3f486a879418257f053aa75ba5b69b14 2013-06-27 08:10:29,996 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=33, maxlogs=32; forcing flush of 1 regions(s): 0e940167482d42f1999b29a023c7c18a 2013-06-27 08:17:44,719 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=33, maxlogs=32; forcing flush of 2 regions(s): 0e940167482d42f1999b29a023c7c18a, e380fd8a7174d34feb903baa97564e08 2013-06-27 08:23:45,357 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=33, maxlogs=32; forcing flush of 3 regions(s): 0e940167482d42f1999b29a023c7c18a, 3f486a879418257f053aa75ba5b69b14, e380fd8a7174d34feb903baa97564e08 Any pointers on what's the best practice for avoiding this scenario ? Thanks, Viral On Thu, Jun 27, 2013 at 1:21 AM, 谢良 xieli...@xiaomi.com wrote: If reached memstore global up-limit, you'll find Blocking updates on in your files(see MemStoreFlusher.reclaimMemStoreMemory); If it's caused by too many log files, you'll find Too many hlogs: logs=(see HLog.cleanOldLogs) Hope it's helpful for you:) Best, Liang
Re: 答复: flushing + compactions after config change
0.94.4 with plans to upgrade to the latest 0.94 release. On Thu, Jun 27, 2013 at 2:22 AM, Azuryy Yu azury...@gmail.com wrote: hey Viral, Which hbase version are you using?
Re: 答复: flushing + compactions after config change
I do have a heavy write operation going on. Actually heavy is relative. Not all tables/regions are seeing the same amount of writes at the same time. There is definitely a burst of writes that can happen on some regions. In addition to that there are some processing jobs which play catch up and could be processing data in the past and they could have more heavy write operations. I think my main problem is, my writes are well distributed across regions. A batch of puts most probably end up hitting every region since they get distributed fairly well. In that scenario, I am guessing I get a lot of WALs though I am just speculating. Regarding the JVM options (minus some settings for remote profiling): -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M On Thu, Jun 27, 2013 at 2:48 AM, Azuryy Yu azury...@gmail.com wrote: Can you paste your JVM options here? and Do you have an extensive write on your hbase cluster?
Re: 答复: flushing + compactions after config change
Thanks Azuryy. Look forward to it. Does DEFERRED_LOG_FLUSH impact the number of WAL files that will be created ? Tried looking around but could not find the details. On Thu, Jun 27, 2013 at 7:53 AM, Azuryy Yu azury...@gmail.com wrote: your JVM options arenot enough. I will give you some detail when I go back office tomorrow. --Send from my Sony mobile.
Re: 答复: flushing + compactions after config change
Hey JD, Thanks for the clarification. I also came across a previous thread which sort of talks about a similar problem. http://mail-archives.apache.org/mod_mbox/hbase-user/201204.mbox/%3ccagptdnfwnrsnqv7n3wgje-ichzpx-cxn1tbchgwrpohgcos...@mail.gmail.com%3E I guess my problem is also similar to the fact that my writes are well distributed and at a given time I could be writing to a lot of regions. Some of the regions receive very little data but since the flush algorithm choose at random what to flush when too many hlogs is hit, it will flush a region with less than 10mb of data causing too many small files. This in-turn causes compaction storms where even though major compactions is disabled, some of the minor get upgraded to major and that's when things start getting worse. My compaction queues are still the same and so I doubt I will be coming out of this storm without bumping up max hlogs for now. Reducing regions per server is one option but then I will be wasting my resources since the servers at current load are at 30% CPU and 25% RAM. Maybe I can bump up heap space and give more memory to the the memstore. Sorry, I am just thinking out loud. Thanks, Viral On Thu, Jun 27, 2013 at 2:40 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: No, all your data eventually makes it into the log, just potentially not as quickly :)
Re: NullPointerException when opening a region on new table creation
To close the loop here, I wasn't able to repro it again and it seems the issue was due to a race condition in create table commands. Eventually the tableinfo file was missing on HDFS and hence there was an exception storm. I fixed it by manually creating the file with the right definition and then eventually dropping the table. -Viral On Tue, Jun 25, 2013 at 5:20 PM, Viral Bajaria viral.baja...@gmail.comwrote: Hi JM, Yeah you are right about when the exception happens. I just went through all the logs of table creation and don't see an exception. Though there was a LONG pause when doing the create table. I cannot find any kind of logs on the hbase side as to why the long pause happened around that time. The bigger problem now is the table does not show up in the hbase ui and I can't do a drop on it. At the same time the regionserver logs are flooded with that exception. I think I will have to muck around with ZK and remove traces of that table. Will try to repro this issue but it seems weird since I am able to create other tables with no issue. Thanks, Viral
NullPointerException when opening a region on new table creation
Hi, I created a new table on my cluster today and hit a weird issue which I have not come across before. I wanted to run it by the list and see if anyone has seen this issue before and if not should I open a JIRA for it. it's still unclear of why it would happen. I create the table programmatically using the HBaseAdmin api's and not through the shell. hbase: 0.94.4 hadoop: 1.0.4 There are 2 stack traces back to back and I think one might be leading to the other, but I have to dive in deeper to confirm this. Thanks, Viral ===StackTrace=== 2013-06-25 09:58:46,041 DEBUG org.apache.hadoop.hbase.util.FSTableDescriptors: Exception during readTableDecriptor. Current table name = test_table_id org.apache.hadoop.hbase.TableInfoMissingException: No .tableinfo file under hdfs://ec2-54-242-168-35.compute-1.amazonaws.com:8020/hbase/test_table_id at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorModtime(FSTableDescriptors.java:416) at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorModtime(FSTableDescriptors.java:408) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:163) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:126) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2829) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2802) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) 2013-06-25 09:58:46,094 WARN org.apache.hadoop.hbase.util.FSTableDescriptors: The following folder is in HBase's root directory and doesn't contain a table descriptor, do consider deleting it: test_table_id 2013-06-25 09:58:46,094 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0xaa3f4e93eed83504 Attempting to transition node 8eac5d6cf6ce4c61fb47bf357af60213 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 2013-06-25 09:58:46,151 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0xaa3f4e93eed83504 Successfully transitioned node 8eac5d6cf6ce4c61fb47bf357af60213 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 2013-06-25 09:58:46,151 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: {NAME = 'test_table_id,,1372103768783.8eac5d6cf6ce4c61fb47bf357af60213.', STARTKEY = '', ENDKEY = '', ENCODED = 8eac5d6cf6ce4c61fb47bf357af60213,} 2013-06-25 09:58:46,152 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: System coprocessor org.apache.hadoop.hbase.coprocessors.GroupBy was loaded successfully with priority (536870911). 2013-06-25 09:58:46,152 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=test_table_id,,1372103768783.8eac5d6cf6ce4c61fb47bf357af60213., starting to roll back the global memstore size. java.lang.IllegalStateException: Could not instantiate a region instance. at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3776) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3954) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:332) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3773) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.loadTableCoprocessors(RegionCoprocessorHost.java:159) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.init(RegionCoprocessorHost.java:151) at org.apache.hadoop.hbase.regionserver.HRegion.init(HRegion.java:455) ... 11 more
Re: NullPointerException when opening a region on new table creation
Hi JM, Yeah you are right about when the exception happens. I just went through all the logs of table creation and don't see an exception. Though there was a LONG pause when doing the create table. I cannot find any kind of logs on the hbase side as to why the long pause happened around that time. The bigger problem now is the table does not show up in the hbase ui and I can't do a drop on it. At the same time the regionserver logs are flooded with that exception. I think I will have to muck around with ZK and remove traces of that table. Will try to repro this issue but it seems weird since I am able to create other tables with no issue. Thanks, Viral On Tue, Jun 25, 2013 at 4:22 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Viral, This exception is when you tries to read/open the able, right? Any exception when you have created it? Your table has not been fully created. So it's just normal for HBase to not open it. The issue is on the creation time. Do you sill have the logs? Thanks, JM 2013/6/25 Viral Bajaria viral.baja...@gmail.com: Hi, I created a new table on my cluster today and hit a weird issue which I have not come across before. I wanted to run it by the list and see if anyone has seen this issue before and if not should I open a JIRA for it. it's still unclear of why it would happen. I create the table programmatically using the HBaseAdmin api's and not through the shell. hbase: 0.94.4 hadoop: 1.0.4 There are 2 stack traces back to back and I think one might be leading to the other, but I have to dive in deeper to confirm this. Thanks, Viral ===StackTrace=== 2013-06-25 09:58:46,041 DEBUG org.apache.hadoop.hbase.util.FSTableDescriptors: Exception during readTableDecriptor. Current table name = test_table_id org.apache.hadoop.hbase.TableInfoMissingException: No .tableinfo file under hdfs:// ec2-54-242-168-35.compute-1.amazonaws.com:8020/hbase/test_table_id at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorModtime(FSTableDescriptors.java:416) at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorModtime(FSTableDescriptors.java:408) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:163) at org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:126) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2829) at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2802) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) 2013-06-25 09:58:46,094 WARN org.apache.hadoop.hbase.util.FSTableDescriptors: The following folder is in HBase's root directory and doesn't contain a table descriptor, do consider deleting it: test_table_id 2013-06-25 09:58:46,094 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0xaa3f4e93eed83504 Attempting to transition node 8eac5d6cf6ce4c61fb47bf357af60213 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 2013-06-25 09:58:46,151 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0xaa3f4e93eed83504 Successfully transitioned node 8eac5d6cf6ce4c61fb47bf357af60213 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 2013-06-25 09:58:46,151 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: {NAME = 'test_table_id,,1372103768783.8eac5d6cf6ce4c61fb47bf357af60213.', STARTKEY = '', ENDKEY = '', ENCODED = 8eac5d6cf6ce4c61fb47bf357af60213,} 2013-06-25 09:58:46,152 INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: System coprocessor org.apache.hadoop.hbase.coprocessors.GroupBy was loaded successfully with priority (536870911). 2013-06-25 09:58:46,152 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=test_table_id,,1372103768783.8eac5d6cf6ce4c61fb47bf357af60213., starting to roll back the global memstore size. java.lang.IllegalStateException: Could not instantiate a region instance. at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3776) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3954) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:332) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java
Re: querying hbase
The shell allows you to use filters just like the standard HBase API but with jruby syntax. Have you tried that or that is too painful and you want a simpler tool ? -Viral On Tue, May 21, 2013 at 2:58 PM, Aji Janis aji1...@gmail.com wrote: are there any tools out there that can help in visualizing data stored in Hbase? I know the shell lets you do basic stuff. But if I don't know what rowid I am looking for or if I want to rows with family say *name* (yes SQL like) are there any tools that can help with this? Not trying to use this on production (although that would be nice) just dev env for now. Thank you for any suggestionns
Re: GET performance degrades over time
Thanks for all the help in advance! Answers inline.. Hi Viral, some questions: Are you adding new data or deleting data over time? Yes I am continuously adding new data. The puts have not slowed down but that could also be an after effect of deferred log flush. Do you have bloom filters enabled? Yes bloom filters have been enabled: ROWCOL Which version of Hadoop? Using 1.0.4 Anything funny the Datanode logs? I haven't seen anything funny, not a lot of timeouts either but I will look into it more. For some reason my datanode metrics refused to show up in ganglia while regionserver metrics work fine. Thanks, Viral
Re: GET performance degrades over time
On Fri, May 17, 2013 at 8:23 AM, Jeremy Carroll phobos...@gmail.com wrote: Look at how much Hard Disk utilization you have (IOPS / Svctm). You may just be under scaled for the QPS you desire for both read + write load. If you are performing random gets, you could expect around the low to mid 100's IOPS/sec per HDD. Use bonnie++ / IOZone / IOPing to verify. Also you could see how efficient your cache is (Saving Disk IOPS). Thanks for the tips Jeremy. I have used bonnie++ to benchmark both the fast and slow servers and the outputs of bonnie are very similar. I haven't tried running bonnie++ when the load was high but I can try and do it later today since I just restarted my load test again. It takes a few hours before the performance starts degrading. Regarding the IOPS/Svctm, I have run iostat for a while when performance was bad and saw that the tps was pretty spiky. I have a striped RAID0 on my 4 disks and see the tps hovering anywhere between 100tps to 4000tps. Each disk individually max's out at 1000 tps. I checked another region server which handles almost equal amounts of data but the rowkey size on that box is bigger by 8 bytes than the box that is slow (fast server: rk is 24 bytes, cf is 1 byte, cq is 6 bytes, val can be 25 bytes to 1.5KB). That box shows tps of max 200 and the GETs that are sent to that regionserver finish 10K requests in a second (not great but acceptable). Given the region size are almost same (off by 300MB), I am still not clear what else to debug. Maybe I can try and split the region and see if that speeds up things but I wanted to try that as my last option. Thanks, Viral
GET performance degrades over time
Hi, My setup is as follows: 24 regionservers (7GB RAM, 8-core CPU, 5GB heap space) hbase 0.94.4 5-7 regions per regionserver I am doing an avg of 4k-5k random gets per regionserver per second and the performance is acceptable in the beginning. I have also done ~10K gets for a single regionserver and got the results back in 600-800ms. After a while the performance of the GETs starts degrading. The same ~10K random gets start taking upwards of 9s-10s. With regards to hbase settings that I have modified, I have disabled major compaction, increase region size to 100G and bumped up the handler count to 100. I monitored ganglia for metrics that vary when the performance shifts from good to bad and found that the fsPreadLatency_avg_time is almost 25x in the bad performing regionserver. fsReadLatency_avg_time is also slightly higher but not that much (it's around 2x). I took a thread dump of the regionserver process and also did CPU utilization monitoring. The CPU cycles were being spent on org.apache.hadoop.hdfs.BlockReaderLocal.read and stack trace for threads running that function is below this email. Any pointers on why positional reads degrade over time ? Or is this just an issue of disk I/O and I should start looking into that ? Thanks, Viral stacktrace for one of the handler doing blockread IPC Server handler 98 on 60020 - Thread t@147 java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:220) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:324) - locked 3215ed96 (a org.apache.hadoop.hdfs.BlockReaderLocal) at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384) at org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1763) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2333) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2400) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1363) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1799) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1643) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:338) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543) - locked 3da12c8a (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411) - locked 3da12c8a (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3643) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3578) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3561) - locked 74d81ea7 (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3599) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4407) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4380) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2039) at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
Re: GET performance degrades over time
Have you checked your HBase environment? I think it perhaps come from: 1) System uses more swap frequently when your continue to execute Gets operation? I have set swap to 0. AFAIK, that's a recommended practice. Let me know if that should not be followed for nodes running HBase. 2) check the setting hfile.block.cache.size in your hbase-site.xml. It's the default i.e. 0.25
Re: GET performance degrades over time
This generally happens when the same block is accessed for the HFile. Are you seeing any contention on the HDFS side? When you say contention what should I be looking for ? slow operations to respond to data block requests ? or some specific metric in ganglia ? -Viral
Re: GET performance degrades over time
Going from memory, the swap value setting to 0 is a suggestion. You may still actually swap, but I think its a 'last resort' type of thing. When you look at top, at the top of the page, how much swap do you see? When I look at top it says: 0K total, 0K used, 0K free (as expected). I can try and add some swap but will do it as a last resort as suggested by you.
Re: GET performance degrades over time
If you're not swapping then don't worry about it. My comment was that even though you set the swap to 0, and I'm going from memory, its possible for some swap to occur. (But I could be wrong. ) Thanks for sharing this info. Will remember for future debugging too. Checked the vm.swappiness and as suggested by Jean-Marc and it is definitely not set to 0. But since we are not swapping I doubt that's the issue here. You really don't have a lot of memory, and you have a 5GB heap... MSLABS on? Could you be facing a GC pause? MSLABS is on or rather I have not modified it and if I recall correctly it should be ON by default in 0.94.x. I have GC logs on and don't see stop the world GC pauses. GC logs are filling up quickly but I have noticed that on my high RAM instances too.
Cached an already cached block (HBASE-5285)
Hi, I have been consistently hitting the following error in one of my QA clusters. I came across two JIRAs, the first one (HBASE-3466) was closed saying Cannot Reproduce but a new one was re-opened under HBASE-5285. I am using HBase 0.94.4 and Hadoop 1.0.4 24 region servers (8 cores, 8GB RAM) In HBASE-5285, Ted Yu has commented that it could be due to a hash code collision. But if caching is enabled, wouldn't it return the block with which it's hash collides when we check the cache for block existence ? It should not even hit the code that tries to put into cache method unless and until there is some concurrency issue. Also HBASE-5285 states that it occurred during compaction for the reporter, but in my cluster I have disabled compaction, so this error happens with not just compaction. Let me know if you need any more information. I can volunteer to submit a patch if we can find the root cause. Thanks, Viral
Re: Cached an already cached block (HBASE-5285)
On Sun, May 5, 2013 at 10:45 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Just to confirm you are getting this with LruBlockCache? If with LruBlockCache then the issue is critical. Because we have faced similar issue with OffHeapCache. But that is not yet stable as far as i know. Regards Ram Yes, it's with LRU cache. My bad, should have copy/pasted the stack trace too. Here you go: java.io.IOException: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1192) at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1181) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2041) at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) Caused by: java.lang.RuntimeException: Cached an already cached block at org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:279) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:353) at org.apache.hadoop.hbase.util.CompoundBloomFilter.contains(CompoundBloomFilter.java:98) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.passesGeneralBloomFilter(StoreFile.java:1511) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.passesBloomFilter(StoreFile.java:1383) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.shouldUseScanner(StoreFileScanner.java:373) at org.apache.hadoop.hbase.regionserver.StoreScanner.selectScannersFrom(StoreScanner.java:257) at org.apache.hadoop.hbase.regionserver.StoreScanner.getScannersNoCompaction(StoreScanner.java:221) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:119) at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1963) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:3517) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1700) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1692) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1668) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4406) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4380) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2039)
Re: HBase and Datawarehouse
On Mon, Apr 29, 2013 at 10:54 PM, Asaf Mesika asaf.mes...@gmail.com wrote: I think for Pheoenix truly to succeed, it's need HBase to break the JVM Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of analytics queries utilize memory, thus since its memory is shared with HBase, there's so much you can do on 12GB heap. On the other hand, if Pheonix was implemented outside HBase on the same machine (like Drill or Impala is doing), you can have 60GB for this process, running many OLAP queries in parallel, utilizing the same data set. Can you shed more info on 12GB heap barrier ? -Viral
Re: max regionserver handler count
Thanks for getting back, Ted. I totally understand other priorities and will wait for some feedback. I am adding some more info to this post to allow better diagnosing of performance. I hit my region servers with a lot of GET requests (~20K per second per regionserver) using asynchbase in my test environment, the storage pattern is very similar to OpenTSDB though with a lot more columns. Each row is around 45-50 bytes long. The regionservers have a lot of RAM available to them (48 out of 60 GB) and they are not sharing resources with anyone else, so memory is not under pressure. The total # of rows in the table is around 100M and growing (there is a put process too) GETs take over 15s for 16K rows, and I don't see any operationTooSlow logs in the regionserver logs either. PUTs take around 1s for 16K rows (deferred log flush is enabled though). I looked at the RPC stats and it seems the RPC threads were always doing something and I assumed my requests were waiting on handlers and so thought of experimenting by increasing number of handlers. But as mentioned in my thread, going above 10K kills my regionserver. Thanks, Viral On Mon, Apr 29, 2013 at 9:43 PM, Ted Yu yuzhih...@gmail.com wrote: Viral: I am currently dealing with some high priority bugs so I didn't have time to look deeper into your case. My feeling is that raising max regionserver handler count shouldn't be the key to boosting performance. Cheers
Re: max regionserver handler count
I am using asynchbase which does not have the notion of batch gets. It allows you to batch at a rowkey level in a single get request. -Viral On Mon, Apr 29, 2013 at 11:29 PM, Anoop John anoop.hb...@gmail.com wrote: You are making use of batch Gets? get(List) -Anoop-
Re: max regionserver handler count
Looked closely into the async API and there is no way to batch GETs to reduce the # of RPC calls and thus handlers. Will play around tomorrow with the handlers again and see if I can find anything interesting. On Tue, Apr 30, 2013 at 12:03 AM, Anoop John anoop.hb...@gmail.com wrote: If you can make use of the batch API ie. get(ListGet) you can reduce the handlers (and no# of RPC calls also).. One batch will use one handler. I am using asynchbase which does not have the notion of batch gets I have not checked with asynchbase. Just telling as a pointer.. -Anoop-
Re: max regionserver handler count
On Sun, Apr 28, 2013 at 7:37 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: So you mean that when the handler count is more than 5k this happens when it is lesser this does not. Have you repeated this behaviour? What i doubt is when you say bouncing around different states i feel may be the ROOT assignment was a problem and something messed up there. If the reason was due to handler count then that needs different analysis. I think that if you can repeat the experiment and get the same behaviour, you can share the logs so that we can ascertain the exact problem. Yeah I have repeated the behavior. But it seems the issue is due to some weird pauses in the region server whenever I bump up the region handler count (logs are below). I doubt the issue is GC, since it should not take such a long time because this is happening on startup with 48GB heap size. There are no active clients either. I can safely say this is due to bumping up the region handler count is due to the fact that I started 3 regionservers with 5000 handlers and 3 with 15000 handlers. The one's with 15000 spun up all IPC handlers and then errored out as show in the logs below. This is just the logs around the error. Before the error there were a few more timeouts. I checked zookeeper servers (I have a 3-node cluster) and it did not GC around the same time nor was it under any kind of load. Thanks, Viral Region Server Logs 2013-04-29 08:00:55,512 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=98.34 MB, free=11.61 GB, max=11.71 GB, blocks=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, evicted=0, evictedPerRun=NaN 2013-04-29 08:02:35,674 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 40592ms for sessionid 0x703e48a8cfd81be6, closing socket connection and attempting reconnect 2013-04-29 08:02:36,286 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.152.152.84:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2013-04-29 08:02:36,287 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 10.152.152.84:2181, initiating session 2013-04-29 08:02:36,288 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x703e48a8cfd81be6 has expired, closing socket connection 2013-04-29 08:03:16,287 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server hostname,60020,1367221255417: regionserver:60020-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6 regionserver:60020-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6-0x703e48a8cfd81be6 received expired from ZooKeeper, aborting org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:389) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2013-04-29 08:03:16,288 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server hostname,60020,1367221255417: Unhandled exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hostname,60020,1367221255417 as dead server org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hostname,60020,1367221255417 as dead server at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:748) at java.lang.Thread.run(Thread.java:662)
Re: max regionserver handler count
On Mon, Apr 29, 2013 at 2:25 AM, Ted Yu yuzhih...@gmail.com wrote: I noticed the 8 occurrences of 0x703e... following region server name in the abort message. I wonder why the repetition ? Cheers Oh good observation. I just stepped through the logs again and saw that the client timeout occurred 8 times before that exception. Maybe that explains the repetitive occurrence, but that is definitely weird ? Logs pasted below. -Viral ===logs=== 2013-04-29 07:41:03,227 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Serving as regionserver1,60020,1367221255417, RPC listening on regionserver1/ 10.155.234.3:60020, sessionid=0x703e48a8cfd81be6 2013-04-29 07:41:03,228 INFO org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker regionserver1,60020,1367221255417 starting 2013-04-29 07:41:03,230 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Registered RegionServer MXBean 2013-04-29 07:41:05,967 INFO org.apache.hadoop.hbase.util.ChecksumType: Checksum can use java.util.zip.CRC32 2013-04-29 07:43:47,739 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 27868ms for sessionid 0x703e48a8cfd81be6, closing socket connection and attempting reconnect 2013-04-29 07:43:48,776 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.165.33.180:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2013-04-29 07:43:48,777 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 10.165.33.180:2181, initiating session 2013-04-29 07:43:48,782 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.165.33.180:2181, sessionid = 0x703e48a8cfd81be6, negotiated timeout = 4 2013-04-29 07:44:28,337 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 39427ms for sessionid 0x703e48a8cfd81be6, closing socket connection and attempting reconnect 2013-04-29 07:44:28,735 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.152.152.84:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2013-04-29 07:44:28,736 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 10.152.152.84:2181, initiating session 2013-04-29 07:44:28,738 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.152.152.84:2181, sessionid = 0x703e48a8cfd81be6, negotiated timeout = 4 2013-04-29 07:46:29,080 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=98.34 MB, free=11.61 GB, max=11.71 GB, blocks=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, evicted=0, evictedPerRun=NaN 2013-04-29 07:46:29,174 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 40354ms for sessionid 0x703e48a8cfd81be6, closing socket connection and attempting reconnect 2013-04-29 07:46:29,682 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.147.128.158:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2013-04-29 07:46:29,682 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 10.147.128.158:2181, initiating session 2013-04-29 07:46:29,684 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.147.128.158:2181, sessionid = 0x703e48a8cfd81be6, negotiated timeout = 4 2013-04-29 07:49:10,404 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 39876ms for sessionid 0x703e48a8cfd81be6, closing socket connection and attempting reconnect 2013-04-29 07:49:11,503 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.165.33.180:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2013-04-29 07:49:11,504 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to 10.165.33.180:2181, initiating session 2013-04-29 07:49:11,506 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server 10.165.33.180:2181, sessionid = 0x703e48a8cfd81be6, negotiated timeout = 4 2013-04-29 07:51:11,462 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Stats: total=98.34 MB, free=11.61 GB, max=11.71 GB, blocks=0, accesses=0, hits=0, hitRatio=0, cachingAccesses=0, cachingHits=0, cachingHitsRatio=0, evictions=0, evicted=0, evictedPerRun=NaN 2013-04-29 07:53:52,158 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 40195ms for sessionid 0x703e48a8cfd81be6, closing socket connection and attempting reconnect 2013-04-29 07:53:52,555 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 10.152.152.84:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2013-04-29 07:53:52,556 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to
Re: max regionserver handler count
On Mon, Apr 29, 2013 at 2:49 AM, Ted Yu yuzhih...@gmail.com wrote: After each zookeeper reconnect, I saw same session Id (0x703e...) What version of zookeeper are you using ? Can you search zookeeper log for this session Id to see what happened ? Thanks Zookeeper version is 3.4.5, following are the logs from 2 zookeeper servers. The first one was the first time ever the regionserver connected to ZK and after that all of them are retries. I doubt the issue is on the ZK side since I have 3 other services running which run fine with the same quorum and have no issues. I feel I am hitting some other bug with HBase when the number of handlers is increased by a lot. Anyone seen a high number of handlers in any production deployment out there ? Thanks, Viral ===server with first session=== 2013-04-29 07:40:55,574 [myid:1072011376] - INFO [CommitProcessor:1072011376:ZooKeeperServer@595] - Established session 0x703e48a8cfd81be6 with negotiated timeout 4 for client / 10.155.234.3:53814 EndOfStreamException: Unable to read additional data from client sessionid 0x703e48a8cfd81be6, likely client has closed socket 2013-04-29 07:43:47,737 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.155.234.3:53814 which had sessionid 0x703e48a8cfd81be6 2013-04-29 07:46:29,679 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x703e48a8cfd81be6 at /10.155.234.3:53831 2013-04-29 07:46:29,679 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x703e48a8cfd81be6 2013-04-29 07:46:29,680 [myid:1072011376] - INFO [QuorumPeer[myid=1072011376]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@595] - Established session 0x703e48a8cfd81be6 with negotiated timeout 4 for client /10.155.234.3:53831 EndOfStreamException: Unable to read additional data from client sessionid 0x703e48a8cfd81be6, likely client has closed socket 2013-04-29 07:49:10,401 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.155.234.3:53831 which had sessionid 0x703e48a8cfd81be6 2013-04-29 07:55:53,441 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x703e48a8cfd81be6 at /10.155.234.3:53854 2013-04-29 07:55:53,441 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x703e48a8cfd81be6 2013-04-29 07:55:53,442 [myid:1072011376] - INFO [QuorumPeer[myid=1072011376]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@595] - Established session 0x703e48a8cfd81be6 with negotiated timeout 4 for client /10.155.234.3:53854 EndOfStreamException: Unable to read additional data from client sessionid 0x703e48a8cfd81be6, likely client has closed socket 2013-04-29 07:58:33,947 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.155.234.3:53854 which had sessionid 0x703e48a8cfd81be6 ===server with subsequent sessions=== 2013-04-29 07:44:28,733 [myid:54242244232] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x703e48a8cfd81be6 at /10.155.234.3:51353 2013-04-29 07:44:28,734 [myid:54242244232] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x703e48a8cfd81be6 with negotiated timeout 4 for client / 10.155.234.3:51353 EndOfStreamException: Unable to read additional data from client sessionid 0x703e48a8cfd81be6, likely client has closed socket 2013-04-29 07:46:29,206 [myid:54242244232] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.155.234.3:51353 which had sessionid 0x703e48a8cfd81be6 2013-04-29 07:53:52,553 [myid:54242244232] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x703e48a8cfd81be6 at /10.155.234.3:51376 2013-04-29 07:53:52,553 [myid:54242244232] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x703e48a8cfd81be6 with negotiated timeout 4 for client / 10.155.234.3:51376 EndOfStreamException: Unable to read additional data from client sessionid 0x703e48a8cfd81be6, likely client has closed socket 2013-04-29 07:55:53,049 [myid:54242244232] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.155.234.3:51376 which had sessionid 0x703e48a8cfd81be6 2013-04-29 08:02:36,000 [myid:54242244232] - INFO [SessionTracker:ZooKeeperServer@325] - Expiring session 0x703e48a8cfd81be6, timeout of 4ms exceeded 2013-04-29 08:02:36,001 [myid:54242244232] - INFO [ProcessThread(sid:54242244232 cport:-1)::PrepRequestProcessor@476] - Processed session
Re: max regionserver handler count
Still unclear how and where the same session id is being re-used. Any thoughts ? The ROOT region was just bouncing around state's since the RS would be marked as dead whenever these timeouts occur and thus the ROOT region will be moved to a different server. Once the ROOT gets assigned to a RS which had less handlers ( 15K), it stopped bouncing around. I am surprised bumping up handlers and having 0 traffic on the cluster can cause this issue. -Viral On Mon, Apr 29, 2013 at 1:23 PM, Viral Bajaria viral.baja...@gmail.comwrote: On Mon, Apr 29, 2013 at 2:49 AM, Ted Yu yuzhih...@gmail.com wrote: After each zookeeper reconnect, I saw same session Id (0x703e...) What version of zookeeper are you using ? Can you search zookeeper log for this session Id to see what happened ? Thanks Zookeeper version is 3.4.5, following are the logs from 2 zookeeper servers. The first one was the first time ever the regionserver connected to ZK and after that all of them are retries. I doubt the issue is on the ZK side since I have 3 other services running which run fine with the same quorum and have no issues. I feel I am hitting some other bug with HBase when the number of handlers is increased by a lot. Anyone seen a high number of handlers in any production deployment out there ? Thanks, Viral ===server with first session=== 2013-04-29 07:40:55,574 [myid:1072011376] - INFO [CommitProcessor:1072011376:ZooKeeperServer@595] - Established session 0x703e48a8cfd81be6 with negotiated timeout 4 for client / 10.155.234.3:53814 EndOfStreamException: Unable to read additional data from client sessionid 0x703e48a8cfd81be6, likely client has closed socket 2013-04-29 07:43:47,737 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.155.234.3:53814 which had sessionid 0x703e48a8cfd81be6 2013-04-29 07:46:29,679 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x703e48a8cfd81be6 at /10.155.234.3:53831 2013-04-29 07:46:29,679 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x703e48a8cfd81be6 2013-04-29 07:46:29,680 [myid:1072011376] - INFO [QuorumPeer[myid=1072011376]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@595] - Established session 0x703e48a8cfd81be6 with negotiated timeout 4 for client /10.155.234.3:53831 EndOfStreamException: Unable to read additional data from client sessionid 0x703e48a8cfd81be6, likely client has closed socket 2013-04-29 07:49:10,401 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.155.234.3:53831 which had sessionid 0x703e48a8cfd81be6 2013-04-29 07:55:53,441 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x703e48a8cfd81be6 at /10.155.234.3:53854 2013-04-29 07:55:53,441 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x703e48a8cfd81be6 2013-04-29 07:55:53,442 [myid:1072011376] - INFO [QuorumPeer[myid=1072011376]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@595] - Established session 0x703e48a8cfd81be6 with negotiated timeout 4 for client /10.155.234.3:53854 EndOfStreamException: Unable to read additional data from client sessionid 0x703e48a8cfd81be6, likely client has closed socket 2013-04-29 07:58:33,947 [myid:1072011376] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.155.234.3:53854 which had sessionid 0x703e48a8cfd81be6 ===server with subsequent sessions=== 2013-04-29 07:44:28,733 [myid:54242244232] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x703e48a8cfd81be6 at /10.155.234.3:51353 2013-04-29 07:44:28,734 [myid:54242244232] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x703e48a8cfd81be6 with negotiated timeout 4 for client / 10.155.234.3:51353 EndOfStreamException: Unable to read additional data from client sessionid 0x703e48a8cfd81be6, likely client has closed socket 2013-04-29 07:46:29,206 [myid:54242244232] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.155.234.3:51353 which had sessionid 0x703e48a8cfd81be6 2013-04-29 07:53:52,553 [myid:54242244232] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x703e48a8cfd81be6 at /10.155.234.3:51376 2013-04-29 07:53:52,553 [myid:54242244232] - INFO [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x703e48a8cfd81be6 with negotiated timeout 4 for client / 10.155.234.3:51376 EndOfStreamException: Unable
max regionserver handler count
Hi, I have been trying to play around with the regionserver handler count. What I noticed was, the cluster comes up fine up to a certain point, ~7500 regionserver handler counts. But above that the system refuses to start up. They keep on spinning for a certain point. The ROOT region keeps on bouncing around different states but never stabilizes. So the first question, what's the max that folks on the list have gone with this settings ? If anyone has gone above 10,000 have you done any special settings ? Secondly, the setting is per regionserver (as the name suggests) and not per region right ? Following are my versions: HBase 0.94.5 Hadoop 1.0.4 Ubuntu 12.04 Let me know if you need any more information from my side. Thanks, Viral
Re: max regionserver handler count
This is an experimental system right now and I will bump up the # of servers in production. But here are the specs: 6 regionservers (64GB RAM, 48GB allocated to HBase heap, some more allocated to Datanode and other processes) 50-55 regions per server workload: 25K gets / second, 25K puts / second (puts is not consistently that high but I am quoting the highest number) handler count: 2.5K per regionserver The data size is pretty compact (think TSDB style) and it should fit in memory (in test environment). Yet I see long pauses when doing GETs. I feel those pauses happen when all the regionserver handles are servicing RPC requests and it makes sense. I can experiment scaling out the cluster, but before doing that I want to bump the region handler count and see how far I can stretch it. But it seems I can't go beyond 5K right now. Thanks, Viral On Sun, Apr 28, 2013 at 3:19 PM, Ted Yu yuzhih...@gmail.com wrote: bq. the setting is per regionserver (as the name suggests) and not per region right ? That is correct. Can you give us more information about your cluster size, workload, etc ? Thanks On Mon, Apr 29, 2013 at 4:30 AM, Viral Bajaria viral.baja...@gmail.com wrote: Hi, I have been trying to play around with the regionserver handler count. What I noticed was, the cluster comes up fine up to a certain point, ~7500 regionserver handler counts. But above that the system refuses to start up. They keep on spinning for a certain point. The ROOT region keeps on bouncing around different states but never stabilizes. So the first question, what's the max that folks on the list have gone with this settings ? If anyone has gone above 10,000 have you done any special settings ? Secondly, the setting is per regionserver (as the name suggests) and not per region right ? Following are my versions: HBase 0.94.5 Hadoop 1.0.4 Ubuntu 12.04 Let me know if you need any more information from my side. Thanks, Viral
Re: Coprocessors
Phoenix might be able to solve the problem if the keys are structured in the binary format that it understand or else you are better off reloading that data in a table created via Phoenix. But I will let James tackle this question. Regarding your use-case, why can't you do the aggregation using observers ? You should be able to do the aggregation and return a new Scanner to your client. And Lars is right about the range scans that Phoenix does. It does restrict things and also will do parallel scans for you based on what you select/filter. -Viral On Thu, Apr 25, 2013 at 3:12 PM, Michael Segel michael_se...@hotmail.comwrote: I don't think Phoenix will solve his problem. He also needs to explain more about his problem before we can start to think about the problem. On Apr 25, 2013, at 4:54 PM, lars hofhansl la...@apache.org wrote: You might want to have a look at Phoenix ( https://github.com/forcedotcom/phoenix), which does that and more, and gives a SQL/JDBC interface. -- Lars From: Sudarshan Kadambi (BLOOMBERG/ 731 LEXIN) skada...@bloomberg.net To: user@hbase.apache.org Sent: Thursday, April 25, 2013 2:44 PM Subject: Coprocessors Folks: This is my first post on the HBase user mailing list. I have the following scenario: I've a HBase table of upto a billion keys. I'm looking to support an application where on some user action, I'd need to fetch multiple columns for upto 250K keys and do some sort of aggregation on it. Fetching all that data and doing the aggregation in my application takes about a minute. I'm looking to co-locate the aggregation logic with the region servers to a. Distribute the aggregation b. Avoid having to fetch large amounts of data over the network (this could potentially be cross-datacenter) Neither observers nor aggregation endpoints work for this use case. Observers don't return data back to the client while aggregation endpoints work in the context of scans not a multi-get (Are these correct assumptions?). I'm looking to write a service that runs alongside the region servers and acts a proxy b/w my application and the region servers. I plan to use the logic in HBase client's HConnectionManager, to segment my request of 1M rowkeys into sub-requests per region-server. These are sent over to the proxy which fetches the data from the region server, aggregates locally and sends data back. Does this sound reasonable or even a useful thing to pursue? Regards, -sudarshan
Re: RefGuide schema design examples
+1! On Fri, Apr 19, 2013 at 4:09 PM, Marcos Luis Ortiz Valmaseda marcosluis2...@gmail.com wrote: Wow, great work, Doug. 2013/4/19 Doug Meil doug.m...@explorysmedical.com Hi folks, I reorganized the Schema Design case studies 2 weeks ago and consolidated them into here, plus added several cases common on the dist-list. http://hbase.apache.org/book.html#schema.casestudies Comments/suggestions welcome. Thanks! Doug Meil Chief Software Architect, Explorys doug.m...@explorysmedical.com -- Marcos Ortiz Valmaseda, *Data-Driven Product Manager* at PDVSA *Blog*: http://dataddict.wordpress.com/ *LinkedIn: *http://www.linkedin.com/in/marcosluis2186 *Twitter*: @marcosluis2186 http://twitter.com/marcosluis2186
Re: schema design: rows vs wide columns
I think this whole idea of don't go over a certain number of column families was a 2+ year old story. I remember hearing numbers like 5 or 6 (not 3) come up when talking at Hadoop conferences with engineers who were at companies that were heavy HBase users. I agree with Andrew's suggestion that we should remove that text and replace it with benchmarks. Obviously we need to provide disclaimers that these are benchmarks based on a specific schema design and so YMMV. I have run a cluster with some tables having upwards of 5 CFs but the data was evenly spread across them. I don't think I saw any performance issues as such or maybe it got masked but 5 CFs was not a problem at all. Stack puts out an interesting stat i.e. ~15 CFs at FB. Do they run their own HBase version ? I feel they do and so they might have some enhancements which are not available to the community or that is no longer the case ? Thanks, Viral On Sun, Apr 7, 2013 at 3:52 PM, Andrew Purtell apurt...@apache.org wrote: Is there a pointer to evidence/experiment backed analysis of this question? I'm sure there is some basis for this text in the book but I recommend we strike it. We could replace it with YCSB or LoadTestTool driven latency graphs for different workloads maybe. Although that would also be a big simplification of 'schema design' considerations, it would not be so starkly lacking background. On Sunday, April 7, 2013, Ted Yu wrote: From http://hbase.apache.org/book.html#number.of.cfs : HBase currently does not do well with anything above two or three column families so keep the number of column families in your schema low. Cheers On Sun, Apr 7, 2013 at 3:04 PM, Stack st...@duboce.net javascript:; wrote: On Sun, Apr 7, 2013 at 11:58 AM, Ted yuzhih...@gmail.comjavascript:; wrote: With regard to number of column families, 3 is the recommended maximum. How did you come up w/ the number '3'? Is it a 'hard' 3? Or does it depend? If the latter, on what does it depend? Thanks, St.Ack -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Remote Connection To Pseudo Distributed HBase (Deployed in aws ec2) Not Working
Are you sure that your hbase regionserver is registered with the external IP in zookeeper ? Your client (laptop) might be trying to connect to ec2 hbase using the internal host name which will not get resolved. To do a quick test, just modify the /etc/hosts on your laptop and put both the ec2 external and internal name in it and check if the error goes away. Can you also post the ERROR log ? I only see DEBUG in your email. Thanks, Viral On Fri, Apr 5, 2013 at 12:15 AM, Ajit Koti ajitk...@gmail.com wrote: Hi, I have deployed HBase in Pseudo Distributed mode in AWS EC2 and I have also deployed my app in another ec2 instance which remotely connects to HBase and it works fine . When I try to connect to Hbase from application deployed on my local machine (laptop) it does not connect Here is the log 2:06:53,320 DEBUG HConnectionManager$HConnectionImplementation:804 - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@e611faa ; serverName=ip-10-145-184-6.ec2.internal,60020,1365142471429 12:06:55,666 DEBUG ClientCnxn:816 - Reading reply sessionid:0x13dd8d4f5c5, packet:: clientPath:null serverPath:null finished:false header:: 23,3 replyHeader:: 23,905,0 request:: '/hbase,F response:: s{3,3,1364049937977,1364049937977,0,48,0,0,0,12,838} 12:06:56,014 DEBUG ClientCnxn:816 - Reading reply sessionid:0x13dd8d4f5c5, packet:: clientPath:null serverPath:null finished:false header:: 24,4 replyHeader:: 24,905,0 request:: '/hbase/root-region-server,T response:: #0001c313532394069702d31302d3134352d3138342d36ffe718ffadffaeffd1712dffc569702d31302d3134352d3138342d362e6563322e696e7465726e616c2c36303032302c31333635313432343731343239,s{838,838,1365142511189,1365142511189,0,0,0,0,81,0,838} 12:06:56,015 DEBUG ZKUtil:1122 - hconnection-0x13dd8d4f5c5 Retrieved 48 byte(s) of data from znode /hbase/root-region-server and set watcher; ip-10-145-184-6.ec2.internal,... 12:06:56,016 DEBUG HConnectionManager$HConnectionImplementation:804 - Lookedup root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@e611faa ; serverName=ip-10-145-184-6.ec2.internal,60020,1365142471429 12:07:09,352 DEBUG ClientCnxn:727 - Got ping response for sessionid: 0x13dd8d4f5c5 after 336ms Here is how /etc/hosts look like on the hbase deployed machine 127.0.0.1 localhost 127.0.0.1 54.235.85.109 Tried various combination mention across blogs , Didnt get a working combination Hbase-site.xml configuration property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.rootdir/name valuehdfs://localhost:8020/hbase/value /property /configuration Client Code configuration = HBaseConfiguration.create(); configuration.set(hbase.zookeeper.quorum, 54.235.85.109); Any help is much appreciated , Have been struggling for more than 3 days now . Please let me know if any more info is needed -- Thanks Ajit Koti about.me/ajitkoti
Re: HBase Client.
Most of the clients listed below are language specific, so if your benchmarking scripts are written in JAVA, you are better off running the java client. HBase Shell is more for running something interactive, not sure how you plan to benchmark that. REST is something that you could use, but I can't comment on it's performance since I have HappyBase is for python. Kundera, can't comment since I have not used it. You can look at AsyncHBase, if you don't mind wrapping your head around it. But it's a bigger rewrite since the API is not compatible with existing client. On Tue, Mar 19, 2013 at 11:25 PM, Pradeep Kumar Mantha pradeep...@gmail.com wrote: Hi, I would like to benchmark HBase using some of our distributed applications using custom developed benchmarking scripts/programs. I found the following clients are available. Could you please let me know which of the following provides best performance. 1. Java direct interface to HBASE. 2. HBase Shell 3. via Rest 4. HappyBase 5. Kundera Please let me know if there is any other client which provides better performance. thanks pradeep
Re: Compaction time
How often do you run those jobs ? Do they run periodically or are they running all the time ? If you have a predictable periodic behavior, you could disable automatic compaction and trigger it manually using a cron job (not the recommended approach, AFAIK). Or you could set the compaction to trigger at a set time of the day when you know your jobs are not running. -Viral On Sun, Mar 3, 2013 at 10:44 PM, samar.opensource samar.opensou...@gmail.com wrote: Hi, We are running some high load jobs which are mostly writes. During these jobs, compaction is triggered which takes sometime as longs as 40mins to complete. This causes blocking (as others wait for compaction in the queue). Please suggest how much compaction time is reasonable for compacting 2Gb store files . And best way to avoid long blocking compactions. Using Cloudera hbase vesion 3u3. Regards, Samar
Re: Updating from 0.90.2 to 0.94
Well if you can afford a longer downtime, you can always distcp your existing hbase data. This way if things get screwed up you can always restore a 0.90.x on that old backup. You cannot distcp while the cluster is running since it will not be able to get locks on file (I think I faced that issue but not sure since I did this upgrade mid-2012). When I did an upgrade, I brought up a test 0.94 cluster using the distcp backup (I took like 2-3 copies since I was very paranoid). Once I ran all my production jobs against the test instance and I was happy with all the results, that's when I re-ran the script to do the migration. It was all scripted out but I no longer have access to those scripts or else I would have shared it with you. Thanks, Viral On Tue, Feb 26, 2013 at 12:00 AM, Yusup Ashrap aph...@gmail.com wrote: Hi Kiran , thanks for reply From what I've read from online docs , downtime is inevitable for upgrading from 0.90.2 to 0.94, and I can afford some downtime. I cannot afford dataloss, so I am concerning potential problems with rolling back to 0.90.2 if I fail to upgrade. -- Best Regards Yusup Ashrap On Tuesday, February 26, 2013 at 3:53 PM, kiran wrote: Hi, We also upgraded the version very recently. If you can afford couple of minutes downtime then you can safely bring down the cluster and do upgrade. As such, there will be no data loss, but be careful with splits. The default split policy has been changed in this version if I am not wrong. It causes some weird things. Thanks Kiran On Tue, Feb 26, 2013 at 1:03 PM, Yusup Ashrap aph...@gmail.com (mailto: aph...@gmail.com) wrote: hi all, I am updating production cluster from 0.90.2 to 0.94 . My table's size is about 20TB+ . Scheduled update includes upgrading both hbase,hadoop version, and I am also changing user with which I start up both hadoop and hbase cluster from user A to user B. It's production environment , so I wanted know what steps I should not miss regarding this upgrade. Table is kinda big and I don't have backup cluster to backup my data. I wanted to know will there be a data loss scenario if I rollback after having failed to upgrade. thanks. -- Best Regards Yusup Ashrap -- Thank you Kiran Sarvabhotla -Even a correct decision is wrong when it is taken late
Re: Announcing Phoenix v 1.1: Support for HBase v 0.94.4 and above
Cool !!! This is really good. I have a quick question though, is it possible to use Phoenix over existing tables ? I doubt it but just thought I will ask it on the list. On Tue, Feb 26, 2013 at 11:17 AM, Stack st...@duboce.net wrote: On Tue, Feb 26, 2013 at 10:02 AM, Graeme Wallace graeme.wall...@farecompare.com wrote: James, Are you or anyone involved with Phoenix going to be at the Strata conf in Santa Clara this week ? You might want to attend the hbase meetup where James will be talking Phoenix at the Intel Campus near Strata: http://www.meetup.com/hbaseusergroup/events/96584102/ St.Ack
Re: Custom HBase Filter : Error in readFields
Also the readFields is your implementation of how to read the byte array transferred from the client. So I think there has to be some issue in how you write the byte array to the network and what you are reading out of that i.e. the size of arrays might not be identical. But as Ted mentioned, looking at the code will help troubleshoot it better. On Wed, Feb 20, 2013 at 3:32 PM, Ted Yu yuzhih...@gmail.com wrote: If you show us the code for RowRangeFilter, that would help us troubleshoot. Cheers On Wed, Feb 20, 2013 at 2:05 PM, Bryan Baugher bjb...@gmail.com wrote: Hi everyone, I am trying to write my own custom Filter but I have been having issues. When there is only 1 region in my table the scan works as expected but when there is more, it attempts to create a new version of my filter and deserialize the information again but the data seems to be gone. I am running HBase 0.92.1-cdh4.1.1. 2013-02-20 15:39:53,220 DEBUG com.cerner.kepler.filters.RowRangeFilter: Reading fields 2013-02-20 15:40:08,612 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 15346ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired 2013-02-20 15:40:09,142 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Error in readFields java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:174) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readFully(DataInputStream.java:152) at com.cerner.kepler.filters.RowRangeFilter.readFields(RowRangeFilter.java:226) at org.apache.hadoop.hbase.client.Scan.readFields(Scan.java:548) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:652) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1254) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1183) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:719) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:511) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:486) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-02-20 15:40:17,498 WARN org.apache.hadoop.ipc.HBaseServer: Unable to read call parameters for client *** java.io.IOException: Error in readFields at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:655) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1254) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1183) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:719) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:511) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:486) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at java.io.ByteArrayInputStream.read(ByteArrayInputStream.java:174) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readFully(DataInputStream.java:152) at com.cerner.kepler.filters.RowRangeFilter.readFields(RowRangeFilter.java:226) at org.apache.hadoop.hbase.client.Scan.readFields(Scan.java:548) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:652) ... 9 more -Bryan
Re: PreSplit the table with Long format
HBase shell is a jruby shell and so you can invoke any java commands from it. For example: import org.apache.hadoop.hbase.util.Bytes Bytes.toLong(Bytes.toBytes(1000)) Not sure if this works as expected since I don't have a terminal in front of me but you could try (assuming the SPLITS keyword takes byte array as input, never used SPLITS from the command line): create 'testTable', 'cf1' , { SPLITS = [ Bytes.toBytes(1000), Bytes.toBytes(2000), Bytes.toBytes(3000) ] } Thanks, Viral On Tue, Feb 19, 2013 at 1:52 AM, Farrokh Shahriari mohandes.zebeleh...@gmail.com wrote: Hi there As I use rowkey in long format,I must presplit table in long format too.But when I've run this command,it presplit the table with STRING format : create 'testTable','cf1',{SPLITS = [ '1000','2000','3000']} How can I presplit the table with Long format ? Farrokh
Re: availability of 0.94.4 and 0.94.5 in maven repo?
I have come across this too, I think someone with authorization needs to perform a maven release to the apache maven repository and/or maven central. For now, I just end up compiling the dot release from trunk and deploy it to my local repository for other projects to use. Thanks, Viral On Tue, Feb 19, 2013 at 5:30 PM, James Taylor jtay...@salesforce.comwrote: Unless I'm doing something wrong, it looks like the Maven repository ( http://mvnrepository.com/**artifact/org.apache.hbase/**hbasehttp://mvnrepository.com/artifact/org.apache.hbase/hbase) only contains HBase up to 0.94.3. Is there a different repo I should use, or if not, any ETA on when it'll be updated? James
Re: Optimizing Multi Gets in hbase
Hi Varun, Are your gets around sequential keys ? If so, you might benefit by doing scans with a start and stop. If they are not sequential I don't think there would be a better way from the way you describe the problem. Besides that, some of the questions that come to mind: - How many GET(s) are you issuing simultaneously ? - Are they hitting the same region and hotspotting it ? - Are these GET(s) on the same rowkey but trying to get different column families ? Thanks, Viral On Mon, Feb 18, 2013 at 1:57 AM, Varun Sharma va...@pinterest.com wrote: Hi, I am trying to batched get(s) on a cluster. Here is the code: ListGet gets = ... // Prepare my gets with the rows i need myHTable.get(gets); I have two questions about the above scenario: i) Is this the most optimal way to do this ? ii) I have a feeling that if there are multiple gets in this case, on the same region, then each one of those shall instantiate separate scan(s) over the region even though a single scan is sufficient. Am I mistaken here ? Thanks Varun
RE: Using HBase for Deduping
Are all these dupe events expected to be within the same hour or they can happen over multiple hours ? Viral From: Rahul Ravindran Sent: 2/14/2013 11:41 AM To: user@hbase.apache.org Subject: Using HBase for Deduping Hi, We have events which are delivered into our HDFS cluster which may be duplicated. Each event has a UUID and we were hoping to leverage HBase to dedupe them. We run a MapReduce job which would perform a lookup for each UUID on HBase and then emit the event only if the UUID was absent and would also insert into the HBase table(This is simplistic, I am missing out details to make this more resilient to failures). My concern is that doing a Read+Write for every event in MR would be slow (We expect around 1 Billion events every hour). Does anyone use Hbase for a similar use case or is there a different approach to achieving the same end result. Any information, comments would be great. Thanks, ~Rahul.
Re: Using HBase for Deduping
You could do with a 2-pronged approach here i.e. some MR and some HBase lookups. I don't think this is the best solution either given the # of events you will get. FWIW, the solution below again relies on the assumption that if a event is duped in the same hour it won't have a dupe outside of that hour boundary. If it can have then you are better of with running a MR job with the current hour + another 3 hours of data or an MR job with the current hour + the HBase table as input to the job too (i.e. no HBase lookups, just read the HFile directly) ? - Run a MR job which de-dupes events for the current hour i.e. only runs on 1 hour worth of data. - Mark records which you were not able to de-dupe in the current run - For the records that you were not able to de-dupe, check against HBase whether you saw that event in the past. If you did, you can drop the current event or update the event to the new value (based on your business logic) - Save all the de-duped events (via HBase bulk upload) Sorry if I just rambled along, but without knowing the whole problem it's very tough to come up with a probable solution. So correct my assumptions and we could drill down more. Thanks, Viral On Thu, Feb 14, 2013 at 12:29 PM, Rahul Ravindran rahu...@yahoo.com wrote: Most will be in the same hour. Some will be across 3-6 hours. Sent from my phone.Excuse the terseness. On Feb 14, 2013, at 12:19 PM, Viral Bajaria viral.baja...@gmail.com wrote: Are all these dupe events expected to be within the same hour or they can happen over multiple hours ? Viral From: Rahul Ravindran Sent: 2/14/2013 11:41 AM To: user@hbase.apache.org Subject: Using HBase for Deduping Hi, We have events which are delivered into our HDFS cluster which may be duplicated. Each event has a UUID and we were hoping to leverage HBase to dedupe them. We run a MapReduce job which would perform a lookup for each UUID on HBase and then emit the event only if the UUID was absent and would also insert into the HBase table(This is simplistic, I am missing out details to make this more resilient to failures). My concern is that doing a Read+Write for every event in MR would be slow (We expect around 1 Billion events every hour). Does anyone use Hbase for a similar use case or is there a different approach to achieving the same end result. Any information, comments would be great. Thanks, ~Rahul.
Re: Using HBase for Deduping
Given the size of the data ( 1B rows) and the frequency of job run (once per hour), I don't think your most optimal solution is to lookup HBase for every single event. You will benefit more by loading the HBase table directly in your MR job. In 1B rows, what's the cardinality ? Is it 100M UUID's ? 99% unique UUID's ? Also once you have done the unique, are you going to use the data again in some other way i.e. online serving of traffic or some other analysis ? Or this is just to compute some unique #'s ? It will be more helpful if you describe your final use case of the computed data too. Given the amount of back and forth, we can take it off list too and summarize the conversation for the list. On Thu, Feb 14, 2013 at 1:07 PM, Rahul Ravindran rahu...@yahoo.com wrote: We can't rely on the the assumption event dupes will not dupe outside an hour boundary. So, your take is that, doing a lookup per event within the MR job is going to be bad? From: Viral Bajaria viral.baja...@gmail.com To: Rahul Ravindran rahu...@yahoo.com Cc: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, February 14, 2013 12:48 PM Subject: Re: Using HBase for Deduping You could do with a 2-pronged approach here i.e. some MR and some HBase lookups. I don't think this is the best solution either given the # of events you will get. FWIW, the solution below again relies on the assumption that if a event is duped in the same hour it won't have a dupe outside of that hour boundary. If it can have then you are better of with running a MR job with the current hour + another 3 hours of data or an MR job with the current hour + the HBase table as input to the job too (i.e. no HBase lookups, just read the HFile directly) ? - Run a MR job which de-dupes events for the current hour i.e. only runs on 1 hour worth of data. - Mark records which you were not able to de-dupe in the current run - For the records that you were not able to de-dupe, check against HBase whether you saw that event in the past. If you did, you can drop the current event or update the event to the new value (based on your business logic) - Save all the de-duped events (via HBase bulk upload) Sorry if I just rambled along, but without knowing the whole problem it's very tough to come up with a probable solution. So correct my assumptions and we could drill down more. Thanks, Viral On Thu, Feb 14, 2013 at 12:29 PM, Rahul Ravindran rahu...@yahoo.com wrote: Most will be in the same hour. Some will be across 3-6 hours. Sent from my phone.Excuse the terseness. On Feb 14, 2013, at 12:19 PM, Viral Bajaria viral.baja...@gmail.com wrote: Are all these dupe events expected to be within the same hour or they can happen over multiple hours ? Viral From: Rahul Ravindran Sent: 2/14/2013 11:41 AM To: user@hbase.apache.org Subject: Using HBase for Deduping Hi, We have events which are delivered into our HDFS cluster which may be duplicated. Each event has a UUID and we were hoping to leverage HBase to dedupe them. We run a MapReduce job which would perform a lookup for each UUID on HBase and then emit the event only if the UUID was absent and would also insert into the HBase table(This is simplistic, I am missing out details to make this more resilient to failures). My concern is that doing a Read+Write for every event in MR would be slow (We expect around 1 Billion events every hour). Does anyone use Hbase for a similar use case or is there a different approach to achieving the same end result. Any information, comments would be great. Thanks, ~Rahul.
question about pre-splitting regions
Hi, I am creating a new table and want to pre-split the regions and am seeing some weird behavior. My table is designed as a composite of multiple fixed length byte arrays separated by a control character (for simplicity sake we can say the separator is _underscore_). The prefix of this rowkey is deterministic (i.e. length of 8 bytes) and I know it beforehand how many different prefix I will see in the near future. The values after the prefix is not deterministic. I wanted to create a pre-split tables based on the number of number of prefix combinations that I know. I ended up doing something like this: hbaseAdmin.createTable(tableName, Bytes.toBytes(1L), Bytes.toBytes(maxCombinationPrefixValue), maxCombinationPrefixValue) The create table worked fine and as expected it created the number of partitions. But when I write data to the table, I still see all the writes hitting a single region instead of hitting different regions based on the prefix. Is my thinking of splitting by prefix values flawed ? Do I have to split by some real rowkeys (though it's impossible for me to know what rowkeys will show up except the row prefix which is much more deterministic). For some reason I think I have a flawed understanding of the createTable API and that is causing the issue for me ? Should I use the byte[][] prefixes method and not the one that I am using right now ? Any suggestions/pointers ? Thanks, Viral
Re: question about pre-splitting regions
I was able to figure it out. I had to use the createTable api which took splitKeys instead of the startKey, endKey and numPartitions. If anyone comes across this issue and needs more feedback feel free to ping me. Thanks, Viral On Thu, Feb 14, 2013 at 7:30 PM, Viral Bajaria viral.baja...@gmail.comwrote: Hi, I am creating a new table and want to pre-split the regions and am seeing some weird behavior. My table is designed as a composite of multiple fixed length byte arrays separated by a control character (for simplicity sake we can say the separator is _underscore_). The prefix of this rowkey is deterministic (i.e. length of 8 bytes) and I know it beforehand how many different prefix I will see in the near future. The values after the prefix is not deterministic. I wanted to create a pre-split tables based on the number of number of prefix combinations that I know. I ended up doing something like this: hbaseAdmin.createTable(tableName, Bytes.toBytes(1L), Bytes.toBytes(maxCombinationPrefixValue), maxCombinationPrefixValue) The create table worked fine and as expected it created the number of partitions. But when I write data to the table, I still see all the writes hitting a single region instead of hitting different regions based on the prefix. Is my thinking of splitting by prefix values flawed ? Do I have to split by some real rowkeys (though it's impossible for me to know what rowkeys will show up except the row prefix which is much more deterministic). For some reason I think I have a flawed understanding of the createTable API and that is causing the issue for me ? Should I use the byte[][] prefixes method and not the one that I am using right now ? Any suggestions/pointers ? Thanks, Viral
Re: Announcing Phoenix: A SQL layer over HBase
Congrats guys !!! This is something that was sorely missing in what I am trying to build... will definitely try it out... just out of curiosity, what kind of projects/tools at SalesForce uses this library ? On Wed, Jan 30, 2013 at 5:55 PM, Huanyou Chang mapba...@mapbased.comwrote: Great tool,I will try it later. thanks for sharing! 2013/1/31 Devaraj Das d...@hortonworks.com Congratulations, James. We will surely benefit from this tool. On Wed, Jan 30, 2013 at 1:04 PM, James Taylor jtay...@salesforce.com wrote: We are pleased to announce the immediate availability of a new open source project, Phoenix, a SQL layer over HBase that powers the HBase use cases at Salesforce.com. We put the SQL back in the NoSQL: * Available on GitHub at https://github.com/forcedotcom/phoenix * Embedded JDBC driver implements the majority of java.sql interfaces, including the metadata APIs. * Built for low latency queries through parallelization, the use of native HBase APIs, coprocessors, and custom filters. * Allows columns to be modelled as a multi-part row key or key/value cells. * Full query support with predicate push down and optimal scan key formation. * DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for adding/removing columns. * Versioned schema repository. Snapshot queries use the schema that was in place when data was written. * DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for mass data transfer between the same or different tables, and DELETE for deleting rows. * Limited transaction support through client-side batching. * Single table only - no joins yet and secondary indexes are a work in progress. * Follows ANSI SQL standards whenever possible * Requires HBase v 0.94.2 or above * BSD-like license * 100% Java Join our user groups: Phoenix HBase User: https://groups.google.com/forum/#!forum/phoenix-hbase-user Phoenix HBase Dev: https://groups.google.com/forum/#!forum/phoenix-hbase-dev and check out our roadmap: https://github.com/forcedotcom/phoenix/wiki#wiki-roadmap We welcome feedback and contributions from the community to Phoenix and look forward to working together. Regards, James Taylor @JamesPlusPlus
Re: Indexing Hbase Data
When you say indexing, are you referring to indexing the column qualifiers or the values that you are storing in the qualifier ? Regarding indexing, I remember someone had recommended this on the mailing list before: https://github.com/ykulbak/ihbase/wiki but it seems the development on that is not active anymore. -Viral On Mon, Jan 28, 2013 at 3:45 AM, Mohammad Tariq donta...@gmail.com wrote: Hello list, I would like to have some suggestions on Hbase data indexing. What would you prefer? I never faced such requirement till now. This is the first time when there is a need of indexing, so thought of getting some expert comments and suggestions. Thank you so much for your precious time. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com
Re: hbase 0.94.4 with hadoop 0.23.5
Thanks Vandana for reply. I tried that but no luck. It still throws the same error. I thought there might have been a typo and you meant -D and not -P but none of them worked. I verified that the hadoop-auth code base does not have KerberosUtil class anymore. So I am guessing there is some, but I am surprised no one has raised this point till now because on the list a few of them say that they are using hbase 0.94 with hadoop 0.23 Maybe I am doing something totally wrong and downloading via the downloads link and compiling is not the right thing to do and I should just get it from the source repository ? -Viral On Mon, Jan 28, 2013 at 7:43 AM, Vandana Ayyalasomayajula avand...@yahoo-inc.com wrote: Hi viral, Try adding -Psecurity and then compiling. Thanks Vandana Sent from my iPhone On Jan 28, 2013, at 3:05 AM, Viral Bajaria viral.baja...@gmail.com wrote: Hi, Is anyone running hbase 0.94.4 against hadoop 0.23.5 ? If yes, how did you end up compiling hbase for hadoop 0.23 ? I downloaded the hbase release and tried running mvn clean package -Dhadoop.profile=23 but I keep on getting a compilation error as follows: [INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ hbase --- [INFO] Compiling 738 source files to /home/viral/dev/downloads/hbase-0.94.4/target/classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java:[41,53] cannot find symbol symbol : class KerberosUtil location: package org.apache.hadoop.security.authentication.util [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[45,15] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1029,19] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java:[242,32] cannot find symbol symbol : variable KerberosUtil location: class org.apache.hadoop.hbase.zookeeper.ZKUtil.JaasConfiguration Tried with both the security release and the one without but no luck. Any pointers ? Thanks, Viral
Re: hbase 0.94.4 with hadoop 0.23.5
Tried all of it, I think I will have to defer this to the hadoop mailing list because it seems there is a missing class in hadoop 0.23 branches, not sure if that is intentional. The class exists in trunk and hadoop 2.0 branches. Though the surprising part is that it does not exist in 0.23. Does the hbase code base exclude certain files based on what hadoop profile we are targeting ? I will try to apply a patch of that class to 0.23.5 and see if it works and if it does then I will post it to the list and submit a patch to both hadoop and hbase (needs pom change). -Viral On Mon, Jan 28, 2013 at 5:34 PM, Stack st...@duboce.net wrote: The below seems like a good suggestion by Vandana. I will say that focus is on support for hadoop 1 and 2. There has not been much call for us to support 0.23.x If you can figure what needs fixing, we could try adding the fix to 0.94 (In trunk a patch to add a compatibility module for hadoop-0.23.x would be welcomed). St.Ack On Mon, Jan 28, 2013 at 5:09 PM, Vandana Ayyalasomayajula avand...@yahoo-inc.com wrote: May be thats the issue. Try downloading the source from 0.94 branch and use the maven command with -Psecurity and -Dhadoop.profile=23. That should work. Thanks Vandana On Jan 28, 2013, at 11:48 AM, Viral Bajaria wrote: Thanks Vandana for reply. I tried that but no luck. It still throws the same error. I thought there might have been a typo and you meant -D and not -P but none of them worked. I verified that the hadoop-auth code base does not have KerberosUtil class anymore. So I am guessing there is some, but I am surprised no one has raised this point till now because on the list a few of them say that they are using hbase 0.94 with hadoop 0.23 Maybe I am doing something totally wrong and downloading via the downloads link and compiling is not the right thing to do and I should just get it from the source repository ? -Viral On Mon, Jan 28, 2013 at 7:43 AM, Vandana Ayyalasomayajula avand...@yahoo-inc.com wrote: Hi viral, Try adding -Psecurity and then compiling. Thanks Vandana Sent from my iPhone On Jan 28, 2013, at 3:05 AM, Viral Bajaria viral.baja...@gmail.com wrote: Hi, Is anyone running hbase 0.94.4 against hadoop 0.23.5 ? If yes, how did you end up compiling hbase for hadoop 0.23 ? I downloaded the hbase release and tried running mvn clean package -Dhadoop.profile=23 but I keep on getting a compilation error as follows: [INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ hbase --- [INFO] Compiling 738 source files to /home/viral/dev/downloads/hbase-0.94.4/target/classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java:[41,53] cannot find symbol symbol : class KerberosUtil location: package org.apache.hadoop.security.authentication.util [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[45,15] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1029,19] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java:[242,32] cannot find symbol symbol : variable KerberosUtil location: class org.apache.hadoop.hbase.zookeeper.ZKUtil.JaasConfiguration Tried with both the security release and the one without but no luck. Any pointers ? Thanks, Viral
Re: hbase 0.94.4 with hadoop 0.23.5
Just closing the loop here, it might help someone else to hand patch their build process before I get the patches in the hadoop branch, no changes required for hbase. I backported the latest version of KerberosUtil from hadoop 2.0 branch and recompiled hadoop-common/hadoop-auth and then installed the jar out to my local maven repository. Ran the command mvn clean package -Dhadoop.profile=23 and I was able to build hbase against hadoop 0.23. Now starts the more painful part of making sure everything works during runtime :-) Stack, I noticed that in all profiles except 0.23 there is hadoop-core or hadoop-common includes, while in 0.23 there is only hadoop-client as a dependency and there is no mention for hadoop-common or hadoop-auth anywhere, do they get pulled in due to other dependencies ? Just trying to understand the whole build process for hbase. Thanks, Viral On Mon, Jan 28, 2013 at 5:58 PM, Viral Bajaria viral.baja...@gmail.comwrote: Tried all of it, I think I will have to defer this to the hadoop mailing list because it seems there is a missing class in hadoop 0.23 branches, not sure if that is intentional. The class exists in trunk and hadoop 2.0 branches. Though the surprising part is that it does not exist in 0.23. Does the hbase code base exclude certain files based on what hadoop profile we are targeting ? I will try to apply a patch of that class to 0.23.5 and see if it works and if it does then I will post it to the list and submit a patch to both hadoop and hbase (needs pom change). -Viral On Mon, Jan 28, 2013 at 5:34 PM, Stack st...@duboce.net wrote: The below seems like a good suggestion by Vandana. I will say that focus is on support for hadoop 1 and 2. There has not been much call for us to support 0.23.x If you can figure what needs fixing, we could try adding the fix to 0.94 (In trunk a patch to add a compatibility module for hadoop-0.23.x would be welcomed). St.Ack On Mon, Jan 28, 2013 at 5:09 PM, Vandana Ayyalasomayajula avand...@yahoo-inc.com wrote: May be thats the issue. Try downloading the source from 0.94 branch and use the maven command with -Psecurity and -Dhadoop.profile=23. That should work. Thanks Vandana On Jan 28, 2013, at 11:48 AM, Viral Bajaria wrote: Thanks Vandana for reply. I tried that but no luck. It still throws the same error. I thought there might have been a typo and you meant -D and not -P but none of them worked. I verified that the hadoop-auth code base does not have KerberosUtil class anymore. So I am guessing there is some, but I am surprised no one has raised this point till now because on the list a few of them say that they are using hbase 0.94 with hadoop 0.23 Maybe I am doing something totally wrong and downloading via the downloads link and compiling is not the right thing to do and I should just get it from the source repository ? -Viral On Mon, Jan 28, 2013 at 7:43 AM, Vandana Ayyalasomayajula avand...@yahoo-inc.com wrote: Hi viral, Try adding -Psecurity and then compiling. Thanks Vandana Sent from my iPhone On Jan 28, 2013, at 3:05 AM, Viral Bajaria viral.baja...@gmail.com wrote: Hi, Is anyone running hbase 0.94.4 against hadoop 0.23.5 ? If yes, how did you end up compiling hbase for hadoop 0.23 ? I downloaded the hbase release and tried running mvn clean package -Dhadoop.profile=23 but I keep on getting a compilation error as follows: [INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ hbase --- [INFO] Compiling 738 source files to /home/viral/dev/downloads/hbase-0.94.4/target/classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java:[41,53] cannot find symbol symbol : class KerberosUtil location: package org.apache.hadoop.security.authentication.util [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[45,15] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/util/Bytes.java:[1029,19] sun.misc.Unsafe is Sun proprietary API and may be removed in a future release [ERROR] /home/viral/dev/downloads/hbase-0.94.4/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java:[242,32] cannot find symbol symbol : variable KerberosUtil location: class org.apache.hadoop.hbase.zookeeper.ZKUtil.JaasConfiguration Tried with both the security release and the one without but no luck. Any pointers ? Thanks, Viral
RE: Hbase Takes Long time to Restart - Whats the correct way to
restart Hbase cluster? MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=0015175cb4cee635fe04d348eb19 --0015175cb4cee635fe04d348eb19 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Is that one unassigned task even getting assigned or it errored out and won't even run on retries ? I have seen this happen before when our region server crashed. On restart, there are a few splitlog tasks that never finish and error out. If you check the regionserver where the task should run you will see that it is having trouble acquiring lease on that file. Can you verify that ? Viral From: Ameya Kantikar Sent: 1/14/2013 4:34 PM To: user@hbase.apache.org Subject: Hbase Takes Long time to Restart - Whats the correct way to restart Hbase cluster? I restarted Hbase master, and its taking long time (at least 30 minutes) to come back. In master-status page I am seeing over 400 regions in transition. In the hbase master log I am seeing following: DEBUG org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 1 unassigned = 1 (This is down from over 50 to 1 as in the log). My setup is as follows: I am running cdh4, hbase 0.92.1. My question is, what is the right way to restart hbase-master machine? All this time master is restarting, my cluster is unavailable. From hbase shell if I hit any table its giving error such as: INFO ipc.HBaseRPC: Server at hbase1/XX.XX.XX.XX:60020 could not be reached after 1 tries, giving up Any ideas? Ameya --0015175cb4cee635fe04d348eb19--
Re: DataXceiver java.io.EOFException
Hi, Is your dfs.datanode.handler.count set to the default value of 3 ? I think I bumped it up when I got these exceptions and the issue wasn't due to xcievers. I would recommend increasing that to 6 and see if the error goes away or the frequency of the error decreases. Thanks, Viral On Wed, Nov 28, 2012 at 10:32 PM, Arati Patro arati.pa...@gmail.com wrote: Hi, I'm using hbase version 0.94.1 and hadoop version 1.0.3 I'm running HBase + HDFS on a 4 node cluster (48 GB RAM, 12TB DiskSpace on each node). 1 HMaster + NameNode and 3 HRegionServer + DataNode Replication is set to 2 Running 6 MapReduce jobs (two of which run concurrently) When MapReduce jobs are triggered the datanode log shows exceptions like this: 2012-11-26 17:37:38,672 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-4043001352486758862_3090 received exception java.io.EOFException 2012-11-26 17:37:38,673 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 10.63.63.249:50010, storageID=DS-778870342-10.63.63.249-50010-1353922061110, infoPort=50075, ipcPort=50020):DataXceiver java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:298) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:351) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107) at java.lang.Thread.run(Thread.java:619) 2012-11-26 17:37:38,675 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_5001084339060873354_3090 src: /10.63.63.249:37109 dest: / 10.63.63.249:50010 Xciever value is set as below in hdfs-site.xml property namedfs.datanode.max.xcievers/name value16384/value /property Could anyone give some more light on why this is happening. Thanks, Arati Patro