Re: hbase can't start:KeeperErrorCode = NoNode for /hbase
The problem resolved. it caused by the zookeeper data corrupted. so I modified zookeeper data dir to another directory on hbase-site.xml and restart hbase again. hbase.zookeeper.property.dataDir /home/zhouhh/myhadoop/zk Property from ZooKeeper's config zoo.cfg. The directory where the snapshot is stored. Thanks to every one. andy 2012/8/2 abloz...@gmail.com > Thank you, Keywal and Mohammad. > I also think the data is corrupted, but the zookeeper is inner of Hbase, I > don't know how to change the zookeeper data directory. I'll try this way. > So if kill java process rudely, there may be corrupted of data. But > sometimes, stop shell script will not work. > > Here is my hbase-site.xml > > > > hbase.rootdir > hdfs://Hadoop48:54310/hbase1 > > > hbase.cluster.distributed > true > > > hbase.master.port > 6 > > > hbase.zookeeper.quorum > Hadoop48 > > > zookeeper.znode.parent > /hbase1 > > > > > Thanks! > > Andy zhou > > 2012/8/2 N Keywal > >> Hi, >> >> The issue is in ZooKeeper, not directly HBase. It seems its data is >> corrupted, so it cannot start. You can configure zookeeper to another >> data directory to make it start. >> >> N. >> >> >> On Thu, Aug 2, 2012 at 11:11 AM, abloz...@gmail.com >> wrote: >> > I even move /hbase to hbase2, and create a new dir /hbase1, modify >> > hbase-site.xml to: >> > >> > hbase.rootdir >> > hdfs://Hadoop48:54310/hbase1 >> > >> > >> > zookeeper.znode.parent >> > /hbase1 >> > >> > >> > But the error message still KeeperErrorCode = NoNode for /hbase >> > >> > Any body can give any help? >> > Thanks! >> > >> > Andy zhou >> > >> > 2012/8/2 abloz...@gmail.com >> > >> >> hi all, >> >> After I killed all java process, I can't restart hbase, it reports: >> >> >> >> Hadoop46: starting zookeeper, logging to >> >> /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop46.out >> >> Hadoop47: starting zookeeper, logging to >> >> /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop47.out >> >> Hadoop48: starting zookeeper, logging to >> >> /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop48.out >> >> Hadoop46: java.lang.RuntimeException: Unable to run quorum server >> >> Hadoop46: at >> >> >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454) >> >> Hadoop46: at >> >> >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) >> >> Hadoop46: at >> >> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) >> >> Hadoop46: at >> >> >> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:74) >> >> Hadoop46: at >> >> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:64) >> >> Hadoop46: Caused by: java.io.IOException: Failed to process transaction >> >> type: 1 error: KeeperErrorCode = NoNode for /hbase >> >> Hadoop46: at >> >> >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) >> >> Hadoop46: at >> >> >> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) >> >> Hadoop46: at >> >> >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) >> >> Hadoop47: java.lang.RuntimeException: Unable to run quorum server >> >> Hadoop47: at >> >> >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454) >> >> Hadoop47: at >> >> >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409) >> >> Hadoop47: at >> >> >> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) >> >> Hadoop47: at >> >> >> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:74) >> >> Hadoop47: at >> >> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:64) >> >> Hadoop47: Caused by: java.io.IOException: Failed to process transaction >> >> type: 1 error: KeeperErrorCode = NoNode for /hbase >> >> Hadoop47: at >> >> >> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151) >> >> Hadoop47: at >> >> >> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) >> >> Hadoop47: at >> >> >> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417) >> >> >> >> while Hadoop48 is HMaster. >> >> but hdfs://xxx/hbase is existed. >> >> [zhouhh@Hadoop47 ~]$ hadoop fs -ls /hbase >> >> Found 113 items >> >> drwxr-xr-x - zhouhh supergroup 0 2012-07-03 19:24 >> /hbase/-ROOT- >> >> drwxr-xr-x - zhouhh supergroup 0 2012-07-03 19:24 >> /hbase/.META. >> >> ... >> >> >> >> So what's the problem? >> >> Thanks! >> >> >> >> andy >> >> >> > >
add_table.rb in -0.92.x
I just checked out hbase-0.92.1 and noticed that /bin/add_table.rb has been deleted. (CHANGES.txt: "HBASE-2460 add_table.rb deletes any tables for which the target table name is a prefix"). I wonder if theres a replacement or fixed version of it somewhere? Thanks, Holger -- View this message in context: http://old.nabble.com/add_table.rb-in--0.92.x-tp34250060p34250060.html Sent from the HBase User mailing list archive at Nabble.com.
Re: How to query by rowKey-infix
Hi Alex, thanks a lot for the hint about setting the timestamp of the put. I didn't know that this would be possible but that's solving the problem (first test was successful). So I'm really glad that I don't need to apply a filter to extract the time and so on for every row. Nevertheless I would like to see your custom filter implementation. Would be nice if you could provide it helping me to get a bit into it. And yes that helped :) regards Chris Von: Alex Baranau An: user@hbase.apache.org; Christian Schäfer Gesendet: 0:57 Freitag, 3.August 2012 Betreff: Re: How to query by rowKey-infix Hi Christian! If to put off secondary indexes and assume you are going with "heavy scans", you can try two following things to make it much faster. If this is appropriate to your situation, of course. 1. > Is there a more elegant way to collect rows within time range X? > (Unfortunately, the date attribute is not equal to the timestamp that is > stored by hbase automatically.) Can you set timestamp of the Puts to the one you have in row key? Instead of relying on the one that HBase puts automatically (current ts). If you can, this will improve reading speed a lot by setting time range on scanner. Depending on how you are writing your data of course, but I assume that you mostly write data in "time-increasing" manner. 2. If your userId has fixed length, or you can change it so that it has fixed length, then you can actually use smth like "wildcard" in row key. There's a way in Filter implementation to fast-forward to the record with specific row key and by doing this skip many records. This might be used as follows: * suppose your userId is 5 characters in length * suppose you are scanning for records with time between 2012-08-01 and 2012-08-08 * when you scanning records and you face e.g. key "a_2012-08-09_3jh345j345kjh", where "a" is user id, you can tell the scanner from your filter to fast-forward to key "b_ 2012-08-01". Because you know that all remained records of user "a" don't fall into the interval you need (as the time for its records will be >= 2012-08-09). As of now, I believe you will have to implement your custom filter to do that. Pointer: org.apache.hadoop.hbase.filter.Filter.ReturnCode.SEEK_NEXT_USING_HINT I believe I implemented similar thing some time ago. If this idea works for you I could look for the implementation and share it if it helps. Or may be even simply add it to HBase codebase. Hope this helps, Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Thu, Aug 2, 2012 at 8:40 AM, Christian Schäfer wrote: > >Excuse my double posting. >Here is the complete mail: > > > >OK, > >at first I will try the scans. > >If that's too slow I will have to upgrade hbase (currently 0.90.4-cdh3u2) to >be able to use coprocessors. > > >Currently I'm stuck at the scans because it requires two steps (therefore >maybe some kind of filter chaining is required) > > >The key: userId-dateInMillis-sessionId > > >At first I need to extract dateInMllis with regex or substring (using special >delimiters for date) > >Second, the extracted value must be parsed to Long and set to a RowFilter >Comparator like this: > >scan.setFilter(new RowFilter(CompareOp.GREATER_OR_EQUAL, new >BinaryComparator(Bytes.toBytes((Long)dateInMillis; > >How to chain that? >Do I have to write a custom filter? >(Would like to avoid that due to deployment) > >regards >Chris > > >- Ursprüngliche Message - >Von: Michael Segel >An: user@hbase.apache.org >CC: >Gesendet: 13:52 Mittwoch, 1.August 2012 >Betreff: Re: How to query by rowKey-infix > >Actually w coprocessors you can create a secondary index in short order. >Then your cost is going to be 2 fetches. Trying to do a partial table scan >will be more expensive. > >On Jul 31, 2012, at 12:41 PM, Matt Corgan wrote: > >> When deciding between a table scan vs secondary index, you should try to >> estimate what percent of the underlying data blocks will be used in the >> query. By default, each block is 64KB. >> >> If each user's data is small and you are fitting multiple users per block, >> then you're going to need all the blocks, so a tablescan is better because >> it's simpler. If each user has 1MB+ data then you will want to pick out >> the individual blocks relevant to each date. The secondary index will help >> you go directly to those sparse blocks, but with a cost in complexity, >> consistency, and extra denormalized data that knocks primary data out of >> your block cache. >> >> If latency is not a concern, I would start with the table scan. If that's >> too slow you add the secondary index, and if you still need it faster you >> do the primary key lookups in parallel as Jerry mentions. >> >> Matt >> >> On Tue, Jul 31, 2012 at 10:10 AM, Jerry Lam wrote: >> >>> Hi Chris: >>> >>> I'm thinking about building a secondary index for prim
Re: How to query by rowKey-infix
Hi Matt, sure I got this in mind as an last option (at least on a limited subset of data). Due to our estimation of some billions rows a week a selective filtering needs to take place at the server side. But I agree that one could do fine filtering stuff on the client side on a handy data subset to avoid getting the hbase schema & indexing (by coprocessors) too complicated. regards Chris - Ursprüngliche Message - Von: Matt Corgan An: user@hbase.apache.org CC: Gesendet: 3:29 Freitag, 3.August 2012 Betreff: Re: How to query by rowKey-infix Yeah - just thought i'd point it out since people often have small tables in their cluster alongside the big ones, and when generating reports, sometimes you don't care if it finishes in 10 minutes vs an hour. On Thu, Aug 2, 2012 at 6:15 PM, Alex Baranau wrote: > I think this is exactly what Christian is trying to (and should be trying > to) avoid ;). > > I can't imagine use-case when you need to filter something and you can do > it with (at least) server-side filter, and yet in this situation you want > to try to do it on the client-side... Doing filtering on client-side when > you can do it on server-side just feels wrong. Esp. given that there's a > lot of data in HBase (otherwise why would you use it). > > Alex Baranau > -- > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - > Solr > > On Thu, Aug 2, 2012 at 7:09 PM, Matt Corgan wrote: > > > Also Christian, don't forget you can read all the rows back to the client > > and do the filtering there using whatever logic you like. HBase Filters > > can be thought of as an optimization (predicate push-down) over > client-side > > filtering. Pulling all the rows over the network will be slower, but I > > don't think we know enough about your data or speed requirements to rule > it > > out. > > > > > > On Thu, Aug 2, 2012 at 3:57 PM, Alex Baranau > >wrote: > > > > > Hi Christian! > > > > > > If to put off secondary indexes and assume you are going with "heavy > > > scans", you can try two following things to make it much faster. If > this > > is > > > appropriate to your situation, of course. > > > > > > 1. > > > > > > > Is there a more elegant way to collect rows within time range X? > > > > (Unfortunately, the date attribute is not equal to the timestamp that > > is > > > stored by hbase automatically.) > > > > > > Can you set timestamp of the Puts to the one you have in row key? > Instead > > > of relying on the one that HBase puts automatically (current ts). If > you > > > can, this will improve reading speed a lot by setting time range on > > > scanner. Depending on how you are writing your data of course, but I > > assume > > > that you mostly write data in "time-increasing" manner. > > > > > > 2. > > > > > > If your userId has fixed length, or you can change it so that it has > > fixed > > > length, then you can actually use smth like "wildcard" in row key. > > There's > > > a way in Filter implementation to fast-forward to the record with > > specific > > > row key and by doing this skip many records. This might be used as > > follows: > > > * suppose your userId is 5 characters in length > > > * suppose you are scanning for records with time between 2012-08-01 > > > and 2012-08-08 > > > * when you scanning records and you face e.g. key > > > "a_2012-08-09_3jh345j345kjh", where "a" is user id, you can > tell > > > the scanner from your filter to fast-forward to key "b_ > 2012-08-01". > > > Because you know that all remained records of user "a" don't fall > > into > > > the interval you need (as the time for its records will be >= > > 2012-08-09). > > > > > > As of now, I believe you will have to implement your custom filter to > do > > > that. > > > Pointer: > > > org.apache.hadoop.hbase.filter.Filter.ReturnCode.SEEK_NEXT_USING_HINT > > > I believe I implemented similar thing some time ago. If this idea works > > for > > > you I could look for the implementation and share it if it helps. Or > may > > be > > > even simply add it to HBase codebase. > > > > > > Hope this helps, > > > > > > Alex Baranau > > > -- > > > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - > ElasticSearch > > - > > > Solr > > > > > > > > > On Thu, Aug 2, 2012 at 8:40 AM, Christian Schäfer < > syrious3...@yahoo.de > > > >wrote: > > > > > > > > > > > > > > > Excuse my double posting. > > > > Here is the complete mail: > > > > > > > > > > > > OK, > > > > > > > > at first I will try the scans. > > > > > > > > If that's too slow I will have to upgrade hbase (currently > > 0.90.4-cdh3u2) > > > > to be able to use coprocessors. > > > > > > > > > > > > Currently I'm stuck at the scans because it requires two steps > > (therefore > > > > maybe some kind of filter chaining is required) > > > > > > > > > > > > The key: userId-dateInMillis-sessionId > > > > > > > > At first I need to extract dateInMllis with regex or substring (using > > > > special delimiters for date) >
Re: How to query by rowKey-infix
Hi, What does your schema look like? Would it make sense to changing the key to user_id '|' timestamp and then use the session_id in the column name? On Aug 2, 2012, at 7:23 AM, Christian Schäfer wrote: > OK, > > at first I will try the scans. > > If that's too slow I will have to upgrade hbase (currently 0.90.4-cdh3u2) to > be able to use coprocessors. > > Currently I'm stuck at the scans because it requires two steps (therefore > some kind of filter chaining) > > The key: userId-dateInMllis-sessionId > > At first I need to extract dateInMllis with regex or substring (using special > delimiters for date) > > Second, the extracted value must be parsed to Long and set to a RowFilter > Comparator like this: > > > > > > - Ursprüngliche Message - > Von: Michael Segel > An: user@hbase.apache.org > CC: > Gesendet: 13:52 Mittwoch, 1.August 2012 > Betreff: Re: How to query by rowKey-infix > > Actually w coprocessors you can create a secondary index in short order. > Then your cost is going to be 2 fetches. Trying to do a partial table scan > will be more expensive. > > On Jul 31, 2012, at 12:41 PM, Matt Corgan wrote: > >> When deciding between a table scan vs secondary index, you should try to >> estimate what percent of the underlying data blocks will be used in the >> query. By default, each block is 64KB. >> >> If each user's data is small and you are fitting multiple users per block, >> then you're going to need all the blocks, so a tablescan is better because >> it's simpler. If each user has 1MB+ data then you will want to pick out >> the individual blocks relevant to each date. The secondary index will help >> you go directly to those sparse blocks, but with a cost in complexity, >> consistency, and extra denormalized data that knocks primary data out of >> your block cache. >> >> If latency is not a concern, I would start with the table scan. If that's >> too slow you add the secondary index, and if you still need it faster you >> do the primary key lookups in parallel as Jerry mentions. >> >> Matt >> >> On Tue, Jul 31, 2012 at 10:10 AM, Jerry Lam wrote: >> >>> Hi Chris: >>> >>> I'm thinking about building a secondary index for primary key lookup, then >>> query using the primary keys in parallel. >>> >>> I'm interested to see if there is other option too. >>> >>> Best Regards, >>> >>> Jerry >>> >>> On Tue, Jul 31, 2012 at 11:27 AM, Christian Schäfer >>> wrote: >>> Hello there, I designed a row key for queries that need best performance (~100 ms) which looks like this: userId-date-sessionId These queries(scans) are always based on a userId and sometimes additionally on a date, too. That's no problem with the key above. However, another kind of queries shall be based on a given time range whereas the outermost left userId is not given or known. In this case I need to get all rows covering the given time range with their date to create a daily reporting. As I can't set wildcards at the beginning of a left-based index for the scan, I only see the possibility to scan the index of the whole table to >>> collect the rowKeys that are inside the timerange I'm interested in. Is there a more elegant way to collect rows within time range X? (Unfortunately, the date attribute is not equal to the timestamp that is stored by hbase automatically.) Could/should one maybe leverage some kind of row key caching to >>> accelerate the collection process? Is that covered by the block cache? Thanks in advance for any advice. regards Chris >>> >
Never ending distributed log split
Hi, I'm using HBase 0.94.0. I stopped the cluster for some maintenance, and I'm have some troubles to restart it. I'm getting one line every about Start Time Description State Status Fri Aug 03 08:59:54 EDT 2012Doing distributed log split in [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting, hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting, hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting, hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting, hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting, hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting, hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting, hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting, hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting, hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting, hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting, hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting, hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting, hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting, hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting, hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting, hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting, hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting, hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting] RUNNING (since 3sec ago)Waiting for distributed tasks to finish. scheduled=1 done=0 error=0 (since 0sec ago) If I let it run, it will run like that for hours. Adding lines and lines and lines until I stop it. On the master logs, I can see that: 2012-08-03 09:02:49,788 INFO org.apache.hadoop.hbase.master.SplitLogManager: task /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297 entered state err node4,60020,1343998592129 2012-08-03 09:02:49,788 WARN org.apache.hadoop.hbase.master.SplitLogManager: Error splitting /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297 2012-08-03 09:02:49,788 WARN org.apache.hadoop.hbase.master.SplitLogManager: error while splitting logs in [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting, hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting, hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting, hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting, hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting, hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting, hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting, hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting, hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting, hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting, hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting, hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting, hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting, hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting, hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting, hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting, hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting, hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting, hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting] installed = 1 but only 0 done 2012-08-03 09:02:49,788 WARN org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting of [latitude,60020,1343908057839, latitude,60020,1343998595290, node1,60020,1343908057567, node1,60020,1343939284240, node1,60020,1343998593757, node2,60020,1343908059614, node2,60020,1343939286369, node2,60020,1343998595830, node3,60020,1343908054414, node3,60020,1343939282294, node3,60020,1343998590612, node4,60020,1343908056186, node4,60020,1343939282889, node4,60020,1343998592129, node5,60020,1343908059158, node5,60020,1343998594856, phenom,60020,1343908053256, phenom,60020,1343939281065, phenom,60020,1343998580375] java.io.IOException: error or interrupt while splitting logs in [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting, hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting, hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting, hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting, hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting, hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting, hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting, hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting, hdfs://node3:9000/hbase/
Re: Never ending distributed log split
Here us the complete log. And seems it's every 30 seconds and not every 20 seconds... http://pastebin.com/gMiURnnj 2012/8/3, Jean-Marc Spaggiari : > Hi, > > I'm using HBase 0.94.0. > > I stopped the cluster for some maintenance, and I'm have some troubles > to restart it. > > I'm getting one line every about > > Start TimeDescription State Status > Fri Aug 03 08:59:54 EDT 2012 Doing distributed log split in > [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting, > hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting, > hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting, > hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting, > hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting, > hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting, > hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting, > hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting, > hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting, > hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting, > hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting, > hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting, > hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting, > hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting, > hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting, > hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting, > hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting, > hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting, > hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting] > RUNNING (since 3sec ago)Waiting for distributed tasks to finish. > scheduled=1 done=0 error=0 (since 0sec ago) > > If I let it run, it will run like that for hours. Adding lines and > lines and lines until I stop it. > > > On the master logs, I can see that: > 2012-08-03 09:02:49,788 INFO > org.apache.hadoop.hbase.master.SplitLogManager: task > /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297 > entered state err node4,60020,1343998592129 > 2012-08-03 09:02:49,788 WARN > org.apache.hadoop.hbase.master.SplitLogManager: Error splitting > /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297 > 2012-08-03 09:02:49,788 WARN > org.apache.hadoop.hbase.master.SplitLogManager: error while splitting > logs in > [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting, > hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting, > hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting, > hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting, > hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting, > hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting, > hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting, > hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting, > hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting, > hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting, > hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting, > hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting, > hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting, > hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting, > hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting, > hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting, > hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting, > hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting, > hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting] > installed = 1 but only 0 done > 2012-08-03 09:02:49,788 WARN > org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting of > [latitude,60020,1343908057839, latitude,60020,1343998595290, > node1,60020,1343908057567, node1,60020,1343939284240, > node1,60020,1343998593757, node2,60020,1343908059614, > node2,60020,1343939286369, node2,60020,1343998595830, > node3,60020,1343908054414, node3,60020,1343939282294, > node3,60020,1343998590612, node4,60020,1343908056186, > node4,60020,1343939282889, node4,60020,1343998592129, > node5,60020,1343908059158, node5,60020,1343998594856, > phenom,60020,1343908053256, phenom,60020,1343939281065, > phenom,60020,1343998580375] > java.io.IOException: error or interrupt while splitting logs in > [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting, > hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting, > hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting, > hdfs://node3:9000/hbase/.logs/node1,60020,13439392842
Re: Never ending distributed log split
Me again ;) I did some more investigation. And I found that: http://pastebin.com/Bedm6Ldy Seems that no region is serving my logs. That's strange because all my servers are up and fsck is telling me that FS is clean. Can I just delete those files? What's the impact of such delete? I don't really worrie about loosing some data. It's a test environment. But I really need it to start again. Thanks, JM 2012/8/3, Jean-Marc Spaggiari : > Here us the complete log. And seems it's every 30 seconds and not > every 20 seconds... > > http://pastebin.com/gMiURnnj > > 2012/8/3, Jean-Marc Spaggiari : >> Hi, >> >> I'm using HBase 0.94.0. >> >> I stopped the cluster for some maintenance, and I'm have some troubles >> to restart it. >> >> I'm getting one line every about >> >> Start Time Description State Status >> Fri Aug 03 08:59:54 EDT 2012 Doing distributed log split in >> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting, >> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting, >> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting, >> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting, >> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting, >> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting, >> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting, >> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting, >> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting, >> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting, >> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting, >> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting, >> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting, >> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting, >> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting, >> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting, >> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting, >> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting, >> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting] >> RUNNING (since 3sec ago)Waiting for distributed tasks to finish. >> scheduled=1 done=0 error=0 (since 0sec ago) >> >> If I let it run, it will run like that for hours. Adding lines and >> lines and lines until I stop it. >> >> >> On the master logs, I can see that: >> 2012-08-03 09:02:49,788 INFO >> org.apache.hadoop.hbase.master.SplitLogManager: task >> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297 >> entered state err node4,60020,1343998592129 >> 2012-08-03 09:02:49,788 WARN >> org.apache.hadoop.hbase.master.SplitLogManager: Error splitting >> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297 >> 2012-08-03 09:02:49,788 WARN >> org.apache.hadoop.hbase.master.SplitLogManager: error while splitting >> logs in >> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting, >> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting, >> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting, >> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting, >> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting, >> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting, >> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting, >> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting, >> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting, >> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting, >> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting, >> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting, >> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting, >> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting, >> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting, >> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting, >> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting, >> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting, >> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting] >> installed = 1 but only 0 done >> 2012-08-03 09:02:49,788 WARN >> org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting of >> [latitude,60020,1343908057839, latitude,60020,1343998595290, >> node1,60020,1343908057567, node1,60020,1343939284240, >> node1,60020,1343998593757, node2,60020,1343908059614, >> node2,60020,1343939286369, node2,60020,1343998595830, >> node3,60020,1343908054414, node3,60020,1343939282294, >> node3,60020,1343998590612, node4,60020,1343908056186, >>
Re: add_table.rb in -0.92.x
hbck should be able to take care of it now. J-D On Fri, Aug 3, 2012 at 2:21 AM, holger.lewin wrote: > > I just checked out hbase-0.92.1 and noticed that /bin/add_table.rb has been > deleted. (CHANGES.txt: "HBASE-2460 add_table.rb deletes any tables for > which the target table name is a prefix"). I wonder if theres a replacement > or fixed version of it somewhere? > > Thanks, > Holger > -- > View this message in context: > http://old.nabble.com/add_table.rb-in--0.92.x-tp34250060p34250060.html > Sent from the HBase User mailing list archive at Nabble.com. >
Re: Never ending distributed log split
On Fri, Aug 3, 2012 at 8:15 AM, Jean-Marc Spaggiari wrote: > Me again ;) > > I did some more investigation. It would really help to see the region server log although the fsck output might be enough. BTW you'll find 0.94.1 RC1 here: http://people.apache.org/~larsh/hbase-0.94.1-rc1/ > > And I found that: > > http://pastebin.com/Bedm6Ldy > > Seems that no region is serving my logs. That's strange because all my > servers are up and fsck is telling me that FS is clean. I don't get the "Seems that no region is serving my logs" part. A region doesn't serve logs, it serves HFiles. You meant to say DataNode? > > Can I just delete those files? What's the impact of such delete? I > don't really worrie about loosing some data. It's a test environment. > But I really need it to start again. I wonder if it's related to: https://issues.apache.org/jira/browse/HBASE-6401 Did you remove a datanode from the cluster as part of the maintenance? If you want you can probably move that folder aside but whatever was in those logs is lost (if there ever was anything) until it gets replayed properly. Kinda weird that a file wouldn't have any blocks like that, would be interesting to see the log of the region server that created it. J-D
Re: HBaseTestingUtility on windows
Hi Mohit: You might need to install Cygwin if the tool has dependency on Linux command like bash. Best Regards, Jerry On Friday, August 3, 2012, N Keywal wrote: > Hi Mohit, > > For simple cases, it works for me for hbase 0.94 at least. But I'm not > sure it works for all features. I've never tried to run hbase unit > tests on windows for example. > > N. > > On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia > > > wrote: > > I am trying to run mini cluster using HBaseTestingUtility Class from > hbase > > tests on windows, but I get "bash command error". Is it not possible to > run > > this utility class on windows? > > > > I followed this example: > > > > > http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/ >
Re: Never ending distributed log split
2012/8/3, Jean-Daniel Cryans : > On Fri, Aug 3, 2012 at 8:15 AM, Jean-Marc Spaggiari > wrote: >> Me again ;) >> >> I did some more investigation. > > It would really help to see the region server log although the fsck > output might be enough. I looked under evey directory and only one is containing a file. http://pastebin.com/8Fea2EnA It seems to be related to node1. On this server, seems that everything is started correctly: hadoop@node1:~$ /usr/local/jdk1.7.0_05/bin/jps 2211 DataNode 2938 Jps 2136 TaskTracker hbase@node1:~$ /usr/local/jdk1.7.0_05/bin/jps 2419 HRegionServer 3708 Jps On the Node1 region server logs, I can see the same information, which is, the file is not hosted anywhere. 2012-08-03 15:01:31,216 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block: blk_4965382127800577452_15852 file=/hbase/.logs/node1,60020,1343908057567-splitting/node1%2C60020%2C1343908057567.1343914548297 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2266) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2060) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2221) at java.io.DataInputStream.read(DataInputStream.java:149) at java.io.DataInputStream.readFully(DataInputStream.java:195) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1486) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1475) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1470) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:55) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175) at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:688) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:850) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:763) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:384) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:351) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:266) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165) at java.lang.Thread.run(Thread.java:722) > BTW you'll find 0.94.1 RC1 here: > http://people.apache.org/~larsh/hbase-0.94.1-rc1/ Super, thanks! I will most probably try it instead of the 0.94.0 >> And I found that: >> >> http://pastebin.com/Bedm6Ldy >> >> Seems that no region is serving my logs. That's strange because all my >> servers are up and fsck is telling me that FS is clean. > > I don't get the "Seems that no region is serving my logs" part. A > region doesn't serve logs, it serves HFiles. You meant to say > DataNode? I was talking about the files under /hbase/.logs . Base on the directory name I thought it was some logs. What ever this file is supposed to be for, it seems it's not served by any datanode. >> Can I just delete those files? What's the impact of such delete? I >> don't really worrie about loosing some data. It's a test environment. >> But I really need it to start again. > > I wonder if it's related to: > https://issues.apache.org/jira/browse/HBASE-6401 > > Did you remove a datanode from the cluster as part of the maintenance? It might be related to this Jira. You, I stopped all the datanodes for the maintenance (Had to work on the power suply...). I had to do that promptly so I "just" stopped everything with init 0. > > If you want you can probably move that folder aside but whatever was > in those logs is lost (if there ever was anything) until it gets > replayed properly. That's fine. Nothing was appening in the cluster for hours. So I'm not really expecting to loose anything. So I will try to delete the file... > Kinda weird that a file wouldn't have any blocks like that, would be > interesting to see the log of the region server that created it. Here are the logs where we can see the file creation: http://pastebin.com/HBc28zab Nothing weird in it I think. When I removed the file, the region server crashed and had to be restarted. Restart was not working: 2012-08-03 16:07:49,119 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: remote error telling master we are up org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.PleaseHoldException:
Need to fast-forward a scanner inside a coprocessor
I have a custom coprocessor that aggregates a selection of records from the table based various criteria. For efficiency, I would like to make it skip a bunch of records. For example, if I don't need any "" records and I encounter "", I would like to tell it to skip everything until "AAAB.." I don't see any methods of the InternalScanner class that would give me that ability. Do I need to close the current scanner and open a new one? Does that add significant overhead (which would reduce any gains achieved by skipping small numbers of records)? I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this functionality. --Tom
Re: HBaseTestingUtility on windows
I ran test from cygwin but it fails here. Could someone help me with how to go about fixing this issue? java.io.IOException: Expecting a line not the end of stream at org.apache.hadoop.fs.DF.parseExecResult(DF.java:117) at org.apache.hadoop.util.Shell.runCommand(Shell.java:237) at org.apache.hadoop.util.Shell.run(Shell.java:182) at org.apache.hadoop.fs.DF.getFilesystem(DF.java:63) at org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.addDirsToCheck(NameNodeResourceChecker.java:93) at org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.(NameNodeResourceChecker.java:73) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:354) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:333) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:271) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:465) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1251) at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:278) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:226) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:348) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:293) at com.intuit.cg.services.dp.analytics.hbase.SessionTimelineDAOTest.initCluster(SessionTimelineDAOTest.java:44) at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:61) at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:163) at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:112) at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:111) at org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103) at $Proxy0.invoke(Unknown Source) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:145) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:87) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69) On Fri, Aug 3, 2012 at 11:44 AM, Jerry Lam wrote: > Hi Mohit: > > You might need to install Cygwin if the tool has dependency on Linux > command like bash. > > Best Regards, > > Jerry > > On Friday, August 3, 2012, N Keywal wrote: > > > Hi Mohit, > > > > For simple cases, it works for me for hbase 0.94 at least. But I'm not > > sure it works for all features. I've never tried to run hbase unit > > tests on windows for example. > > > > N. > > > > On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia > > > wrote: > > > I am trying to run mini cluster using HBaseTestingUtility Class from > > hbase > > > tests on windows, but I get "bash command error". Is it not possible to > > run > > > this utility class on windows? > > > > > > I followed this example: > > > > > > > > > http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/ > > >
Re: HBaseTestingUtility on windows
https://issues.apache.org/jira/browse/HDFS-197 has a workaround (see last comment) On Fri, Aug 3, 2012 at 1:33 PM, Mohit Anchlia wrote: > I ran test from cygwin but it fails here. Could someone help me with how to > go about fixing this issue? > > java.io.IOException: Expecting a line not the end of stream > at org.apache.hadoop.fs.DF.parseExecResult(DF.java:117) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:237) > at org.apache.hadoop.util.Shell.run(Shell.java:182) > at org.apache.hadoop.fs.DF.getFilesystem(DF.java:63) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.addDirsToCheck(NameNodeResourceChecker.java:93) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.(NameNodeResourceChecker.java:73) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:354) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:333) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:271) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:465) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1251) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:278) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:226) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:348) > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:293) > at > com.intuit.cg.services.dp.analytics.hbase.SessionTimelineDAOTest.initCluster(SessionTimelineDAOTest.java:44) > at > org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:61) > at > org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:163) > at > org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:112) > at > org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:111) > at > org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103) > at $Proxy0.invoke(Unknown Source) > at > org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:145) > at > org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:87) > at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69) > > > On Fri, Aug 3, 2012 at 11:44 AM, Jerry Lam wrote: > >> Hi Mohit: >> >> You might need to install Cygwin if the tool has dependency on Linux >> command like bash. >> >> Best Regards, >> >> Jerry >> >> On Friday, August 3, 2012, N Keywal wrote: >> >> > Hi Mohit, >> > >> > For simple cases, it works for me for hbase 0.94 at least. But I'm not >> > sure it works for all features. I've never tried to run hbase unit >> > tests on windows for example. >> > >> > N. >> > >> > On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia > > >> > wrote: >> > > I am trying to run mini cluster using HBaseTestingUtility Class from >> > hbase >> > > tests on windows, but I get "bash command error". Is it not possible to >> > run >> > > this utility class on windows? >> > > >> > > I followed this example: >> > > >> > > >> > >> http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/ >> > >>
Re: Need to fast-forward a scanner inside a coprocessor
We recently added a new API for that: RegionScanner.reseek(...). See HBASE-5520. 0.94+ only, unfortunately. So it depends specifically on where you hook this up. If you do it at RegionObserver.postScannerOpen you can reseek forward at any time. -- Lars - Original Message - From: Tom Brown To: user@hbase.apache.org Cc: Sent: Friday, August 3, 2012 1:27 PM Subject: Need to fast-forward a scanner inside a coprocessor I have a custom coprocessor that aggregates a selection of records from the table based various criteria. For efficiency, I would like to make it skip a bunch of records. For example, if I don't need any "" records and I encounter "", I would like to tell it to skip everything until "AAAB.." I don't see any methods of the InternalScanner class that would give me that ability. Do I need to close the current scanner and open a new one? Does that add significant overhead (which would reduce any gains achieved by skipping small numbers of records)? I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this functionality. --Tom
Problems starting HBase
Hi guys, I've been trying to setup HBase for OpenTSDB for a few days now and am completely stuck. I've gotten .92 running on a virtual machine but I am completely unable to deploy it to a real machine. Firstly, I've been following this guide: http://opentsdb.net/setup-hbase.html Here's what I've tried: 1) 0.92, which gives me a null error as discussed in this git issue: https://github.com/stumbleupon/opentsdb.net/pull/5 2) Seeing this, I decided to try 0.94. This seems to solve the null issue but now whenever I try to create a table in hbase shell it hangs. Here's a log for situation #2: https://gist.github.com/3251817 Thanks in advance! -- View this message in context: http://old.nabble.com/Problems-starting-HBase-tp34252988p34252988.html Sent from the HBase User mailing list archive at Nabble.com.
Re: How to query by rowKey-infix
Good! Submitted initial patch of fuzzy row key filter at https://issues.apache.org/jira/browse/HBASE-6509. You can just copy the filter class and include it in your code and use it in your setup as any other custom filter (no need to patch HBase). Please let me know if you try it out (or post your comments at HBASE-6509). Alex Baranau -- Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Aug 3, 2012 at 5:23 AM, Christian Schäfer wrote: > Hi Alex, > > thanks a lot for the hint about setting the timestamp of the put. > I didn't know that this would be possible but that's solving the problem > (first test was successful). > So I'm really glad that I don't need to apply a filter to extract the time > and so on for every row. > > Nevertheless I would like to see your custom filter implementation. > Would be nice if you could provide it helping me to get a bit into it. > > And yes that helped :) > > regards > Chris > > > > Von: Alex Baranau > An: user@hbase.apache.org; Christian Schäfer > Gesendet: 0:57 Freitag, 3.August 2012 > Betreff: Re: How to query by rowKey-infix > > > Hi Christian! > If to put off secondary indexes and assume you are going with "heavy > scans", you can try two following things to make it much faster. If this is > appropriate to your situation, of course. > > 1. > > > Is there a more elegant way to collect rows within time range X? > > (Unfortunately, the date attribute is not equal to the timestamp that is > stored by hbase automatically.) > > Can you set timestamp of the Puts to the one you have in row key? Instead > of relying on the one that HBase puts automatically (current ts). If you > can, this will improve reading speed a lot by setting time range on > scanner. Depending on how you are writing your data of course, but I assume > that you mostly write data in "time-increasing" manner. > > > 2. > > If your userId has fixed length, or you can change it so that it has fixed > length, then you can actually use smth like "wildcard" in row key. There's > a way in Filter implementation to fast-forward to the record with specific > row key and by doing this skip many records. This might be used as follows: > * suppose your userId is 5 characters in length > * suppose you are scanning for records with time between 2012-08-01 > and 2012-08-08 > * when you scanning records and you face e.g. key > "a_2012-08-09_3jh345j345kjh", where "a" is user id, you can tell > the scanner from your filter to fast-forward to key "b_ 2012-08-01". > Because you know that all remained records of user "a" don't fall into > the interval you need (as the time for its records will be >= 2012-08-09). > > As of now, I believe you will have to implement your custom filter to do > that. > Pointer: org.apache.hadoop.hbase.filter.Filter.ReturnCode.SEEK_NEXT_USING_HINT > I believe I implemented similar thing some time ago. If this idea works > for you I could look for the implementation and share it if it helps. Or > may be even simply add it to HBase codebase. > > Hope this helps, > > > Alex Baranau > -- > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - > Solr > > > > On Thu, Aug 2, 2012 at 8:40 AM, Christian Schäfer > wrote: > > > > > >Excuse my double posting. > >Here is the complete mail: > > > > > > > >OK, > > > >at first I will try the scans. > > > >If that's too slow I will have to upgrade hbase (currently 0.90.4-cdh3u2) > to be able to use coprocessors. > > > > > >Currently I'm stuck at the scans because it requires two steps (therefore > maybe some kind of filter chaining is required) > > > > > >The key: userId-dateInMillis-sessionId > > > > > >At first I need to extract dateInMllis with regex or substring (using > special delimiters for date) > > > >Second, the extracted value must be parsed to Long and set to a RowFilter > Comparator like this: > > > >scan.setFilter(new RowFilter(CompareOp.GREATER_OR_EQUAL, new > BinaryComparator(Bytes.toBytes((Long)dateInMillis; > > > >How to chain that? > >Do I have to write a custom filter? > >(Would like to avoid that due to deployment) > > > >regards > >Chris > > > > > >- Ursprüngliche Message - > >Von: Michael Segel > >An: user@hbase.apache.org > >CC: > >Gesendet: 13:52 Mittwoch, 1.August 2012 > >Betreff: Re: How to query by rowKey-infix > > > >Actually w coprocessors you can create a secondary index in short order. > >Then your cost is going to be 2 fetches. Trying to do a partial table > scan will be more expensive. > > > >On Jul 31, 2012, at 12:41 PM, Matt Corgan wrote: > > > >> When deciding between a table scan vs secondary index, you should try to > >> estimate what percent of the underlying data blocks will be used in the > >> query. By default, each block is 64KB. > >> > >> If each user's data is small and you are fitting multiple users per > block, > >> then you're going to need all the blocks, so a tablescan is better > becau
Re: Need to fast-forward a scanner inside a coprocessor
So I understand I'll need to upgrade to 0.94 (which won't be a problem because the releases are binary-compatible). I see that the RegionScanner interface contains the new method "reseek(byte[] row)". I have a reference to a RegionScanner in my coprocessor because I'm using: getEnvironment().getRegion().getScanner(scan). What I don't understand is your conditional statement "it depends specifically on where you hook this up". I'm not doing anything with "postScannerOpen". Since I have an instance of a RegionScanner, should I expect "reseek" to work, as long as I'm seeking forward? Is the way I'm using it up compatible with how it should work? --Tom On Fri, Aug 3, 2012 at 3:05 PM, lars hofhansl wrote: > We recently added a new API for that: > RegionScanner.reseek(...). See HBASE-5520. 0.94+ only, unfortunately. > > So it depends specifically on where you hook this up. If you do it at > RegionObserver.postScannerOpen you can reseek forward at any time. > > > -- Lars > > > > - Original Message - > From: Tom Brown > To: user@hbase.apache.org > Cc: > Sent: Friday, August 3, 2012 1:27 PM > Subject: Need to fast-forward a scanner inside a coprocessor > > I have a custom coprocessor that aggregates a selection of records > from the table based various criteria. For efficiency, I would like to > make it skip a bunch of records. For example, if I don't need any > "" records and I encounter "", I would like to tell it to > skip everything until "AAAB.." > > I don't see any methods of the InternalScanner class that would give > me that ability. Do I need to close the current scanner and open a new > one? Does that add significant overhead (which would reduce any gains > achieved by skipping small numbers of records)? > > I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this > functionality. > > --Tom >
Re: Need to fast-forward a scanner inside a coprocessor
Oh... I just meant you need to have your hands on a RegionScanner :) As long as you only scan forward it should work. - Original Message - From: Tom Brown To: user@hbase.apache.org; lars hofhansl Cc: Sent: Friday, August 3, 2012 5:47 PM Subject: Re: Need to fast-forward a scanner inside a coprocessor So I understand I'll need to upgrade to 0.94 (which won't be a problem because the releases are binary-compatible). I see that the RegionScanner interface contains the new method "reseek(byte[] row)". I have a reference to a RegionScanner in my coprocessor because I'm using: getEnvironment().getRegion().getScanner(scan). What I don't understand is your conditional statement "it depends specifically on where you hook this up". I'm not doing anything with "postScannerOpen". Since I have an instance of a RegionScanner, should I expect "reseek" to work, as long as I'm seeking forward? Is the way I'm using it up compatible with how it should work? --Tom On Fri, Aug 3, 2012 at 3:05 PM, lars hofhansl wrote: > We recently added a new API for that: > RegionScanner.reseek(...). See HBASE-5520. 0.94+ only, unfortunately. > > So it depends specifically on where you hook this up. If you do it at > RegionObserver.postScannerOpen you can reseek forward at any time. > > > -- Lars > > > > - Original Message - > From: Tom Brown > To: user@hbase.apache.org > Cc: > Sent: Friday, August 3, 2012 1:27 PM > Subject: Need to fast-forward a scanner inside a coprocessor > > I have a custom coprocessor that aggregates a selection of records > from the table based various criteria. For efficiency, I would like to > make it skip a bunch of records. For example, if I don't need any > "" records and I encounter "", I would like to tell it to > skip everything until "AAAB.." > > I don't see any methods of the InternalScanner class that would give > me that ability. Do I need to close the current scanner and open a new > one? Does that add significant overhead (which would reduce any gains > achieved by skipping small numbers of records)? > > I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this > functionality. > > --Tom >
Re: adding data
Well, if the file that you have contains TSV, you can directly use the ImportTSV utility of HBase to do a bulk load. More details about that can be found here : http://hbase.apache.org/book/ops_mgt.html#importtsv The other option for you is to run a MR job on the file that you have, to generate the HFiles, which you can later import to HBase using completebulkload. HFiles are created using the HFileOutputFormat class.The output of Map should be Put or KeyValue. For Reduce you need to use configureIncrementalLoad which sets up reduce tasks. Bijeet On Sat, Aug 4, 2012 at 8:13 AM, Rita wrote: > I have a file which has 13 billion rows of key an value which I would like > to place in Hbase. I was wondering if anyone has a good example to provide > with mapreduce for some sort of work like this. > > > tia > > > -- > --- Get your facts first, then you can distort them as you please.-- >
Re: adding data
Hi Rita, HBase Bulk Loader is a viable solution for loading such huge data set. Even if your import file has a separator other than tab you can use ImportTsv as long as the separator is single character. If in case you want to put in your business logic while writing the data to HBase then you can write your own mapper class and use it with bulk loader. Hence, you can heavily customize the bulk loader as per your needs. These links might be helpful for you: http://hbase.apache.org/book.html#arch.bulk.load http://bigdatanoob.blogspot.com/2012/03/bulk-load-csv-file-into-hbase.html HTH, Anil Gupta On Fri, Aug 3, 2012 at 9:54 PM, Bijeet Singh wrote: > Well, if the file that you have contains TSV, you can directly use the > ImportTSV utility of HBase to do a bulk load. > More details about that can be found here : > > http://hbase.apache.org/book/ops_mgt.html#importtsv > > The other option for you is to run a MR job on the file that you have, to > generate the HFiles, which you can later import > to HBase using completebulkload. HFiles are created using the > HFileOutputFormat class.The output of Map should > be Put or KeyValue. For Reduce you need to use configureIncrementalLoad > which sets up reduce tasks. > > Bijeet > > > On Sat, Aug 4, 2012 at 8:13 AM, Rita wrote: > > > I have a file which has 13 billion rows of key an value which I would > like > > to place in Hbase. I was wondering if anyone has a good example to > provide > > with mapreduce for some sort of work like this. > > > > > > tia > > > > > > -- > > --- Get your facts first, then you can distort them as you please.-- > > > -- Thanks & Regards, Anil Gupta