Re: hbase can't start:KeeperErrorCode = NoNode for /hbase

2012-08-03 Thread abloz...@gmail.com
The problem resolved. it caused by the zookeeper data corrupted. so I
modified zookeeper data dir to another directory on hbase-site.xml and
restart hbase again.
  
hbase.zookeeper.property.dataDir
/home/zhouhh/myhadoop/zk
Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.




Thanks to every one.

andy

2012/8/2 abloz...@gmail.com 

> Thank you, Keywal and  Mohammad.
> I also think the data is corrupted, but the zookeeper is inner of Hbase, I
> don't know how to change the zookeeper data directory. I'll try this way.
> So if kill java process rudely, there may be corrupted of data. But
> sometimes, stop shell script will not work.
>
> Here is my hbase-site.xml
>
> 
> 
> hbase.rootdir
> hdfs://Hadoop48:54310/hbase1
> 
> 
> hbase.cluster.distributed
> true
> 
> 
> hbase.master.port
> 6
>   
> 
>   hbase.zookeeper.quorum
>   Hadoop48
>  
>  
> zookeeper.znode.parent
> /hbase1
>  
>
> 
>
> Thanks!
>
> Andy zhou
>
> 2012/8/2 N Keywal 
>
>> Hi,
>>
>> The issue is in ZooKeeper, not directly HBase. It seems its data is
>> corrupted, so it cannot start. You can configure zookeeper to another
>> data directory to make it start.
>>
>> N.
>>
>>
>> On Thu, Aug 2, 2012 at 11:11 AM, abloz...@gmail.com 
>> wrote:
>> > I even move /hbase to hbase2, and create a new dir /hbase1, modify
>> > hbase-site.xml to:
>> > 
>> > hbase.rootdir
>> > hdfs://Hadoop48:54310/hbase1
>> > 
>> >  
>> > zookeeper.znode.parent
>> > /hbase1
>> > 
>> >
>> > But the error message still  KeeperErrorCode = NoNode for /hbase
>> >
>> > Any body can give any help?
>> > Thanks!
>> >
>> > Andy zhou
>> >
>> > 2012/8/2 abloz...@gmail.com 
>> >
>> >> hi all,
>> >> After I killed all java process, I can't restart hbase, it reports:
>> >>
>> >> Hadoop46: starting zookeeper, logging to
>> >> /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop46.out
>> >> Hadoop47: starting zookeeper, logging to
>> >> /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop47.out
>> >> Hadoop48: starting zookeeper, logging to
>> >> /home/zhouhh/hbase-0.94.0/logs/hbase-zhouhh-zookeeper-Hadoop48.out
>> >> Hadoop46: java.lang.RuntimeException: Unable to run quorum server
>> >> Hadoop46:   at
>> >>
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
>> >> Hadoop46:   at
>> >>
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
>> >> Hadoop46:   at
>> >>
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
>> >> Hadoop46:   at
>> >>
>> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:74)
>> >> Hadoop46:   at
>> >> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:64)
>> >> Hadoop46: Caused by: java.io.IOException: Failed to process transaction
>> >> type: 1 error: KeeperErrorCode = NoNode for /hbase
>> >> Hadoop46:   at
>> >>
>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
>> >> Hadoop46:   at
>> >>
>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>> >> Hadoop46:   at
>> >>
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
>> >> Hadoop47: java.lang.RuntimeException: Unable to run quorum server
>> >> Hadoop47:   at
>> >>
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
>> >> Hadoop47:   at
>> >>
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
>> >> Hadoop47:   at
>> >>
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
>> >>  Hadoop47:   at
>> >>
>> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:74)
>> >> Hadoop47:   at
>> >> org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:64)
>> >> Hadoop47: Caused by: java.io.IOException: Failed to process transaction
>> >> type: 1 error: KeeperErrorCode = NoNode for /hbase
>> >> Hadoop47:   at
>> >>
>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
>> >> Hadoop47:   at
>> >>
>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
>> >> Hadoop47:   at
>> >>
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
>> >>
>> >> while Hadoop48 is HMaster.
>> >> but hdfs://xxx/hbase is existed.
>> >> [zhouhh@Hadoop47 ~]$ hadoop fs -ls /hbase
>> >> Found 113 items
>> >> drwxr-xr-x   - zhouhh supergroup  0 2012-07-03 19:24
>> /hbase/-ROOT-
>> >> drwxr-xr-x   - zhouhh supergroup  0 2012-07-03 19:24
>> /hbase/.META.
>> >> ...
>> >>
>> >> So what's the problem?
>> >> Thanks!
>> >>
>> >> andy
>> >>
>>
>
>


add_table.rb in -0.92.x

2012-08-03 Thread holger.lewin

I just checked out hbase-0.92.1 and noticed that /bin/add_table.rb has been
deleted. (CHANGES.txt: "HBASE-2460  add_table.rb deletes any tables for
which the target table name is a prefix"). I wonder if theres a replacement
or fixed version of it somewhere?

Thanks,
Holger
-- 
View this message in context: 
http://old.nabble.com/add_table.rb-in--0.92.x-tp34250060p34250060.html
Sent from the HBase User mailing list archive at Nabble.com.



Re: How to query by rowKey-infix

2012-08-03 Thread Christian Schäfer
Hi Alex,

thanks a lot for the hint about setting the timestamp of the put.
I didn't know that this would be possible but that's solving the problem (first 
test was successful).
So I'm really glad that I don't need to apply a filter to extract the time and 
so on for every row.

Nevertheless I would like to see your custom filter implementation.
Would be nice if you could provide it helping me to get a bit into it.

And yes that helped :)

regards
Chris



Von: Alex Baranau 
An: user@hbase.apache.org; Christian Schäfer  
Gesendet: 0:57 Freitag, 3.August 2012
Betreff: Re: How to query by rowKey-infix


Hi Christian!
If to put off secondary indexes and assume you are going with "heavy scans", 
you can try two following things to make it much faster. If this is appropriate 
to your situation, of course.

1.

> Is there a more elegant way to collect rows within time range X?
> (Unfortunately, the date attribute is not equal to the timestamp that is 
> stored by hbase automatically.)

Can you set timestamp of the Puts to the one you have in row key? Instead of 
relying on the one that HBase puts automatically (current ts). If you can, this 
will improve reading speed a lot by setting time range on scanner. Depending on 
how you are writing your data of course, but I assume that you mostly write 
data in "time-increasing" manner.


2.

If your userId has fixed length, or you can change it so that it has fixed 
length, then you can actually use smth like "wildcard"  in row key. There's a 
way in Filter implementation to fast-forward to the record with specific row 
key and by doing this skip many records. This might be used as follows:
* suppose your userId is 5 characters in length
* suppose you are scanning for records with time between 2012-08-01 
and 2012-08-08
* when you scanning records and you face e.g. key 
"a_2012-08-09_3jh345j345kjh", where "a" is user id, you can tell the 
scanner from your filter to fast-forward to key "b_ 2012-08-01". Because 
you know that all remained records of user "a" don't fall into the interval 
you need (as the time for its records will be >= 2012-08-09).

As of now, I believe you will have to implement your custom filter to do that. 
Pointer: org.apache.hadoop.hbase.filter.Filter.ReturnCode.SEEK_NEXT_USING_HINT
I believe I implemented similar thing some time ago. If this idea works for you 
I could look for the implementation and share it if it helps. Or may be even 
simply add it to HBase codebase.

Hope this helps,


Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr



On Thu, Aug 2, 2012 at 8:40 AM, Christian Schäfer  wrote:


>
>Excuse my double posting.
>Here is the complete mail:
>
>
>
>OK,
>
>at first I will try the scans.
>
>If that's too slow I will have to upgrade hbase (currently 0.90.4-cdh3u2) to 
>be able to use coprocessors.
>
>
>Currently I'm stuck at the scans because it requires two steps (therefore 
>maybe some kind of filter chaining is required)
>
>
>The key:  userId-dateInMillis-sessionId
>
>
>At first I need to extract dateInMllis with regex or substring (using special 
>delimiters for date)
>
>Second, the extracted value must be parsed to Long and set to a RowFilter 
>Comparator like this:
>
>scan.setFilter(new RowFilter(CompareOp.GREATER_OR_EQUAL, new 
>BinaryComparator(Bytes.toBytes((Long)dateInMillis;
>
>How to chain that?
>Do I have to write a custom filter?
>(Would like to avoid that due to deployment)
>
>regards
>Chris
>
>
>- Ursprüngliche Message -
>Von: Michael Segel 
>An: user@hbase.apache.org
>CC:
>Gesendet: 13:52 Mittwoch, 1.August 2012
>Betreff: Re: How to query by rowKey-infix
>
>Actually w coprocessors you can create a secondary index in short order.
>Then your cost is going to be 2 fetches. Trying to do a partial table scan 
>will be more expensive.
>
>On Jul 31, 2012, at 12:41 PM, Matt Corgan  wrote:
>
>> When deciding between a table scan vs secondary index, you should try to
>> estimate what percent of the underlying data blocks will be used in the
>> query.  By default, each block is 64KB.
>>
>> If each user's data is small and you are fitting multiple users per block,
>> then you're going to need all the blocks, so a tablescan is better because
>> it's simpler.  If each user has 1MB+ data then you will want to pick out
>> the individual blocks relevant to each date.  The secondary index will help
>> you go directly to those sparse blocks, but with a cost in complexity,
>> consistency, and extra denormalized data that knocks primary data out of
>> your block cache.
>>
>> If latency is not a concern, I would start with the table scan.  If that's
>> too slow you add the secondary index, and if you still need it faster you
>> do the primary key lookups in parallel as Jerry mentions.
>>
>> Matt
>>
>> On Tue, Jul 31, 2012 at 10:10 AM, Jerry Lam  wrote:
>>
>>> Hi Chris:
>>>
>>> I'm thinking about building a secondary index for prim

Re: How to query by rowKey-infix

2012-08-03 Thread Christian Schäfer
Hi Matt,

sure I got this in mind as an last option (at least on a limited subset of 
data).

Due to our estimation of some billions rows a week a selective filtering needs 
to take place at the server side.

But I agree that one could do fine filtering stuff on the client side on a 
handy data subset to avoid getting the hbase schema & indexing (by 
coprocessors) too complicated.

regards
Chris



- Ursprüngliche Message -
Von: Matt Corgan 
An: user@hbase.apache.org
CC: 
Gesendet: 3:29 Freitag, 3.August 2012
Betreff: Re: How to query by rowKey-infix

Yeah - just thought i'd point it out since people often have small tables
in their cluster alongside the big ones, and when generating reports,
sometimes you don't care if it finishes in 10 minutes vs an hour.


On Thu, Aug 2, 2012 at 6:15 PM, Alex Baranau wrote:

> I think this is exactly what Christian is trying to (and should be trying
> to) avoid ;).
>
> I can't imagine use-case when you need to filter something and you can do
> it with (at least) server-side filter, and yet in this situation you want
> to try to do it on the client-side... Doing filtering on client-side when
> you can do it on server-side just feels wrong. Esp. given that there's a
> lot of data in HBase (otherwise why would you use it).
>
> Alex Baranau
> --
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
> On Thu, Aug 2, 2012 at 7:09 PM, Matt Corgan  wrote:
>
> > Also Christian, don't forget you can read all the rows back to the client
> > and do the filtering there using whatever logic you like.  HBase Filters
> > can be thought of as an optimization (predicate push-down) over
> client-side
> > filtering.  Pulling all the rows over the network will be slower, but I
> > don't think we know enough about your data or speed requirements to rule
> it
> > out.
> >
> >
> > On Thu, Aug 2, 2012 at 3:57 PM, Alex Baranau  > >wrote:
> >
> > > Hi Christian!
> > >
> > > If to put off secondary indexes and assume you are going with "heavy
> > > scans", you can try two following things to make it much faster. If
> this
> > is
> > > appropriate to your situation, of course.
> > >
> > > 1.
> > >
> > > > Is there a more elegant way to collect rows within time range X?
> > > > (Unfortunately, the date attribute is not equal to the timestamp that
> > is
> > > stored by hbase automatically.)
> > >
> > > Can you set timestamp of the Puts to the one you have in row key?
> Instead
> > > of relying on the one that HBase puts automatically (current ts). If
> you
> > > can, this will improve reading speed a lot by setting time range on
> > > scanner. Depending on how you are writing your data of course, but I
> > assume
> > > that you mostly write data in "time-increasing" manner.
> > >
> > > 2.
> > >
> > > If your userId has fixed length, or you can change it so that it has
> > fixed
> > > length, then you can actually use smth like "wildcard"  in row key.
> > There's
> > > a way in Filter implementation to fast-forward to the record with
> > specific
> > > row key and by doing this skip many records. This might be used as
> > follows:
> > > * suppose your userId is 5 characters in length
> > > * suppose you are scanning for records with time between 2012-08-01
> > > and 2012-08-08
> > > * when you scanning records and you face e.g. key
> > > "a_2012-08-09_3jh345j345kjh", where "a" is user id, you can
> tell
> > > the scanner from your filter to fast-forward to key "b_
> 2012-08-01".
> > > Because you know that all remained records of user "a" don't fall
> > into
> > > the interval you need (as the time for its records will be >=
> > 2012-08-09).
> > >
> > > As of now, I believe you will have to implement your custom filter to
> do
> > > that.
> > > Pointer:
> > > org.apache.hadoop.hbase.filter.Filter.ReturnCode.SEEK_NEXT_USING_HINT
> > > I believe I implemented similar thing some time ago. If this idea works
> > for
> > > you I could look for the implementation and share it if it helps. Or
> may
> > be
> > > even simply add it to HBase codebase.
> > >
> > > Hope this helps,
> > >
> > > Alex Baranau
> > > --
> > > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase -
> ElasticSearch
> > -
> > > Solr
> > >
> > >
> > > On Thu, Aug 2, 2012 at 8:40 AM, Christian Schäfer <
> syrious3...@yahoo.de
> > > >wrote:
> > >
> > > >
> > > >
> > > > Excuse my double posting.
> > > > Here is the complete mail:
> > > >
> > > >
> > > > OK,
> > > >
> > > > at first I will try the scans.
> > > >
> > > > If that's too slow I will have to upgrade hbase (currently
> > 0.90.4-cdh3u2)
> > > > to be able to use coprocessors.
> > > >
> > > >
> > > > Currently I'm stuck at the scans because it requires two steps
> > (therefore
> > > > maybe some kind of filter chaining is required)
> > > >
> > > >
> > > > The key:  userId-dateInMillis-sessionId
> > > >
> > > > At first I need to extract dateInMllis with regex or substring (using
> > > > special delimiters for date)
> 

Re: How to query by rowKey-infix

2012-08-03 Thread Michael Segel
Hi, 

What does your schema look like? 

Would it make sense to changing the key to user_id '|' timestamp and then use 
the session_id in the column name? 



On Aug 2, 2012, at 7:23 AM, Christian Schäfer  wrote:

> OK,
> 
> at first I will try the scans.
> 
> If that's too slow I will have to upgrade hbase (currently 0.90.4-cdh3u2) to 
> be able to use coprocessors.
> 
> Currently I'm stuck at the scans because it requires two steps (therefore 
> some kind of filter chaining)
> 
> The key:  userId-dateInMllis-sessionId
> 
> At first I need to extract dateInMllis with regex or substring (using special 
> delimiters for date)
> 
> Second, the extracted value must be parsed to Long and set to a RowFilter 
> Comparator like this:
> 
> 
> 
> 
> 
> - Ursprüngliche Message -
> Von: Michael Segel 
> An: user@hbase.apache.org
> CC: 
> Gesendet: 13:52 Mittwoch, 1.August 2012
> Betreff: Re: How to query by rowKey-infix
> 
> Actually w coprocessors you can create a secondary index in short order. 
> Then your cost is going to be 2 fetches. Trying to do a partial table scan 
> will be more expensive. 
> 
> On Jul 31, 2012, at 12:41 PM, Matt Corgan  wrote:
> 
>> When deciding between a table scan vs secondary index, you should try to
>> estimate what percent of the underlying data blocks will be used in the
>> query.  By default, each block is 64KB.
>> 
>> If each user's data is small and you are fitting multiple users per block,
>> then you're going to need all the blocks, so a tablescan is better because
>> it's simpler.  If each user has 1MB+ data then you will want to pick out
>> the individual blocks relevant to each date.  The secondary index will help
>> you go directly to those sparse blocks, but with a cost in complexity,
>> consistency, and extra denormalized data that knocks primary data out of
>> your block cache.
>> 
>> If latency is not a concern, I would start with the table scan.  If that's
>> too slow you add the secondary index, and if you still need it faster you
>> do the primary key lookups in parallel as Jerry mentions.
>> 
>> Matt
>> 
>> On Tue, Jul 31, 2012 at 10:10 AM, Jerry Lam  wrote:
>> 
>>> Hi Chris:
>>> 
>>> I'm thinking about building a secondary index for primary key lookup, then
>>> query using the primary keys in parallel.
>>> 
>>> I'm interested to see if there is other option too.
>>> 
>>> Best Regards,
>>> 
>>> Jerry
>>> 
>>> On Tue, Jul 31, 2012 at 11:27 AM, Christian Schäfer >>> wrote:
>>> 
 Hello there,
 
 I designed a row key for queries that need best performance (~100 ms)
 which looks like this:
 
 userId-date-sessionId
 
 These queries(scans) are always based on a userId and sometimes
 additionally on a date, too.
 That's no problem with the key above.
 
 However, another kind of queries shall be based on a given time range
 whereas the outermost left userId is not given or known.
 In this case I need to get all rows covering the given time range with
 their date to create a daily reporting.
 
 As I can't set wildcards at the beginning of a left-based index for the
 scan,
 I only see the possibility to scan the index of the whole table to
>>> collect
 the
 rowKeys that are inside the timerange I'm interested in.
 
 Is there a more elegant way to collect rows within time range X?
 (Unfortunately, the date attribute is not equal to the timestamp that is
 stored by hbase automatically.)
 
 Could/should one maybe leverage some kind of row key caching to
>>> accelerate
 the collection process?
 Is that covered by the block cache?
 
 Thanks in advance for any advice.
 
 regards
 Chris
 
>>> 
> 



Never ending distributed log split

2012-08-03 Thread Jean-Marc Spaggiari
Hi,

I'm using HBase 0.94.0.

I stopped the cluster for some maintenance, and I'm have some troubles
to restart it.

I'm getting one line every about

Start Time  Description State   Status
Fri Aug 03 08:59:54 EDT 2012Doing distributed log split in
[hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
RUNNING (since 3sec ago)Waiting for distributed tasks to finish.
scheduled=1 done=0 error=0 (since 0sec ago)

If I let it run, it will run like that for hours. Adding lines and
lines and lines until I stop it.


On the master logs, I can see that:
2012-08-03 09:02:49,788 INFO
org.apache.hadoop.hbase.master.SplitLogManager: task
/hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
entered state err node4,60020,1343998592129
2012-08-03 09:02:49,788 WARN
org.apache.hadoop.hbase.master.SplitLogManager: Error splitting
/hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
2012-08-03 09:02:49,788 WARN
org.apache.hadoop.hbase.master.SplitLogManager: error while splitting
logs in [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
installed = 1 but only 0 done
2012-08-03 09:02:49,788 WARN
org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting of
[latitude,60020,1343908057839, latitude,60020,1343998595290,
node1,60020,1343908057567, node1,60020,1343939284240,
node1,60020,1343998593757, node2,60020,1343908059614,
node2,60020,1343939286369, node2,60020,1343998595830,
node3,60020,1343908054414, node3,60020,1343939282294,
node3,60020,1343998590612, node4,60020,1343908056186,
node4,60020,1343939282889, node4,60020,1343998592129,
node5,60020,1343908059158, node5,60020,1343998594856,
phenom,60020,1343908053256, phenom,60020,1343939281065,
phenom,60020,1343998580375]
java.io.IOException: error or interrupt while splitting logs in
[hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
hdfs://node3:9000/hbase/

Re: Never ending distributed log split

2012-08-03 Thread Jean-Marc Spaggiari
Here us the complete log. And seems it's every 30 seconds and not
every 20 seconds...

http://pastebin.com/gMiURnnj

2012/8/3, Jean-Marc Spaggiari :
> Hi,
>
> I'm using HBase 0.94.0.
>
> I stopped the cluster for some maintenance, and I'm have some troubles
> to restart it.
>
> I'm getting one line every about
>
> Start TimeDescription State   Status
> Fri Aug 03 08:59:54 EDT 2012  Doing distributed log split in
> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
>   RUNNING (since 3sec ago)Waiting for distributed tasks to finish.
> scheduled=1 done=0 error=0 (since 0sec ago)
>
> If I let it run, it will run like that for hours. Adding lines and
> lines and lines until I stop it.
>
>
> On the master logs, I can see that:
> 2012-08-03 09:02:49,788 INFO
> org.apache.hadoop.hbase.master.SplitLogManager: task
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
> entered state err node4,60020,1343998592129
> 2012-08-03 09:02:49,788 WARN
> org.apache.hadoop.hbase.master.SplitLogManager: Error splitting
> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
> 2012-08-03 09:02:49,788 WARN
> org.apache.hadoop.hbase.master.SplitLogManager: error while splitting
> logs in
> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
> installed = 1 but only 0 done
> 2012-08-03 09:02:49,788 WARN
> org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting of
> [latitude,60020,1343908057839, latitude,60020,1343998595290,
> node1,60020,1343908057567, node1,60020,1343939284240,
> node1,60020,1343998593757, node2,60020,1343908059614,
> node2,60020,1343939286369, node2,60020,1343998595830,
> node3,60020,1343908054414, node3,60020,1343939282294,
> node3,60020,1343998590612, node4,60020,1343908056186,
> node4,60020,1343939282889, node4,60020,1343998592129,
> node5,60020,1343908059158, node5,60020,1343998594856,
> phenom,60020,1343908053256, phenom,60020,1343939281065,
> phenom,60020,1343998580375]
> java.io.IOException: error or interrupt while splitting logs in
> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
> hdfs://node3:9000/hbase/.logs/node1,60020,13439392842

Re: Never ending distributed log split

2012-08-03 Thread Jean-Marc Spaggiari
Me again ;)

I did some more investigation.

And I found that:

http://pastebin.com/Bedm6Ldy

Seems that no region is serving my logs. That's strange because all my
servers are up and fsck is telling me that FS is clean.

Can I just delete those files? What's the impact of such delete? I
don't really worrie about loosing some data. It's a test environment.
But I really need it to start again.

Thanks,

JM

2012/8/3, Jean-Marc Spaggiari :
> Here us the complete log. And seems it's every 30 seconds and not
> every 20 seconds...
>
> http://pastebin.com/gMiURnnj
>
> 2012/8/3, Jean-Marc Spaggiari :
>> Hi,
>>
>> I'm using HBase 0.94.0.
>>
>> I stopped the cluster for some maintenance, and I'm have some troubles
>> to restart it.
>>
>> I'm getting one line every about
>>
>> Start Time   Description State   Status
>> Fri Aug 03 08:59:54 EDT 2012 Doing distributed log split in
>> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
>> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
>> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
>> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
>>  RUNNING (since 3sec ago)Waiting for distributed tasks to finish.
>> scheduled=1 done=0 error=0 (since 0sec ago)
>>
>> If I let it run, it will run like that for hours. Adding lines and
>> lines and lines until I stop it.
>>
>>
>> On the master logs, I can see that:
>> 2012-08-03 09:02:49,788 INFO
>> org.apache.hadoop.hbase.master.SplitLogManager: task
>> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
>> entered state err node4,60020,1343998592129
>> 2012-08-03 09:02:49,788 WARN
>> org.apache.hadoop.hbase.master.SplitLogManager: Error splitting
>> /hbase/splitlog/hdfs%3A%2F%2Fnode3%3A9000%2Fhbase%2F.logs%2Fnode1%2C60020%2C1343908057567-splitting%2Fnode1%252C60020%252C1343908057567.1343914548297
>> 2012-08-03 09:02:49,788 WARN
>> org.apache.hadoop.hbase.master.SplitLogManager: error while splitting
>> logs in
>> [hdfs://node3:9000/hbase/.logs/latitude,60020,1343908057839-splitting,
>> hdfs://node3:9000/hbase/.logs/latitude,60020,1343998595290-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343908057567-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343939284240-splitting,
>> hdfs://node3:9000/hbase/.logs/node1,60020,1343998593757-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343908059614-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343939286369-splitting,
>> hdfs://node3:9000/hbase/.logs/node2,60020,1343998595830-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343908054414-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343939282294-splitting,
>> hdfs://node3:9000/hbase/.logs/node3,60020,1343998590612-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343908056186-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343939282889-splitting,
>> hdfs://node3:9000/hbase/.logs/node4,60020,1343998592129-splitting,
>> hdfs://node3:9000/hbase/.logs/node5,60020,1343908059158-splitting,
>> hdfs://node3:9000/hbase/.logs/node5,60020,1343998594856-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343908053256-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343939281065-splitting,
>> hdfs://node3:9000/hbase/.logs/phenom,60020,1343998580375-splitting]
>> installed = 1 but only 0 done
>> 2012-08-03 09:02:49,788 WARN
>> org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting of
>> [latitude,60020,1343908057839, latitude,60020,1343998595290,
>> node1,60020,1343908057567, node1,60020,1343939284240,
>> node1,60020,1343998593757, node2,60020,1343908059614,
>> node2,60020,1343939286369, node2,60020,1343998595830,
>> node3,60020,1343908054414, node3,60020,1343939282294,
>> node3,60020,1343998590612, node4,60020,1343908056186,
>>

Re: add_table.rb in -0.92.x

2012-08-03 Thread Jean-Daniel Cryans
hbck should be able to take care of it now.

J-D

On Fri, Aug 3, 2012 at 2:21 AM, holger.lewin  wrote:
>
> I just checked out hbase-0.92.1 and noticed that /bin/add_table.rb has been
> deleted. (CHANGES.txt: "HBASE-2460  add_table.rb deletes any tables for
> which the target table name is a prefix"). I wonder if theres a replacement
> or fixed version of it somewhere?
>
> Thanks,
> Holger
> --
> View this message in context: 
> http://old.nabble.com/add_table.rb-in--0.92.x-tp34250060p34250060.html
> Sent from the HBase User mailing list archive at Nabble.com.
>


Re: Never ending distributed log split

2012-08-03 Thread Jean-Daniel Cryans
On Fri, Aug 3, 2012 at 8:15 AM, Jean-Marc Spaggiari
 wrote:
> Me again ;)
>
> I did some more investigation.

It would really help to see the region server log although the fsck
output might be enough.

BTW you'll find 0.94.1 RC1 here:
http://people.apache.org/~larsh/hbase-0.94.1-rc1/

>
> And I found that:
>
> http://pastebin.com/Bedm6Ldy
>
> Seems that no region is serving my logs. That's strange because all my
> servers are up and fsck is telling me that FS is clean.

I don't get the "Seems that no region is serving my logs" part. A
region doesn't serve logs, it serves HFiles. You meant to say
DataNode?

>
> Can I just delete those files? What's the impact of such delete? I
> don't really worrie about loosing some data. It's a test environment.
> But I really need it to start again.

I wonder if it's related to: https://issues.apache.org/jira/browse/HBASE-6401

Did you remove a datanode from the cluster as part of the maintenance?

If you want you can probably move that folder aside but whatever was
in those logs is lost (if there ever was anything) until it gets
replayed properly.

Kinda weird that a file wouldn't have any blocks like that, would be
interesting to see the log of the region server that created it.

J-D


Re: HBaseTestingUtility on windows

2012-08-03 Thread Jerry Lam
Hi Mohit:

You might need to install Cygwin if the tool has dependency on Linux
command like bash.

Best Regards,

Jerry

On Friday, August 3, 2012, N Keywal wrote:

> Hi Mohit,
>
> For simple cases, it works for me for hbase 0.94 at least. But I'm not
> sure it works for all features. I've never tried to run hbase unit
> tests on windows for example.
>
> N.
>
> On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia 
> >
> wrote:
> > I am trying to run mini cluster using HBaseTestingUtility Class from
> hbase
> > tests on windows, but I get "bash command error". Is it not possible to
> run
> > this utility class on windows?
> >
> > I followed this example:
> >
> >
> http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/
>


Re: Never ending distributed log split

2012-08-03 Thread Jean-Marc Spaggiari
2012/8/3, Jean-Daniel Cryans :
> On Fri, Aug 3, 2012 at 8:15 AM, Jean-Marc Spaggiari
>  wrote:
>> Me again ;)
>>
>> I did some more investigation.
>
> It would really help to see the region server log although the fsck
> output might be enough.

I looked under evey directory and only one is containing a file.

http://pastebin.com/8Fea2EnA

It seems to be related to node1. On this server, seems that everything
is started correctly:
hadoop@node1:~$ /usr/local/jdk1.7.0_05/bin/jps
2211 DataNode
2938 Jps
2136 TaskTracker

hbase@node1:~$ /usr/local/jdk1.7.0_05/bin/jps
2419 HRegionServer
3708 Jps

On the Node1 region server logs, I can see the same information, which
is, the file is not hosted anywhere.

2012-08-03 15:01:31,216 WARN org.apache.hadoop.hdfs.DFSClient: DFS
Read: java.io.IOException: Could not obtain block:
blk_4965382127800577452_15852
file=/hbase/.logs/node1,60020,1343908057567-splitting/node1%2C60020%2C1343908057567.1343914548297
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:2266)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:2060)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2221)
at java.io.DataInputStream.read(DataInputStream.java:149)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
at 
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1486)
at 
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1475)
at 
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1470)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:55)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:688)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:850)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(HLogSplitter.java:763)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:384)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFileToTemp(HLogSplitter.java:351)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:113)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:266)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:197)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:165)
at java.lang.Thread.run(Thread.java:722)

> BTW you'll find 0.94.1 RC1 here:
> http://people.apache.org/~larsh/hbase-0.94.1-rc1/

Super, thanks! I will most probably try it instead of the 0.94.0


>> And I found that:
>>
>> http://pastebin.com/Bedm6Ldy
>>
>> Seems that no region is serving my logs. That's strange because all my
>> servers are up and fsck is telling me that FS is clean.
>
> I don't get the "Seems that no region is serving my logs" part. A
> region doesn't serve logs, it serves HFiles. You meant to say
> DataNode?

I was talking about the files under /hbase/.logs . Base on the
directory name I thought it was some logs. What ever this file is
supposed to be for, it seems it's not served by any datanode.


>> Can I just delete those files? What's the impact of such delete? I
>> don't really worrie about loosing some data. It's a test environment.
>> But I really need it to start again.
>
> I wonder if it's related to:
> https://issues.apache.org/jira/browse/HBASE-6401
>
> Did you remove a datanode from the cluster as part of the maintenance?

It might be related to this Jira. You, I stopped all the datanodes for
the maintenance (Had to work on the power suply...). I had to do that
promptly so I "just" stopped everything with init 0.

>
> If you want you can probably move that folder aside but whatever was
> in those logs is lost (if there ever was anything) until it gets
> replayed properly.

That's fine. Nothing was appening in the cluster for hours. So I'm not
really expecting to loose anything. So I will try to delete the
file...


> Kinda weird that a file wouldn't have any blocks like that, would be
> interesting to see the log of the region server that created it.
Here are the logs where we can see the file creation:
http://pastebin.com/HBc28zab Nothing weird in it I think.

When I removed the file, the region server crashed and had to be restarted.

Restart was not working:
2012-08-03 16:07:49,119 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: remote error
telling master we are up
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hbase.PleaseHoldException:

Need to fast-forward a scanner inside a coprocessor

2012-08-03 Thread Tom Brown
I have a custom coprocessor that aggregates a selection of records
from the table based various criteria. For efficiency, I would like to
make it skip a bunch of records. For example, if I don't need any
"" records and I encounter "", I would like to tell it to
skip everything until "AAAB.."

I don't see any methods of the InternalScanner class that would give
me that ability. Do I need to close the current scanner and open a new
one? Does that add significant overhead (which would reduce any gains
achieved by skipping small numbers of records)?

I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this
functionality.

--Tom


Re: HBaseTestingUtility on windows

2012-08-03 Thread Mohit Anchlia
I ran test from cygwin but it fails here. Could someone help me with how to
go about fixing this issue?

java.io.IOException: Expecting a line not the end of stream
at org.apache.hadoop.fs.DF.parseExecResult(DF.java:117)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:237)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.fs.DF.getFilesystem(DF.java:63)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.addDirsToCheck(NameNodeResourceChecker.java:93)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.(NameNodeResourceChecker.java:73)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:354)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:333)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:465)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1251)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:278)
at
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:226)
at
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:348)
at
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:293)
at
com.intuit.cg.services.dp.analytics.hbase.SessionTimelineDAOTest.initCluster(SessionTimelineDAOTest.java:44)
at
org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:61)
at
org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:163)
at
org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:112)
at
org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:111)
at
org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
at $Proxy0.invoke(Unknown Source)
at
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:145)
at
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:87)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)


On Fri, Aug 3, 2012 at 11:44 AM, Jerry Lam  wrote:

> Hi Mohit:
>
> You might need to install Cygwin if the tool has dependency on Linux
> command like bash.
>
> Best Regards,
>
> Jerry
>
> On Friday, August 3, 2012, N Keywal wrote:
>
> > Hi Mohit,
> >
> > For simple cases, it works for me for hbase 0.94 at least. But I'm not
> > sure it works for all features. I've never tried to run hbase unit
> > tests on windows for example.
> >
> > N.
> >
> > On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia  >
>  > wrote:
> > > I am trying to run mini cluster using HBaseTestingUtility Class from
> > hbase
> > > tests on windows, but I get "bash command error". Is it not possible to
> > run
> > > this utility class on windows?
> > >
> > > I followed this example:
> > >
> > >
> >
> http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/
> >
>


Re: HBaseTestingUtility on windows

2012-08-03 Thread Shrijeet Paliwal
https://issues.apache.org/jira/browse/HDFS-197 has a workaround (see
last comment)

On Fri, Aug 3, 2012 at 1:33 PM, Mohit Anchlia  wrote:
> I ran test from cygwin but it fails here. Could someone help me with how to
> go about fixing this issue?
>
> java.io.IOException: Expecting a line not the end of stream
> at org.apache.hadoop.fs.DF.parseExecResult(DF.java:117)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:237)
> at org.apache.hadoop.util.Shell.run(Shell.java:182)
> at org.apache.hadoop.fs.DF.getFilesystem(DF.java:63)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.addDirsToCheck(NameNodeResourceChecker.java:93)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.(NameNodeResourceChecker.java:73)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:354)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:333)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:271)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:465)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1251)
> at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:278)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:226)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:348)
> at
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:293)
> at
> com.intuit.cg.services.dp.analytics.hbase.SessionTimelineDAOTest.initCluster(SessionTimelineDAOTest.java:44)
> at
> org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:61)
> at
> org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:163)
> at
> org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:112)
> at
> org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:111)
> at
> org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
> at $Proxy0.invoke(Unknown Source)
> at
> org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:145)
> at
> org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:87)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
>
>
> On Fri, Aug 3, 2012 at 11:44 AM, Jerry Lam  wrote:
>
>> Hi Mohit:
>>
>> You might need to install Cygwin if the tool has dependency on Linux
>> command like bash.
>>
>> Best Regards,
>>
>> Jerry
>>
>> On Friday, August 3, 2012, N Keywal wrote:
>>
>> > Hi Mohit,
>> >
>> > For simple cases, it works for me for hbase 0.94 at least. But I'm not
>> > sure it works for all features. I've never tried to run hbase unit
>> > tests on windows for example.
>> >
>> > N.
>> >
>> > On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia > >
>>  > wrote:
>> > > I am trying to run mini cluster using HBaseTestingUtility Class from
>> > hbase
>> > > tests on windows, but I get "bash command error". Is it not possible to
>> > run
>> > > this utility class on windows?
>> > >
>> > > I followed this example:
>> > >
>> > >
>> >
>> http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/
>> >
>>


Re: Need to fast-forward a scanner inside a coprocessor

2012-08-03 Thread lars hofhansl
We recently added a new API for that:
RegionScanner.reseek(...). See HBASE-5520. 0.94+ only, unfortunately.

So it depends specifically on where you hook this up. If you do it at 
RegionObserver.postScannerOpen you can reseek forward at any time.


-- Lars



- Original Message -
From: Tom Brown 
To: user@hbase.apache.org
Cc: 
Sent: Friday, August 3, 2012 1:27 PM
Subject: Need to fast-forward a scanner inside a coprocessor

I have a custom coprocessor that aggregates a selection of records
from the table based various criteria. For efficiency, I would like to
make it skip a bunch of records. For example, if I don't need any
"" records and I encounter "", I would like to tell it to
skip everything until "AAAB.."

I don't see any methods of the InternalScanner class that would give
me that ability. Do I need to close the current scanner and open a new
one? Does that add significant overhead (which would reduce any gains
achieved by skipping small numbers of records)?

I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this
functionality.

--Tom



Problems starting HBase

2012-08-03 Thread sk101

Hi guys, I've been trying to setup HBase for OpenTSDB for a few days now and
am completely stuck. I've gotten .92 running on a virtual machine but I am
completely unable to deploy it to a real machine.

Firstly, I've been following this guide:
http://opentsdb.net/setup-hbase.html

Here's what I've tried:
1) 0.92, which gives me a null error as discussed in this git issue:
https://github.com/stumbleupon/opentsdb.net/pull/5
2) Seeing this, I decided to try 0.94. This seems to solve the null issue
but now whenever I try to create a table in hbase shell it hangs.

Here's a log for situation #2:

https://gist.github.com/3251817

Thanks in advance!
-- 
View this message in context: 
http://old.nabble.com/Problems-starting-HBase-tp34252988p34252988.html
Sent from the HBase User mailing list archive at Nabble.com.



Re: How to query by rowKey-infix

2012-08-03 Thread Alex Baranau
Good!

Submitted initial patch of fuzzy row key filter at
https://issues.apache.org/jira/browse/HBASE-6509. You can just copy the
filter class and include it in your code and use it in your setup as any
other custom filter (no need to patch HBase).

Please let me know if you try it out (or post your comments at HBASE-6509).

Alex Baranau
--
Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr

On Fri, Aug 3, 2012 at 5:23 AM, Christian Schäfer wrote:

> Hi Alex,
>
> thanks a lot for the hint about setting the timestamp of the put.
> I didn't know that this would be possible but that's solving the problem
> (first test was successful).
> So I'm really glad that I don't need to apply a filter to extract the time
> and so on for every row.
>
> Nevertheless I would like to see your custom filter implementation.
> Would be nice if you could provide it helping me to get a bit into it.
>
> And yes that helped :)
>
> regards
> Chris
>
>
> 
> Von: Alex Baranau 
> An: user@hbase.apache.org; Christian Schäfer 
> Gesendet: 0:57 Freitag, 3.August 2012
> Betreff: Re: How to query by rowKey-infix
>
>
> Hi Christian!
> If to put off secondary indexes and assume you are going with "heavy
> scans", you can try two following things to make it much faster. If this is
> appropriate to your situation, of course.
>
> 1.
>
> > Is there a more elegant way to collect rows within time range X?
> > (Unfortunately, the date attribute is not equal to the timestamp that is
> stored by hbase automatically.)
>
> Can you set timestamp of the Puts to the one you have in row key? Instead
> of relying on the one that HBase puts automatically (current ts). If you
> can, this will improve reading speed a lot by setting time range on
> scanner. Depending on how you are writing your data of course, but I assume
> that you mostly write data in "time-increasing" manner.
>
>
> 2.
>
> If your userId has fixed length, or you can change it so that it has fixed
> length, then you can actually use smth like "wildcard"  in row key. There's
> a way in Filter implementation to fast-forward to the record with specific
> row key and by doing this skip many records. This might be used as follows:
> * suppose your userId is 5 characters in length
> * suppose you are scanning for records with time between 2012-08-01
> and 2012-08-08
> * when you scanning records and you face e.g. key
> "a_2012-08-09_3jh345j345kjh", where "a" is user id, you can tell
> the scanner from your filter to fast-forward to key "b_ 2012-08-01".
> Because you know that all remained records of user "a" don't fall into
> the interval you need (as the time for its records will be >= 2012-08-09).
>
> As of now, I believe you will have to implement your custom filter to do
> that.
> Pointer: org.apache.hadoop.hbase.filter.Filter.ReturnCode.SEEK_NEXT_USING_HINT
> I believe I implemented similar thing some time ago. If this idea works
> for you I could look for the implementation and share it if it helps. Or
> may be even simply add it to HBase codebase.
>
> Hope this helps,
>
>
> Alex Baranau
> --
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
>
>
> On Thu, Aug 2, 2012 at 8:40 AM, Christian Schäfer 
> wrote:
>
>
> >
> >Excuse my double posting.
> >Here is the complete mail:
> >
> >
> >
> >OK,
> >
> >at first I will try the scans.
> >
> >If that's too slow I will have to upgrade hbase (currently 0.90.4-cdh3u2)
> to be able to use coprocessors.
> >
> >
> >Currently I'm stuck at the scans because it requires two steps (therefore
> maybe some kind of filter chaining is required)
> >
> >
> >The key:  userId-dateInMillis-sessionId
> >
> >
> >At first I need to extract dateInMllis with regex or substring (using
> special delimiters for date)
> >
> >Second, the extracted value must be parsed to Long and set to a RowFilter
> Comparator like this:
> >
> >scan.setFilter(new RowFilter(CompareOp.GREATER_OR_EQUAL, new
> BinaryComparator(Bytes.toBytes((Long)dateInMillis;
> >
> >How to chain that?
> >Do I have to write a custom filter?
> >(Would like to avoid that due to deployment)
> >
> >regards
> >Chris
> >
> >
> >- Ursprüngliche Message -
> >Von: Michael Segel 
> >An: user@hbase.apache.org
> >CC:
> >Gesendet: 13:52 Mittwoch, 1.August 2012
> >Betreff: Re: How to query by rowKey-infix
> >
> >Actually w coprocessors you can create a secondary index in short order.
> >Then your cost is going to be 2 fetches. Trying to do a partial table
> scan will be more expensive.
> >
> >On Jul 31, 2012, at 12:41 PM, Matt Corgan  wrote:
> >
> >> When deciding between a table scan vs secondary index, you should try to
> >> estimate what percent of the underlying data blocks will be used in the
> >> query.  By default, each block is 64KB.
> >>
> >> If each user's data is small and you are fitting multiple users per
> block,
> >> then you're going to need all the blocks, so a tablescan is better
> becau

Re: Need to fast-forward a scanner inside a coprocessor

2012-08-03 Thread Tom Brown
So I understand I'll need to upgrade to 0.94 (which won't be a problem
because the releases are binary-compatible).  I see that the
RegionScanner interface contains the new method "reseek(byte[] row)".

I have a reference to a RegionScanner in my coprocessor because I'm
using: getEnvironment().getRegion().getScanner(scan).

What I don't understand is your conditional statement "it depends
specifically on where you hook this up".  I'm not doing anything with
"postScannerOpen".  Since I have an instance of a RegionScanner,
should I expect "reseek" to work, as long as I'm seeking forward? Is
the way I'm using it up compatible with how it should work?

--Tom

On Fri, Aug 3, 2012 at 3:05 PM, lars hofhansl  wrote:
> We recently added a new API for that:
> RegionScanner.reseek(...). See HBASE-5520. 0.94+ only, unfortunately.
>
> So it depends specifically on where you hook this up. If you do it at 
> RegionObserver.postScannerOpen you can reseek forward at any time.
>
>
> -- Lars
>
>
>
> - Original Message -
> From: Tom Brown 
> To: user@hbase.apache.org
> Cc:
> Sent: Friday, August 3, 2012 1:27 PM
> Subject: Need to fast-forward a scanner inside a coprocessor
>
> I have a custom coprocessor that aggregates a selection of records
> from the table based various criteria. For efficiency, I would like to
> make it skip a bunch of records. For example, if I don't need any
> "" records and I encounter "", I would like to tell it to
> skip everything until "AAAB.."
>
> I don't see any methods of the InternalScanner class that would give
> me that ability. Do I need to close the current scanner and open a new
> one? Does that add significant overhead (which would reduce any gains
> achieved by skipping small numbers of records)?
>
> I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this
> functionality.
>
> --Tom
>


Re: Need to fast-forward a scanner inside a coprocessor

2012-08-03 Thread lars hofhansl
Oh... I just meant you need to have your hands on a RegionScanner :)
As long as you only scan forward it should work.



- Original Message -
From: Tom Brown 
To: user@hbase.apache.org; lars hofhansl 
Cc: 
Sent: Friday, August 3, 2012 5:47 PM
Subject: Re: Need to fast-forward a scanner inside a coprocessor

So I understand I'll need to upgrade to 0.94 (which won't be a problem
because the releases are binary-compatible).  I see that the
RegionScanner interface contains the new method "reseek(byte[] row)".

I have a reference to a RegionScanner in my coprocessor because I'm
using: getEnvironment().getRegion().getScanner(scan).

What I don't understand is your conditional statement "it depends
specifically on where you hook this up".  I'm not doing anything with
"postScannerOpen".  Since I have an instance of a RegionScanner,
should I expect "reseek" to work, as long as I'm seeking forward? Is
the way I'm using it up compatible with how it should work?

--Tom

On Fri, Aug 3, 2012 at 3:05 PM, lars hofhansl  wrote:
> We recently added a new API for that:
> RegionScanner.reseek(...). See HBASE-5520. 0.94+ only, unfortunately.
>
> So it depends specifically on where you hook this up. If you do it at 
> RegionObserver.postScannerOpen you can reseek forward at any time.
>
>
> -- Lars
>
>
>
> - Original Message -
> From: Tom Brown 
> To: user@hbase.apache.org
> Cc:
> Sent: Friday, August 3, 2012 1:27 PM
> Subject: Need to fast-forward a scanner inside a coprocessor
>
> I have a custom coprocessor that aggregates a selection of records
> from the table based various criteria. For efficiency, I would like to
> make it skip a bunch of records. For example, if I don't need any
> "" records and I encounter "", I would like to tell it to
> skip everything until "AAAB.."
>
> I don't see any methods of the InternalScanner class that would give
> me that ability. Do I need to close the current scanner and open a new
> one? Does that add significant overhead (which would reduce any gains
> achieved by skipping small numbers of records)?
>
> I am using HBase 0.92. Upgrading to 0.94 is possible if it gives this
> functionality.
>
> --Tom
>



Re: adding data

2012-08-03 Thread Bijeet Singh
Well, if the file that you have contains TSV, you can directly use the
ImportTSV utility of HBase to do a bulk load.
More details about that can be found here :

http://hbase.apache.org/book/ops_mgt.html#importtsv

The other option for you is to run a MR job on the file that you have, to
generate the HFiles, which you can later import
to HBase using completebulkload.  HFiles are created using the
HFileOutputFormat class.The output of Map should
be Put or KeyValue. For Reduce you need to use configureIncrementalLoad
which sets up reduce tasks.

Bijeet


On Sat, Aug 4, 2012 at 8:13 AM, Rita  wrote:

> I have a file which has 13 billion rows of key an value which I would like
> to place in Hbase. I was wondering if anyone has a good example to provide
> with mapreduce for some sort of work like this.
>
>
> tia
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>


Re: adding data

2012-08-03 Thread anil gupta
Hi Rita,

HBase Bulk Loader is a viable solution for loading such huge data set. Even
if your import file has a separator other than tab you can use ImportTsv as
long as the separator is single character. If in case you want to put in
your business logic while writing the data to HBase then you can write your
own mapper class and use it with bulk loader. Hence, you can heavily
customize the bulk loader as per your needs.
These links might be helpful for you:
http://hbase.apache.org/book.html#arch.bulk.load
http://bigdatanoob.blogspot.com/2012/03/bulk-load-csv-file-into-hbase.html

HTH,
Anil Gupta

On Fri, Aug 3, 2012 at 9:54 PM, Bijeet Singh  wrote:

> Well, if the file that you have contains TSV, you can directly use the
> ImportTSV utility of HBase to do a bulk load.
> More details about that can be found here :
>
> http://hbase.apache.org/book/ops_mgt.html#importtsv
>
> The other option for you is to run a MR job on the file that you have, to
> generate the HFiles, which you can later import
> to HBase using completebulkload.  HFiles are created using the
> HFileOutputFormat class.The output of Map should
> be Put or KeyValue. For Reduce you need to use configureIncrementalLoad
> which sets up reduce tasks.
>
> Bijeet
>
>
> On Sat, Aug 4, 2012 at 8:13 AM, Rita  wrote:
>
> > I have a file which has 13 billion rows of key an value which I would
> like
> > to place in Hbase. I was wondering if anyone has a good example to
> provide
> > with mapreduce for some sort of work like this.
> >
> >
> > tia
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
> >
>



-- 
Thanks & Regards,
Anil Gupta