Re: Please congratulate our new PMC Chair Misty Stanley-Jones

2017-09-25 Thread Gary Helmling
Congrats, Misty, and thanks for all your efforts!

On Fri, Sep 22, 2017 at 3:57 PM Umesh Agashe  wrote:

> Congratulations Misty!
>
>
>
> On Fri, Sep 22, 2017 at 11:41 AM, Esteban Gutierrez 
> wrote:
>
> > Thats awesome! Congratulations, Misty!
> >
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Fri, Sep 22, 2017 at 11:27 AM, Alexander Leblang <
> > alex.lebl...@cloudera.com> wrote:
> >
> > > Congrats Misty! Well done!
> > > On Fri, Sep 22, 2017 at 11:25 AM Wei-Chiu Chuang 
> > > wrote:
> > >
> > > > Congrats! Misty!!
> > > >
> > > > On Fri, Sep 22, 2017 at 7:50 AM, Jimmy Xiang 
> wrote:
> > > >
> > > > > Congrats! Misty!!
> > > > >
> > > > > On Fri, Sep 22, 2017 at 7:16 AM, Pankaj kr 
> > > wrote:
> > > > >
> > > > > > Congratulations Misty..!! :)
> > > > > >
> > > > > >
> > > > > > -Pankaj-
> > > > > >
> > > > > >
> > > > > > -Original Message-
> > > > > > From: Andrew Purtell [mailto:apurt...@apache.org]
> > > > > > Sent: Friday, September 22, 2017 3:08 AM
> > > > > > To: d...@hbase.apache.org; user@hbase.apache.org
> > > > > > Subject: Please congratulate our new PMC Chair Misty
> Stanley-Jones
> > > > > >
> > > > > > At today's meeting of the Board, Special Resolution B changing
> the
> > > > HBase
> > > > > > project Chair to Misty Stanley-Jones was passed unanimously.
> > > > > >
> > > > > > Please join me in congratulating Misty on her new role!
> > > > > >
> > > > > > ​(If you need any help or advice please don't hesitate to ping
> me,
> > > > Misty,
> > > > > > but I suspect you'll do just fine and won't need it.)​
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Andrew
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > A very happy Hadoop contributor
> > > >
> > >
> >
>


Re: Thrift server kerberos ticket refresh

2017-06-26 Thread Gary Helmling
A relogin from the keytab will happen in
RpcClientImpl.Connection.handleSaslConnectionFailure().  So if the Thrift
server fails to establish a connection to a regionserver to relay a client
request, it should perform a relogin from the configured keytab.  This is a
bit indirect though, and there may be a window where your credentials can
expire if you are trying to use kerberos to authenticate Thrift clients and
don't have any requests coming in.

Using something like AuthUtil.getAuthChore() method would work, though the
current implementation is hard-coded to use configurations for
"hbase.client.(keytab.file|kerberos.principal)", so would need to be
extended to allow plugging in the config keys for the thrift server.

Alternately, I provided a patch to
https://issues.apache.org/jira/browse/HADOOP-9567 to have
UserGroupInformation launch a background renewal thread for keytab based
logins, but that doesn't seem to be gaining any traction.

On Sun, Jun 25, 2017 at 10:35 PM Jerry He  wrote:

> Let's go to the JIRA ticket you opened.
> Please describe the problem more over there.  For example, give the
> exception or stack trace and where it comes from.
>
> Thanks,
>
> Jerry
>
> On Wed, Jun 21, 2017 at 12:15 AM, Steen Manniche 
> wrote:
> > I understand that the ticket renewal logic might be called indirectly
> > through some process/module that the thrift server is importing or
> > using, but after a thorough spelunking around the code-base, I was not
> > able to find any path to ticket renewal logic. Which is why I turned
> > to the list :)
> >
> > On Wed, Jun 21, 2017 at 5:06 AM, Jerry He  wrote:
> >> The right code can be hard to find and may not be even in the Thrift
> module.
> >>
> >> Did you encounter any problem, e.g. the Thrift server giving out errors
> due
> >> to expired Kerberos ticket?
> >>
> >> Thanks,
> >>
> >> Jerry
> >>
> >> On Tue, Jun 20, 2017 at 11:05 AM, Steen Manniche 
> wrote:
> >>
> >>> Hi Ted,
> >>>
> >>> thanks for the feedback. I created HBASE-18243
> >>>
> >>> Best regards,
> >>> Steen
> >>>
> >>> On Tue, Jun 20, 2017 at 5:03 PM, Ted Yu  wrote:
> >>> > I didn't find ticket renewal logic either.
> >>> >
> >>> > I think we can use facility similar to AuthUtil#getAuthChore().
> >>> >
> >>> > Mind logging a JIRA ?
> >>> >
> >>> > On Tue, Jun 20, 2017 at 4:17 AM, Steen Manniche 
> >>> wrote:
> >>> >
> >>> >> Hi all,
> >>> >>
> >>> >> I have been looking through the hbase-thrift code looking for where
> >>> >> the server performs renewals of kerberos tickets for the provided
> >>> >> principal/keytab. I cannot seem to find any trace of this?
> >>> >>
> >>> >> As an example, the hadoop-common provides the class
> >>> >> UserGroupInformation, which exposes the method
> >>> >> checkTGTAndReloginFromKeytab. I can see that the ThriftServerRunner
> >>> >> has a handle to the class
> >>> >> (https://github.com/apache/hbase/blob/master/hbase-
> >>> >> thrift/src/main/java/org/apache/hadoop/hbase/thrift/
> >>> >> ThriftServerRunner.java#L205),
> >>> >> but I do not see the ticket renewal logic being called anywhere. Am
> I
> >>> >> missing something about how this works?
> >>> >>
> >>> >>
> >>> >> Thanks for the time and best regards,
> >>> >> Steen
> >>> >>
> >>>
>


[ANNOUNCE] New Apache HBase committer Ashu Pachauri

2017-06-16 Thread Gary Helmling
On behalf of the Apache HBase PMC, I am pleased to announce that Ashu
Pachauri has accepted the PMC's invitation to become a committer on the
project.  We appreciate all of Ashu's generous contributions thus far and
look forward to his continued involvement.

Congratulations and welcome, Ashu!


Re: [DISCUSS] Status of the 0.98 release line

2017-04-10 Thread Gary Helmling
+1 to EOL, and thanks to Andrew for all of the RM'ing.

On Mon, Apr 10, 2017 at 12:27 PM Ted Yu  wrote:

> +1
>
> Andrew has done tremendous work.
>
> On Mon, Apr 10, 2017 at 12:17 PM, Mikhail Antonov 
> wrote:
>
> > +1 to EOL 0.98.
> >
> > Thanks Andrew for all the work maintaining it!
> >
> > -Mikhail
> >
> > On Mon, Apr 10, 2017 at 12:10 PM, Dima Spivak 
> > wrote:
> >
> > > +1
> > >
> > > -Dima
> > >
> > > On Mon, Apr 10, 2017 at 12:08 PM, Stack  wrote:
> > >
> > > > I agree we should EOL 0.98.
> > > > St.Ack
> > > >
> > > > On Mon, Apr 10, 2017 at 11:43 AM, Andrew Purtell <
> apurt...@apache.org>
> > > > wrote:
> > > >
> > > > > Please speak up if it is incorrect to interpret the lack of
> responses
> > > as
> > > > > indicating consensus on declaring 0.98 EOL.
> > > > >
> > > > > I believe we should declare 0.98 EOL.
> > > > >
> > > > >
> > > > > On Wed, Mar 29, 2017 at 6:56 AM, Sean Busbey 
> > > wrote:
> > > > >
> > > > > > Hi Folks!
> > > > > >
> > > > > > Back in January our Andrew Purtell stepped down as the release
> > > > > > manager for the 0.98 release line.
> > > > > >
> > > > > > On the resultant dev@hbase thread[1] folks seemed largely in
> favor
> > > of
> > > > > > declaring end-of-maintenance for the 0.98 line.
> > > > > >
> > > > > > Now that it's been a couple of months, does anyone have concerns
> > > about
> > > > > > pushing forward on that?
> > > > > >
> > > > > > Do folks who listen on user@hbase but not dev@hbase have any
> > > concerns?
> > > > > >
> > > > > > As with any end-of-maintenance branch, the PMC would consider on
> a
> > > > > > case-by-case basis doing a future release of the branch should a
> > > > > > critical security vulnerability show up.
> > > > > >
> > > > > >
> > > > > > [1]: https://s.apache.org/DjCi
> > > > > >
> > > > > > -busbey
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >
> > > > >- Andy
> > > > >
> > > > > If you are given a choice, you believe you have acted freely. -
> > Raymond
> > > > > Teller (via Peter Watts)
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Michael Antonov
> >
>


Re: HBase as a file repository

2017-04-05 Thread Gary Helmling
On Tue, Apr 4, 2017 at 11:00 AM Stack  wrote:

>
> What's the recommended approach to avoid or reduce the delay between when
> > HBase starts sending the response and when the application can act on it?
>
>
> As is, Cells are indivisible as are 'responses' when we promise a
> consistent view. Our implementation has us first realize the response in
> memory on the server-side before we ship the client. We do not have support
> for streaming responses (though this an old request that has come up in
> many forms [1]). Until we have such support, there'll be this lag you
> describe whether MOB or not.
>
>
As Stack points out, this is the reason you're seeing higher
time-to-first-byte on the client side with larger files.  We don't stream
the response within a cell -- the full cell value is being shipped to the
client in a single response.

To improve this, you could try chunking files larger than some threshold
(1MB?) across multiple columns in the row that is stored on the server
side.  You would need to write an abstraction for this on the client side.
The columns could be named with just a simple incrementing counter, which
will of course give them back to you in the right order:

row:cf:1 -> first 1MB
row:cf:2 -> second 1MB

etc.


Then when reading the row back, instead of performing a get, perform a scan
on that single row.  If you call:

Scan.setMaxResultSize(chunkSize)
Scan.setAllowPartialResults()

with then the server will send back individual responses when max result
size is exceeded, which will allow your client to see the column chunks in
individual calls to ResultScanner.next().  The downside is that you will
have more round trips to the server, so you should also look at total
response time. This may help you to implement a pseudo-streaming interface
back to the client.

You may have to play with the right chunk size and max result size values
to use, but this is the way that I would approach large file storage.


Re: On signalling clients from coprocessors

2017-01-30 Thread Gary Helmling
>
> A jira sounds like a good idea.  Even if this is buried somewhere, it's
> clearly not prominent enough.
>
>
+1.  Clarifying this in the javadoc and reference guide seems like a good
idea.


Re: On signalling clients from coprocessors

2017-01-27 Thread Gary Helmling
Did you try throwing CoprocessorException or making your custom exception a
subclass of it?  These should be carried through to the client.

Yes, for exceptions outside of this hierarchy, there is no way to know if
the exception is recoverable or not, so the safe route is chosen and either
the regionserver will abort or the coprocessor will be unloaded, depending
on configuration.

On Fri, Jan 27, 2017 at 12:55 AM Steen Manniche  wrote:

> We have been using coprocessors to create secondary indexes as well as
> doing some sanity checks on incoming data.
>
> Especially in the last case, we planned to use the prePut event to reject
> Puts to hbase that did not pass our sanity checks for incoming data. I had
> expected being able to signal the client ingesting data into hbase via
> throwing a custom exception, but learned that the regionserver the
> coprocessor is installed on basically only have two options in the face of
> a coprocessor throwing an exception: unloading the coprocessor or shutting
> down.
>
> This behaviour surprises me since coprocessors almost by definition contain
> application logic, and exceptions are a standard way of signalling in
> application logic. Are there any plans to make the signal handling work
> better wrt. clients in coprocessors? Or have I simply misunderstood the
> intention of coprocessors here?
>
> Thanks in advance and best regards,
> Steen Manniche
>


Re: [DISCUSS] EOL 1.1 Release Branch

2016-11-07 Thread Gary Helmling
>
> I'm not deeply familiar with the AssignmentManager. I see when we process
> split rollbacks in onRegionSplit() we only call regionOffline() on
> daughters if they are known to exist. However when processing merge
> rollbacks in the else case of onRegionMerge() we unconditionally call
> regionOffline() on the parent-being-merged. Shouldn't that likewise be
> conditional on regionStates holding a state for the parent-being-merged?
> Pardon if I've missed something.
>
>
> I'm really not familiar with the merge code, but this seems plausible to
> me.  I see that onRegionSplit() has an early out at the top of the method,
> but that will fail to evaluate if rs_a and rs_b are open and rs_p is null.
> So if it's called with a code of MERGE_REVERTED, I think we could wind up
> creating an offline meta entry for rs_p with no regioninfo, similar to
> HBASE-16093.  And that entry could wind up hiding the (still online)
> daughter regions.
>

s/onRegionSplit()/onRegionMerge()/ in that comment.


Re: [DISCUSS] EOL 1.1 Release Branch

2016-11-07 Thread Gary Helmling
>
> I'm not deeply familiar with the AssignmentManager. I see when we process
> split rollbacks in onRegionSplit() we only call regionOffline() on
> daughters if they are known to exist. However when processing merge
> rollbacks in the else case of onRegionMerge() we unconditionally call
> regionOffline() on the parent-being-merged. Shouldn't that likewise be
> conditional on regionStates holding a state for the parent-being-merged?
> Pardon if I've missed something.
>
>
I'm really not familiar with the merge code, but this seems plausible to
me.  I see that onRegionSplit() has an early out at the top of the method,
but that will fail to evaluate if rs_a and rs_b are open and rs_p is null.
So if it's called with a code of MERGE_REVERTED, I think we could wind up
creating an offline meta entry for rs_p with no regioninfo, similar to
HBASE-16093.  And that entry could wind up hiding the (still online)
daughter regions.


Re: [DISCUSS] EOL 1.1 Release Branch

2016-11-04 Thread Gary Helmling
>
> The behavior: Looks like failed split/compaction rollback: row(s) in META
> without HRegionInfo, regions deployed without valid meta entries (at
> first), regions on HDFS without valid meta entries (later, after RS
> carrying them are killed by chaos), holes in the region chain leading to
> timeouts and job failure.
>
>
The empty regioninfo in meta sounds like HBASE-16093, though that fix is in
1.2.  Interested to see if there are other problems around splits though.
Do you have a JIRA yet for tracking?


>
> You'll know you have found it when on the ITBLL console its meta scanner
> starts complaining about rows in meta without serialized HRegionInfo.
>
>
Will keep an eye out for this in our ITBLL runs here.


Re: [ANNOUNCE] Mikhail Antonov joins the Apache HBase PMC

2016-05-26 Thread Gary Helmling
Welcome Mikhail!

On Thu, May 26, 2016 at 11:47 AM Ted Yu  wrote:

> Congratulations, Mikhail !
>
> On Thu, May 26, 2016 at 11:30 AM, Andrew Purtell 
> wrote:
>
> > On behalf of the Apache HBase PMC I am pleased to announce that Mikhail
> > Antonov has accepted our invitation to become a PMC member on the Apache
> > HBase project. Mikhail has been an active contributor in many areas,
> > including recently taking on the Release Manager role for the upcoming
> > 1.3.x code line. Please join me in thanking Mikhail for his contributions
> > to date and anticipation of many more contributions.
> >
> > Welcome to the PMC, Mikhail!
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>


Re: Is it safe to set "hbase.coprocessor.abortonerror" to false on produce environment?

2015-04-30 Thread Gary Helmling
The effect of setting this to false is that, if any of your coprocessors
throw unexpected exceptions, instead of aborting, the region server will
log an error and remove the coprocessor from the list of loaded
coprocessors on the region / region server / master.

This allows HBase to continue running, but whether or not this is what you
want depends largely on what your coprocessor is doing.  If your
coprocessor is providing an essential service, such as access control, then
simply unloading the coprocessor compromises that service, in this case
security, which may be worse than simply failing fast.  Imagine a security
exploit where you can trigger an error in the security coprocessor and then
future requests can access any data with no access control being applied.
Similarly, if your coprocessor is transforming data that is being written
to a table (say updating secondary indexes), then unloading the coprocessor
on an error would remove it from the write path of any future requests,
allowing your data to become inconsistent.  Depending on what data you are
storing and how it is being used, this may be a worse outcome than simply
failing fast.

Since HBase cannot know how critical these situations are to you, and since
coprocessors are a server side extension mechanism, HBase makes the
conservative choice and defaults to failing fast in the face of coprocessor
errors.

The "hbase.coprocessor.abortonerror" configuration certainly works in
allowing HBase to continue running, but whether or not it is "safe" to use
in a given situation depends on your use of HBase and coprocessors and
understanding the consequences of the scenarios I outlined above.


On Thu, Apr 30, 2015 at 8:04 AM 姚驰  wrote:

>   Hello, everyone. I'm new to coprocessor and I found that all
> regionservers would abort when I updated a wrong coprocessor. To get rid of
> this on produce environment,
> should I set "hbase.coprocessor.abortonerror" to false? I wonder if this
> option will cause any bad effect to my hbase service, please tell me if
> there is, thanks very much.


Re: Please welcome new HBase committer Jing Chen (Jerry) He

2015-04-01 Thread Gary Helmling
Welcome, Jerry!


On Wed, Apr 1, 2015 at 12:04 PM Richard Ding  wrote:

> Congratulations!
>
> On Wed, Apr 1, 2015 at 11:28 AM, Demai Ni  wrote:
>
> > Jerry, congratulations! well deserved
> >
> > On Wed, Apr 1, 2015 at 11:23 AM, Esteban Gutierrez  >
> > wrote:
> >
> > > Congrats, Jerry!
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > > On Wed, Apr 1, 2015 at 10:55 AM, Ted Yu  wrote:
> > >
> > > > Congratulations, Jerry.
> > > >
> > > > On Wed, Apr 1, 2015 at 10:53 AM, Andrew Purtell  >
> > > > wrote:
> > > >
> > > > > On behalf of the Apache HBase PMC, I am pleased to announce that
> > Jerry
> > > He
> > > > > has accepted the PMC's invitation to become a committer on the
> > project.
> > > > We
> > > > > appreciate all of Jerry's hard work and generous contributions thus
> > > far,
> > > > > and look forward to his continued involvement.
> > > > >
> > > > > Congratulations and welcome, Jerry!
> > > > >
> > > > > --
> > > > > ​​
> > > > >
> > > > > Best regards,
> > > > >
> > > > >- Andy
> > > > >
> > > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > > Hein
> > > > > (via Tom White)
> > > > >
> > > >
> > >
> >
>


Re: Timerange scan

2015-03-02 Thread Gary Helmling
Proving it to yourself is sometimes the hardest part!

On Mon, Mar 2, 2015 at 2:11 PM Nick Dimiduk  wrote:

> Gary to the rescue! Does it still count as being right even if you cannot
> prove it for yourself? ;)
>
> On Mon, Mar 2, 2015 at 2:06 PM, Gary Helmling  wrote:
>
> > >
> > > Sorry Kristoffer, but I believe my previous statement was mistaken. I
> > > cannot find a location where the timestamp is taken into account at the
> > > StoreFile level. I thought the above statement about metadata from the
> > > HFile headers was correct, but I cannot locate the code that takes such
> > > information into consideration.
> > >
> >
> > I believe the filtering happens in StoreScanner.selectScannersFrom(),
> which
> > calls StoreFileScanner.shouldUseScanner() for each store file.  See also
> > StoreFile.passesTimerangeFilter(), which does the check that the Scan's
> > time range is included in the time range from the store file metadata.
> >
> > So store files which fall completely outside of the Scan's min/max
> > timestamps should be excluded.
> >
>


Re: Timerange scan

2015-03-02 Thread Gary Helmling
>
> Sorry Kristoffer, but I believe my previous statement was mistaken. I
> cannot find a location where the timestamp is taken into account at the
> StoreFile level. I thought the above statement about metadata from the
> HFile headers was correct, but I cannot locate the code that takes such
> information into consideration.
>

I believe the filtering happens in StoreScanner.selectScannersFrom(), which
calls StoreFileScanner.shouldUseScanner() for each store file.  See also
StoreFile.passesTimerangeFilter(), which does the check that the Scan's
time range is included in the time range from the store file metadata.

So store files which fall completely outside of the Scan's min/max
timestamps should be excluded.


Re: [ANNOUNCE] Apache HBase 1.0.0 is now available for download

2015-02-24 Thread Gary Helmling
Fantastic work!  Congrats everyone!

On Tue Feb 24 2015 at 9:45:24 AM Esteban Gutierrez 
wrote:

> Wow! Congrats, all!
>
> --
> Cloudera, Inc.
>
>
> On Tue, Feb 24, 2015 at 9:41 AM, Jerry He  wrote:
>
> > Congratulations on the milestone!
> >
>


Re: 1 table, 1 dense CF => N tables, 1 dense CF ?

2015-01-09 Thread Gary Helmling
ScanType is a parameter of RegionObserver preCompact() and
preCompactScannerOpen().  It seems like anything we are explicitly
providing to coprocessor hooks should be LimitedPrivate.

On Fri, Jan 9, 2015 at 12:26 PM, Ted Yu  wrote:

> w.r.t. ScanType, here is the logic used by DefaultCompactor:
>
> ScanType scanType =
>
> request.isAllFiles() ? ScanType.COMPACT_DROP_DELETES :
> ScanType.
> COMPACT_RETAIN_DELETES;
>
> BTW ScanType is currently marked InterfaceAudience.Private
>
> Should it be marked LimitedPrivate ?
>
> Cheers
>
> On Fri, Jan 9, 2015 at 12:19 PM, Gary Helmling 
> wrote:
>
> > >
> > >
> > > 2) is more expensive than 1).
> > > I'm wondering if we could use Compaction Coprocessor for 2)?  HBaseHUT
> > > needs to be able to grab N rows and merge them into 1, delete those N
> > rows,
> > > and just write that 1 new row.  This N could be several thousand rows.
> > > Could Compaction Coprocessor really be used for that?
> > >
> > >
> > It would depend on the details.  If you're simply aggregating the data
> into
> > one row, and:
> > * the thousands of rows are contiguous in the scan
> > * you can somehow incrementally update or emit the new row that you want
> to
> > create so that you don't need to retain all the old rows in memory
> > * the new row you want to emit would sort sequentially into the same
> > position
> >
> > Then overriding the scanner used for compaction could be a good solution.
> > This would allow you to transform the cells emitted during compaction,
> > including dropping the cells from the old rows and emitting new
> > (transformed) cells for the new row.
> >
> >
> > > Also, would that come into play during minor or major compactions or
> > both?
> > >
> > >
> > You can distinguish between them in your coprocessor hooks based on
> > ScanType.  So up to you.
> >
>


Re: 1 table, 1 dense CF => N tables, 1 dense CF ?

2015-01-09 Thread Gary Helmling
>
>
> 2) is more expensive than 1).
> I'm wondering if we could use Compaction Coprocessor for 2)?  HBaseHUT
> needs to be able to grab N rows and merge them into 1, delete those N rows,
> and just write that 1 new row.  This N could be several thousand rows.
> Could Compaction Coprocessor really be used for that?
>
>
It would depend on the details.  If you're simply aggregating the data into
one row, and:
* the thousands of rows are contiguous in the scan
* you can somehow incrementally update or emit the new row that you want to
create so that you don't need to retain all the old rows in memory
* the new row you want to emit would sort sequentially into the same
position

Then overriding the scanner used for compaction could be a good solution.
This would allow you to transform the cells emitted during compaction,
including dropping the cells from the old rows and emitting new
(transformed) cells for the new row.


> Also, would that come into play during minor or major compactions or both?
>
>
You can distinguish between them in your coprocessor hooks based on
ScanType.  So up to you.


Re: Programmatic HBase version detection/extraction

2014-11-12 Thread Gary Helmling
Yes, you can use the org.apache.hadoop.hbase.util.VersionInfo class.

>From java code, you can use VersionInfo.getVersion().  From shell
scripts, you can just run "hbase version" and parse the output.

On Wed, Nov 12, 2014 at 1:37 PM, Otis Gospodnetic
 wrote:
> Hi,
>
> Is there a way to detect which version of HBase one is running?
> Is there an API for that, or a constant with this value, or maybe an MBean
> or some other way to get to this info?
>
> Thanks,
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/


Re: How to deploy coprocessor via HDFS

2014-10-27 Thread Gary Helmling
If you are configuring the coprocessor via
hbase.coprocessor.region.classes, then it is a region endpoint.

For the moment, only table-configured coprocessors support loading
from a jar file in HDFS. Coprocessors configured in hbase-site.xml
need to be resolvable on the regionserver's classpath.

I can't say exactly why you're only getting the ClassNotFoundException
when invoking the endpoint.  Is the class that you need packaged in
your jar file on HDFS?

On Mon, Oct 27, 2014 at 4:00 PM, Tom Brown  wrote:
> I tried to attach the coprocessor directly to a table, and it is able to
> load the coprocessor class. Unfortunately, when I try and use the
> coprocessor I get a ClassNotFoundException on one of the supporting classes
> required by the coprocessor.
>
> It's almost as if the ClassLoader used to load the coprocessor initially is
> not in use when the coprocessor is actually invoked.
>
> --Tom
>
> On Mon, Oct 27, 2014 at 3:42 PM, Tom Brown  wrote:
>
>> I'm not sure how to tell if it is a region endpoint or a region server
>> endpoint.
>>
>> I have not had to explicitly associate the coprocessor with the table
>> before (it is loaded via "hbase.coprocessor.region.classes" in
>> hbase-site.xml), so it might be a region server endpoint. However, the
>> coprocessor code knows to which table the request applies, so it might be a
>> region endpoint.
>>
>> If it helps, this is a 0.94.x cluster (and upgrading isn't doable right
>> now).
>>
>> Can both types of endpoint be loaded from HDFS, or just the table-based
>> one?
>>
>> --Tom
>>
>> On Mon, Oct 27, 2014 at 3:31 PM, Gary Helmling 
>> wrote:
>>
>>> Hi Tom,
>>>
>>> First off, are you talking about a region endpoint (vs. master
>>> endpoint or region server endpoint)?
>>>
>>> As long as you are talking about a region endpoint, the endpoint
>>> coprocessor can be configured as a table coprocessor, the same as a
>>> RegionObserver.  You can see an example and description in the HBase
>>> guide: http://hbase.apache.org/book/ch13s03.html
>>>
>>> From the HBase shell:
>>>
>>>   hbase> alter 't1',
>>>
>>> 'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'
>>>
>>> The arguments are: HDFS path, classname, priority, key=value
>>> parameters.  Arguments are separated by a '|' character.
>>>
>>> Using this configuration, your endpoint class should be loaded from
>>> the jar file in HDFS.  If it's not loaded, you can check the
>>> regionserver log of any of the servers hosting the table's regions.
>>> Just search for your endpoint classname and you should find an error
>>> message of what went wrong.
>>>
>>>
>>>
>>> On Mon, Oct 27, 2014 at 2:03 PM, Tom Brown  wrote:
>>> > Is it possible to deploy an endpoint coprocessor via HDFS or must I
>>> > distribute the jar file to each regionserver individually?
>>> >
>>> > In my testing, it appears the endpoint coprocessors cannot be loaded
>>> from
>>> > HDFS, though I'm not at all sure I'm doing it right (are delimiters ":"
>>> or
>>> > "|", when I use "hdfs:///" does that map to the root hdfs path or the
>>> hbase
>>> > hdfs path, etc).
>>> >
>>> > I have attempted to google this, and have not found any clear answer.
>>> >
>>> > Thanks in advance!
>>> >
>>> > --Tom
>>>
>>
>>


Re: How to deploy coprocessor via HDFS

2014-10-27 Thread Gary Helmling
Hi Tom,

First off, are you talking about a region endpoint (vs. master
endpoint or region server endpoint)?

As long as you are talking about a region endpoint, the endpoint
coprocessor can be configured as a table coprocessor, the same as a
RegionObserver.  You can see an example and description in the HBase
guide: http://hbase.apache.org/book/ch13s03.html

>From the HBase shell:

  hbase> alter 't1',

'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'

The arguments are: HDFS path, classname, priority, key=value
parameters.  Arguments are separated by a '|' character.

Using this configuration, your endpoint class should be loaded from
the jar file in HDFS.  If it's not loaded, you can check the
regionserver log of any of the servers hosting the table's regions.
Just search for your endpoint classname and you should find an error
message of what went wrong.



On Mon, Oct 27, 2014 at 2:03 PM, Tom Brown  wrote:
> Is it possible to deploy an endpoint coprocessor via HDFS or must I
> distribute the jar file to each regionserver individually?
>
> In my testing, it appears the endpoint coprocessors cannot be loaded from
> HDFS, though I'm not at all sure I'm doing it right (are delimiters ":" or
> "|", when I use "hdfs:///" does that map to the root hdfs path or the hbase
> hdfs path, etc).
>
> I have attempted to google this, and have not found any clear answer.
>
> Thanks in advance!
>
> --Tom


[ANNOUNCE] Tephra 0.3.0 Release

2014-09-26 Thread Gary Helmling
Hi all,

I'm happy to announce the 0.3.0 release of Tephra.

This release is a renaming of the project from Continuuity Tephra to
Cask Tephra, and includes the following changes:

* All packages have changed from com.continuuity.tephra to co.cask.tephra
* The Maven group ID has changed from com.continuuity.tephra to co.cask.tephra
* The github repository has moved to https://github.com/caskdata/tephra

If you have a current clone of the Tephra repository, please be sure
to re-clone from https://github.com/caskdata/tephra or to update your
git remote URL.

If you are currently using Tephra as a Maven dependency in any
project, please make note of the change to the groupId.  You will need
to update your dependency settings to something like the following:


  co.cask.tephra
  tephra-api
  0.3.0


  co.cask.tephra
  tephra-core
  0.3.0


  co.cask.tephra
  tephra-hbase-compat-0.98
  0.3.0


Release artifacts are available for download from:
https://github.com/caskdata/tephra/releases/tag/v0.3.0

For any questions or to get involved, please email the Tephra mailing
list at: tephra-...@googlegroups.com


Re: Performance oddity between AWS instance sizes

2014-09-18 Thread Gary Helmling
What do you have HBASE_HEAPSIZE set to in hbase-env.sh?  Is it
possible that you're overcommitting memory and the instance is
swapping?  Just a shot in the dark, but I see that the m3.2xlarge
instance has 30G of memory vs. 15G for c3.2xlarge.

On Wed, Sep 17, 2014 at 3:28 PM, Ted Yu  wrote:
> bq. there's almost no activity on either side
>
> During this period, can you capture stack trace for the region server and
> pastebin the stack ?
>
> Cheers
>
> On Wed, Sep 17, 2014 at 3:21 PM, Josh Williams 
> wrote:
>
>> Hi, everyone.  Here's a strange one, at least to me.
>>
>> I'm doing some performance profiling, and as a rudimentary test I've
>> been using YCSB to drive HBase (originally 0.98.3, recently updated to
>> 0.98.6.)  The problem happens on a few different instance sizes, but
>> this is probably the closest comparison...
>>
>> On m3.2xlarge instances, works as expected.
>> On c3.2xlarge instances, HBase barely responds at all during workloads
>> that involve read activity, falling silent for ~62 second intervals,
>> with the YCSB throughput output resembling:
>>
>>  0 sec: 0 operations;
>>  2 sec: 918 operations; 459 current ops/sec; [UPDATE
>> AverageLatency(us)=1252778.39] [READ AverageLatency(us)=1034496.26]
>>  4 sec: 918 operations; 0 current ops/sec;
>>  6 sec: 918 operations; 0 current ops/sec;
>> 
>>  62 sec: 918 operations; 0 current ops/sec;
>>  64 sec: 5302 operations; 2192 current ops/sec; [UPDATE
>> AverageLatency(us)=7715321.77] [READ AverageLatency(us)=7117905.56]
>>  66 sec: 5302 operations; 0 current ops/sec;
>>  68 sec: 5302 operations; 0 current ops/sec;
>> (And so on...)
>>
>> While that happens there's almost no activity on either side, the CPU's
>> and disks are idle, no iowait at all.
>>
>> There isn't much that jumps out at me when digging through the Hadoop
>> and HBase logs, except that those 62-second intervals are often (but
>> note always) associated with ClosedChannelExceptions in the regionserver
>> logs.  But I believe that's just HBase finding that a TCP connection it
>> wants to reply on had been closed.
>>
>> As far as I've seen this happens every time on this or any of the larger
>> c3 class of instances, surprisingly.  The m3 instance class sizes all
>> seem to work fine.  These are built with a custom AMI that has HBase and
>> all installed, and run via a script, so the different instance type
>> should be the only difference between them.
>>
>> Anyone seen anything like this?  Any pointers as to what I could look at
>> to help diagnose this odd problem?  Could there be something I'm
>> overlooking in the logs?
>>
>> Thanks!
>>
>> -- Josh
>>
>>
>>


Re: adding new tokens to existing Hconnection instances

2014-09-10 Thread Gary Helmling
Authentication is only performed during RPC connection setup.  So
there isn't really a concept of token expiration for an existing RPC
connection.  The connection will be authenticated (will not expire)
for as long as it's held open.  When it's closed and re-opened, it
should pick up the latest tokens associated with the UGI.  So I think
this should work as expected, as long you you are adding the new
tokens to the existing UGI.

By the way, when testing this, you can set the value for
"hbase.auth.token.max.lifetime" to a smaller value (say 360 for
one hour) in your HBase configuration.  This would make it easier to
manually test riding over a token expiration.

On Wed, Sep 10, 2014 at 11:06 AM, Parth Brahmbhatt
 wrote:
> Hi,
>
> The short question:
> Is there any way to update delegation tokens of an existing active 
> HConnection instance?
>
> Long story:
> This is a follow up to http://osdir.com/ml/general/2014-08/msg27210.html. To 
> recap storm is trying to get delegation tokens from Hbase on behalf of a user 
> who is trying to run a storm topology  and then distribute these tokens to 
> all the worker that would run the user topology. I was able to get delegation 
> tokens using TokenUtil.obtainAndCacheToken(hbaseConf, proxyUser) and then 
> read the token from the user credentials. I was hoping on worker host the 
> user code will just add these tokens to the User’s subject object and then 
> call createConnection(Configuration conf, User user).
>
> This seem to work fine until the token expires. Because Hbase do not support 
> token renewal , we have a renewal scheme where master just asks for new 
> tokens at regular interval and then pushes it to worker which again adds it 
> to ugi’s subject.
>
> During the code review of above feature it was pointed out that HConnection 
> implementation only contacts the UGI during initial connection establishment 
> and then caches it. This means even if UGI is updated by adding new tokens 
> the connection will not see these changes and will end up using old expired 
> tokens. I could not actually verify the behavior because token expiry is 7 
> days(anyway to change this?) and my token.cancel() methods are failing.
>
> I looked at RPCClient and HConnectionImplementation, and they both seem to 
> have a user instance which is set to the user instance passed during 
> “createConnection" call.  The only place the token are accessed are during 
> construction of Connection objects in RPCClient. Have I missed something 
> obvious here or there is no other alternative when token expires other then 
> abandoning all objects and connections and recreating a Connection instance?
>
> Thanks
> Parth
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.


[ANNOUNCE] Tephra 0.2.1 release

2014-08-28 Thread Gary Helmling
Hi all,

I'm happy to announce the 0.2.1 release of Tephra.

Tephra provides globally consistent transactions on top of Apache
HBase by leveraging HBase's native data versioning to provide
multi-versioned concurrency control (MVCC) for transactional reads and
writes. With MVCC capability, each transaction sees its own consistent
"snapshot" of data, providing snapshot isolation of concurrent
transactions.


This release fixes the following issues:

* TransactionProcessor use of FilterList on flush and compaction causes abort
* Support coverage report generation
* TransactionProcessor should use TransactionVisibilityFilter on flush
and compact
* Cleanup class and package names
* Use Configuration instead of HBaseConfiguration when possible
* TransactionVisibilityFilter should support additional filtering on
visible cells
* Assigned transaction IDs should reflect current timestamp
* Remove using of HBase Bytes in non-HBase related classes in tephra-core

Please note that a number of the Tephra packages and classes have been
renamed for clarity.  Any existing code will need to be updated.

Binary and source distributions of the release are available at:
https://github.com/continuuity/tephra/releases/tag/v0.2.1

For any questions or to get involved, please email the Tephra mailing
list at: tephra-...@googlegroups.com


Re: getting delegation token for hbase

2014-08-15 Thread Gary Helmling
The default expiration for HBase delegation tokens is 7 days.  But of
course that could be overridden for a given deployment.


On Fri, Aug 15, 2014 at 1:51 PM, Parth Brahmbhatt <
pbrahmbh...@hortonworks.com> wrote:

> Thanks for the reply. Storm topologies are by design suppose to run for
> ever, The only advantage I can think of having a renewal mechanism is that
> instead of distributing the tokens to all workers every "expiration millis”
> the master just renews it. When they eventually expire (in HDFS’s case I
> think its 7 days) the storm master still has to get and push the new tokens
> but the renewal reduces the push frequency thus reducing some work load
> from master.
>
> That is not to say Hbase should implement renewal but in the absence of it
> I hope the expiration is relatively a larger number.
>
> Thanks
> Parth
>
> On Aug 15, 2014, at 1:35 PM, Gary Helmling  wrote:
>
> >>
> >>
> >> I don’t think we need to support older versions of HBase. However there
> is
> >> one thing that still bugs me. How does token renewal work here?
> Generally
> >> in HDFS I have seen that you have to pass in the renewer user as an
> >> argument when you obtain a token. Here as renew user is not passed I am
> >> guessing it’s either some hardcoded Hbase value, or its derived from the
> >> UGI.
> >>
> >
> > HBase doesn't really handle token renewal the way that, say, HDFS does.
> > With HBase the token is simply valid for a fixed period.  In HDFS, the NN
> > retains a map of all current tokens in memory and updates the expiration
> > for a given token when it is renewed, but this is still subject to a max
> > age, so that token still eventually expires.  In HBase, the
> authentication
> > performed with the token is distributed (all regionservers can
> authenticate
> > clients with the token), so keeping all tokens synchronized in memory on
> > all nodes would be difficult.  I also don't think supporting renewal
> would
> > add a great deal of value for this case.
> >
> > So for a truly long running process which could live beyond the token
> > lifetime, you need to have your "delegator", which obtains the initial
> > tokens, periodically obtain new tokens for the processes and make those
> > available to the processes that need them.  The same will also be true
> for
> > HDFS delegation tokens when your processes could run for longer than the
> > token max age (maximum renewal time).
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: getting delegation token for hbase

2014-08-15 Thread Gary Helmling
>
>
> I don’t think we need to support older versions of HBase. However there is
> one thing that still bugs me. How does token renewal work here? Generally
> in HDFS I have seen that you have to pass in the renewer user as an
> argument when you obtain a token. Here as renew user is not passed I am
> guessing it’s either some hardcoded Hbase value, or its derived from the
> UGI.
>

HBase doesn't really handle token renewal the way that, say, HDFS does.
With HBase the token is simply valid for a fixed period.  In HDFS, the NN
retains a map of all current tokens in memory and updates the expiration
for a given token when it is renewed, but this is still subject to a max
age, so that token still eventually expires.  In HBase, the authentication
performed with the token is distributed (all regionservers can authenticate
clients with the token), so keeping all tokens synchronized in memory on
all nodes would be difficult.  I also don't think supporting renewal would
add a great deal of value for this case.

So for a truly long running process which could live beyond the token
lifetime, you need to have your "delegator", which obtains the initial
tokens, periodically obtain new tokens for the processes and make those
available to the processes that need them.  The same will also be true for
HDFS delegation tokens when your processes could run for longer than the
token max age (maximum renewal time).


Re: getting delegation token for hbase

2014-08-15 Thread Gary Helmling
Hi Parth,

The code that you outline here would just return credentials containing
tokens that have already been obtained for the given user.

As I understand it, what you are trying to do is have Storm do secure
impersonation in order to obtain a delegation token on behalf of another
user, which the proceses running on worker nodes will be able to use to
authenticate to HBase as that user.  Is this correct?

If so, then the next question is what versions of HBase do you want to
support?  If you only need to support HBase 0.96+ and current versions of
0.94 (0.94.19+), then you can make use of the
org.apache.hadoop.hbase.security.token.TokenUtil class.  You can call
TokenUtil.obtainToken(Configuration) to obtain a delegation token for the
current user.  Or you can call TokenUtil.obtainAndCacheToken(Configuration,
UserGroupInformation) to obtain a token for a specific UGI and add it to
the UGI's credentials.

If you really need to support older versions of HBase 0.94 (pre 0.94.19),
then you will need to add some reflection around this, since old versions
of 0.94 did not include the security classes (including TokenUtil) by
default.  This is why the User class exposes it's own obtainToken...()
methods to provide the reflection support.  However, I'd recommend that you
avoid this and just stick with current versions of HBase as described above.

--gh



On Wed, Aug 13, 2014 at 12:36 PM, Parth Brahmbhatt <
pbrahmbh...@hortonworks.com> wrote:

> Hi,
>
> I am working on https://issues.apache.org/jira/browse/STORM-444. The task
> is very similar to https://issues.apache.org/jira/browse/OOZIE-961.
> Basically in storm secure mode we would like to fetch topology/job
> submitter user’s credentials on behalf of them on our master node and auto
> populate these credentials on worker nodes. However I noticed that the only
> allowed methods supported by User class requires either a jobConf or a
> combination of kind and service (not real sure what those are). We do not
> have any job configuration because the user is probably just trying to talk
> to Hbase outside of any  map reduce context. The questions I have are
>
> Is there any value in adding a user.getDelegationToken that just returns
> all the tokens?
> In absence of the above API, given User class is just a wrapper around the
> UserGroupInformation class should the following be sufficient?
> if(UserGroupInformation.isSecurityEnabled) {
>   Configuration hbaseConf = HBaseConfiguration.create();
>   UserGroupInformation.setConfiguration(hbaseConf);
>   UserGroupInformation ugi =
> UserGroupInformation.getCurrentUser();
>   UserGroupInformation proxyUser =
> UserGroupInformation.createProxyUser(topologyOrJobSubmitterUser, ugi);
>   User u = User.create(ugi);
>   if(u.isHBaseSecurityEnabled()) {
>  Credentials credentials=
> proxyUser.getCredentials();
>   }
> }
> return credentails;
>
> Appreciate the help.
>
> Thanks
> Parth
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


ANNOUNCE: Tephra for HBase transactions

2014-07-19 Thread Gary Helmling
Hi all,

I would like to introduce a new open source project, Continuuity Tephra,
which provides scalable, distributed transactions for Apache HBase.

Tephra provides "snapshot isolation" for concurrent transactions spanning
multiple regions, tables, and RPC calls.  A central transaction manager
provides globally unique, time-based transaction IDs and efficient conflict
detection.  Multiple transaction manager instances can be run to provide
automated failover that is transparent to clients. A simple client API,
including a drop-in replacement for HTable, makes it easy to work with
transactional operations.

Tephra was built to leverage core HBase features in order to operate
efficiently.  HBase cell versions provide multi-version concurrency control
for isolation.  Efficient transactional reads are performed using
server-side filtering, minimizing the amount of data scanned.  Coprocessors
hook into region flush and compaction operations to clean up data from
invalidated or no longer visible transactions.

Tephra is licensed under the Apache License, Version 2.0.

Tephra on Github (including docs):
https://github.com/continuuity/tephra

User and Developer Mailing List:
https://groups.google.com/d/forum/tephra-dev


Please take a look at Tephra and get involved to make it even better!


Gary Helmling


Re: Themis : implements cross-row/corss-table transaction on HBase.

2014-07-08 Thread Gary Helmling
Hi Jianwei,

You may also want to take a look at the generic client transaction API
being proposed in HBASE-11447:
https://issues.apache.org/jira/browse/HBASE-11447

I think it would be useful to have the Themis perspective there, and
whether the proposed API meets your needs and requirements.



On Tue, Jul 8, 2014 at 9:10 AM, Ted Yu  wrote:

> Jianwei:
> You may want to update the comment for ThemisScan :
>
> //a wrapper class of Put in HBase which not expose timestamp to user
> public class ThemisScan extends ThemisRead {
>
> Is there plan to support append / increment as part of the transaction ?
>
> Currently Themis depends on 0.94.11
> Is there plan to support 0.96+ releases ?
>
> Thanks
>
>
> On Tue, Jul 8, 2014 at 12:34 AM, 崔建伟  wrote:
>
> > Hi everyone, I want to introduce our open-source project Themis which
> > implements cross-row/corss-table transaction on HBase.
> >
> > Themis follows google's percolator algorithm(
> > http://research.google.com/pubs/pub36726.html), which provides
> > ACID-compliant transaction and snapshot isolation. The cross-row
> > transaction is based on HBase's single-row atomic semantics and doesn't
> use
> > a central transaction server, so that supports linear-scalability.
> >
> > Themis depends on a timestamp server to provides global strictly
> > incremental timestamp to define the order of transactions, which will be
> > used to resolve the write-write and read-write conflicts. The timestamp
> > server is lightweight and could achieve hight throughput(500, 000 + qps),
> > and Themis will batch timestamp requests across transactions in one Rpc,
> so
> > that it won't become the bottleneck of the system even when processing
> > billions of transactions every day.
> >
> > Although Themis could be implemented totally in client-side, we adopt
> > coprocessor framework of HBase to achieve higher performance. Themis
> > includes a client-side library to provides transaction APIs, such as
> > themisPut/themisGet/themisScan/themisDelete, and a coprocessor library
> > loaded on regionserver. Therefore, Themis could be used without changing
> > the code and logic of HBase.
> >
> > We have been validating the correctness of Themis for a few months by a
> > AccountTransfer simulation program, which concurrently does cross-row
> > transactions by transferring money among different accounts(each account
> is
> > a row in HBase) and verifies total money of all accounts doesn't change
> in
> > the simulation. We have also run Themis on our production environment.
> >
> > We test the performance of Themis and get comparable result as
> percolator.
> > The single-column transaction represents the worst performance case for
> > Themis compared with HBase, the result is:
> > 1) For read, the performance of percolator is 90% of HBase;
> > 2) For write, the performance of percolator is 23% of HBase.
> > The write performance drops a lot because Themis uses two-phase commit
> > protocol to achieve ACID of transaction. For multi-row write, we improve
> > the performance by paralleling all writes of pre-write phase. For
> > single-row write, we are optimizing two-phase commit protocol to achieve
> > better performance and will update the result when it is ready. The
> details
> > of performance result could be found in github.
> >
> > The repository and introduction of Themis include:
> > 1. Themis github: https://github.com/XiaoMi/themis/. The source code,
> > performance test result and user guide could be found here.
> > 2. Themis jira : https://issues.apache.org/jira/browse/HBASE-10999
> > 3. Chronos github: https://github.com/XiaoMi/chronos. Chronos is our
> > open-source high-availability, high-performance timestamp server to
> provide
> > global strictly incremental timestamp for Themis.
> >
> > If you find Themis interesting, please leave us comment in the mail, jira
> > or github.
> >
> > Best
> > cuijianwei
> >
>


Re: problem access security hbase

2014-07-01 Thread Gary Helmling
Hi Cheney,

If you are obtaining kerberos credentials outside of your program (ie.
kinit), then you can use k5start, which will run your program after
performing a kinit and has a variety of options to relogin periodically.

If you use UGI.loginFromKeytab(), then if you get an authentication failure
performing a remote connection, the HBase client will automatically try to
relogin from the keytab file.  So your program should not need to do any to
explicitly refresh the kerberos tgt.


On Tue, Jul 1, 2014 at 10:16 PM, anil gupta  wrote:

> Hi Cheney,
>
> If you are using a java client and using kinit way to login then i don't
> have much idea about handling long running clients.
> We run long running clients using UserGroupInformation to login to cluster.
> I dont know the very specifics but it think there is a kerberos setting
> where you can setup in such a way that Ticket auto-renews. We run this
> client ranging from 2-4 weeks without any problem of security. Hope this
> helps.
>
> Thanks,
> Anil Gupta
>
>
> On Tue, Jul 1, 2014 at 7:12 PM, Cheney Sun  wrote:
>
> > Thanks Gary, Anil.
> >
> > Add this statement 'UserGroupInformation.setConfiguration(hbaseConf);'
> can
> > resolve the problem.
> >
> > I'm using the kinit way to login KDC. But I wonder if I switch to calling
> > UserGroupInformation.loginFromKeytab() in code, does it need to be
> > called periodically for a long running program, since the TGT obtained
> from
> > KDC will expire?
> >
> > Thanks,
> > Cheney
> >
> >
> > On Wed, Jul 2, 2014 at 1:20 AM, Gary Helmling 
> wrote:
> >
> > > Hi Cheney,
> > >
> > > Did you obtain kerberos credentials before running your program, either
> > by
> > > calling kinit before running the program, or by calling
> > > UserGroupInformation.loginFromKeytab() in your code?
> > >
> > >
> > > On Tue, Jul 1, 2014 at 8:44 AM, Cheney Sun 
> wrote:
> > >
> > > > Hello all,
> > > >
> > > > I have setup a security hbase/hdfs/zookeeper, which was confirmed and
> > > work
> > > > normally.
> > > > I wrote a Java program to get/put data to a table and package the
> > > > core-site.xml / hbase-site.xml (which are obtained from the secure
> > > cluster)
> > > > into the jar file, and it worked correctly.
> > > >
> > > > But when I removed the core-site.xml and hbase-site.xml from the jar,
> > and
> > > > instead, I use the Configuration API to set the relevant settings in
> > the
> > > > program as below,
> > > > Configuration hbaseConf = HBaseConfiguration.create(hadoopConf);
> > > > hbaseConf.set("hbase.zookeeper.quorum","slave-nodex");
> > > > hbaseConf.set("hbase.zookeeper.property.clientPort", "2181");
> > > > hbaseConf.set("hbase.rpc.engine",
> > > > "org.apache.hadoop.hbase.ipc.SecureRpcEngine");
> > > > hbaseConf.set("hbase.security.authentication", "kerberos");
> > > > hbaseConf.set("hbase.master.kerberos.principal", "hbase/_
> > h...@hadoop.com
> > > > ");
> > > >
> > hbaseConf.set("hbase.master.keytab.file","/etc/hbase/conf/hbase.keytab");
> > > > hbaseConf.set("hbase.regionserver.kerberos.principal", "hbase/_
> > > > h...@hadoop.com ");
> > > >
> > > >
> > >
> >
> hbaseConf.set("hbase.regionserver.keytab.file","/etc/hbase/conf/hbase.keytab");
> > > > hbaseConf.set("hadoop.security.authentication", "kerberos");
> > > > hbaseConf.set("hadoop.security.authorization", "true");
> > > >
> > > > It failed getting authenticated to access to the hbase with the error
> > > > message as:
> > > > org.apache.hadoop.ipc.RemoteException: Authentication is required
> > > > at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1021)
> > > > ~[test-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
> > > >  at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:164)
> > > > ~[test-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
> > > > at com.sun.proxy.$Proxy7.getProtocolVersion(Unknown Source) ~[na:na]
> > > >
> > > > It looks like the settings through API in code doesn't work. Is is a
> > > known
> > > > issue or am I wrong somewhere?
> > > >
> > > > Thanks,
> > > > Cheney
> > > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>


Re: problem access security hbase

2014-07-01 Thread Gary Helmling
Hi Cheney,

Did you obtain kerberos credentials before running your program, either by
calling kinit before running the program, or by calling
UserGroupInformation.loginFromKeytab() in your code?


On Tue, Jul 1, 2014 at 8:44 AM, Cheney Sun  wrote:

> Hello all,
>
> I have setup a security hbase/hdfs/zookeeper, which was confirmed and work
> normally.
> I wrote a Java program to get/put data to a table and package the
> core-site.xml / hbase-site.xml (which are obtained from the secure cluster)
> into the jar file, and it worked correctly.
>
> But when I removed the core-site.xml and hbase-site.xml from the jar, and
> instead, I use the Configuration API to set the relevant settings in the
> program as below,
> Configuration hbaseConf = HBaseConfiguration.create(hadoopConf);
> hbaseConf.set("hbase.zookeeper.quorum","slave-nodex");
> hbaseConf.set("hbase.zookeeper.property.clientPort", "2181");
> hbaseConf.set("hbase.rpc.engine",
> "org.apache.hadoop.hbase.ipc.SecureRpcEngine");
> hbaseConf.set("hbase.security.authentication", "kerberos");
> hbaseConf.set("hbase.master.kerberos.principal", "hbase/_h...@hadoop.com
> ");
> hbaseConf.set("hbase.master.keytab.file","/etc/hbase/conf/hbase.keytab");
> hbaseConf.set("hbase.regionserver.kerberos.principal", "hbase/_
> h...@hadoop.com ");
>
> hbaseConf.set("hbase.regionserver.keytab.file","/etc/hbase/conf/hbase.keytab");
> hbaseConf.set("hadoop.security.authentication", "kerberos");
> hbaseConf.set("hadoop.security.authorization", "true");
>
> It failed getting authenticated to access to the hbase with the error
> message as:
> org.apache.hadoop.ipc.RemoteException: Authentication is required
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1021)
> ~[test-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
>  at
>
> org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:164)
> ~[test-0.0.1-SNAPSHOT-jar-with-dependencies.jar:na]
> at com.sun.proxy.$Proxy7.getProtocolVersion(Unknown Source) ~[na:na]
>
> It looks like the settings through API in code doesn't work. Is is a known
> issue or am I wrong somewhere?
>
> Thanks,
> Cheney
>


Re: HBase 0.94.3 with ACL RS won't start

2014-06-20 Thread Gary Helmling
Hi Demai,

Yes, even when using hbase.security.authentication=simple in 0.94, you need
to use SecureRpcEngine.  The default WritableRpcEngine does not pass the
username to the server at all, which can obviously cause problems for
authorization.

--gh


On Fri, Jun 20, 2014 at 10:21 AM, Demai Ni  wrote:

> hi, Andrew,
>
> I didn't setup the keytabs as the current setup is using a firewall instead
> of kerberos. so only use the authorization feature of hbase, and not
> authentication at this moment. A long story about why. :-(
>
> Anyway, I got a tip here
>
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.3.0/CDH4-Security-Guide/cdh4sg_topic_8_2.html
> and add this property on hbase-site.xml (I think that is different between
> 94 and 98)
>
> 
>  hbase.rpc.engine
>  org.apache.hadoop.hbase.ipc.SecureRpcEngine
> 
>
> And now hbase can start and I am able to grant auth like:
> --
> hbase(main):004:0> grant 'dn','R','t1_dn'
> 0 row(s) in 0.0700 seconds
>
> hbase(main):005:0> user_permission 't1_dn'
> User
> Table,Family,Qualifier:Permission
>  demai   t1_dn,,: [Permission:
> actions=READ,WRITE]
>  dn  t1_dn,,: [Permission:
> actions=READ]
>
> -
>
> Demai
>
>
> On Fri, Jun 20, 2014 at 10:11 AM, Andrew Purtell 
> wrote:
>
> > Have you set up keytabs for the server processes?
> >
> >
> > On Thu, Jun 19, 2014 at 9:40 PM, Demai Ni  wrote:
> >
> > > hi, folks,
> > >
> > > I am able to recreate the same error on another single node cluster.
> > >
> > > RS log pasted here: http://pastebin.com/iP9Mrz2T
> > > and
> > > hbase-site.xml is here: http://pastebin.com/ppnqfwGR
> > >
> > > the only thing changes is by adding the following property per
> > > http://hbase.apache.org/book/hbase.accesscontrol.configuration.html
> > >
> > >  hbase.coprocessor.master.classes
> > >
> > >
>  org.apache.hadoop.hbase.security.access.AccessController
> > >
> > >
> > >hbase.coprocessor.region.classes
> > >  org.apache.hadoop.hbase.security.token.TokenProvider,
> > >
> >  org.apache.hadoop.hbase.security.access.AccessController
> > >
> > >
> > > the same setting works on another hbase 98.2 cluster. So I am wondering
> > > what's missing here.
> > >
> > > BTW, I didn't follow the instruction here:
> > > http://hbase.apache.org/book/zk.sasl.auth.html for zookeeper as no
> > > Authentication is needed on this cluster.
> > >
> > > Any suggestion or pointers?
> > >
> > > Demai
> > >
> > >
> > > On Thu, Jun 19, 2014 at 2:59 PM, Enoch Hsu  wrote:
> > >
> > > >
> > > >
> > > > Hi All,
> > > >
> > > > I am running HBase 0.94.3 and trying to get ACL working on a single
> > node
> > > > cluster. I followed the steps in
> > > > http://hbase.apache.org/book/hbase.accesscontrol.configuration.html
> > step
> > > > 8.4.3 and added those 2 properties to my hbase-site.xml
> > > > After stopping and starting hbase, my regionserver is dying with
> > > following
> > > > error/stack trace
> > > >
> > > > 2014-06-19 14:51:00,430 WARN
> > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
> > Exception
> > > > running postOpenDeployTasks; region=1028785192
> > > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > Failed
> > > > 1 action: org.apache.hadoop.hbase.security.AccessDeniedException:
> > > > Insufficient permissions (table=-ROOT-, family: info, action=WRITE)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission
> > > > (AccessController.java:471)
> > > > at
> > > org.apache.hadoop.hbase.security.access.AccessController.prePut
> > > > (AccessController.java:878)
> > > > at
> > > > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.prePut
> > > > (RegionCoprocessorHost.java:800)
> > > > at
> > org.apache.hadoop.hbase.regionserver.HRegion.doPreMutationHook
> > > > (HRegion.java:2046)
> > > > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate
> > > > (HRegion.java:2022)
> > > > at org.apache.hadoop.hbase.regionserver.HRegionServer.multi
> > > > (HRegionServer.java:3573)
> > > > at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown
> Source)
> > > > at sun.reflect.DelegatingMethodAccessorImpl.invoke
> > > > (DelegatingMethodAccessorImpl.java:37)
> > > > at java.lang.reflect.Method.invoke(Method.java:611)
> > > > at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call
> > > > (WritableRpcEngine.java:364)
> > > > at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run
> > > > (HBaseServer.java:1426)
> > > > : 1 time, servers with issues: bdvm081.svl.ibm.com:60020,
> > > > at org.apache.hadoop.hbase.client.HConnectionManager
> > > > $HConnectionImplementation.processBatchCallback
> > > > (HConnectionManager.java:1624)
> > > > at org.apache.hadoop.hbase.client.HConnectionManager
> > > > $HCo

Re: non default coprocessor port

2014-03-21 Thread Gary Helmling
As Anoop described, region observers don't use ZK directly.  Can you
describe more of what you are trying to do in your coprocessor -- how / why
you are connecting to zookeeper, or even provide sample code from your
coprocessor implementation?



On Fri, Mar 21, 2014 at 10:43 AM, Anoop John  wrote:

> region observer CP is not having any seperate con with zk. The CPs, which
> uses zk, use the same connection established by this RS. Which CP u want to
> use?
>
> -Anoop-
>
> On Thu, Mar 20, 2014 at 6:51 AM, Jignesh Patel  >wrote:
>
> > How to configure coprocessor-region observer to use non default port for
> > zookeeper?
> > Hbase Version:0.98
> > hadoop Version: 2.2.0
> >
>


Re: Problems with dynamic loading a Coprocessor

2014-03-21 Thread Gary Helmling
Todd,

Take a look at the regionserver log for the server hosting the "events"
table region.  If the server cannot load the coprocessor when opening the
region, you should see some errors in the log that will help pinpoint the
problem (like is the user running the regionserver process able to read the
file in HDFS?).


On Fri, Mar 21, 2014 at 10:38 AM, Ted Yu  wrote:

> HBASE-5258 dropped per-region coprocessor list from HServerLoad.
>
> Have you tried specifying namenode information in shell command. e.g.
> 'coprocessor'=>'hdfs://example0:8020...'
>
> Please also take a look at region server log around the time table was
> enabled.
>
> Cheers
>
>
> On Fri, Mar 21, 2014 at 9:38 AM, Todd Gruben  wrote:
>
> > I'm new to hbase and I'm trying to load my first region observer
> > coprocessor.  I working with cloudera's hbase 0.96.1.1-cdh5.0.0-beta-2
> >
> > The basic steps i've tried.  Its's about as basic a process as you can
> get,
> > I'm hoping just to put some stuff in the log and prevent a row from going
> > into the table.
> >
> > public class Exploader extends BaseRegionObserver {
> >  public static final Logger logger= Logger.getLogger(Exploader.class);
> > public void start(CoprocessorEnvironment env) throws IOException
> {
> > logger.info("Loaded Exploader");
> > }
> >
> > public void prePut(ObserverContext
> e,
> > Put put, WALEdit edit, boolean writeToWAL)throws IOException {
> > //alright so the goal here is to build a jar file that can pluging
> > logger.info("prePut Exploader");
> > e.complete(); //ignore and not install
> >
> > }
> >
> > }
> >
> >
> > I build the jar and put it into hdfs like so..
> >
> >  home>hadoop fs -copyFromLocal Exploader-0.0.jar /
> >
> > I then go to the hbase shell
> > hbase(main):037:0> disable 'events'
> >
> > hbase(main):038:0>alter 'events', METHOD => 'table_att',
> >
> >
> 'coprocessor'=>'hdfs:///Exploader-0.0.jar|umbel.hbase.coprocessor.Exploader|1001|'
> >
> > hbase(main):041:0>enable 'events'
> >
> > I see it there..
> >
> > hbase(main):039:0> describe 'events'
> > DESCRIPTION
> >ENABLED
> >  'events', {TABLE_ATTRIBUTES => {coprocessor$1 =>
> > 'hdfs:///Exploader-0.0.jar|umbel.hbase.coprocessor.Exploader|1001|'}
> false
> >  , {NAME => 'event', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> > REPLICATION_SCOPE => '0', VERSIONS => '1', C
> >  OMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
> > KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
> >  IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> >
> > I do a put into events and I see the item  present, and no entry in the
> log
> > file.
> >
> > It doesn't seem to be loaded when I look at the status 'details'
> >
> > hbase(main):042:0> status 'detailed'
> > version 0.96.1.1-cdh5.0.0-beta-2
> > 0 regionsInTransition
> > master coprocessors: []
> > 2 live servers
> > ch3.localdomain:60020 1395253537189
> > requestsPerSecond=4.0, numberOfOnlineRegions=3, usedHeapMB=132,
> > maxHeapMB=1541, numberOfStores=3, numberOfStorefiles=3,
> > storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
> > storefileIndexSizeMB=0, readRequestsCount=230538, writeRequestsCount=28,
> > rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
> > totalCompactingKVs=39, currentCompactedKVs=39, compactionProgressPct=1.0,
> > coprocessors=[]
> > "events,,1395417302442.275cd6d13fce89a2040dd394792ba86e."
> > numberOfStores=1, numberOfStorefiles=0,
> > storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
> > storefileIndexSizeMB=0, readRequestsCount=0, writeRequestsCount=0,
> > rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
> > totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN
> > "hbase:meta,,1"
> > numberOfStores=1, numberOfStorefiles=2,
> > storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
> > storefileIndexSizeMB=0, readRequestsCount=230527, writeRequestsCount=28,
> > rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
> > totalCompactingKVs=39, currentCompactedKVs=39, compactionProgressPct=1.0
> >
> "hbase:namespace,,1395245443099.1cc9f4eeda9c21b8d2bfcc3e63598224."
> > numberOfStores=1, numberOfStorefiles=1,
> > storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
> > storefileIndexSizeMB=0, readRequestsCount=11, writeRequestsCount=0,
> > rootIndexSizeKB=0, totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0,
> > totalCompactingKVs=0, currentCompactedKVs=0, compactionProgressPct=NaN
> > ch2.localdomain:60020 1395253551442
> > requestsPerSecond=0.0, numberOfOnlineRegions=1, usedHeapMB=11,
> > maxHeapMB=1541, numberOfStores=1, numberOfStorefiles=2,
> > storefileUncompressedSizeMB=0, storefileSizeMB=0, memstoreSizeMB=0,
> > storefileIndexSizeMB=0, readRequestsCount=186, writeRequests

Re: org.apache.hadoop.hbase.ipc.SecureRpcEngine class not found in HBase jar

2014-03-04 Thread Gary Helmling
For HBase 0.94, you need a version of HBase built with the "security"
profile to get SecureRpcEngine and other security classes.  I'm not sure
that the published releases on maven central actually include this.

However, it's easily to build yourself, just add "-Psecurity" to the mvn
command line to get the security profile.

For HBase 0.96+ this is no longer necessary, as the security classes are
now part of the main build.


On Tue, Mar 4, 2014 at 10:02 AM, anil gupta  wrote:

> Hi All,
>
> If i create a maven project with the following maven dependency then the
> HBase jar doesn't have org.apache.hadoop.hbase.ipc.SecureRpcEngine class.
>  
> org.apache.hbase
> hbase
> 0.94.12
> 
>
> SecureRPCEngine class is used when the cluster is secured. Is there any
> other maven dependency i need to use to get that class?
>
> --
> Thanks & Regards,
> Anil Gupta
>


Re: enable/disable table permission

2014-02-25 Thread Gary Helmling
It looks like how the CREATE permission is applied changed with HBASE-6188,
which removed the concept of a table owner.  Prior to HBASE-6188, the
disable/enable table permission checks required either:

* ADMIN permission
or
* the user is the table owner AND has the CREATE permission

I believe the original intent here was that if you created a table, you
should be able to disable and modify it.

After HBASE-6188, the check in enable/disable table is simply for either
ADMIN or CREATE permission.  This seems to be the best compromise on
attempting to maintain some of the previous semantics.

Andrew Purtell commented to this in HBASE-6188:



CREATE -(DDL) CreateTable, AddColumn, DeleteColumn, DeleteTable,
ModifyColumn, ModifyTable, DisableTable, EnableTable

ADMIN - All of the above plus Flush, Split, Compact

It's not useful to give add/delete/modify schema privileges without
enable/disable to have them take effect. So either we do the above or we
get rid of CREATE. I think the above distinction is still useful.

Edit: I don't like that non-ADMIN can do enable/disable table, because it
can really affect the cluster if the table is large. However I think on
balance it would be more confusing than useful to remove EnableTable and
DisableTable from the set of operations CREATE permission allows until
online schema update-in-place without disable is always possible.


At this point, it may be useful to discuss if we're at the point yet where
online schema updates can be reliably done without a table disable.  In
this case, it might make sense to drop disable/enable table from CREATE
permission.  Though we now have backwards compatibility to consider as well.

If this could be better reflected in the security documentation, please do
open a JIRA describing how we can make it clearer.  And if you feel up to
it, a patch or updated text would be even better.


On Tue, Feb 25, 2014 at 12:30 PM, Alex Nastetsky wrote:

> I don't really understand how HBase permission is expected to work then. A
> user needs the Create permission in order to be able to create their own
> tables. But that permission also allows them to "drop" and "alter" the
> tables created by others. Even if those operations are set up to only work
> when a table is disabled, the ability to disable a table is also given by
> the Create permission. What am I missing?
>
>
> On Tue, Feb 25, 2014 at 3:25 PM, Alex Nastetsky  >wrote:
>
> > Sounds like either permission is sufficient. Either way, the
> documentation
> > could be improved.
> >
> > Thanks.
> >
> >
> > On Tue, Feb 25, 2014 at 3:22 PM, Ted Yu  wrote:
> >
> >> Here is related code from AccessController:
> >> {code}
> >>   public void
> >> preDisableTable(ObserverContext
> >> c, byte[] tableName)
> >> ...
> >> requirePermission("disableTable", tableName, null, null,
> Action.ADMIN,
> >> Action.CREATE);
> >> {code}
> >> requirePermission() iterates through the above permissions and would
> >> return
> >> error for the second permission (CREATE) if validation fails.
> >>
> >> Cheers
> >>
> >>
> >> On Tue, Feb 25, 2014 at 12:12 PM, Alex Nastetsky <
> anastet...@spryinc.com
> >> >wrote:
> >>
> >> > According to
> >> >
> >> >
> >>
> http://hbase.apache.org/book/hbase.accesscontrol.configuration.html#d2566e5780
> >> > ,
> >> > the Enable/Disable operation is controlled by the Admin permission.
> >> > However, it seems to be controlled instead by the Create permission.
> Is
> >> > this a bug or a typo in the documentation?
> >> >
> >> > hbase(main):002:0> disable 'foo'
> >> >
> >> > ERROR: org.apache.hadoop.hbase.security.AccessDeniedException:
> >> Insufficient
> >> > permissions (user=anastet...@spry.com, scope=foo, family=,
> >> action=CREATE)
> >> >
> >> > Thanks in advance,
> >> > Alex.
> >> >
> >>
> >
> >
>


Re: Coprocessor Client Blocking

2014-01-21 Thread Gary Helmling
Yes, the pre/post method calls for the Observer hooks (RegionObserver for
postPut()) are executed synchronously on the RPC calling path.  So the
RegionServer will not return the response to the client until your
postPut() method has returned.  In general, this means that for best
performance you should only load Observers that you need, and any Observers
you write should do their processing as efficiently as possible.


On Tue, Jan 21, 2014 at 4:32 PM, Pradeep Gollakota wrote:

> Hi All,
>
> In the blog describing the coprocessor there was sequence diagram walking
> through the lifecycle of a Get.
>
> https://blogs.apache.org/hbase/mediaresource/60b135e5-04c6-4197-b262-e7cd08de784b
>
> I'm wondering if the lifecycle of a Put follows the same sequence.
> Specifically for my use case, I'm doing some processing using a
> RegionObserver in the postPut() method. Does the client wait until the
> postPut() is executed? When is the control returned to the client in a Put?
>
> Thanks!
>


Re: How to set HBase ACL for groups ?

2014-01-08 Thread Gary Helmling
To grant privileges to a group, just prefix the group name with '@' in the
grant command.  For example, to grant global read/write privileges to the
group "mygroup" in the shell, you would use:

> grant '@mygroup', 'RW'


On Wed, Jan 8, 2014 at 7:59 PM, takeshi  wrote:

> Hi All,
>
> I read following section talking about ACL for group in HBase book
> > 3. HBase managed "roles" as collections of permissions: We will not model
> "roles" internally in HBase to begin with. We instead allow group names to
> be granted permissions, which allows external modeling of roles via group
> membership. Groups are created and manipulated externally to HBase, via the
> Hadoop group mapping service.
>
> So I'd like to try on the *group names to be granted permissions*, how do I
> do for the settings ?
>
> HBase version: 0.99.0
> reference: http://hbase.apache.org/book.html#d2907e5580
>
> Best regards
>
> takeshi
>


Re: HBase Client

2013-12-13 Thread Gary Helmling
Hi Andy,

I'm afraid you will have to ask MapR then what is supported. MapR M7 is a
proprietary application.  It is _not_ Apache HBase.


On Fri, Dec 13, 2013 at 2:27 PM, ados1...@gmail.com wrote:

> Am using MapR M7 HBase distribution (
> http://www.mapr.com/products/mapr-editions/m7-edition)
>
>
> On Fri, Dec 13, 2013 at 5:24 PM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
> > Hi Andy,
> >
> > Then I'm not really sure what to recommend you.
> >
> > What version of HBase are you using with your mapR distribution? 0.94.x?
> > 0.92.x?
> >
> > JM
> >
> >
> > 2013/12/13 ados1...@gmail.com 
> >
> > > Thank you Ted and Jean,
> > >
> > > I should have been more clear stating "what is the best hbase client to
> > use
> > > with *mapR*?" Hue is not officially supported with mapR.
> > >
> > >
> > > On Fri, Dec 13, 2013 at 5:16 PM, Ted Yu  wrote:
> > >
> > > > Search for 'hbase sql client' gives this as top hit:
> > > > https://github.com/forcedotcom/phoenix
> > > >
> > > >
> > > > On Fri, Dec 13, 2013 at 2:14 PM, ados1...@gmail.com <
> > ados1...@gmail.com
> > > > >wrote:
> > > >
> > > > > Thanks Ted but am looking for something like toad/sql developer for
> > > > > querying/viewing data in hbase.
> > > > >
> > > > >
> > > > > On Fri, Dec 13, 2013 at 5:09 PM, Ted Yu 
> wrote:
> > > > >
> > > > > > Hi,
> > > > > > See http://hbase.apache.org/book.html#client
> > > > > > and http://hbase.apache.org/book.html#rest
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > >
> > > > > > On Fri, Dec 13, 2013 at 2:06 PM, ados1...@gmail.com <
> > > > ados1...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hello All,
> > > > > > >
> > > > > > > I am newbie in hbase and wanted to see if there are any good
> > hbase
> > > > > client
> > > > > > > that i can use to query underlying hbase datastore or what is
> the
> > > > best
> > > > > > tool
> > > > > > > to use?
> > > > > > >
> > > > > > > I am using command line but looking for any other best
> > alternative.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Andy.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Strange Problem on using HRegion's getScanner Method in RegionServer

2013-12-05 Thread Gary Helmling
> hi Asaf,
>Thank you for your response. the rpc server in my application is a
> singleton
> instance. It is started in a region observer, and work as a single server
> in the HRegionServer, just like the RPC servers bring up in the RS's main()
> Method. It not attatched with any Table or regions, It can get all the
> HRegion instance on the HRegionServer.
>
>
That sounds like a pretty non-standard setup.  Why not just use the normal
coprocessor endpoint mechanism and execute a scan on the local region,
instead of going through a regionserver-scoped singleton?  Then you can
aggregate results on the client.

Within each coprocessor endpoint instance, you can simply call
RegionCoprocessorEnvironment.getRegion() to get a reference to the local
region.


Re: coprocessor status query

2013-10-23 Thread Gary Helmling
You are welcome to not use coprocessors.


> IMHO, the current implementation is DOA, primarily because it runs in the
> same JVM as the RS.
> (I'll have to see if I can open a JIRA and make comments.)
>
>
There has been a JIRA for out of process coprocessors for quite some time:
https://issues.apache.org/jira/browse/HBASE-4047

Patches welcome.


Re: coprocessor status query

2013-10-22 Thread Gary Helmling
>
> "The coprocessor class is of course still in memory on the
> regionserver,"
>
> That was kinda my point.
>
> You can't remove the class from the RS until you do a rolling restart.
>

Yes, understood.

However, your original statement that "You can't remove a coprocessor"
needed some clarification, in that the coprocessor that threw the exception
_is_ removed from the active set of coprocessors for that region.  So it is
no longer invoked for pre/post hooks on the call path for further requests.

>From the original question, I take it that this invocation context is what
Wei cared about.


Re: How to create HTableInterface in coprocessor?

2013-10-22 Thread Gary Helmling
Within a coprocessor, you can just use the CoprocessorEnvironment instance
passed to start() method or any of the pre/post hooks, and call
CoprocessorEnvironment.getTable(byte[] tablename).


On Tue, Oct 22, 2013 at 9:41 AM, Ted Yu  wrote:

> Take a look at http://hbase.apache.org/book.html#client.connections ,
> especially 9.3.1.1.
>
>
> On Tue, Oct 22, 2013 at 9:37 AM, yonghu  wrote:
>
> > Hello,
> >
> > In the oldest verison of HBase , I can get the HTableInterface by
> > HTablePool.getTable() method. However, in the latest Hbase
> version0.94.12,
> > HTablePool is deprecated. So, I tried to use HConnectionManager to create
> > HTableInterface, but it does not work. Can anyone tell me how to create
> > HTableInterface in new HBase version? By the way, there is no error
> message
> > when I run coprocessor.
> >
> > regards!
> >
> > Yong
> >
>


Re: coprocessor status query

2013-10-22 Thread Gary Helmling
Unfortunately, the per-region coprocessor list was dropped from HServerLoad
in HBASE-5258.  This doesn't leave any easy way for a client to list the
loaded coprocessors on a per-region basis, that I'm aware of.

If you feel like this would be useful to provide, please open a JIRA
describing what you'd like to see.  Bonus points for contributing a patch!


On Mon, Oct 21, 2013 at 8:32 PM, Wei Tan  wrote:

>
>
> Hi Gary, thanks!
> It seems that the region observer been removed behavior, is per region and
> NOT per coprocessor. So do I have to query each region to get the per
> region health status? Or, is there a table level API telling me something
> like, I have 10 regions and an observer has been removed in 2 out of the 10
> regions?
>
> Best,
> Wei
>
> From my iPhone
>
> > On Oct 21, 2013, at 10:06 PM, "Gary Helmling" 
> wrote:
> >
> > > You can't remove a coprocessor.
> > >
> > > Well, you can, but that would require a rolling restart.
> > >
> > > It still exists and is still loaded.
> > >
> > >
> > Assuming we are talking about RegionObserver coprocessors here, when a
> > coprocessor throws an exception (other than IOException), it is either:
> >
> > a) removed from the list of active RegionObservers being invoked on the
> > region's operations
> > b) or if "hbase.coprocessor.abortonerror" is "true", the regionserver
> aborts
> >
> > The coprocessor class is of course still in memory on the regionserver,
> but
> > that instance will no longer be invoked in any pre/post hooks for
> > operations on that region.
> >
> > Back to the original question, the coprocessor is only removed from the
> > list of active coprocessors for the region(s) where it has thrown an
> > exception.  It will still be active on any regions where it has not
> thrown
> > an exception.
>


Re: coprocessor status query

2013-10-21 Thread Gary Helmling
> You can't remove a coprocessor.
>
> Well, you can, but that would require a rolling restart.
>
> It still exists and is still loaded.
>
>
Assuming we are talking about RegionObserver coprocessors here, when a
coprocessor throws an exception (other than IOException), it is either:

a) removed from the list of active RegionObservers being invoked on the
region's operations
b) or if "hbase.coprocessor.abortonerror" is "true", the regionserver aborts

The coprocessor class is of course still in memory on the regionserver, but
that instance will no longer be invoked in any pre/post hooks for
operations on that region.

Back to the original question, the coprocessor is only removed from the
list of active coprocessors for the region(s) where it has thrown an
exception.  It will still be active on any regions where it has not thrown
an exception.


Re: Endpoint and Observer work together?

2013-10-02 Thread Gary Helmling
Your DemoObserver is not being invoked because DemoEndpoint is opening a
scanner directly on the region:

RegionCoprocessorEnvironment env
=(RegionCoprocessorEnvironment)getEnvironment();
InternalScanner scanner = env.getRegion().getScanner(scan);

The RegionObserver.postScannerNext() hook is invoked higher up in the
client call stack.

If the processing of these two coprocessors is so tightly related, then I'd
recommend just combining them to a single class (a RegionObserver can also
be an endpoint):

public class DemoObserver extends BaseRegionObserver implements
DemoProtocol  {

Or if for some reason this is difficult to do, then separate out your
KeyValue handling into a shared class that can be use by both
DemoObserver.postScannerNext() and the InternalScanner result handling in
DemoEndpoint.scanRows().




On Wed, Oct 2, 2013 at 7:21 AM, rgaimari  wrote:

> Hi,
>
> I've created some demo code to show the problem.  Here's the Observer:
>
> ...
> public class DemoObserver extends BaseRegionObserver {
>   byte[] personFamily = Bytes.toBytes("Person");
>
>   @Override
>   public boolean postScannerNext(
> ObserverContext e, InternalScanner s,
> List results, int limit, boolean hasMore)
> throws IOException {
> List newResults = new ArrayList();
> for (Result result : results) {
> List newKVList = new ArrayList();
> for (KeyValue kv : result.list()) {
>   String newVal = Bytes.toString(kv.getValue()).toUpperCase();
>   newKVList.add(new KeyValue(kv.getRow(), kv.getFamily(),
> kv.getQualifier(), kv.getTimestamp(),
> Bytes.toBytes(newVal)));
> }
> newResults.add(new Result(newKVList));
> }
> results.clear();
> results.addAll(newResults);
> return super.postScannerNext(e, s, results, limit, hasMore);
>   }
> }
>
>
> And here's the Endpoint:
>
> ...
> public class DemoEndpoint extends BaseEndpointCoprocessor implements
> DemoProtocol {
>
>   @Override
>   public List scanRows(Filter filter) throws IOException {
> Scan scan = new Scan();
> scan.setFilter(filter);
> RegionCoprocessorEnvironment env =
> (RegionCoprocessorEnvironment)getEnvironment();
> InternalScanner scanner = env.getRegion().getScanner(scan);
>
> List retValues = new ArrayList();
> boolean more = false;
> List res = new ArrayList();
> do {
> res.clear();
> more = scanner.next(res);
> if (res != null)
>   retValues.addAll(res);
> } while (more);
>
> scanner.close();
> return retValues;
>   }
> }
>
> They are loaded in separate jar files, and they are both attached to the
> table:
>
> 1.9.3p448 :009 >   describe 'Demo'
> DESCRIPTIONENABLED
>  {NAME => 'Demo', coprocessor$2 => 'hdfs:///user/hduser/DemoEndpoi true
>  nt.jar|demo.DemoEndpoint|1001', coprocessor$1 => 'hdfs:///user/hd
>  user/DemoObserver.jar|demo.DemoObserver|1', FAMILIES => [{NAME =>
>   'Person', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
>  REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
>   MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS =>
>  'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DI
>  SK => 'true', BLOCKCACHE => 'true'}]}
> 1 row(s) in 0.0880 seconds
>
> If I run a test where I do a scan directly on the client (with no filter),
> I
> get the following results:
>
> , , , <123 MAIN STREET>
> , , , 
> , , , <234 ELM STREET>
> , , , 
> , , , <345 SCOTT STREET>
> , , , 
>
> The values are all capitalized, as the Observer was supposed to do.
> However, if I then run a scan through the Endpoint coprocessor (with a
> filter just looking for the name "john"), I get the following results:
>
> , , , <123 main street>
> , , , 
> , , , <345 scott street>
> , , , 
>
> It's filtered properly, but the values don't go through the Observer and
> aren't capitalized.
>
> If there is any other info you need to help diagnose this, please let me
> know. Thanks.
>
> - Bob Gaimari
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Endpoint-and-Observer-work-together-tp4051383p4051395.html
> Sent from the HBase User mailing list archive at Nabble.com.
>


Re: Base coprocessor arguments

2013-09-11 Thread Gary Helmling
Ben & George,

The arguments you provide when configuring the coprocessor should be
present in the Configuration object exposed through
CoprocessorEnvironment.  So, for example, in your RegionObserver.start()
method, you should be able to do:

*public void start(CoprocessorEnvironment e) throws IOException {
*
* *e.getConfiguration().get("arg1"); // should be "1"*
*
*}
*


On Wed, Sep 11, 2013 at 2:25 AM, Ben Kim  wrote:

> Hello George
>
> Have you found a solution to your question?
>
> I can't seem to get the arguments anywhere :(
>
> Best,
> Ben
>
> *Benjamin Kim*
> *benkimkimben at gmail*
>
>
> On Mon, Aug 20, 2012 at 10:48 PM, George Forman
> wrote:
>
> >
> >
> > Hi All,
> > I have extended BaseRegionObservers coprocessor. I want to know how to
> > access the arguments specified with the associated table: alter 't1',
> > METHOD => 'table_att',
> >
> >
> 'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'
> > Are they available in postPut via
> > ObserverContext ?
> > Thanks
> > George
>


Re: Please welcome our newest committer, Rajeshbabu Chintaguntla

2013-09-11 Thread Gary Helmling
Congrats, Rajesh!


On Wed, Sep 11, 2013 at 11:09 AM, Enis Söztutar  wrote:

> Congrats and welcome aboard.
>
>
> On Wed, Sep 11, 2013 at 10:08 AM, Jimmy Xiang  wrote:
>
> > Congrats!
> >
> >
> > On Wed, Sep 11, 2013 at 9:54 AM, Stack  wrote:
> >
> > > Hurray for Rajesh!
> > >
> > >
> > > On Wed, Sep 11, 2013 at 9:17 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasude...@gmail.com> wrote:
> > >
> > > > Hi All,
> > > >
> > > > Please join me in welcoming Rajeshbabu (Rajesh) as our new HBase
> > > committer.
> > > > Rajesh has been there for more than a year and has been solving some
> > very
> > > > good bugs around the Assignment Manger area.  He has been working on
> > > other
> > > > stuff like HBase-Mapreduce performance improvement, migration scripts
> > and
> > > > off late in the Secondary Index related things.
> > > >
> > > > Rajesh has made his first commit to the pom.xml already.
> > > > Once again, congratulations and welcome to this new role (smile).
> > > >
> > > > Cheers
> > > > Ram
> > > >
> > >
> >
>


Re: sqoop import into secure Hbase with kerberos

2013-08-05 Thread Gary Helmling
To further isolate the problem, try doing some simple commands from the
hbase shell after obtaining kerberos credentials:

1) kinit
2) hbase shell
3) in hbase shell:
  create 'testtable', 'f'
  put 'testtable', 'r1', 'f:col1', 'val1'
  get 'testtable', 'r1'

If these all work, then the HBase code is correctly recognizing and
utilizing your kerberos credentials, and there may be a problem with
getting the credentials passed through the sqoop process.  If these
commands do _not_ work, then there is a problem with your kerberos / HBase
security setup.  For example. it could be a mismatch in the encryption
ciphers (most kerberos configs will default to AES which is not or at least
was not previously supported in Java 6 without the "unlimited strength" JCE
policy files).


On Mon, Aug 5, 2013 at 6:14 PM, ssatish  wrote:

> I got the grant working for user by adding kuser1 to sudo users list by
> adding the following code to hbase-site.xml and restarting master and
> region
> servers.
>
> hbase.superuser
> kuser1
>
> But I still get the ImportTool error -  ERROR tool.ImportTool: Error during
> import: Can't get authentication token
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/sqoop-import-into-secure-Hbase-with-kerberos-tp4048847p4048864.html
> Sent from the HBase User mailing list archive at Nabble.com.
>


Re: problem in testing coprocessor endpoint

2013-07-12 Thread Gary Helmling
Kim, Asaf,

I don't know where this conception comes from that endpoint coprocessors
must be loaded globally, but it is simply not true.  If you would like to
see how endpoints are registered, see RegionCoprocessorHost.java:

  @Override
  public RegionEnvironment createEnvironment(Class implClass,
  Coprocessor instance, int priority, int seq, Configuration conf) {
// Check if it's an Endpoint.
// Due to current dynamic protocol design, Endpoint
// uses a different way to be registered and executed.
// It uses a visitor pattern to invoke registered Endpoint
// method.
for (Class c : implClass.getInterfaces()) {
  if (CoprocessorProtocol.class.isAssignableFrom(c)) {
region.registerProtocol(c, (CoprocessorProtocol)instance);
break;
  }
}


If you would like some trivial test code that demonstates invoking an
endpoint coprocessor configured on only a single table (coprocessor jar
loaded from HDFS), just let me know and I will send it to you.

--gh


On Fri, Jul 12, 2013 at 10:06 AM, Kim Chew  wrote:

> No, Endpoint processor can be deployed via configuration only.
> In hbase-site.xml, there should be an entry like this,
>
> 
>   hbase.coprocessor.region.classes
>   myEndpointImpl
> 
>
> Also, you have to let HBase know where to find your class, so in
> hbase-env.sh
>
> export HBASE_CLASSPATH=${HBASE_HOME}/lib/AggregateCounterEndpoint.jar
>
>
> The trouble is you will need to restart RS. It would be nice to have APIs
> to load the Endpoint coprocessor dynamically.
>
> Kim
>
>
> On Fri, Jul 12, 2013 at 9:18 AM, Gary Helmling 
> wrote:
>
> > Endpoint coprocessors can be loaded on a single table.  They are no
> > different from RegionObservers in this regard.  Both are instantiated per
> > region by RegionCoprocessorHost.  You should be able to load the
> > coprocessor by setting it as a table attribute.  If it doesn't seem to be
> > loading, check the region server logs after you re-enable the table where
> > you have added it.  Do you see any log messages from
> RegionCoprocessorHost?
> >
> >
> > On Fri, Jul 12, 2013 at 4:33 AM, Asaf Mesika 
> > wrote:
> >
> > > You can't register and end point just for one table. It's like a stored
> > > procedure - you choose to run it and pass parameters to it.
> > >
> > > On Friday, July 12, 2013, ch huang wrote:
> > >
> > > > what your describe is how to load endpoint coprocessor for every
> region
> > > in
> > > > the hbase, what i want to do is just load it into my test table ,only
> > for
> > > > the regions of the table
> > > >
> > > > On Fri, Jul 12, 2013 at 12:07 PM, Asaf Mesika  >
> > > > wrote:
> > > >
> > > > > The only way to register endpoint coprocessor jars is by placing
> them
> > > in
> > > > > lib dir if hbase and modifying hbase-site.xml to point to it under
> a
> > > > > property name I forgot at the moment.
> > > > > What you described is a way to register an Observer type
> coprocessor.
> > > > >
> > > > >
> > > > > On Friday, July 12, 2013, ch huang wrote:
> > > > >
> > > > > > i am testing coprocessor endpoint function, here is my testing
> > > process
> > > > > ,and
> > > > > > error i get ,hope any expert on coprocessor can help me out
> > > > > >
> > > > > >
> > > > > > # vi ColumnAggregationProtocol.java
> > > > > >
> > > > > > import java.io.IOException;
> > > > > > import org.apache.hadoop.hbase.ipc.CoprocessorProtocol;
> > > > > > // A sample protocol for performing aggregation at regions.
> > > > > > public interface ColumnAggregationProtocol
> > > > > > extends CoprocessorProtocol {
> > > > > > // Perform aggregation for a given column at the region. The
> > > > aggregation
> > > > > > // will include all the rows inside the region. It can be
> extended
> > to
> > > > > > // allow passing start and end rows for a fine-grained
> aggregation.
> > > > > >public long sum(byte[] family, byte[] qualifier) throws
> > > IOException;
> > > > > > }
> > > > > >
> > > > > >
> > > > > > # vi ColumnAggregationEndpoint.java
> > > > > >
> > > > > >
> > > > > > import java.io.FileWriter;
> > &g

Re: problem in testing coprocessor endpoint

2013-07-12 Thread Gary Helmling
Endpoint coprocessors can be loaded on a single table.  They are no
different from RegionObservers in this regard.  Both are instantiated per
region by RegionCoprocessorHost.  You should be able to load the
coprocessor by setting it as a table attribute.  If it doesn't seem to be
loading, check the region server logs after you re-enable the table where
you have added it.  Do you see any log messages from RegionCoprocessorHost?


On Fri, Jul 12, 2013 at 4:33 AM, Asaf Mesika  wrote:

> You can't register and end point just for one table. It's like a stored
> procedure - you choose to run it and pass parameters to it.
>
> On Friday, July 12, 2013, ch huang wrote:
>
> > what your describe is how to load endpoint coprocessor for every region
> in
> > the hbase, what i want to do is just load it into my test table ,only for
> > the regions of the table
> >
> > On Fri, Jul 12, 2013 at 12:07 PM, Asaf Mesika 
> > wrote:
> >
> > > The only way to register endpoint coprocessor jars is by placing them
> in
> > > lib dir if hbase and modifying hbase-site.xml to point to it under a
> > > property name I forgot at the moment.
> > > What you described is a way to register an Observer type coprocessor.
> > >
> > >
> > > On Friday, July 12, 2013, ch huang wrote:
> > >
> > > > i am testing coprocessor endpoint function, here is my testing
> process
> > > ,and
> > > > error i get ,hope any expert on coprocessor can help me out
> > > >
> > > >
> > > > # vi ColumnAggregationProtocol.java
> > > >
> > > > import java.io.IOException;
> > > > import org.apache.hadoop.hbase.ipc.CoprocessorProtocol;
> > > > // A sample protocol for performing aggregation at regions.
> > > > public interface ColumnAggregationProtocol
> > > > extends CoprocessorProtocol {
> > > > // Perform aggregation for a given column at the region. The
> > aggregation
> > > > // will include all the rows inside the region. It can be extended to
> > > > // allow passing start and end rows for a fine-grained aggregation.
> > > >public long sum(byte[] family, byte[] qualifier) throws
> IOException;
> > > > }
> > > >
> > > >
> > > > # vi ColumnAggregationEndpoint.java
> > > >
> > > >
> > > > import java.io.FileWriter;
> > > > import java.io.IOException;
> > > > import java.util.ArrayList;
> > > > import java.util.List;
> > > > import org.apache.hadoop.hbase.CoprocessorEnvironment;
> > > > import org.apache.hadoop.hbase.KeyValue;
> > > > import org.apache.hadoop.hbase.client.Scan;
> > > > import org.apache.hadoop.hbase.coprocessor.BaseEndpointCoprocessor;
> > > > import
> > org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
> > > > import org.apache.hadoop.hbase.ipc.ProtocolSignature;
> > > > import org.apache.hadoop.hbase.regionserver.HRegion;
> > > > import org.apache.hadoop.hbase.regionserver.InternalScanner;
> > > > import org.apache.hadoop.hbase.util.Bytes;
> > > >
> > > > //Aggregation implementation at a region.
> > > >
> > > > public class ColumnAggregationEndpoint extends
> BaseEndpointCoprocessor
> > > >   implements ColumnAggregationProtocol {
> > > >  @Override
> > > >  public long sum(byte[] family, byte[] qualifier)
> > > >  throws IOException {
> > > >// aggregate at each region
> > > >  Scan scan = new Scan();
> > > >  scan.addColumn(family, qualifier);
> > > >  long sumResult = 0;
> > > >
> > > >  CoprocessorEnvironment ce = getEnvironment();
> > > >  HRegion hr = ((RegionCoprocessorEnvironment)ce).getRegion();
> > > >  InternalScanner scanner = hr.getScanner(scan);
> > > >
> > > >  try {
> > > >List curVals = new ArrayList();
> > > >boolean hasMore = false;
> > > >do {
> > > >  curVals.clear();
> > > >  hasMore = scanner.next(curVals);
> > > >  KeyValue kv = curVals.get(0);
> > > >  sumResult += Long.parseLong(Bytes.toString(kv.getValue()));
> > > >
> > > >} while (hasMore);
> > > >  } finally {
> > > >  scanner.close();
> > > >  }
> > > >  return sumResult;
> > > >   }
> > > >
> > > >   @Override
> > > >   public long getProtocolVersion(String protocol, long
> > clientVersion)
> > > >  throws IOException {
> > > >  // TODO Auto-generated method stub
> > > >  return 0;
> > > >   }
> > > >
> > > > > 192.168.10.22:9000/alex/test.jar|ColumnAggregationEndpoint|1001<
> >
> http://192.168.10.22:9000/alex/test.jar%7CColumnAggregationEndpoint%7C1001
> > >
> > > '
> > > >
> > > > here is my testing java code
> > > >
> > > > package com.testme.demo;
> > > > import java.io.IOException;
> > > > import java.util.Map;
> > > > import org.apache.hadoop.conf.Configuration;
> > > > import org.apache.hadoop.hbase.HBaseConfiguration;
> > > > import org.apache.hadoop.hbase.HTableDescriptor;
> > > > import org.apache.hadoop.hbase.client.*;
> > > > import org.apache.hadoop.hbase.coprocessor.ColumnAggregationProtocol;
> > > >

Re: ClusterId read in ZooKeeper is null

2013-07-09 Thread Gary Helmling
Is the HMaster process running correctly on the cluster?  Between the
missing cluster ID and meta region not being available, it looks like
HMaster may not have fully initialized.

Alternately, if HMaster is running correctly, did you override the default
value for zookeeper.znode.parent in your cluster configuration, but not set
this in your client code?


On Tue, Jul 9, 2013 at 10:05 AM, Brian Jeltema <
brian.jelt...@digitalenvoy.net> wrote:

> I'm new to HBase, and need a little guidance. I've set up a 6-node
> cluster, with 3 nodes
> running the ZooKeeper server. The database seems to be working from the
> hbase shell; I can create tables, insert,
> scan, etc.
>
> But when I try to perform operations in a Java app, I hang at:
>
> 13/07/09 12:40:34 INFO zookeeper.ZooKeeper: Initiating client connection,
> connectString=cas-2:2181,cas-1:2181,cas-3:2181 sessionTimeout=9
> watcher=hconnection-0x6833f0de
> 13/07/09 12:40:34 INFO zookeeper.RecoverableZooKeeper: Process
> identifier=hconnection-0x6833f0de connecting to ZooKeeper
> ensemble=cas-2:2181,cas-1:2181,cas-3:2181
> 13/07/09 12:40:34 INFO zookeeper.ClientCnxn: Opening socket connection to
> server cas-1/10.4.0.1:2181. Will not attempt to authenticate using SASL
> (Unable to locate a login configuration)
> 13/07/09 12:40:34 INFO zookeeper.ClientCnxn: Socket connection established
> to cas-1/10.4.0.1:2181, initiating session
> 13/07/09 12:40:34 INFO zookeeper.ClientCnxn: Session establishment
> complete on server cas-1/10.4.0.1:2181, sessionid = 0x13fa5b5096e001f,
> negotiated timeout = 4
> 13/07/09 12:40:34 INFO client.ZooKeeperRegistry: ClusterId read in
> ZooKeeper is null
>
> The Java code is nothing more than:
>
> Configuration hConf = HBaseConfiguration.create();
> hConf.set("hbase.zookeeper.quorum", "cas-1,cas-2,cas-3");
> hConf.set("hbase.zookeeper.property.clientPort", "2181");
> HTable hTable = new HTable(hConf, "tablename");
>
> a thread dump shows the app blocked:
>
> "main" prio=10 tid=0x7f1424009000 nid=0x2976 waiting on condition
> [0x7f142a0c3000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.blockUntilAvailable(ZKUtil.java:1772)
> at
> org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:175)
> at
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:58)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:806)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:896)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:809)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:778)
> at
> org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:245)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:186)
> at org.apache.hadoop.hbase.client.HTable.(HTable.java:147)
>
> Any suggestions as to the cause?
>
> TIA
>
> Brian


Re: coprocessorExec got stucked with generic type

2013-06-11 Thread Gary Helmling
Does your NameAndDistance class implement org.apache.hadoop.io.Writable?
 If so, it _should_ be serialized correctly.  There was a past issue
handling generic types in coprocessor endpoints, but that was fixed way
back (long before 0.94.2).  So, as far as I know, this should all be
working, assuming that NameAndDistance can be serialized.


On Mon, Jun 10, 2013 at 9:36 AM, Pavel Hančar wrote:

>  It's org.apache.commons.lang.SerializationUtils
> I have it in hbase-0.94.2-cdh4.2.1/lib/commons-lang-2.5.jar
>  Pavel
>
>
> 2013/6/10 Ted Yu 
>
> > I searched for SerializationUtils class in hadoop (both branch-1 and
> > branch-2)
> > I also searched for SerializationUtils in hbase codebase.
> >
> > I didn't seem to find it.
> >
> > Is it an internal class of your project ?
> >
> > Cheers
> >
> > On Mon, Jun 10, 2013 at 6:11 AM, Pavel Hančar  > >wrote:
> >
> > >  I see, that's probably big nonsense to return ArrayList (or array) of
> > > another classes from coprocessor, because it's a list of pointers. The
> > > solution is to serialize it to byte[] by SerializationUtils.serialize(
> > > Serializable<
> > >
> >
> http://java.sun.com/javase/6/docs/api/java/io/Serializable.html?is-external=true
> > > >
> > >  obj).
> > >   Pavel
> > >
> > >
> > > 2013/6/10 Pavel Hančar 
> > >
> > > >   Hello,
> > > > can I return from an EndPoint a generic type? I try to return
> > > > ArrayList from an EndPoint method (where
> > NameAndDistance
> > > > is a simpe class with two public variables name and distance). But
> > when I
> > > > return  unempty ArrayList, the coprocessorExec get stucked.
> > > >   Thanks,
> > > >   Pavel Hančar
> > > >
> > >
> >
>


Re: Endpoint vs. Observer Coprocessors

2013-05-03 Thread Gary Helmling
A single class can act as both a RegionObserver and an endpoint.  The
Base... classes are just there for convenience.

To implement both, for example, you could:
1) have your class extend BaseRegionObserver, override postPut(), etc
2) define an interface that extends CoprocessorProtocol.  the methods you
define will be callable on the endpoint
3) have your same class from (1) implement your interface from (2)

Once the coprocessor is loaded, you should be able to call the interface
methods from (2) using HTable.coprocessorProxy() or
HTable.coprocessorExec().  This will fan out the method invocations on the
coprocessor instances loaded on the regions which contain the row key or
row key range you specify for the calls.


On Fri, May 3, 2013 at 5:55 AM, Pavel Hančar  wrote:

>  Hello,
> I'd like to have an object (index of vectors of pictures) in RAM of every
> regionserver. To have a variable of some Coprocessor class seems to be good
> way. I need to add a new vector to the index, when I add a picture to HBase
> (it's the postPut() method) and then I need to search trough the index by
> some special API, when users click in my web application. The second task
> seems to be good for some endpoint. But I need to share the variable with
> index.
>   Thanks for the answers.
>   Pavel
>
>
> 2013/5/3 Anoop John 
>
> > >data in one common variable
> > Didn't follow u completely. Can u tell us little more on your usage. How
> > exactly the endpoint to be related with the CP hook (u said postPut)
> >
> > -Anoop-
> >
> >
> > On Fri, May 3, 2013 at 4:04 PM, Pavel Hančar 
> > wrote:
> >
> > > Hello,
> > > I've just started to discover coprocessors. Namely the classes
> > > BaseEndpointCoprocessor and BaseRegionObserver. I need the postPut()
> > method
> > > and than some special user calls. So I need both, but I want to have my
> > > data in one common variable. What's the easiest way to manage? Do I
> have
> > to
> > > Implement my own Base...Coprocessor class?
> > >  Best wishes,
> > >   Pavel Hančar
> > >
> >
>


Re: Coprocessors

2013-04-25 Thread Gary Helmling
> I'm looking to write a service that runs alongside the region servers and
> acts a proxy b/w my application and the region servers.
>
> I plan to use the logic in HBase client's HConnectionManager, to segment
> my request of 1M rowkeys into sub-requests per region-server. These are
> sent over to the proxy which fetches the data from the region server,
> aggregates locally and sends data back. Does this sound reasonable or even
> a useful thing to pursue?
>
>
This is essentially what coprocessor endpoints (called through
HTable.coprocessorExec()) basically do.  (One difference is that there is a
parallel request per-region, not per-region server, though that is a
potential optimization that could be made as well).

The tricky part I see for the case you describe is splitting your full set
of row keys up correctly per region.  You could send the full set of row
keys to each endpoint invocation, and have the endpoint implementation
filter down to only those keys present in the current region.  But that
would be a lot of overhead on the request side.  You could split the row
keys into per-region sets on the client side, but I'm not sure we provide
sufficient context for the Batch.Callable instance you provide to
coprocessorExec() to determine which region it is being invoked against.


Re: Coreprocessor always scan the whole table.

2013-04-19 Thread Gary Helmling
As others mention HBASE-6870 is about coprocessorExec() always scanning the
full .META. table to determine region locations.  Is this what you mean or
are you talking about your coprocessor always scanning your full user table?

If you want to limit the scan within regions in your user table, you'll
need to pass startRow and endRow as parameters to your instance.GetList()
method.  Then when you create the region scanner in your coprocessor code,
you'll need to set the start and end row yourself in order to limit the
rows scanned.


On Fri, Apr 19, 2013 at 5:59 AM, Ted Yu  wrote:

> Please upgrade to 0.94.6.1 which is more stable.
>
> Cheers
>
> On Apr 19, 2013, at 4:58 AM, GuoWei  wrote:
>
> >
> > We use base 0.94.1 in our production environment.
> >
> >
> > Best Regards / 商祺
> > 郭伟 Guo Wei
> >
> > 在 2013-4-19,下午6:01,Ted Yu  写道:
> >
> >> Which hbase version are you using ?
> >>
> >> Thanks
> >>
> >> On Apr 19, 2013, at 2:49 AM, GuoWei  wrote:
> >>
> >>> Hello,
> >>>
> >>> We use HBase core processor endpoint  to process realtime data. But
> when I use coreprocessorExec method to scan table and pass startRow and
> endRow. It always scan all table instead of the result between the startRow
> and endRow.
> >>>
> >>> my code.
> >>>
> >>> results = table.coprocessorExec(IEndPoint_SA.class,  startrow, endrow,
> >>>  new Batch.Call>() {
> >>>public Hashtable call(IEndPoint_SA
> instance)throws IOException{
> >>>Hashtable s = null;
> >>>  try {
> >>>  s=instance.GetList();
> >>>  } catch (ParseException e) {
> >>>  // TODO Auto-generated catch block
> >>>  e.printStackTrace();
> >>>  }
> >>>  return s;
> >>>}
> >>>  });
> >>>
> >>>
> >>>
> >>> Best Regards / 商祺
> >>> 郭伟 Guo Wei
> >>> -
> >>
> >
>


Re: Hbase question

2013-04-09 Thread Gary Helmling
Hi Rami,

One thing to note for RegionObservers, is that each table region gets its
own instance of each configured coprocessor.  So if your cluster has N
regions per region server, with your RegionObserver loaded on all tables,
then each region server will have N instances of your coprocessor.  You
should just be aware of this in case you, say, create a thread pool in your
coprocessor constructor.  An alternative in this case is to use a singleton
class per region server (aka per jvm) to manage the resources.

You do want to be sure that all threads are daemon threads, so that they
don't block region server shutdown.  Or else you'll need to ensure you
properly stop/join all the threads you've spawned on shutdown.
 RegionServerObserver.preStopRegionServer() may help there.

--gh



On Tue, Apr 9, 2013 at 11:40 AM, Ted Yu  wrote:

> Rami:
> Can you tell us what coprocessor hook you plan to use ?
>
> Thanks
>
> On Tue, Apr 9, 2013 at 10:51 AM, Rami Mankevich  wrote:
>
> > First of all - thanks  for the quick response.
> >
> > Basically threads I want to open are for my own  internal structure
> > updates and I guess have no relations to HBase internal structures.
> > All I want is initiations for some asynchronous structure updates as part
> > of coprocessor execution in order  not to block user reponse.
> >
> > The only reason I was asking is to be sure Hbase will not kill those
> > threads.
> > As I understand - shouldn't be any issue with that. Am I correct?
> >
> > In addition - Is there any Hbase Thread pool I can use?
> >
> >
> > Thanks
> > From: Andrew Purtell [mailto:apurt...@apache.org]
> > Sent: Tuesday, April 09, 2013 6:53 PM
> > To: Rami Mankevich
> > Cc: apurt...@apache.org
> > Subject: Re: Hbase question
> >
> > Hi Rami,
> >
> > It is no problem to create threads in a coprocessor as a generic answer.
> > More specifically there could be issues depending on exactly what you
> want
> > to do, since coprocessor code changes HBase internals. Perhaps you could
> > say a bit more. I also encourage you to ask this question on
> > user@hbase.apache.org so other
> contributors
> > can chime in too.
> >
> > On Tuesday, April 9, 2013, Rami Mankevich wrote:
> > Hey
> > According to the Hbase documentation you are one of contrinuters to the
> > HBase project
> > I would like to raise some question when nobody can basically advice me:
> >
> > In context of coprocessors I want to raise some threads.
> > Do you see any problems with that?
> >
> > Thanks
> > This message and the information contained herein is proprietary and
> > confidential and subject to the Amdocs policy statement, you may review
> at
> > http://www.amdocs.com/email_disclaimer.asp
> >
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>


Re: HBaseClient isn't reusing connections but creating a new one each time

2013-03-29 Thread Gary Helmling
Hi Jeff,

Yeah that is pretty bad.  User should definitely be implementing equals()
and hashCode().  Thanks for tracking this down and reporting it.

I opened https://issues.apache.org/jira/browse/HBASE-8222


Gary


On Fri, Mar 29, 2013 at 11:41 AM, Jeff Whiting  wrote:

> After noticing a lot of threads, I turned on debugging logging for hbase
> client and saw this many times counting up constantly:
> HBaseClient:531 - IPC Client (687163870) connection to
> /10.1.37.21:60020from jeff: starting, having connections 1364
>
> At that point in my code it was up to 1364 different connections (and
> threads).  Those connections will eventually drop off after the idle time
> is reached "conf.getInt("hbase.ipc.client.connection.maxidletime", 1)".
> But during periods of activity the number of threads can get very high.
>
> Additionally I was able to confirm the large number of threads by doing:
>
> jstack  | grep IPC
>
>
> So I started digging around in the code...
>
> In HBaseClient.getConnection it attempts to reuse previous connections:
>
>  ConnectionId remoteId = new ConnectionId(addr, protocol, ticket,
> rpcTimeout);
> do {
>   synchronized (connections) {
> connection = connections.get(remoteId);
> if (connection == null) {
>   LOG.error("poolsize: "+getPoolSize(conf));
>   connection = new Connection(remoteId);
>   connections.put(remoteId, connection);
> }
>   }
> } while (!connection.addCall(call));
>
>
> It does this by using the connection id as the key to the pool. All of this
> seems good except ConnectionId never hashes to the same value so it cannot
> reuse any connection.
>
> From my understanding of the code here is why.
>
> In HBaseClient.ConnectionId
>
> @Override
> public boolean equals(Object obj) {
>  if (obj instanceof ConnectionId) {
>ConnectionId id = (ConnectionId) obj;
>return address.equals(id.address) && protocol == id.protocol &&
>   ((ticket != null && ticket.equals(id.ticket)) ||
>(ticket == id.ticket)) && rpcTimeout == id.rpcTimeout;
>  }
>  return false;
> }
>
> @Override  // simply use the default Object#hashcode() ?
> public int hashCode() {
>   return (address.hashCode() + PRIME * (
>   PRIME * System.identityHashCode(protocol) ^
>  (ticket == null ? 0 : ticket.hashCode()) )) ^ rpcTimeout;
> }
>
> It uses the protocol and the ticket in the both functions.  However going
> back through all of the layers I think I found the problem.
>
> Problem:
>
> HBaseRPC.java:  public static VersionedProtocol getProxy(Class VersionedProtocol> protocol,
>   long clientVersion, InetSocketAddress addr, Configuration conf,
>   SocketFactory factory, int rpcTimeout) throws IOException {
> return getProxy(protocol, clientVersion, addr,
> User.getCurrent(), conf, factory, rpcTimeout);
>   }
>
> User.getCurrent() always returns a new User object.  That user instance is
> eventually passed down to ConnectionId.  However the User object doesn't
> implement hash() or equals() so one ConnectionId won't ever match another
> ConnectionId.
>
>
> There are several possible solutions.
> 1. implement hashCode and equals for the User.
> 2. only create one User object and reuse it.
> 3. don't look at ticket in ConnectionId (probably a bad idea)
>
>
> Thoughts?  Has anyone else noticed this behavior?  Should I open up a jira
> issue?
>
> I originally ran into the problem due to OS X having a limited number of
> threads per user (and I was not able to increase the limit) and our unit
> tests making requests quick enough that I ran out of threads.  I tried out
> all three solutions and it worked fine for my application.  However I'm not
> sure what changing the behavior would do to other's applications especially
> those that use SecureHadoop.
>
>
>
> Thanks,
> ~Jeff
>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> je...@qualtrics.com
>


Re: Regionserver goes down while endpoint execution

2013-03-12 Thread Gary Helmling
To expand on what Himanshu said, your endpoint is doing an unbounded scan
on the region, so with a region with a lot of rows it's taking more than 60
seconds to run to the region end, which is why the client side of the call
is timing out.  In addition you're building up an in memory list of all the
values for that qualifier in that region, which could cause you to bump
into OOM issues, depending on how big your values are and how sparse the
given column qualifier is.  If you trigger an OOMException, then the region
server would abort.

For this usage specifically, though -- scanning through a single column
qualifier for all rows -- you would be better off just doing a normal
client side scan, ie. HTable.getScanner().  Then you will avoid the client
timeout and potential server-side memory issues.


On Tue, Mar 12, 2013 at 9:29 AM, Ted Yu  wrote:

> From region server log:
>
> 2013-03-12 03:07:22,605 DEBUG org.apache.hadoop.hdfs.DFSClient: Error
> making BlockReader. Closing stale
> Socket[addr=/10.42.105.112,port=50010,localport=54114]
> java.io.EOFException: Premature EOF: no length prefix available
> at
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
> at
> org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:407)
>
> What version of HBase and hadoop are you using ?
> Do versions of hadoop on Eclipse machine and in your cluster match ?
>
> Cheers
>
> On Tue, Mar 12, 2013 at 4:46 AM, Kumar, Deepak8  >wrote:
>
> >  Lars,
> >
> > I am getting following errors at datanode & region servers.
> >
> > ** **
> >
> > Regards,
> >
> > Deepak
> >
> > ** **
> >
> > *From:* Kumar, Deepak8 [CCC-OT_IT NE]
> > *Sent:* Tuesday, March 12, 2013 3:00 AM
> > *To:* Kumar, Deepak8 [CCC-OT_IT NE]; 'user@hbase.apache.org'; 'lars
> > hofhansl'
> >
> > *Subject:* RE: Regionserver goes down while endpoint execution
> >
> >  ** **
> >
> > Lars,
> >
> > It is having following errors when I execute the Endpoint RPC client from
> > eclipse. It seems some of the regions at regionserver
> > vm-8aa9-fe74.nam.nsroot.net is taking more time to reponse.
> >
> > ** **
> >
> > Could you guide how to fix it. I don’t find any option to set
> hbase.rpc.timeout
> > from hbase configuration menu in CDH4 CM server for hbase
> configuration.**
> > **
> >
> > ** **
> >
> > Regards,
> >
> > Deepak
> >
> > ** **
> >
> > 3/03/12 02:33:12 INFO zookeeper.ClientCnxn: Session establishment
> complete
> > on server vm-15c2-3bbf.nam.nsroot.net/10.96.172.44:2181, sessionid =
> > 0x53d591b77090026, negotiated timeout = 6
> >
> > Mar 12, 2013 2:33:13 AM org.apache.hadoop.conf.Configuration
> > warnOnceIfDeprecated
> >
> > WARNING: hadoop.native.lib is deprecated. Instead, use
> > io.native.lib.available
> >
> > Mar 12, 2013 2:44:00 AM
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > processExecs
> >
> > WARNING: Error executing for row 153299:1362780381523:2932572079500658:
> > vm-ab1f-dd21.nam.nsroot.net:
> >
> > *java.util.concurrent.ExecutionException*: *
> > org.apache.hadoop.hbase.client.RetriesExhaustedException*: Failed after
> > attempts=10, exceptions:
> >
> > Tue Mar 12 02:34:15 EDT 2013,
> > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> > java.net.SocketTimeoutException*: Call to
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> > exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> > waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2271
> remote=
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
> >
> > Tue Mar 12 02:35:16 EDT 2013,
> > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> > java.net.SocketTimeoutException*: Call to
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> > exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> > waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2403
> remote=
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
> >
> > Tue Mar 12 02:36:18 EDT 2013,
> > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> > java.net.SocketTimeoutException*: Call to
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> > exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> > waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2465
> remote=
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
> >
> > Tue Mar 12 02:37:20 EDT 2013,
> > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> > java.net.SocketTimeoutException*: Call to
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> > exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> >

Re: endpoint coprocessor performance

2013-03-07 Thread Gary Helmling
> So should we close HBASE-5492 as a dup?
>
> Yes, that would make sense.  Done.


Re: endpoint coprocessor performance

2013-03-07 Thread Gary Helmling
> I profiled it and getStartKeysInRange is taking all the time. Recall I'm
> running 0.92.1. I think these factors are consistent with
> https://issues.apache.org/jira/browse/HBASE-5492, which was fixed in
> 0.92.3.
>
> We'll be upgrading soon, so I'll be able to verify the perf issue is gone.
>

Unfortunately it doesn't look like that issue was ever resolved, so the fix
version of 0.92.3 is not accurate.  I cleared the fix version to avoid
future confusion.

In any case, it looks like the same issue described in HBASE-6870, so if we
can get that in, it should solve your problem.


Re: endpoint coprocessor performance

2013-03-04 Thread Gary Helmling
I see this is HBASE-6870.  I thought that sounded familiar.


On Mon, Mar 4, 2013 at 6:23 PM, Gary Helmling  wrote:

>
> Check your logs for whether your end-point coprocessor is hitting
>> zookeeper on every invocation to figure out the region start key.
>> Unfortunately (at least last time I checked), the default way of invoking
>> an end point coprocessor doesn't use the meta cache. You can go through a
>> combination of the following instead:
>> HRegionLocation regionLocation = retried ?
>> connection.relocateRegion(**tableName, tableKey) :
>> connection.locateRegion(**tableName, tableKey);
>> ...
>> Then call HConnection.processExecs call, passing in the regionKeys from
>> above.
>> You can trap the error case of the region being relocated and try again
>> with retried = true and it'll update the meta data cache when
>> relocateRegion is called.
>>
>
>
> Any idea if we have an improvement logged in JIRA for this?  This is
> definitely something we should improve on.
>


Re: endpoint coprocessor performance

2013-03-04 Thread Gary Helmling
> Check your logs for whether your end-point coprocessor is hitting
> zookeeper on every invocation to figure out the region start key.
> Unfortunately (at least last time I checked), the default way of invoking
> an end point coprocessor doesn't use the meta cache. You can go through a
> combination of the following instead:
> HRegionLocation regionLocation = retried ?
> connection.relocateRegion(**tableName, tableKey) :
> connection.locateRegion(**tableName, tableKey);
> ...
> Then call HConnection.processExecs call, passing in the regionKeys from
> above.
> You can trap the error case of the region being relocated and try again
> with retried = true and it'll update the meta data cache when
> relocateRegion is called.
>


Any idea if we have an improvement logged in JIRA for this?  This is
definitely something we should improve on.


Re: endpoint coprocessor performance

2013-03-04 Thread Gary Helmling
>
> I'm running some experiments to understand where to use coprocessors. One
> interesting scenario is computing distinct values. I ran performance tests
> with two distinct value implementations: one using endpoint coprocessors,
> and one using just scans (computing distinct values client side only). I
> noticed that the endpoint coprocessor implementation averaged 80 ms slower
> than the scan implementation. Details of that are below for anyone
> interested.
>
> To drill into the performance, I instrumented the code and ultimately
> deployed a no-op endpoint coprocessor, to look at the overhead of simply
> calling it. I'm measuring around 100ms for calling my empty, no-op endpoint
> coprocessor.
>
>
100ms to do a single no-op coprocessor call seems very high.  Do you have
more details of where you see the code spending time?  Or even better, can
you post sample code somewhere?  Also, which version of HBase are you
testing with?

I need to do more tests, but I believe my tests are leading me to similar
> conclusions drawn here:
> http://hbase-coprocessor-experiments.blogspot.com/2011/05/extending.html
>
> I.e. if the query/scan is selective enough (I'll go out on a limb and
> estimate 50-100 rows), then it's better to just perform a scan and compute
> client side. Endpoint coprocessors will make sense for larger result sets
> and/or scans that hit multiple regions.
>
>
I would certainly agree with this.  Coprocessor endpoints are not a
replacement for the regular HBase client APIs.  They're really meant to
allow you to extend HBase with new capabilities.  Coprocessor endpoints
will allow you to parallelize operations across multiple regions, which can
be a powerful capability if you need it, or will allow you to maintain some
pre-computed state server-side and then easily retrieve it from the client.
 If you're scanning larger amounts of data and computing a much smaller
result, endpoints will also save transferring the full data set over the
network back to the client, but you'll still need to scan through the data
server-side.  In your case, are you applying the same scan options in the
coprocessor (start/end row, any filtering)?


> Before going too far, I wanted to check if anyone in this group has
> suggestions. I.e. perhaps there are just some configuration options I've
> not uncovered. Does this 100ms latency sound correct?
>

It would help to have more details of what your code is actually doing.
 Can you post an extract of what's running in the coprocessor?


--gh


Re: Please welcome our newest committer: Sergey Shelukhin

2013-02-22 Thread Gary Helmling
Congrats, Sergey!  Great work!


On Fri, Feb 22, 2013 at 2:10 PM, Enis Söztutar  wrote:

> Congrats. Well deserved.
>
>
> On Fri, Feb 22, 2013 at 1:57 PM, Andrew Purtell 
> wrote:
>
> > Congratulations Sergey!
> >
> >
> > On Fri, Feb 22, 2013 at 1:39 PM, Ted Yu  wrote:
> >
> > > Hi,
> > > Sergey has 51 issues under his name:
> > > *
> > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HBASE%20AND%20assignee%20%3D%20sershe%20AND%20status%20in%20(Resolved%2C%20Closed)
> > > *
> > > *
> > > *
> > > * He was the driving force in finishing HBASE-5416 Improve performance
> of
> > > scans with some kind of filters.
> > > * He volunteers to be component owner for compaction
> > > * He has been using and improving integration tests
> > > * Several of his JIRAs improve dynamic config update
> > > * He has studied levelDB and come up with various plans to improve
> > > compaction performance.
> > >
> > > I am sure he is going to make more contributions to HBase.
> > >
> > > Keep up the great work, Sergey.
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>


Re: restrict clients

2013-02-11 Thread Gary Helmling
You can also use the service-level authorization support to control which
users/groups are allowed to connect at all.  It's configured via
hbase-policy.xml in the conf/ directory and functions similarly to the HDFS
implementation:
http://hadoop.apache.org/docs/r1.0.4/service_level_auth.html

But with ACLs already controlling who has read access, you can get
finer-grained support with ACLs directly.

If you want to control which hosts can connect to the cluster at all, start
with iptables, as Mike suggests.


On Mon, Feb 11, 2013 at 7:36 PM, Anoop Sam John  wrote:

> HBase supports Kerberos based authentication. Only those client nodes with
> a valid Kerberos ticket can connect with the HBase cluster.
>
> -Anoop-
> 
> From: Rita [rmorgan...@gmail.com]
> Sent: Monday, February 11, 2013 6:37 PM
> To: user@hbase.apache.org
> Subject: Re: restrict clients
>
> Hi,
>
> I am looking for more than an ACL. I want to control what clients can
> connect to the hbase cluster. Is that possible?
>
>
> On Fri, Feb 8, 2013 at 10:36 AM, Stas Maksimov  wrote:
>
> > Hi Rita,
> >
> > As far as I know ACL is on a user basis. Here's a link for you:
> > http://hbase.apache.org/book/hbase.accesscontrol.configuration.html
> >
> > Thanks,
> > Stas
> >
> >
> > On 8 February 2013 15:20, Rita  wrote:
> >
> > > Hi,
> > >
> > > In an enterprise deployment, how can I restrict who can access the
> data?
> > > For example, I want only certain servers able to GET,PUT data everyone
> > else
> > > should be denied. Is this possible?
> > >
> > >
> > >
> > > --
> > > --- Get your facts first, then you can distort them as you please.--
> > >
> >
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>


Re: [ANNOUNCE] New Apache HBase Committer - Devaraj Das

2013-02-07 Thread Gary Helmling
Congrats, Devaraj!


On Thu, Feb 7, 2013 at 5:36 AM, Nicolas Liochon  wrote:

> Congrats, Devaraj!
>
>
> On Thu, Feb 7, 2013 at 2:26 PM, Marcos Ortiz  wrote:
>
> > Congratulations, Devaraj.
> >
> >
> > On 02/07/2013 02:20 AM, Lars George wrote:
> >
> >> Congrats! Welcome aboard.
> >>
> >> On Feb 7, 2013, at 6:19, Ted Yu  wrote:
> >>
> >>  Hi,
> >>> We've brought in one new Apache HBase Committer: Devaraj Das.
> >>>
> >>> On behalf of the Apache HBase PMC,  I am excited to welcome Devaraj as
> >>> committer.
> >>>
> >>> He has played a key role in unifying RPC engines for 0.96
> >>> He fixed some tricky replication-related bugs
> >>> There're 30 resolved HBase JIRAs under his name.
> >>>
> >>> Please join me in congratulating Devaraj for his new role.
> >>>
> >>
> > --
> > Marcos Ortiz Valmaseda,
> > Product Manager && Data Scientist at UCI
> > Blog: http://marcosluis2186.**posterous.com<
> http://marcosluis2186.posterous.com>
> > Twitter: @marcosluis2186  http://twitter.com/marcosluis2186>
> > >
> >
>


Re: maven junit test of a coprocessor

2013-01-31 Thread Gary Helmling
If you're writing a junit test that spins up a mini cluster to test the
coprocessor, then there's no need to deploy the jar into HDFS just for
testing.  The coprocessor class should already be on your test classpath.
 In your test's setup method, you just need to either: a) add the
coprocessor class name to the configuration (as
hbase.coprocessor.region.classes) or b) to the HTableDescriptor before you
create the table.

Take a look at the following HBase test classes as an example:

org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface
org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint




On Wed, Jan 30, 2013 at 11:27 PM, Wei Tan  wrote:

> Hi,
> I am writing a maven junit test for a HBase coprocessor. The problem is
> that, I want to write a junit test that deploy the cp jar into a cluster,
> and test its function. However, test is before install so I cannot get a
> cp jar to deploy at that time.
> Is this like a chicken-and-egg problem? Any advice? Thanks!
> Wei


Re: Announcing Phoenix: A SQL layer over HBase

2013-01-30 Thread Gary Helmling
Great stuff!  I've been waiting for this.  Congrats on open sourcing and
thanks for sharing!


On Wed, Jan 30, 2013 at 1:04 PM, James Taylor wrote:

> We are pleased to announce the immediate availability of a new open source
> project, Phoenix, a SQL layer over HBase that powers the HBase use cases at
> Salesforce.com. We put the SQL back in the NoSQL:
>
>  * Available on GitHub at 
> https://github.com/**forcedotcom/phoenix
>  * Embedded JDBC driver implements the majority of java.sql interfaces,
>including the metadata APIs.
>  * Built for low latency queries through parallelization, the use of
>native HBase APIs, coprocessors, and custom filters.
>  * Allows columns to be modelled as a multi-part row key or key/value
>cells.
>  * Full query support with predicate push down and optimal scan key
>formation.
>  * DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for
>adding/removing columns.
>  * Versioned schema repository. Snapshot queries use the schema that
>was in place when data was written.
>  * DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT
>for mass data transfer between the same or different tables, and
>DELETE for deleting rows.
>  * Limited transaction support through client-side batching.
>  * Single table only - no joins yet and secondary indexes are a work in
>progress.
>  * Follows ANSI SQL standards whenever possible
>  * Requires HBase v 0.94.2 or above
>  * BSD-like license
>  * 100% Java
>
> Join our user groups:
> Phoenix HBase User: https://groups.google.com/**
> forum/#!forum/phoenix-hbase-**user
> Phoenix HBase Dev: https://groups.google.com/**
> forum/#!forum/phoenix-hbase-**dev
> and check out our roadmap:
> https://github.com/**forcedotcom/phoenix/wiki#wiki-**roadmap
>
> We welcome feedback and contributions from the community to Phoenix and
> look forward to working together.
>
> Regards,
>
> James Taylor
> @JamesPlusPlus
>


Re: Find the tablename in Observer

2013-01-28 Thread Gary Helmling
> >Will the CoprocessorEnvironment reference in the  start() method be
> instanceof RegionCoprocessorEnvironment too
>
> No. It will be reference of RegionEnvironment . This is not a public class
> so you wont be able to do the casting.
>

Since RegionEnvionment implements RegionCoprocessorEnvironment, you should
be able to do:

((RegionCoprocessorEnvironment)env).getRegion().getRegionInfo().getTableName();

within your start() method without a problem.


Re: thrift client api supports filters and coprocessor

2013-01-02 Thread Gary Helmling
I'm not familiar with happybase, but with the recent conversion of
coprocessor endpoints to protocol buffer services in trunk, it should be
possible to implement calling endpoints from other languages that protobufs
support.  There is a ticket to enable endpoint calls over the REST gateway:

https://issues.apache.org/jira/browse/HBASE-6790

I'm not sure how you would get thrift and PB to play nice together, but it
would at least be possible to implement a thrift wrapper of the serialized
PB service request, which could be deserialized and invoked on the thrift
server.

If you're willing to experiment, though, and need coprocessor endpoints you
might try out REST as a thrift alternative.  HBASE-6790 still remains to be
done, but it shouldn't be too much work.  Just needs someone interested to
tackle it.


On Fri, Dec 28, 2012 at 5:29 AM, Shengjie Min  wrote:

> Hi guys,
>
> Sadly, My hbase client language is Python, I am using happybase for now
> which is based on thrift AFAIK. I know thrift so far is still not
> supporting filters, coprocessors (correct me if I am wrong here). Can some
> one point me any Jira items I can track the plan/progress if there is one?
> The only ones I can find is from "Hbase in Action":
>
> “Thrift server to match the new Java API”:
> https://issues.apache.org/jira/browse/HBASE-1744
>
> “Make Endpoint Coprocessors Available from Thrift,”
> https://issues.apache.org/jira/browse/HBASE-5600.
>
> The 1st one doesn't seem covering filters and the 2nd one hasn't been
> updated for a long while.
>
> --
> Shengjie Min
>


Re: [ANNOUNCE] New Apache HBase Committers - Matteo Bertozzi and Chunhui Shen

2013-01-02 Thread Gary Helmling
Congrats Matteo and Chunhui!  Keep up the good work!


On Wed, Jan 2, 2013 at 5:24 PM, Manoj Babu  wrote:

> Congraulations Matteo and Chunhui!
>
> Cheers!
> Manoj.
>
>
> On Thu, Jan 3, 2013 at 5:18 AM, lars hofhansl  wrote:
>
> > Congrats Matteo and Chunhui, glad to have you on board.
> > Don't break the 0.94 tests :)
> >
> >
> > -- Lars
> >
> >
> > - Original Message -
> > From: Jonathan Hsieh 
> > To: d...@hbase.apache.org; user@hbase.apache.org
> > Cc:
> > Sent: Wednesday, January 2, 2013 11:37 AM
> > Subject: [ANNOUNCE] New Apache HBase Committers - Matteo Bertozzi and
> > Chunhui Shen
> >
> > Along with bringing in the new year, we've brought in two new Apache
> > HBase Committers!
> >
> > On behalf of the Apache HBase PMC,  I am excited to welcome Matteo
> > Bertozzi and Chunhui Shen as committers!
> >
> > * Matteo's been working on table snapshots, updates to
> > security-related coprocessors, and some performance benchmarking
> > improvements.
> > * Chunhui's been working on lots of really important and tricky data
> > loss fixes, race condition fixes, region assignment fixes, and
> > availability fixes.
> >
> > Please join me in congratulating Matteo and Chunhui on their new roles!
> >
> > Jon.
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // j...@cloudera.com
> >
> >
>


Re: coprocessors and JMX

2012-10-15 Thread Gary Helmling
It sounds like you want to have the coprocessor expose it's own
metrics as part of the HBase metrics?  If that's right, can you
describe some of the metrics you might want to expose?

We could possibly provide hooks to publish metrics through the
CoprocessorEnvironment, which could then get pushed out with the other
HBase metrics in some coprocessor namespace.  I haven't been following
the latest metrics changes, but I assume this would mainly be some
hooks for the coprocessor to register new metrics with the
environment, using the existing metrics types.

It would also make sense to expose some metrics on coprocessor
endpoint RPCs, same as we do for other RPC methods but named for the
endpoint method being called.  This is something I've considered in
the past, but I'm not sure if there's an existing JIRA for it.


On Mon, Oct 15, 2012 at 4:45 PM, Elliott Clark  wrote:
> There's not a special way inside of the coprocessor to get metrics.
>
>
>- RegionObservers can get the HRegion, through the ObserverContext,
>which has lots of metrics hanging off of it that might be useful
>- You can get it through the normal jmx system.  However that's pretty
>verbose.
>- Or the easiest way, which is a little bit slower, is though the info
>server http://:/jmx.
>
>
> On Sun, Oct 14, 2012 at 11:35 AM, Grant Ingersoll wrote:
>
>> I have a Coprocessor, what's the best way to hook it into HBase's JMX
>> setup so that I can get stats on the coprocessor?
>>
>> Thanks,
>> Grant


Re: Allocating more heap for endpoint coprocessors

2012-08-31 Thread Gary Helmling
Maybe we need to add a coprocessors section to the ref guide.  I think
all the current documentation is in javadoc.  And if all the
potentially destabilizing issues of in-process coprocessor usage are
not yet called out (memory usage, cpu, etc), we could more explicitly
detail that.

In we want to really be able to support these cases, though, we
probably need to push HBASE-4047 (external coprocessor host) forward.
Without it there's not a lot we can do to prevent coprocessors from
taking down region servers.


On Thu, Aug 30, 2012 at 10:04 PM, lars hofhansl  wrote:
> Maybe we should be better with marking improvement as such in jira, and then 
> have a special list in the release announcements highlighting these 
> (otherwise they just be in the noise of all the other bug fixes).
>
> -- Lars
>
>
>
> 
>  From: Ramkrishna.S.Vasudevan 
> To: user@hbase.apache.org; 'lars hofhansl' 
> Sent: Thursday, August 30, 2012 9:59 PM
> Subject: RE: Allocating more heap for endpoint coprocessors
>
> @Lars/@Gary
>
> Do we need to document such things.  Recently someone was asking me a
> question like this, if my endpoint impl is so memory intensive it just
> affects a running cluster and already the RS
> has a huge memory heap associated with it.
>
> So its better we document saying if your endpoint consumes memory and
> because it runs along with RS based on your need add that extra amount of
> memory that will be taken up by the endpoint impl.
>
> Regards
> Ram
>
>> -Original Message-
>> From: lars hofhansl [mailto:lhofha...@yahoo.com]
>> Sent: Friday, August 31, 2012 3:04 AM
>> To: user@hbase.apache.org
>> Subject: Re: Allocating more heap for endpoint coprocessors
>>
>> In the upcoming 0.94.2 release will also have HBASE-6505, which allows
>> you to share state between RegionObservers (and Endpoints) within the
>> same RegionServer.
>>
>> -- Lars
>>
>>
>>
>> 
>>  From: Gary Helmling 
>> To: user@hbase.apache.org
>> Sent: Thursday, August 30, 2012 1:59 PM
>> Subject: Re: Allocating more heap for endpoint coprocessors
>>
>> Endpoint coprocessors are loaded and run within the HBase RegionServer
>> process.  Your endpoint coprocessors will be running on the region
>> servers hosting the regions for the table(s) on which the coprocessor
>> is configured.
>>
>> So the way to allocate more memory is by setting either HBASE_HEAPSIZE
>> or setting the max heap in HBASE_REGIONSERVER_OPTS in hbase-env.sh on
>> the region server.
>>
>> Note that a separate coprocessor instance is loaded for each table
>> region, so, say you want to allocate 10MB for your coprocessor, but
>> each region server hosts 20 regions, you would want to increase the
>> heap size by 200MB (20x10MB).
>>
>> --gh
>>
>> On Thu, Aug 30, 2012 at 1:45 PM, Young Kim 
>> wrote:
>> > Hi,
>> >
>> > We have some memory intensive endpoint coprocessors running on our
>> RegionServers. As a result, we want to allocate more heap for the
>> coprocessors, but there doesn't seem to be much documentation on which
>> Hbase processes are directly responsible for coprocessors. Does anyone
>> happen to know or direct me to some resource that does?
>> >
>> > Thanks,
>> > Young Kim
>> >


Re: Allocating more heap for endpoint coprocessors

2012-08-30 Thread Gary Helmling
Endpoint coprocessors are loaded and run within the HBase RegionServer
process.  Your endpoint coprocessors will be running on the region
servers hosting the regions for the table(s) on which the coprocessor
is configured.

So the way to allocate more memory is by setting either HBASE_HEAPSIZE
or setting the max heap in HBASE_REGIONSERVER_OPTS in hbase-env.sh on
the region server.

Note that a separate coprocessor instance is loaded for each table
region, so, say you want to allocate 10MB for your coprocessor, but
each region server hosts 20 regions, you would want to increase the
heap size by 200MB (20x10MB).

--gh

On Thu, Aug 30, 2012 at 1:45 PM, Young Kim  wrote:
> Hi,
>
> We have some memory intensive endpoint coprocessors running on our 
> RegionServers. As a result, we want to allocate more heap for the 
> coprocessors, but there doesn't seem to be much documentation on which Hbase 
> processes are directly responsible for coprocessors. Does anyone happen to 
> know or direct me to some resource that does?
>
> Thanks,
> Young Kim
>


Re: hbase security

2012-05-17 Thread Gary Helmling
I could repost the "up and running with secure hadoop" one.  But it's
kind of out of date at this point.  I remember, back when the site was
still up, getting some comments on it about things that had already
changed in the 0.20.20X releases.

I can take a look and see how bad it is.


On Thu, May 17, 2012 at 1:22 PM, Stack  wrote:
> On Thu, May 17, 2012 at 7:19 AM, Eugene Koontz  wrote:
>> http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/
>>
>> http://web.archive.org/web/20100817034022/http://hbaseblog.com/2010/07/21/up-and-running-with-secure-hadoop/
>>
>
> Anyone interested in porting these over to
> http://blogs.apache.org/hbase/? They have great stuff in them.
> St.Ack


Re: Best way to profile a coprocessor

2012-05-10 Thread Gary Helmling
> What is the best way to profile some co-processor code (running on the
> regionserver)? If you have done it successfully, what tips can you
> offer, and what unexpected problems did you encounter?
>

It depends on what exactly you want to look at, but ultimately I don't
think it's too different from profiling any other remote java process.

I did a fair amount of profiling of the AccessController coprocessor,
and the general process I used was:

0) bring up a cluster with my cp configuration (obviously)
1) find a specific table region I was interested in
2) manually assign this region to a given region server -- you may
also want to disable balancing so that your region stays put
("balance_switch false" in the shell)
3) attach your profiler to the region server
4) run your target load

Only problem I ran into was that I had to hack up the hbase shell
script a little bit to disable the CMS collector -- the profiler
didn't work with CMS.


--gh


Re: Unable to run aggregation using AggregationClient in HBase0.92

2012-05-07 Thread Gary Helmling
Hi Anil,

> Does this mean that
> "hbase.coprocessor.region.classes" is not a client side configuration? I am
> just curious to know why it was not working when i was setting the conf
> through code.
>

That is correct.  This is a server-side only configuration.  Setting
it on the client side will have no effect.


Re: Unable to run aggregation using AggregationClient in HBase0.92

2012-05-07 Thread Gary Helmling
>
> org.apache.hadoop.hbase.ipc.HBaseRPC$UnknownProtocolException:
> org.apache.hadoop.hbase.ipc.HBaseRPC$UnknownProtocolException: No matching
> handler for protocol org.apache.hadoop.hbase.coprocessor.AggregateProtocol
> in region transactions,,1335223974116.e9190687f8a74b5083b39b6e5bd55705.

The exception indicates that the AggregateImplementation coprocessor
is not loaded on your region servers.  Setting
"hbase.coprocessor.region.classes" in the client configuration that
you use is not sufficient.  You need to either set this in the
hbase-site.xml file used by your region servers (and restart the
region servers), or you can enable the coprocessor for a specific
table by disabling the table, setting a property on the
HTableDescriptor and re-enabling the table.

For more details, see
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html#package_description


Re: Integrity constraints

2012-04-24 Thread Gary Helmling
Hi Vamshi,

See the ConstraintProcessor coprocessor that was added for just this
kind of case: 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/constraint/package-summary.html

You would need to implement the Constraint interface and apply the
configuration to your tables via the Constraints utility.

Assuming the fields are being handled as strings on the client end,
your Constraint implementation could simply call Bytes.toString() and
apply some basic regexs for validation.  Or you could consider using a
more structured serialization format like protobufs.

--gh

On Tue, Apr 24, 2012 at 9:35 PM, Vamshi Krishna  wrote:
> Hi all ,  here i am having one basic doubt about constraints on hbase
> table, after knowing there is no concept of data types in hbase and
> everything is stored in the bytes.
>  Suppose a table in hbase has 3 columns,(under same column family)
> 1st column is 'Name' which accepts only character strings not numbers and
> special symbols.
> 2nd column 'phoneNumber', which is numerals that too exactly 10 digits, and
> 3rd column 'city' which should accept only upper case character strings. If
> such is the situation, how to enforce the constraints on each of the
> columns of hbase table?
>
> Also Can anybody please tell how to write the equivalent  query in hbase
> shell and Java to do so?
> --
> *Regards*
> *
> Vamshi Krishna
> *


Re: A confusion of RegionCoprocessorEnvironment.getReion() method

2012-04-10 Thread Gary Helmling
Each and every HRegion on a given region server will have it's own
distinct instance of your configured RegionObserver class.
RegionCoprocessorEnvironment.getRegion() returns a reference to the
HRegion containing the current coprocessor instance.

The hierarchy is essentially:

HRegionServer
\_  HRegion
\_ RegionCoprocessorHost
 \_  

(repeated for each HRegion).

This blog post by Mingjie may help explain things a bit more:
https://blogs.apache.org/hbase/entry/coprocessor_introduction


--gh



On Tue, Apr 10, 2012 at 2:30 AM, yonghu  wrote:
> Hello,
>
> The description of this method is " /** @return the region associated
> with this coprocessor */" and the return value is an HRegion instance.
> If I configure the region-coprocessor class in hbase-site.xml.  It
> means that this coprocessor will be applied to every HRegion which
> resides on this Region Server (if I understand right).  Why this
> method only return one HRgion instance not a list of HRgion
> instances?Suppose that a region server has two HRegions, one is for
> table 'test1', the other is for table 'test2'.  Which HRgion instance
> will be returned if I call RegionCoprocessorEnvironment.getReion()?
>
> Thanks!
>
> Yong


Re: Thrift and coprocessors

2012-03-19 Thread Gary Helmling
Currently endpoint coprocessors are only callable via the java client.
 Please do open a JIRA describing what you would like to see here.  If
you'd like to try working up a patch, that would be even better!


On Mon, Mar 19, 2012 at 11:03 AM, Ben West  wrote:
> Hi all,
>
> We use thrift to access HBase, and I've been playing around with endpoint 
> coprocessors. I'm wondering how I can use thrift to access these - it seems 
> like they're mostly supported with Java clients.
>
> So far, I've just been adding each function to the thrift schema and then 
> manually editing the thrift server to run my coprocessor. Is there an easier 
> way?
>
> If none currently exist, I can write up a JIRA.
>
> Thanks,
> -Ben


Re: HBase coprocessors blog posted

2012-02-02 Thread Gary Helmling
Hi Andy,

Sure, I'd be happy to do a draft review and contribute whatever
comments I can.  For the byline, thanks, I appreciate the offer to
include me.  I'm fine with whatever you guys decide -- you're the ones
writing up the posts. :)

Let me know as soon as you or Eugene have something you want me to
take a look at.

--gh


On Wed, Feb 1, 2012 at 7:23 PM, Andrew Purtell  wrote:
> Gary, Eugene is resurrecting posts on security now. I was thinking you should 
> be in the byline because of your many contributions. Is that ok? Want to make 
> an edit pass?
>
>
>    - Andy
>
>
>
>>____
>> From: Gary Helmling 
>>To: d...@hbase.apache.org
>>Cc: user@hbase.apache.org
>>Sent: Wednesday, February 1, 2012 8:34 AM
>>Subject: Re: HBase coprocessors blog posted
>>
>>Great work, Mingjie!
>>
>>
>>On Wed, Feb 1, 2012 at 7:29 AM, Jonathan Hsieh  wrote:
>>> Mingjie,
>>>
>>> Great post.  It would be great if we have other major features written up
>>> at this level (what it means for users and some of the guts) posted on the
>>> Apache HBase blog!
>>>
>>> Jon.
>>>
>>> On Wed, Feb 1, 2012 at 12:26 AM, Mingjie Lai  wrote:
>>>
>>>> Hi hbasers.
>>>>
>>>> A hbase blog regarding coprocessors has been posted to apache blog site.
>>>> Here is the link:
>>>>
>>>> https://blogs.apache.org/**hbase/entry/coprocessor_**introduction<https://blogs.apache.org/hbase/entry/coprocessor_introduction>
>>>>
>>>> Your comments are welcome.
>>>>
>>>> Thanks,
>>>> Mingjie
>>>>
>>>
>>>
>>>
>>> --
>>> // Jonathan Hsieh (shay)
>>> // Software Engineer, Cloudera
>>> // j...@cloudera.com
>>
>>
>>


Re: HBase coprocessors blog posted

2012-02-01 Thread Gary Helmling
Great work, Mingjie!


On Wed, Feb 1, 2012 at 7:29 AM, Jonathan Hsieh  wrote:
> Mingjie,
>
> Great post.  It would be great if we have other major features written up
> at this level (what it means for users and some of the guts) posted on the
> Apache HBase blog!
>
> Jon.
>
> On Wed, Feb 1, 2012 at 12:26 AM, Mingjie Lai  wrote:
>
>> Hi hbasers.
>>
>> A hbase blog regarding coprocessors has been posted to apache blog site.
>> Here is the link:
>>
>> https://blogs.apache.org/**hbase/entry/coprocessor_**introduction
>>
>> Your comments are welcome.
>>
>> Thanks,
>> Mingjie
>>
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // j...@cloudera.com


Re: AggregateProtocol Help

2012-01-01 Thread Gary Helmling
ailed
>> > >> on
>> > >> socket timeout exception: java.net.SocketTimeoutException: 6
>> millis
>> > >> timeout while waiting for channel to be ready for read. ch :
>> > >> java.nio.channels.SocketChannel[connected local=/10.0.0.235:5
>> > >> remote=namenode/10.0.0.235:60020]
>> > >> (8 more of these, making for 10 tries)
>> > >> Sat Dec 31 17:51:09 GMT 2011,
>> > >> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@1fc4f0f8,
>> > >> java.net.SocketTimeoutException: Call to namenode/10.0.0.235:60020
>> failed
>> > >> on
>> > >> socket timeout exception: java.net.SocketTimeoutException: 6
>> millis
>> > >> timeout while waiting for channel to be ready for read. ch :
>> > >> java.nio.channels.SocketChannel[connected local=/10.0.0.235:59364
>> > >> remote=namenode/10.0.0.235:60020]
>> > >>
>> > >>       at
>> > >> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>> > >>       at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>> > >>       at
>> > >>
>> > >>
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.
>> > >> processExecs(HConnectionManager.java:1465)
>> > >>       at
>> > >>
>> org.apache.hadoop.hbase.client.HTable.coprocessorExec(HTable.java:1555)
>> > >>       at
>> > >>
>> > >>
>> >
>> org.apache.hadoop.hbase.client.coprocessor.AggregationClient.sum(Aggregation
>> > >> Client.java:229)
>> > >>       at EDRPAggregator.testSumWithValidRange(EDRPAggregator.java:51)
>> > >>       at EDRPAggregator.main(EDRPAggregator.java:77)
>> > >>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > >>       at
>> > >>
>> > >>
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>> > >> )
>> > >>       at
>> > >>
>> > >>
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
>> > >> .java:25)
>> > >>       at java.lang.reflect.Method.invoke(Method.java:597)
>> > >>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>> > >>
>> > >>
>> > >> Looking at the log (.regionserver-namenode.log) I see this debug
>> > message:
>> > >>
>> > >> 2011-12-31 17:42:23,472 DEBUG
>> > >> org.apache.hadoop.hbase.coprocessor.AggregateImplementation: Sum from
>> > this
>> > >> region is
>> EDRPTestTbl,,1324485124322.7b9ee0d113db9b24ea9fdde90702d006.:
>> > 0
>> > >>
>> > >> Where the sum value looks reasonable which makes me think the sum of a
>> > >> CF:CQ
>> > >> worked. But I never see this value on stdout. Then I see this warning:
>> > >>
>> > >> 2011-12-31 17:42:23,476 WARN org.apache.hadoop.ipc.HBaseServer:
>> > >> (responseTooSlow): {"processingtimems":113146,"call":"execCoprocess$
>> > >> 2011-12-31 17:42:23,511 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>> > Server
>> > >> Responder, call execCoprocessor([B@4b22fad6, getSum(org.$
>> > >> 2011-12-31 17:42:23,515 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>> > Server
>> > >> handler 1 on 60020 caught: java.nio.channels.ClosedChann$
>> > >>       at
>> > >>
>> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>> > >>       at
>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>> > >>       at
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1651)
>> > >>       at
>> > >>
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServe
>> > >> r.java:924)
>> > >>       at
>> > >>
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java
>> > >> :1003)
>> > >>       at
>> > >>
>> > >>
>> >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer
>&g

Re: AggregateProtocol Help

2011-12-23 Thread Gary Helmling
Hi Tom,

The test code is not really the best guide for configuration.

To enable the AggregateProtocol on all of your tables, add this to the
hbase-site.xml for the servers in your cluster:

  
hbase.coprocessor.user.region.classes
org.apache.hadoop.hbase.coprocessor.AggregateImplementation
  

If you only want to use the aggregate functions on a specific table
(or tables), then you can enable that individually for the table from
the shell:

1) disable the table
hbase> disable 'EDRP7'

2) add the coprocessor
hbase> alter 'EDRP7', METHOD => 'table_att',

'coprocessor'=>'|org.apache.hadoop.hbase.coprocessor.AggregateImplementation||'

(Note that the pipes in the value string are required)

3) re-enable the table
hbase> enable 'EDRP7'


Either way should work.  With the second approach you will see the
coprocessor listed when you describe the table from the shell, as Ted
mentioned.  With the first approach you will not, but it should be
loaded all the same.

--gh


On Fri, Dec 23, 2011 at 7:04 AM, Ted Yu  wrote:
> I don't know why you chose HBaseTestingUtility to create the table.
> I guess you followed test code example.
>
> At least you should pass the conf to this ctor:
>  public HBaseTestingUtility(Configuration conf) {
>
> If coprocessor was installed correctly, you should see something like(from
> HBASE-5070):
> coprocessor$1 =>
> '|org.apache.hadoop.hbase.constraint.ConstraintProcessor|1073741823|'
>
> Cheers
>
> On Fri, Dec 23, 2011 at 3:02 AM, Tom Wilcox  wrote:
>
>> Hi,
>>
>> I am not sure how we load the AggregateImplementation into the table. When
>> we are creating a table, we use the same functions as the test as follows...
>>
>> ...
>> >              conf.set(CoprocessorHost.REGION_COPROCESSOR_CONF_KEY,
>> >
>> > "org.apache.hadoop.hbase.coprocessor.AggregateImplementation");
>> >
>> >              // Utility.CreateHBaseTable(conf, otherArgs[1],
>> otherArgs[2],
>> > true);
>> >
>> >              HBaseTestingUtility util = new HBaseTestingUtility();
>> >              HTable table = util.createTable(EDRP_TABLE, EDRP_FAMILY);
>> >
>> >              AggregationClient aClient = new AggregationClient(conf);
>> ...
>>
>> Running DESCRIBE on a table produced shows the following output:
>>
>> hbase(main):002:0> describe 'EDRP7'
>> DESCRIPTION
>>                                            ENABLED
>>  {NAME => 'EDRP7', FAMILIES => [{NAME => 'advanceKWh', BLOOMFILTER =>
>> 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>  true
>>  '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
>> BLOCKSIZE => '65536', IN_MEMORY => 'false', B
>>  LOCKCACHE => 'true'}]}
>>
>> We are using the tip of 0.92 (cloned from the Git repo). See the version
>> string below:
>>
>> hbase(main):005:0> version
>> 0.92.0, r1208286, Thu Dec 15 13:16:03 GMT 2011
>>
>> We would really appreciate an example of how to create a table that is
>> enabled to handle Aggregation).
>>
>> Thanks
>>
>>
>> 
>> From: Ted Yu [yuzhih...@gmail.com]
>> Sent: 22 December 2011 17:03
>> To: user@hbase.apache.org
>> Subject: Re: AggregateProtocol Help
>>
>> Have you loaded AggregateImplementation into your table ?
>> Can you show us the contents of the following command in hbase shell:
>> describe 'your-table'
>>
>> BTW are you using the tip of 0.92 ?
>> HBASE-4946 would be of help for dynamically loaded coprocessors which you
>> might use in the future.
>>
>> Cheers
>>
>> On Thu, Dec 22, 2011 at 8:09 AM, Tom Wilcox  wrote:
>>
>> > Hi,
>> >
>> > We are trying to use the aggregation functionality in HBase 0.92  and we
>> > have managed to get the test code working using the following command:
>> >
>> > java -classpath junit-4.10.jar:build/*:$HBASELIBS/*
>> > org.junit.runner.JUnitCore
>> > org.apache.hadoop.hbase.coprocessor.TestAggregateProtocol
>> >
>> > Closer inspection of this test class has revealed that it uses a mini DFS
>> > cluster to populate and run the tests. These tests return successfully.
>> >
>> > However, when we attempt to run similar code on our development HDFS
>> > cluster we experience the following error:
>> >
>> > [sshexec] 11/12/22 15:46:28 WARN
>> > client.HConnectionManager$HConnectionImplementation: Error executing for
>> row
>> >  [sshexec] java.util.concurrent.ExecutionException:
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$UnknownProtocolException:
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$UnknownProtocolException: No
>> matching
>> > handler for protocol
>> org.apache.hadoop.hbase.coprocessor.AggregateProtocol
>> > in region EDRPTestTbl,,1324485124322.7b9ee0d113db9b24ea9fdde90702d006.
>> >  [sshexec]   at
>> > org.apache.hadoop.hbase.regionserver.HRegion.exec(HRegion.java:4010)
>> >  [sshexec]   at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.execCoprocessor(HRegionServer.java:3040)
>> >  [sshexec]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> >  [sshexec]   at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMetho

Re: Hmaster can't start for the latest trunk version

2011-10-31 Thread Gary Helmling
Yes, HBASE-4510 broke running on one set of conditions, now the "fix"
in HBASE-4680 seems to have broken another.

Are the safe mode related changes from HBASE-4510 really necessary
right now?  Would it be possible to wait for HDFS-2413, when we have a
real API for checking safe mode?  Or do we absolutely need the change
in safe mode checking to run on 0.23?  The previous version of the
check at least did not have these issues.  If we don't strictly need
that change, I'd be in favor of reverting both HBASE-4680 and the
safe-mode related bits from HBASE-4510.


On Sun, Oct 30, 2011 at 9:44 PM, Harsh J  wrote:
> Hi Gaojinchao,
>
> -user (bcc)
> +dev
>
> This appears to be due to https://issues.apache.org/jira/browse/HBASE-4680 
> (and also HBASE-4510 which sourced the whole issue), and is caused by the 
> fact that /hbase doesn't exist yet when you first start it up. I've filed 
> https://issues.apache.org/jira/browse/HBASE-4705 for this.
>
> Workaround may be to "sudo -u hbase-user hadoop dfs -mkdir /hbase" before 
> starting, for now.
>
> On 31-Oct-2011, at 9:42 AM, Gaojinchao wrote:
>
>> The latest trunk version.
>> Throw this logs:
>> 2011-10-31 00:09:09,549 FATAL org.apache.hadoop.hbase.master.HMaster: 
>> Unhandled exception. Starting shutdown.
>> java.io.FileNotFoundException: File does not exist: hdfs://C3S31:9000/hbase
>>         at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:731)
>>         at 
>> org.apache.hadoop.hbase.util.FSUtils.isInSafeMode(FSUtils.java:163)
>>         at 
>> org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:458)
>>         at 
>> org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:301)
>>         at 
>> org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:127)
>>         at 
>> org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:112)
>>         at 
>> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:426)
>>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)
>>         at java.lang.Thread.run(Thread.java:662)
>> 2011-10-31 00:09:09,551 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
>> 2011-10-31 00:09:09,551 DEBUG org.apache.hadoop.hbase.master.HMaster: 
>> Stopping service threads
>
>


Re: sum, avg, count, etc...

2011-10-26 Thread Gary Helmling
Also, make sure that you're either setting a stop row on the scan, or
if you're using a filter, try wrapping it in a WhileMatchFilter.  This
tells the scanner it can stop as soon as the filter starts rejecting
rows.  Otherwise you can wind up getting back just the data you
expect, but still scanning all the way to the end of the table, just
filtering out all the remaining rows.

On Wed, Oct 26, 2011 at 6:18 AM, Doug Meil
 wrote:
> Hi there-
>
> First, make sure you aren't tripping on any of these issues..
>
> http://hbase.apache.org/book.html#perf.reading
>
>
>
>
>
> On 10/26/11 6:21 AM, "Rita"  wrote:
>
>>I am trying to do some simple statistics with my data but its taking
>>longer
>>than expected.
>>
>>
>>
>>Here is how my data is structured in hbase.
>>
>>keys (symbol#epoch time stamp)
>>msft#1319562974#NASDAQ
>>t#1319562974#NYSE
>>yhoo#1319562974#NASDAQ
>>msft#1319562975#NASDAQ
>>
>>The values look like this (for instance microsoft)
>>...
>>price=26.81
>>open=
>>close=
>>...
>>
>>there are about 300 values per each key.
>>
>>
>>So, for instance if I want to calculate avg price of msft I am setting up
>>a
>>start and stop filter and its able to calculate it by tick. But its taking
>>about 7 seconds to go thru 500 keys. Is that normal? Is there a faster way
>>to calculate sum,avg,count in hbase? would I need to redo my schema?
>>
>>tia
>>
>>
>>
>>
>>
>>--
>>--- Get your facts first, then you can distort them as you please.--
>
>


Re: A requirement to change time of the Hbase cluster.

2011-10-26 Thread Gary Helmling
At the same time, it might be simpler to get your customers/operators
to fix their ntp setups.

Not having synchronized clocks throughout the cluster will cause
problems in other areas as well.  It will make it very difficult to
correlate events in different server logs when troubleshooting
problems (you can't rely on the timestamp as a rough guideline).  The
Kerberos infrastructure used by Hadoop and HBase security also
requires synchronized clocks to operate correctly.  By default, if the
clock skew between two machines is greater than 5 minutes, it will
reject messages as invalid.  So you're likely to experience other
headaches even if you can coax HBase into operating the way you'd
like.


On Wed, Oct 26, 2011 at 9:02 AM, Stack  wrote:
> On Tue, Oct 25, 2011 at 7:36 PM, Gaojinchao  wrote:
>> So we hope to add a choice about metadata is not time-dependent. Just like 
>> use data can use a number as a timestamp .
>> If we can do this, the effect for time will be smaller. We don't use the ntp 
>> server, the cluster also can work normal ?
>> Can I open a file? I will try to make a patch and share my mind.
>>
>
> I suppose you could set an attribute on a table that says "use always
> increasing version rather than timestamp".  You'd have to then on a
> per region basis keep note of the most recent version and rather than
> use system time, do a +1 per edit coming in.
>
> I think hfile already records the version of the last edit added to
> the file.  On open of a region, you'd look at all hfiles and figure
> the highest verison and then set your version machine to start at
> highest-version +1.
>
> It might not be that hard to add.  You'd have to check the code but a
> while back we made it so we indirectly got version by going to an
> EnvironmentEdge class.  You could add your 'always increasing version'
> as an atomic long or something and then it would be available
> throughout.
>
> St.Ack
>


Re: FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append

2011-10-14 Thread Gary Helmling
I think the key part is this:


java.io.IOException: All datanodes 10.33.100.74:50010 are bad. Aborting...
>at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2680)
>at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2172)
>at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2371)
> 2011-10-13 14:35:48,002 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> requests=0, regions=4, stores=6, storefiles=5, storefileIndexSize=0,
> memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=31,
> maxHeap=993, blockCacheSize=1011832, blockCacheFree=207366440,
> blockCacheCount=9, blockCacheHitCount=10644, blockCacheMissCount=9,
> blockCacheEvictedCount=0, blockCacheHitRatio=99,
> blockCacheHitCachingRatio=99
> 2011-10-13 14:35:48,002 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: IOE in log
> roller
> 2011-10-13 14:35:48,002 INFO
> org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting.
>
>
This looks like the case that motivated HBASE-4222:
https://issues.apache.org/jira/browse/HBASE-4222

Did you by any chance restart the cluster DataNodes while the RegionServers
were running?  This can easily cause this issue to occur.  If not, look in
the DataNode logs for other errors which would indicate the underlying
problem.  In either case, the next release should be a bit more resilient to
this problem, as long as it is a transient error and you are not using
deferred log flushing on your tables.


Re: setTimeRange for HBase Increment

2011-10-04 Thread Gary Helmling
If you just need the increments to not be visible when > 30 days old, then
put the increment columns in their own column family and set TTL=2592000 (30
days in seconds).

Note that the timestamp is updated on each increment, so a column that
always receives increments before the TTL window runs out will never expire.

Is this the problem?  Are you looking to do rolling expiration of the
increment values?  If so you could do some combination of increments with
limited time ranges (always set minStamp to 12:00am of the current day to
roll over to a new version per day) or represent the truncated date in
either the column qualifier or row key.  This way you're incrementing
(aggregating) over limited periods to allow for data expiration, and can
easily do summing for the period you're concerned with.  Again, openTSDB
does some smart things with efficiently constructing keys for these types of
scenarios, so it's definitely worth looking at.

If neither of these really addresses what you're looking for, maybe you can
explain your requirements in a bit more detail?  HBase schema design is a
fine art, but it helps to be able to see the big picture.


--gh

On Tue, Oct 4, 2011 at 11:14 AM, Jameson Lopp  wrote:

> Thanks, that makes sense. Unfortunately, it sounds like this feature is
> unable to solve my particular problem...
>
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc
>
> On 10/04/2011 01:36 PM, Gary Helmling wrote:
>
>> Jameson,
>>
>> The TimeRange you set on the Increment is used in looking up the previous
>> value that you'll be incrementing.  It's not stored with the incremented
>> value as a data "lifetime" or anything.  If a previously stored value is
>> found within the given time range, it will be incremented.  If no value is
>> found within that range, a new value is stored with using the value from
>> your Increment.
>>
>> As other have already covered, if you're looking for auto-cleanup of data
>> you would set a TTL on the column family.
>>
>> So let me tweak your scenario a bit to explain how it might work:
>>
>> 0) Say you have a previous value on column "c1" of 2, last incremented 31
>> days ago
>>
>> 1) You perform an increment on "c1" with a value of 1, minStamp = now - 30
>> days, maxStamp = now
>>
>> 2) There is now a new version of "c1", with value=1, timestamp=now.  The
>> previous version, with value=2, timestamp=now - 31 days, still exists and
>> may be automatically cleaned up, subject to your settings for max versions
>> and TTL.  So you would have:
>>
>> c1:
>>   - v2: ts=now, value=1
>>   - v1: ts=now-31days, value=2
>>
>> 3) Reading the current value of "c1" will return 1
>>
>> 4a) If you repeat step #1 in 31 days from now, you would wind up with a
>> third version of "c1", again with value=1:
>>
>> c1:
>>   - v3: ts=now, value=1
>>   - v2: ts=now-31days, value=1
>>   - v1: ts=now-62days, value=2
>>
>> 4b) If you instead repeat step #1 31 days from now, but using minStamp=now
>> -
>> 60 days, maxStamp=now, then you would be incrementing the existing "v2" of
>> "c1", since it falls within the time range:
>>
>> c1:
>>   - v2: ts=now, value=2
>>   - v1: ts=now-62days, value=2
>>
>>
>> I hope this clarifies things.
>>
>> --gh
>>
>>
>> On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp
>>  wrote:
>>
>>  Thanks! Nevertheless, can anyone confirm / deny if the scenario I
>>> described
>>> would play out in that manner? Just want to make sure I understand the
>>> functionality.
>>>
>>>
>>> --
>>> Jameson Lopp
>>> Software Engineer
>>> Bronto Software, Inc
>>>
>>> On 09/29/2011 03:32 PM, Doug Meil wrote:
>>>
>>>
>>>> Here are a few links on table cleanup and major compactions...
>>>>
>>>> http://hbase.apache.org/book.html#schema.minversions<http://hbase.apache.org/book.**html#schema.minversions>
>>>> http://hbase.apache.org/book.html#schema.minversions>>
>>>>   (ttl related)
>>>>
>>>> http://hbase.apache.org/book.html#perf.deleting.queue<http://hbase.apache.org/book.**html#perf.deleting.queue>
>>>> http://hbase.apache.org/book.html#perf.deleting.queue>
>>>> >
>>>>
>>>> http://hbase.apache.org/book.html#compaction<http://hbase.apache.org/book.**html#compaction>
>>>> <http://hbase.**apach

Re: setTimeRange for HBase Increment

2011-10-04 Thread Gary Helmling
Jameson,

The TimeRange you set on the Increment is used in looking up the previous
value that you'll be incrementing.  It's not stored with the incremented
value as a data "lifetime" or anything.  If a previously stored value is
found within the given time range, it will be incremented.  If no value is
found within that range, a new value is stored with using the value from
your Increment.

As other have already covered, if you're looking for auto-cleanup of data
you would set a TTL on the column family.

So let me tweak your scenario a bit to explain how it might work:

0) Say you have a previous value on column "c1" of 2, last incremented 31
days ago

1) You perform an increment on "c1" with a value of 1, minStamp = now - 30
days, maxStamp = now

2) There is now a new version of "c1", with value=1, timestamp=now.  The
previous version, with value=2, timestamp=now - 31 days, still exists and
may be automatically cleaned up, subject to your settings for max versions
and TTL.  So you would have:

c1:
  - v2: ts=now, value=1
  - v1: ts=now-31days, value=2

3) Reading the current value of "c1" will return 1

4a) If you repeat step #1 in 31 days from now, you would wind up with a
third version of "c1", again with value=1:

c1:
  - v3: ts=now, value=1
  - v2: ts=now-31days, value=1
  - v1: ts=now-62days, value=2

4b) If you instead repeat step #1 31 days from now, but using minStamp=now -
60 days, maxStamp=now, then you would be incrementing the existing "v2" of
"c1", since it falls within the time range:

c1:
  - v2: ts=now, value=2
  - v1: ts=now-62days, value=2


I hope this clarifies things.

--gh


On Thu, Sep 29, 2011 at 12:40 PM, Jameson Lopp  wrote:

> Thanks! Nevertheless, can anyone confirm / deny if the scenario I described
> would play out in that manner? Just want to make sure I understand the
> functionality.
>
>
> --
> Jameson Lopp
> Software Engineer
> Bronto Software, Inc
>
> On 09/29/2011 03:32 PM, Doug Meil wrote:
>
>>
>> Here are a few links on table cleanup and major compactions...
>>
>> http://hbase.apache.org/book.**html#schema.minversions
>>   (ttl related)
>>
>> http://hbase.apache.org/book.**html#perf.deleting.queue
>>
>> http://hbase.apache.org/book.**html#compaction
>>
>>
>>
>>
>>
>> On 9/29/11 2:29 PM, "Ted Yu"  wrote:
>>
>>  Doug Meil may point you to related doc.
>>>
>>> Take a look at this as well:
>>> https://issues.apache.org/**jira/browse/HBASE-4241
>>>
>>> On Thu, Sep 29, 2011 at 11:22 AM, Jameson Lopp
>>>  wrote:
>>>
>>>  Hm, well I didn't mention a number of other requirements for the feature
 I'm building, but long story short, I need to keep track of millions to
 billions of these counters and need the lookup time to be as close to
 constant time as possible, thus I was really hoping to avoid doing table
 scans.

 I'll admit I know nothing of the dangers of auto-pruning; is there an
 article / documentation I could read about it? Google wasn't very
 helpful.


 --
 Jameson Lopp
 Software Engineer
 Bronto Software, Inc


 On 09/29/2011 02:12 PM, Jean-Daniel Cryans wrote:

  My advice usually regarding timestamps is if it's part of your data
> model, it should appear somewhere in an HBase key. 99% of the time
> overloading the HBase timestamps is a bad idea, especially with
> counters since there's auto-pruning done in the Memstore!
>
> I would suggest you make time part of your row key, maybe one counter
> per day, and then set the TTL on your table to 30 days. Then all you
> need to do is a sequential scan for those 30 days maybe with a prefix
> that refers to some event id.
>
> OpenTSDB is another way of doing it: http://opentsdb.net/
>
> J-D
>
> On Thu, Sep 29, 2011 at 11:04 AM, Jameson Lopp
>  wrote:
>
>  I wish to store a count of 30-day trailing event data (e.g. # of
>> clicks
>> in
>> past 30 days) and ended up reading the documentation for setTimeRange
>> in
>> the
>> Increment operation.
>> http://hbase.apache.org/apidocs/org/apache/hadoop/**
>>
>> hbase/client/Increment.html#getTimeRange%28%29> hbase.apache.or 
>> g/apidocs/org/apache/hadoop/**hbase/client/Increment.html#**
>> getTimeRange%28
>> %29>
>>
>> I was hoping someone could clarify if it works as I'm imagining in
>> this
>> example scenario.
>>
>> 1) Current click count is 0
>>
>> 2) I process a click and I perform an increment operation with the
>> time
>> range set to minStamp = now and maxStamp = 30 days from now
>>
>> 3) I query for the value immediately and fi

Re: Dynamic addition of RegionServer

2011-09-20 Thread Gary Helmling
Working and consistent hostname resolution is a requirement for running an
HBase cluster.  Usually the easiest way to do this is with DNS.  You can
also use a hosts file, but you need to make sure the hosts file includes all
cluster hosts and have a way of synchronizing it throughout the cluster.


On Tue, Sep 20, 2011 at 8:12 AM, Stuti Awasthi  wrote:

> Hi ,
> I was able to add region server dynamically in running cluster . But this
> happens only when hostname of new node is resolved to IP by running cluster.
> Now to achieve this I have to add new node IP and hostname in /etc/hosts
> file and rebooted the machine. Restarted the cluster and added node
> dynamically.
>
> Now since currently in Hbase it resolves only hostname and not IP is a
> drawback I see because if I want to add any regionserver dynamically  it
> must be present in my /etc/hosts file on each node and this is not always
> possible.
>
> Any suggesstions if it can be fixed?
>
> Regards
> Stuti
>
> From: Stuti Awasthi
> Sent: Tuesday, September 20, 2011 10:30 AM
> To: user@hbase.apache.org
> Subject: Dynamic addition of RegionServer
>
> Hi all,
>
> I was trying to add new regionserver node dynamically to my already running
> cluster. I updated conf/regionserver file and added IP (not hostname) of new
> node and started regionserver daemon on the new node. It is started fine but
> on Hbase webUI it doesnot shows the new regionserver added.
>
> As far as I know Hbase resolve the hostname of the nodes. Now if I add
> hostname to conf/regionserver file then I will also need to update /etc/host
> and reboot the machines otherwise it will give" unkown host exception ".
>
> Am I missing something ?
>
> -Stuti
>
> 
> ::DISCLAIMER::
>
> ---
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its
> affiliates. Any views or opinions presented in
> this email are solely those of the author and may not necessarily reflect
> the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure, modification,
> distribution and / or publication of
> this message without the prior written consent of the author of this e-mail
> is strictly prohibited. If you have
> received this email in error please delete it and notify the sender
> immediately. Before opening any mail and
> attachments please check them for viruses and defect.
>
>
> ---
>


Re: Cloudera BASE (+ZooKeeper), Hadoop HDFS, MapReduce, EC2 instances selection

2011-09-15 Thread Gary Helmling
Running on EC2 has been discussed on the list quite a bit in the past, so
you might want to do some searches on the archives.  Here are a few threads
I pulled up:

http://search-hadoop.com/m/paQmKTxSgj

http://search-hadoop.com/m/7E9PaA6U1V

http://search-hadoop.com/m/sGXTATdlIg2

For instance types, it appears that only c1.xlarge, m2.4xlarge and
cc1.xlarge instances will get you a physical server for each instance, so
you will pay the least IO virtualization "tax" using these with instance
storage.  But even with that expect reduced IO performance vs physical
hardware.

For the node layout, I'd suggest something like:

1 - NameNode, JobTracker, ZooKeeper, HMaster
1 - SecondaryNameNode, HMaster
3 - DataNode, TaskTracker, RegionServer

You could run more ZK instances on smaller instance types (m1.medium?), but
beware that these could be more subject to erratic IO throughput due to
other instances running on the same physical server, which could negatively
impact zookeeper performance and overall cluster stability.  So for a
cluster this small, I don't think I would bother.

For instance types, it'll depend on your workload and memory requirements.
I usually use c1.xlarge for HBase testing, but those have somewhat limited
memory, so you'll be constrained on the number of MR tasks you can run
without overcommitting memory (you want to avoid swapping at all costs).

I would say to do some testing with your workload and see what instance
types give you the best performance at an acceptable price.

--gh


On Thu, Sep 15, 2011 at 2:01 AM, Ronen Itkin  wrote:

>  Hi,
>
> I am wondering if someone can recommend on the best practice with selecting
> the right AMAZON EC2 instances combination for the following
> implementation:
>
> Cloudera Hadoop HDFS and MapReduce:
>
>   - 1 NameNode + JobTracker servers.
>   - 1 SecondaryNameNode server.
>   - 3 DataNodes + TastTrackers.
>
>
> Cloudera HBase:
>
>   - 2 HMaster servers
>   - 3 ZooKeeper Servers
>   - 2 Region Servers.
>
>
> From your own experience what AMAZON EC2 instances should I choose?
> How would you combine and place the above implementation across the
> instances?
> Should I place datanode & task tracker with HRegionServer on the same
> instance?
>
> Thanks !
>
> --
> *
> Ronen.*
>
> 
>


Re: HBase and Cassandra on StackOverflow

2011-08-31 Thread Gary Helmling
> Since this is fairly off-topic at this point, I'll keep it short. The
> simple
> rule for Dynamo goes like this: if (R+W>N && W>=Quorum), then you're
> guaranteed a consistent result always. You get eventual consistency if
> W>=Quorum. If W detected/fixed by readers (often using timestamps or similar techniques).
> Joe is right, enforcing (W=3, R=1, N=3) on a Dynamo system gives the same
> (provably identical?) behaviour as HBase, with respect to consistency.
>
>
For those interested in a comparison of the consistency behavior, there's an
older, but really excellent thread on quora with detailed analysis:
http://www.quora.com/How-does-HBase-write-performance-differ-from-write-performance-in-Cassandra-with-consistency-level-ALL

Don't miss the last answer in the the thread.  It's unfortunately collapsed
due to some quora policy, but it contains some of the best details.


Re: RegionObserver and system tables (-ROOT-, .META.)

2011-08-26 Thread Gary Helmling
> That led me to question: Should a RegionObserver be allowed to interfere
> with the system tables?
>
>
Yes.

This is critical for the security implementation, for example.  We need to
perform authorization checks on access to -ROOT- and .META.  If this were
disallowed, then security couldn't be implemented on coprocessors alone.
I'm sure there are other applications lurking out there as well.

Coprocessors are very much an "experts only" feature right now.  It's
possible to completely bork your cluster with them.  We can make them a bit
safer to use, but going too far and neutering them only shoots ourselves in
the foot.

--gh


Re: Coprocessors and batch processing

2011-08-15 Thread Gary Helmling
Hi Lars,

Should then all RPC triggered by a coprocessor be avoided (and hence the use
> the env-provided HTableInterface be generally discouraged)?
>
>
I would generally avoid making synchronous RPC calls within a direct
coprocessor call path (blocking a handler thread waiting on the response).
Making the same calls asynchronously (say queuing puts for a secondary index
to be handled by a background thread) is generally better because you're not
tying up a constrained resource that other clients will be contending on.

The cp environment provided HTableInterface is there to be used, but it's
still up to you to use it wisely.  But providing that resource as part of
the coprocessor framework (instead of using a standard client HTable) will
potentially allow us to do other optimizations like using a separate
priority for coprocessor originating RPCs (handled by a different thread
pool than client RPCs), or short circuiting the RPC stack for calls to
regions residing on the same region server (and avoiding tying up another
handler thread).

Those are just a couple examples, but I think ultimately we'll want a bit
more constraint over what coprocessor code is allowed to do, for the sake of
better guaranteeing cluster stability.  Currently it's a void-your-warranty
type scenario. :)

I still think a RegionServer or RpcServer level "preRequest" and
> "postRequest" (or whatever) hooks would be useful for a variety of
> scenarios.
>
>
I could see that, but it's definitely not part of the RegionObserver
contract.  I do also worry that a profusion of too many different
coprocessor interfaces will lead to confusion about how to actually go about
implementing a given application's needs.  But we're still pretty early on
in the development of coprocessors and I don't pretend we have everything
covered.

Do you have some specific scenarios where you think a preRequest/postRequest
interface would be better suited?  Feel free to open up a JIRA where we can
walk through them!  We could try to model out a RPC listener or RPC filter
interface.  I think that interacting at the RPC layer (in front of all of
the core HBase code) will be a bit limited in what you have access to.  Many
of the current coprocessor hooks are situated in keys points after a lot of
setup or initialization has gone on to provide the call context.  But for a
given set of needs that may not be an issue (or may even be an advantage).


Gary


Re: Why RowFilter plus BinaryPrefixComparator solution is so slow

2011-08-11 Thread Gary Helmling
On Thu, Aug 11, 2011 at 2:20 PM, Allan Yan  wrote:

> Hello,
>
> 1. Scan s = new Scan();
> 2. s.addFamily(myFamily);
> 3. s.setStartRow(startRow);
> 4. Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new
> BinaryPrefixComparator(startRow));
> 5. s.setFilter(rowFilter);
>
>
With this code, you're still only telling the scan how to filter out what to
return to you, not when to stop.  So your scan will be continuing from
startRow to the end of the table.

Try either setting stopRow in addition, or else wrap your row filter above
in WhileMatchFilter -- this tells the scan to stop as soon as your filter
rejects a row.

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/WhileMatchFilter.html


--gh


Re: Coprocessors and batch processing

2011-08-11 Thread Gary Helmling
On Wed, Aug 10, 2011 at 10:46 PM, lars hofhansl  wrote:

>
> I guess there could either be a {pre|post}Multi on RegionObserver (although
> HRegionServer.multi does a lot of munging).
> Or maybe a general {pre|post}Request with no arguments - in which case it
> would be at least possible to write code in the coprocessor
> to collect the puts/deletes/etc through the normal single
> prePut/preDelete/etc hooks and then batch-process them in postRequest().
>
>
This same question came up recently in an internal discussion as well.

Since multi() only exists at the HRegionServer level (there is no
HRegion.multi()), I don't think that this belongs in RegionObserver at all.
So this would be the main reason why there currently are no
preMulti()/postMulti() hooks.  There is something of a gap here between the
way the client sees things and the way the coprocessor sees things.  But I
really see this as more of an RPC-layer listener or filter.  After all,
"multi" is really just an RPC operation for efficiency.  It's not really a
core HBase operation.

As I see it there are a couple motivations behind the current limitation:

1) we generally only want a single set of coprocessor hooks to be involved
in a given operation
2) the coprocessor API should be as conceptually simple as possible, while
still reflecting what is happening in HBase


I think adding in some multi representation in the coprocessor API poses
challenges on each of these fronts.  Not necessarily unresolvable
challenges, but there are trade-offs involved.

Re (1): representing multi in cp hooks means that you now have layering in
the cp hooks handling a given operation.  Say all the actions in the multi
request are Puts, you go from having:

[prePut() ... postPut()] x N

to

preMulti() [prePut() ... postPut()] x N  postMulti()

For me, this implies some confusion about where I should handle the actions
in my coprocessor.  Should I put all handling in pre/postMulti() or do I
also need to implement pre/postPut()?

This gets at (2), for me it's easier to think of "multi" as just an
aggregation of operations, not as an operation in itself.  The actual
operation in the above example is a Put.  It's just that there are a lot of
them.


Back to the original situation you raise, I think it's a really bad idea to
immediately trigger RPC operations within a coprocessor, _especially_ in the
case of multi.  Say you are doing a secondary indexing transformation on the
Puts you receive.  You get a multi batch of 1000 puts.  You transform that
into a batch of 1000 secondary index puts, potentially going to every region
server in your cluster holding a region in the secondary indexing table.
You've just multiplied the RPC operations triggered by a single request and
exposed yourself to triggering a distributed deadlock, where the RPC handler
thread running in one RS is waiting for an RPC handler to become available
in another RS, which in turn has all handlers occupied waiting on other
servers.

I think the better approach to doing these kind of updates would be to have
the RegionObserver.pre/postPut() implementation queue them up and have them
batched and processed by a separate background thread so that you're not
tying up resources directly in the RPC handling path (and also making
clients wait on a synchronous response).

It may be that a higher level (RegionServer or RpcServer level)
observer-type interface would be useful.  But I think that adds some
complexity to understanding what coprocessors are and how they interact.

--gh


  1   2   >