Re: Welcome our newest Committer Anoop

2013-03-12 Thread xkwang bruce
Congratulations, Anoop!


2013/3/13 Devaraj Das 

> Hey Anoop, Congratulations!
> Devaraj.
>
>
> On Mon, Mar 11, 2013 at 10:50 AM, Enis Söztutar 
> wrote:
>
> > Congrats and welcome.
> >
> >
> > On Mon, Mar 11, 2013 at 2:21 AM, Nicolas Liochon 
> > wrote:
> >
> > > Congrats, Anoop!
> > >
> > >
> > > On Mon, Mar 11, 2013 at 5:35 AM, rajeshbabu chintaguntla <
> > > rajeshbabu.chintagun...@huawei.com> wrote:
> > >
> > > > Contratulations Anoop!
> > > >
> > > > 
> > > > From: Anoop Sam John [anoo...@huawei.com]
> > > > Sent: Monday, March 11, 2013 9:00 AM
> > > > To: user@hbase.apache.org
> > > > Subject: RE: Welcome our newest Committer Anoop
> > > >
> > > > Thanks to all.. Hope to work more and more for HBase!
> > > >
> > > > -Anoop-
> > > >
> > > > 
> > > > From: Andrew Purtell [apurt...@apache.org]
> > > > Sent: Monday, March 11, 2013 7:33 AM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: Welcome our newest Committer Anoop
> > > >
> > > > Congratulations Anoop. Welcome!
> > > >
> > > >
> > > > On Mon, Mar 11, 2013 at 12:42 AM, ramkrishna vasudevan <
> > > > ramkrishna.s.vasude...@gmail.com> wrote:
> > > >
> > > > > Hi All
> > > > >
> > > > > Pls welcome Anoop, our newest committer.  Anoop's work in HBase has
> > > been
> > > > > great and he has helped lot of users in the mailing list.
> > > > >
> > > > > He has contributed features related to Endpoints and CPs.
> > > > >
> > > > > Welcome Anoop and best wishes for your future work.
> > > > >
> > > > > Hope to see your continuing efforts to the community.
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >- Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> >
>


RE: Does a major compact flush memstore?

2013-03-12 Thread Liu, Raymond
St.Ack

I am not sure what's the design idea behind it. While, If I want to 
invoke a major compact manually, I guess what I want is that all separate file 
and the memstore is combined into one file. If I don't write anything new 
there, from the user point of view, I will assume that it will end up in a 
single store file per region.


Best Regards,
Raymond Liu

> 
> Raymond:
> 
> Major compaction does not first flush.  Should it or should it be an option?
> 
> St.Ack
> 
> 
> On Tue, Mar 12, 2013 at 6:46 PM, Liu, Raymond 
> wrote:
> 
> > I tried both hbase shell's major_compact cmd and java api
> > HBaseAdmin.majorCompact() on table name.
> > They don't flush the memstore on to disk, compact cmd seems not doing
> > that too.
> >
> > I hadn't read enough related code, While I am wondering, is that
> > because there are size threshold before a memstore is flushed? Then a
> > user invoked compact don't force to flush it?
> >
> > Best Regards,
> > Raymond Liu
> >
> > >
> > > Did you try from java api?  If flush does not happen we may need to
> > > fix
> > it.
> > >
> > > Regards
> > > RAm
> > >
> > > On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond
> > > 
> > > wrote:
> > >
> > > > It seems to me that a major_compact table command from hbase shell
> > > > do not fush memstore?  When I done with major compact, still some
> > > > data in memstore and will be flush out to disk when I shut down hbase
> cluster.
> > > >
> > > > Best Regards,
> > > > Raymond Liu
> > > >
> > > >
> >


Re: Does a major compact flush memstore?

2013-03-12 Thread Stack
Raymond:

Major compaction does not first flush.  Should it or should it be an option?

St.Ack


On Tue, Mar 12, 2013 at 6:46 PM, Liu, Raymond  wrote:

> I tried both hbase shell's major_compact cmd and java api
> HBaseAdmin.majorCompact() on table name.
> They don't flush the memstore on to disk, compact cmd seems not doing that
> too.
>
> I hadn't read enough related code, While I am wondering, is that because
> there are size threshold before a memstore is flushed? Then a user invoked
> compact don't force to flush it?
>
> Best Regards,
> Raymond Liu
>
> >
> > Did you try from java api?  If flush does not happen we may need to fix
> it.
> >
> > Regards
> > RAm
> >
> > On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond 
> > wrote:
> >
> > > It seems to me that a major_compact table command from hbase shell do
> > > not fush memstore?  When I done with major compact, still some data in
> > > memstore and will be flush out to disk when I shut down hbase cluster.
> > >
> > > Best Regards,
> > > Raymond Liu
> > >
> > >
>


RE: Does a major compact flush memstore?

2013-03-12 Thread Liu, Raymond
I tried both hbase shell's major_compact cmd and java api 
HBaseAdmin.majorCompact() on table name.
They don't flush the memstore on to disk, compact cmd seems not doing that too.

I hadn't read enough related code, While I am wondering, is that because there 
are size threshold before a memstore is flushed? Then a user invoked compact 
don't force to flush it?

Best Regards,
Raymond Liu

> 
> Did you try from java api?  If flush does not happen we may need to fix it.
> 
> Regards
> RAm
> 
> On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond 
> wrote:
> 
> > It seems to me that a major_compact table command from hbase shell do
> > not fush memstore?  When I done with major compact, still some data in
> > memstore and will be flush out to disk when I shut down hbase cluster.
> >
> > Best Regards,
> > Raymond Liu
> >
> >


Re: "HBase is able to connect to ZooKeeper but the connection closes immediately..."

2013-03-12 Thread Ryan Compton
Yes, I am having success with it.

We use Cloudera and there's a button, "Generate Client Configuration",
which generated a .zip file with a very informative README.txt. The
trick is to extract that .zip and export HBASE_CONF_DIR :

hbase(main):002:0> exit
-bash-3.2$ hostname
node33
-bash-3.2$ export HBASE_CONF_DIR=/data/cm-gui/hbase-global-clientconfig
-bash-3.2$ /usr/lib/hbase/bin/hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012

hbase(main):001:0> status
34 servers, 0 dead, 59.0882 average load


So far, this is working and I can get to the shell from other nodes. I
think the installation process for mapreduce/hdfs is the same (but the
guy who set it all up quit a few weeks ago).


On Tue, Mar 12, 2013 at 2:08 PM, Ted Yu  wrote:
> Have you seen Kevin's response ?
>
> Cheers
>
> On Tue, Mar 12, 2013 at 1:42 PM, Ryan Compton wrote:
>
>> When I try to access HBase from a cluster node which is not the
>> zookeeper server I have the following problem:
>>
>> -bash-3.2$ hostname
>> node33
>> -bash-3.2$ hbase shell
>> HBase Shell; enter 'help' for list of supported commands.
>> Type "exit" to leave the HBase Shell
>> Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012
>>
>> hbase(main):001:0> status
>>
>> ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is
>> able to connect to ZooKeeper but the connection closes immediately.
>> This could be a sign that the server has too many connections (30 is
>> the default). Consider inspecting your ZK server logs for that error
>> and then make sure you are reusing HBaseConfiguration as often as you
>> can. See HTable's javadoc for more information.
>>
>> Here is some help for this command:
>> Show cluster status. Can be 'summary', 'simple', or 'detailed'. The
>> default is 'summary'. Examples:
>>
>>   hbase> status
>>   hbase> status 'simple'
>>   hbase> status 'summary'
>>   hbase> status 'detailed'
>>
>>
>> hbase(main):002:0>
>>
>>
>> Now, if I log into the zookeeper server I can do:
>>
>> -bash-3.2$ hostname
>> master
>> -bash-3.2$ hbase shell
>> HBase Shell; enter 'help' for list of supported commands.
>> Type "exit" to leave the HBase Shell
>> Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012
>>
>> hbase(main):001:0> status
>> 34 servers, 0 dead, 59.0882 average load
>>
>> hbase(main):002:0>
>>
>>
>> What's going on? I suspect that HBase should be able to read/write
>> from any node that I can modify hdfs from.
>>


Re: "HBase is able to connect to ZooKeeper but the connection closes immediately..."

2013-03-12 Thread Ted Yu
Have you seen Kevin's response ?

Cheers

On Tue, Mar 12, 2013 at 1:42 PM, Ryan Compton wrote:

> When I try to access HBase from a cluster node which is not the
> zookeeper server I have the following problem:
>
> -bash-3.2$ hostname
> node33
> -bash-3.2$ hbase shell
> HBase Shell; enter 'help' for list of supported commands.
> Type "exit" to leave the HBase Shell
> Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012
>
> hbase(main):001:0> status
>
> ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is
> able to connect to ZooKeeper but the connection closes immediately.
> This could be a sign that the server has too many connections (30 is
> the default). Consider inspecting your ZK server logs for that error
> and then make sure you are reusing HBaseConfiguration as often as you
> can. See HTable's javadoc for more information.
>
> Here is some help for this command:
> Show cluster status. Can be 'summary', 'simple', or 'detailed'. The
> default is 'summary'. Examples:
>
>   hbase> status
>   hbase> status 'simple'
>   hbase> status 'summary'
>   hbase> status 'detailed'
>
>
> hbase(main):002:0>
>
>
> Now, if I log into the zookeeper server I can do:
>
> -bash-3.2$ hostname
> master
> -bash-3.2$ hbase shell
> HBase Shell; enter 'help' for list of supported commands.
> Type "exit" to leave the HBase Shell
> Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012
>
> hbase(main):001:0> status
> 34 servers, 0 dead, 59.0882 average load
>
> hbase(main):002:0>
>
>
> What's going on? I suspect that HBase should be able to read/write
> from any node that I can modify hdfs from.
>


"HBase is able to connect to ZooKeeper but the connection closes immediately..."

2013-03-12 Thread Ryan Compton
When I try to access HBase from a cluster node which is not the
zookeeper server I have the following problem:

-bash-3.2$ hostname
node33
-bash-3.2$ hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012

hbase(main):001:0> status

ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is
able to connect to ZooKeeper but the connection closes immediately.
This could be a sign that the server has too many connections (30 is
the default). Consider inspecting your ZK server logs for that error
and then make sure you are reusing HBaseConfiguration as often as you
can. See HTable's javadoc for more information.

Here is some help for this command:
Show cluster status. Can be 'summary', 'simple', or 'detailed'. The
default is 'summary'. Examples:

  hbase> status
  hbase> status 'simple'
  hbase> status 'summary'
  hbase> status 'detailed'


hbase(main):002:0>


Now, if I log into the zookeeper server I can do:

-bash-3.2$ hostname
master
-bash-3.2$ hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012

hbase(main):001:0> status
34 servers, 0 dead, 59.0882 average load

hbase(main):002:0>


What's going on? I suspect that HBase should be able to read/write
from any node that I can modify hdfs from.


Re: "HBase is able to connect to ZooKeeper but the connection closes immediately..."

2013-03-12 Thread Kevin O'dell
Hi Ryan,

  Make sure you have the correct client configurations on the node you are
trying to access from.  You will need the hbase-site and the zoo.cfg to
make this work.

On Tue, Mar 12, 2013 at 4:47 PM, Ryan Compton wrote:

> When I try to access HBase from a cluster node which is not the
> zookeeper server I have the following problem:
>
> -bash-3.2$ hostname
> node33
> -bash-3.2$ hbase shell
> HBase Shell; enter 'help' for list of supported commands.
> Type "exit" to leave the HBase Shell
> Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012
>
> hbase(main):001:0> status
>
> ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is
> able to connect to ZooKeeper but the connection closes immediately.
> This could be a sign that the server has too many connections (30 is
> the default). Consider inspecting your ZK server logs for that error
> and then make sure you are reusing HBaseConfiguration as often as you
> can. See HTable's javadoc for more information.
>
> Here is some help for this command:
> Show cluster status. Can be 'summary', 'simple', or 'detailed'. The
> default is 'summary'. Examples:
>
>   hbase> status
>   hbase> status 'simple'
>   hbase> status 'summary'
>   hbase> status 'detailed'
>
>
> hbase(main):002:0>
>
>
> Now, if I log into the zookeeper server I can do:
>
> -bash-3.2$ hostname
> master
> -bash-3.2$ hbase shell
> HBase Shell; enter 'help' for list of supported commands.
> Type "exit" to leave the HBase Shell
> Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012
>
> hbase(main):001:0> status
> 34 servers, 0 dead, 59.0882 average load
>
> hbase(main):002:0>
>
>
> What's going on? I suspect that HBase should be able to read/write
> from any node that I can modify hdfs from.
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera


"HBase is able to connect to ZooKeeper but the connection closes immediately..."

2013-03-12 Thread Ryan Compton
When I try to access HBase from a cluster node which is not the
zookeeper server I have the following problem:

-bash-3.2$ hostname
node33
-bash-3.2$ hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012

hbase(main):001:0> status

ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is
able to connect to ZooKeeper but the connection closes immediately.
This could be a sign that the server has too many connections (30 is
the default). Consider inspecting your ZK server logs for that error
and then make sure you are reusing HBaseConfiguration as often as you
can. See HTable's javadoc for more information.

Here is some help for this command:
Show cluster status. Can be 'summary', 'simple', or 'detailed'. The
default is 'summary'. Examples:

  hbase> status
  hbase> status 'simple'
  hbase> status 'summary'
  hbase> status 'detailed'


hbase(main):002:0>


Now, if I log into the zookeeper server I can do:

-bash-3.2$ hostname
master
-bash-3.2$ hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012

hbase(main):001:0> status
34 servers, 0 dead, 59.0882 average load

hbase(main):002:0>


What's going on? I suspect that HBase should be able to read/write
from any node that I can modify hdfs from.


Re: Welcome our newest Committer Anoop

2013-03-12 Thread Devaraj Das
Hey Anoop, Congratulations!
Devaraj.


On Mon, Mar 11, 2013 at 10:50 AM, Enis Söztutar  wrote:

> Congrats and welcome.
>
>
> On Mon, Mar 11, 2013 at 2:21 AM, Nicolas Liochon 
> wrote:
>
> > Congrats, Anoop!
> >
> >
> > On Mon, Mar 11, 2013 at 5:35 AM, rajeshbabu chintaguntla <
> > rajeshbabu.chintagun...@huawei.com> wrote:
> >
> > > Contratulations Anoop!
> > >
> > > 
> > > From: Anoop Sam John [anoo...@huawei.com]
> > > Sent: Monday, March 11, 2013 9:00 AM
> > > To: user@hbase.apache.org
> > > Subject: RE: Welcome our newest Committer Anoop
> > >
> > > Thanks to all.. Hope to work more and more for HBase!
> > >
> > > -Anoop-
> > >
> > > 
> > > From: Andrew Purtell [apurt...@apache.org]
> > > Sent: Monday, March 11, 2013 7:33 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: Welcome our newest Committer Anoop
> > >
> > > Congratulations Anoop. Welcome!
> > >
> > >
> > > On Mon, Mar 11, 2013 at 12:42 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasude...@gmail.com> wrote:
> > >
> > > > Hi All
> > > >
> > > > Pls welcome Anoop, our newest committer.  Anoop's work in HBase has
> > been
> > > > great and he has helped lot of users in the mailing list.
> > > >
> > > > He has contributed features related to Endpoints and CPs.
> > > >
> > > > Welcome Anoop and best wishes for your future work.
> > > >
> > > > Hope to see your continuing efforts to the community.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >- Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>


Re: Regionserver goes down while endpoint execution

2013-03-12 Thread Gary Helmling
To expand on what Himanshu said, your endpoint is doing an unbounded scan
on the region, so with a region with a lot of rows it's taking more than 60
seconds to run to the region end, which is why the client side of the call
is timing out.  In addition you're building up an in memory list of all the
values for that qualifier in that region, which could cause you to bump
into OOM issues, depending on how big your values are and how sparse the
given column qualifier is.  If you trigger an OOMException, then the region
server would abort.

For this usage specifically, though -- scanning through a single column
qualifier for all rows -- you would be better off just doing a normal
client side scan, ie. HTable.getScanner().  Then you will avoid the client
timeout and potential server-side memory issues.


On Tue, Mar 12, 2013 at 9:29 AM, Ted Yu  wrote:

> From region server log:
>
> 2013-03-12 03:07:22,605 DEBUG org.apache.hadoop.hdfs.DFSClient: Error
> making BlockReader. Closing stale
> Socket[addr=/10.42.105.112,port=50010,localport=54114]
> java.io.EOFException: Premature EOF: no length prefix available
> at
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
> at
> org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:407)
>
> What version of HBase and hadoop are you using ?
> Do versions of hadoop on Eclipse machine and in your cluster match ?
>
> Cheers
>
> On Tue, Mar 12, 2013 at 4:46 AM, Kumar, Deepak8  >wrote:
>
> >  Lars,
> >
> > I am getting following errors at datanode & region servers.
> >
> > ** **
> >
> > Regards,
> >
> > Deepak
> >
> > ** **
> >
> > *From:* Kumar, Deepak8 [CCC-OT_IT NE]
> > *Sent:* Tuesday, March 12, 2013 3:00 AM
> > *To:* Kumar, Deepak8 [CCC-OT_IT NE]; 'user@hbase.apache.org'; 'lars
> > hofhansl'
> >
> > *Subject:* RE: Regionserver goes down while endpoint execution
> >
> >  ** **
> >
> > Lars,
> >
> > It is having following errors when I execute the Endpoint RPC client from
> > eclipse. It seems some of the regions at regionserver
> > vm-8aa9-fe74.nam.nsroot.net is taking more time to reponse.
> >
> > ** **
> >
> > Could you guide how to fix it. I don’t find any option to set
> hbase.rpc.timeout
> > from hbase configuration menu in CDH4 CM server for hbase
> configuration.**
> > **
> >
> > ** **
> >
> > Regards,
> >
> > Deepak
> >
> > ** **
> >
> > 3/03/12 02:33:12 INFO zookeeper.ClientCnxn: Session establishment
> complete
> > on server vm-15c2-3bbf.nam.nsroot.net/10.96.172.44:2181, sessionid =
> > 0x53d591b77090026, negotiated timeout = 6
> >
> > Mar 12, 2013 2:33:13 AM org.apache.hadoop.conf.Configuration
> > warnOnceIfDeprecated
> >
> > WARNING: hadoop.native.lib is deprecated. Instead, use
> > io.native.lib.available
> >
> > Mar 12, 2013 2:44:00 AM
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> > processExecs
> >
> > WARNING: Error executing for row 153299:1362780381523:2932572079500658:
> > vm-ab1f-dd21.nam.nsroot.net:
> >
> > *java.util.concurrent.ExecutionException*: *
> > org.apache.hadoop.hbase.client.RetriesExhaustedException*: Failed after
> > attempts=10, exceptions:
> >
> > Tue Mar 12 02:34:15 EDT 2013,
> > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> > java.net.SocketTimeoutException*: Call to
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> > exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> > waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2271
> remote=
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
> >
> > Tue Mar 12 02:35:16 EDT 2013,
> > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> > java.net.SocketTimeoutException*: Call to
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> > exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> > waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2403
> remote=
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
> >
> > Tue Mar 12 02:36:18 EDT 2013,
> > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> > java.net.SocketTimeoutException*: Call to
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> > exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> > waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2465
> remote=
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
> >
> > Tue Mar 12 02:37:20 EDT 2013,
> > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> > java.net.SocketTimeoutException*: Call to
> > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> > exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> >

Re: Regionserver goes down while endpoint execution

2013-03-12 Thread Ted Yu
>From region server log:

2013-03-12 03:07:22,605 DEBUG org.apache.hadoop.hdfs.DFSClient: Error
making BlockReader. Closing stale
Socket[addr=/10.42.105.112,port=50010,localport=54114]
java.io.EOFException: Premature EOF: no length prefix available
at 
org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:407)

What version of HBase and hadoop are you using ?
Do versions of hadoop on Eclipse machine and in your cluster match ?

Cheers

On Tue, Mar 12, 2013 at 4:46 AM, Kumar, Deepak8 wrote:

>  Lars,
>
> I am getting following errors at datanode & region servers.
>
> ** **
>
> Regards,
>
> Deepak
>
> ** **
>
> *From:* Kumar, Deepak8 [CCC-OT_IT NE]
> *Sent:* Tuesday, March 12, 2013 3:00 AM
> *To:* Kumar, Deepak8 [CCC-OT_IT NE]; 'user@hbase.apache.org'; 'lars
> hofhansl'
>
> *Subject:* RE: Regionserver goes down while endpoint execution
>
>  ** **
>
> Lars,
>
> It is having following errors when I execute the Endpoint RPC client from
> eclipse. It seems some of the regions at regionserver
> vm-8aa9-fe74.nam.nsroot.net is taking more time to reponse.
>
> ** **
>
> Could you guide how to fix it. I don’t find any option to set 
> hbase.rpc.timeout
> from hbase configuration menu in CDH4 CM server for hbase configuration.**
> **
>
> ** **
>
> Regards,
>
> Deepak
>
> ** **
>
> 3/03/12 02:33:12 INFO zookeeper.ClientCnxn: Session establishment complete
> on server vm-15c2-3bbf.nam.nsroot.net/10.96.172.44:2181, sessionid =
> 0x53d591b77090026, negotiated timeout = 6
>
> Mar 12, 2013 2:33:13 AM org.apache.hadoop.conf.Configuration
> warnOnceIfDeprecated
>
> WARNING: hadoop.native.lib is deprecated. Instead, use
> io.native.lib.available
>
> Mar 12, 2013 2:44:00 AM
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> processExecs
>
> WARNING: Error executing for row 153299:1362780381523:2932572079500658:
> vm-ab1f-dd21.nam.nsroot.net:
>
> *java.util.concurrent.ExecutionException*: *
> org.apache.hadoop.hbase.client.RetriesExhaustedException*: Failed after
> attempts=10, exceptions:
>
> Tue Mar 12 02:34:15 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> java.net.SocketTimeoutException*: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2271remote=
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
>
> Tue Mar 12 02:35:16 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> java.net.SocketTimeoutException*: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2403remote=
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
>
> Tue Mar 12 02:36:18 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> java.net.SocketTimeoutException*: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2465remote=
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
>
> Tue Mar 12 02:37:20 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> java.net.SocketTimeoutException*: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2500remote=
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
>
> Tue Mar 12 02:38:22 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> java.net.SocketTimeoutException*: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2538remote=
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
>
> Tue Mar 12 02:39:25 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, *
> java.net.SocketTimeoutException*: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: *java.net.SocketTimeoutException*: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2572remote=
> vm-8aa9-fe74.nam.nsroot.net/10.42.10

Re: Hbase Thrift DemoClient.php bug

2013-03-12 Thread Ted Yu
HBASE-8079 has been logged.

On Tue, Mar 12, 2013 at 9:55 AM, Ted Yu  wrote:

> I checked trunk code - parameter count matches.
>
> Looks like this is a problem in 0.94 only
>
>
> On Mon, Mar 11, 2013 at 8:19 PM, Ted  wrote:
>
>> Thanks for reporting this.
>>
>> Mind opening a jira ?
>>
>> On Mar 11, 2013, at 7:13 PM, jung  wrote:
>>
>> > hiya
>> >
>> > installed thrift and hbase. and testing php with hbase.
>> >
>> > i tested apache thrift 0.90 and 1.0-dev  with hbase 0.94.2 and hbase
>> > 0.94.5 's Hbase.thrift
>> >
>> > $ php DemoClient.php failed like below
>> >
>> > 
>> > 
>> > DemoClient
>> > 
>> > 
>> > 
>> > scanning tables...
>> >  found: cars
>> >  found: demo_table
>> >disabling table: demo_table
>> >deleting table: demo_table
>> >  found: dnstest
>> >  found: hello_world
>> > creating table: demo_table
>> > column families in demo_table:
>> >  column: entry:, maxVer: 10
>> >  column: unused:, maxVer: 3
>> > PHP Warning:  Missing argument 4 for Hbase\HbaseClient::mutateRow(),
>> > called in /home/hbase/test/www-current/DemoClient.php on line 138 and
>> > defined in /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php
>> > on line 1233
>> > PHP Notice:  Undefined variable: attributes in
>> > /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php on line
>> > 1235
>> > PHP Warning:  Missing argument 4 for Hbase\HbaseClient::mutateRow(),
>> > called in /home/hbase/test/www-current/DemoClient.php on line 147 and
>> > defined in /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php
>> > on line 1233
>> >
>> > ---
>> > DemoClient.php using 3 args
>> >
>> > $client->mutateRow( $t, "foo", $mutations )
>> >
>> > but mutateRow in gen-php/Hbase.php have a 4 args like below
>> >
>> > 40:  public function mutateRow($tableName, $row, $mutations,
>> $attributes);
>> > 41:  public function mutateRowTs($tableName, $row, $mutations,
>> > $timestamp, $attributes);
>> > 42:  public function mutateRows($tableName, $rowBatches, $attributes);
>> > 43:  public function mutateRowsTs($tableName, $rowBatches, $timestamp,
>> > $attributes);
>> > 1233:  public function mutateRow($tableName, $row, $mutations,
>> $attributes)
>> > 1290:  public function mutateRowTs($tableName, $row, $mutations,
>> > $timestamp, $attributes)
>> > 1348:  public function mutateRows($tableName, $rowBatches, $attributes)
>> > 1404:  public function mutateRowsTs($tableName, $rowBatches,
>> > $timestamp, $attributes)
>> >
>> >
>> > any ideas?
>> >
>> > thanks
>>
>
>


Re: Regionserver goes down while endpoint execution

2013-03-12 Thread Himanshu Vashishtha
I don't see RS dying with this. It says that it is taking more time
than 60 sec (default timeout for clients), and therefore it stops
processing the coprocessor call as the client is disconnected.
Is your cluster okay? how many rows in the table? Normal scan works
good? Can you share more about your cluster details (nodes, tables,
regions, data size, etc)?

Thanks,
Himanshu

On Tue, Mar 12, 2013 at 4:46 AM, Kumar, Deepak8  wrote:
> Lars,
>
> I am getting following errors at datanode & region servers.
>
>
>
> Regards,
>
> Deepak
>
>
>
> From: Kumar, Deepak8 [CCC-OT_IT NE]
> Sent: Tuesday, March 12, 2013 3:00 AM
> To: Kumar, Deepak8 [CCC-OT_IT NE]; 'user@hbase.apache.org'; 'lars hofhansl'
>
>
> Subject: RE: Regionserver goes down while endpoint execution
>
>
>
> Lars,
>
> It is having following errors when I execute the Endpoint RPC client from
> eclipse. It seems some of the regions at regionserver
> vm-8aa9-fe74.nam.nsroot.net is taking more time to reponse.
>
>
>
> Could you guide how to fix it. I don’t find any option to set
> hbase.rpc.timeout from hbase configuration menu in CDH4 CM server for hbase
> configuration.
>
>
>
> Regards,
>
> Deepak
>
>
>
> 3/03/12 02:33:12 INFO zookeeper.ClientCnxn: Session establishment complete
> on server vm-15c2-3bbf.nam.nsroot.net/10.96.172.44:2181, sessionid =
> 0x53d591b77090026, negotiated timeout = 6
>
> Mar 12, 2013 2:33:13 AM org.apache.hadoop.conf.Configuration
> warnOnceIfDeprecated
>
> WARNING: hadoop.native.lib is deprecated. Instead, use
> io.native.lib.available
>
> Mar 12, 2013 2:44:00 AM
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> processExecs
>
> WARNING: Error executing for row
> 153299:1362780381523:2932572079500658:vm-ab1f-dd21.nam.nsroot.net:
>
> java.util.concurrent.ExecutionException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=10, exceptions:
>
> Tue Mar 12 02:34:15 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f,
> java.net.SocketTimeoutException: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: java.net.SocketTimeoutException: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2271
> remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
>
> Tue Mar 12 02:35:16 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f,
> java.net.SocketTimeoutException: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: java.net.SocketTimeoutException: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2403
> remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
>
> Tue Mar 12 02:36:18 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f,
> java.net.SocketTimeoutException: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: java.net.SocketTimeoutException: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2465
> remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
>
> Tue Mar 12 02:37:20 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f,
> java.net.SocketTimeoutException: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: java.net.SocketTimeoutException: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2500
> remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
>
> Tue Mar 12 02:38:22 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f,
> java.net.SocketTimeoutException: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: java.net.SocketTimeoutException: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2538
> remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
>
> Tue Mar 12 02:39:25 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f,
> java.net.SocketTimeoutException: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: java.net.SocketTimeoutException: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/150.110.96.212:2572
> remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
>
> Tue Mar 12 02:40:30 EDT 2013,
> org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f,
> java.net.SocketTimeoutException: Call to
> vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout
> exception: java.net.SocketTimeoutException: 6 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.So

Re: Hbase Thrift DemoClient.php bug

2013-03-12 Thread Ted Yu
I checked trunk code - parameter count matches.

Looks like this is a problem in 0.94 only

On Mon, Mar 11, 2013 at 8:19 PM, Ted  wrote:

> Thanks for reporting this.
>
> Mind opening a jira ?
>
> On Mar 11, 2013, at 7:13 PM, jung  wrote:
>
> > hiya
> >
> > installed thrift and hbase. and testing php with hbase.
> >
> > i tested apache thrift 0.90 and 1.0-dev  with hbase 0.94.2 and hbase
> > 0.94.5 's Hbase.thrift
> >
> > $ php DemoClient.php failed like below
> >
> > 
> > 
> > DemoClient
> > 
> > 
> > 
> > scanning tables...
> >  found: cars
> >  found: demo_table
> >disabling table: demo_table
> >deleting table: demo_table
> >  found: dnstest
> >  found: hello_world
> > creating table: demo_table
> > column families in demo_table:
> >  column: entry:, maxVer: 10
> >  column: unused:, maxVer: 3
> > PHP Warning:  Missing argument 4 for Hbase\HbaseClient::mutateRow(),
> > called in /home/hbase/test/www-current/DemoClient.php on line 138 and
> > defined in /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php
> > on line 1233
> > PHP Notice:  Undefined variable: attributes in
> > /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php on line
> > 1235
> > PHP Warning:  Missing argument 4 for Hbase\HbaseClient::mutateRow(),
> > called in /home/hbase/test/www-current/DemoClient.php on line 147 and
> > defined in /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php
> > on line 1233
> >
> > ---
> > DemoClient.php using 3 args
> >
> > $client->mutateRow( $t, "foo", $mutations )
> >
> > but mutateRow in gen-php/Hbase.php have a 4 args like below
> >
> > 40:  public function mutateRow($tableName, $row, $mutations,
> $attributes);
> > 41:  public function mutateRowTs($tableName, $row, $mutations,
> > $timestamp, $attributes);
> > 42:  public function mutateRows($tableName, $rowBatches, $attributes);
> > 43:  public function mutateRowsTs($tableName, $rowBatches, $timestamp,
> > $attributes);
> > 1233:  public function mutateRow($tableName, $row, $mutations,
> $attributes)
> > 1290:  public function mutateRowTs($tableName, $row, $mutations,
> > $timestamp, $attributes)
> > 1348:  public function mutateRows($tableName, $rowBatches, $attributes)
> > 1404:  public function mutateRowsTs($tableName, $rowBatches,
> > $timestamp, $attributes)
> >
> >
> > any ideas?
> >
> > thanks
>


HBase receivedBytes

2013-03-12 Thread Levin, Ohad
Hi all,

Monitoring Hbase, we need to know the amount of bytes received by each region 
server (and overall throughput).
The only metric I found for this is 
RS.hadoop.service_HBase.name_RPCStatistics-60020.ReceivedBytes. However, this 
metric
Is a delta and not total number. My question is is there a way to calculate the 
total received bytes per region server from the available metrics.

Thanks,
  Ohad.


RE: region server down when scanning using mapreduce

2013-03-12 Thread Azuryy Yu
how did you use scanner? paste some codes here.
On Mar 12, 2013 4:13 PM, "Lu, Wei"  wrote:

>
> We turned the block cache to false and tried again, regionserver still
> crash one after another.
> There are a lot of scanner lease time out, and then master log info:
> RegionServer ephemeral node deleted, processing expiration
> [rs21,60020,1363010589837]
> Seems the problem is not caused by block cache
>
>
> Thanks
>
> -Original Message-
> From: Azuryy Yu [mailto:azury...@gmail.com]
> Sent: Tuesday, March 12, 2013 1:41 PM
> To: user@hbase.apache.org
> Subject: Re: region server down when scanning using mapreduce
>
> please read here http://hbase.apache.org/book.html (11.8.5. Block Cache)
> to
> get some background of block cache.
>
>
> On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei  wrote:
>
> > No, does block cache matter? Btw, the mr dump is a mr program we
> > implemented rather than the hbase tool.
> >
> > Thanks
> >
> > -Original Message-
> > From: Azuryy Yu [mailto:azury...@gmail.com]
> > Sent: Tuesday, March 12, 2013 1:18 PM
> > To: user@hbase.apache.org
> > Subject: Re: region server down when scanning using mapreduce
> >
> > did you closed block cache when you used mr dump?
> > On Mar 12, 2013 1:06 PM, "Lu, Wei"  wrote:
> >
> > > Hi,
> > >
> > > When we use mapreduce to dump data from a pretty large table on hbase.
> > One
> > > region server crash and then another. Mapreduce is deployed together
> with
> > > hbase.
> > >
> > > 1) From log of the region server, there are both "next" and "multi"
> > > operations on going. Is it because there is write/read conflict that
> > cause
> > > scanner timeout?
> > > 2) Region server has 24 cores, and # max map tasks is 24 too; the table
> > > has about 30 regions (each of size 0.5G) on the region server, is it
> > > because cpu is all used by mapreduce and that case region server slow
> and
> > > then timeout?
> > > 2) current hbase.regionserver.handler.count is 10 by default, should it
> > be
> > > enlarged?
> > >
> > > Please give us some advices.
> > >
> > > Thanks,
> > > Wei
> > >
> > >
> > > Log information:
> > >
> > >
> > > [Regionserver rs21:]
> > >
> > > 2013-03-11 18:36:28,148 INFO
> > > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
> > > adcbg21.machine.wisdom.com
> > ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
> > > entries=22417, filesize=127539793.  for
> > >
> >
> /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
> > > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept 28183ms instead of 3000ms, this is likely due to a long garbage
> > > collecting pause and it's usually bad, see
> > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56058
> > >
> >
> ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
> > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56529
> > >
> >
> ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
> > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56146
> > >
> >
> ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
> > > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56069
> > >
> >
> ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"}
> > > 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56586
> > >
> >
> ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"}
> > > 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > >
> >
> {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a
> > ),
> > > rpc version=1, cl

Re: RegionServers Crashing every hour in production env

2013-03-12 Thread Pablo Musa

Guys,
thank you very much for the help.

Yesterday I spent 14 hours trying to tune the whole cluster.
The cluster is not ready yet needs a lot of tunning, but at least is 
working.


My first big problem was namenode + datanode GC. They were not using
CMS and thus were taking "incremental" time to run. Ii started in 0.01 
ms and

in 20 minutes was taking 150 secs.
After setting CMSGC this time is much smaller taking a maximum of 70 secs,
which is VERY HIGH, but for now does not stop HBase.

With this issue solved, it was clear that the RS was doing a long pause GC,
taking up to 220 secs. Zookeeper expired the RS and it shutdown.
I tried a lot of different flags configuration (MORE than 20), and could not
get small gcs. Eventually it would take more than 150 secs (zookeeper 
timeout)

and shutdown.

Finally I tried a config that so far, 12 hours, is working with a maximum GC
time of 90 secs. Which of course is a terrible problem since HBase is a
database, but at least the cluster is stable while I can tune it a 
little more.


In my opinion, my biggest problem is to have a few "monster" machines in the
cluster instead of a bunch of commodities machines. I don't know if 
there are

a lot companies using this kind of machines inside a hadoop cluster, but
a fast search on google could not find a lot of tunes for big heap GCs.

I guess my next step will be search for big heap gc tuning.

Back to some questions ;)

> You have ganglia or tsdb running?

I use zabbix for now, and no there is nothing going on when the big 
pause happens.


> When you see the big pause above, can you see anything going on on the
> machine? (swap, iowait, concurrent fat mapreduce job?)
> what are you doing during long GC happened? read or write? if 
reading, what

> the block cache size?

The cpu for the RS process goes to 100% and the logs "pause", until it 
gets out.

Ex: [NewPar

IO and SWAP are normal. There is no MR running, just normal database 
load, which is
very low. I am probably doing reads AND writes to the database with 
default block

cache size.
One problem in this moment might be the big number of regions (1252) 
since I am

using only one RS to be able to track the problem.

The links and ideas were very helpful. Thank you very much guys.

I will post my future researches as I find a solution ;)

If you have more ideas or info (links, flag suggestions, etc.), please 
post it :)


Abs,
Pablo

On 03/10/2013 11:24 PM, Andrew Purtell wrote:

Be careful with GC tuning, throwing changes at an application without
analysis of what is going on with the heap is shooting in the dark. One
particular good treatment of the subject is here:
http://java.dzone.com/articles/how-tame-java-gc-pauses

If you have made custom changes to blockcache or memstore configurations,
back them out until you're sure everything else is ok.

Watch carefully for swapping. Set the vm.swappiness sysctl to 0. Monitor
for spikes in page scanning or any swap activity. Nothing brings on
"Juliette" pauses better than a JVM partially swapped out. The Java GC
starts collection by examining the oldest pages, and those are the first
pages the OS swaps out...



On Mon, Mar 11, 2013 at 10:13 AM, Azuryy Yu  wrote:


Hi Pablo,
It'a terrible for a long minor GC. I don't think there are swaping from
your vmstat log.
but I just suggest you
1) add following JVM options:
-XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19
-XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=3 -XX:+UseFastAccessorMethods

2) -Xmn is two small, your total Mem is 74GB, just make -Xmn2g
3) what are you doing during long GC happened? read or write? if reading,
what the block cache size?




On Mon, Mar 11, 2013 at 6:41 AM, Stack  wrote:


You could increase your zookeeper session timeout to 5 minutes while you
are figuring why these long pauses.
http://hbase.apache.org/book.html#zookeeper.session.timeout

Above, there is an outage for almost 5 minutes:


We slept 225100ms instead of 3000ms, this is likely due to a long

You have ganglia or tsdb running?  When you see the big pause above, can
you see anything going on on the machine?  (swap, iowait, concurrent fat
mapreduce job?)

St.Ack



On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa  wrote:


Hi Sreepathi,
they say in the book (or the site), we could try it to see if it is

really

a timeout error
or there is something more. But it is not recomended for production
environments.

I could give it a try if five minutes will ensure to us that the

problem

is the GC or
elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30
minutes.

Abs,
Pablo


On 03/10/2013 04:06 PM, Sreepathi wrote:


Hi Stack/Ted/Pablo,

Should we increase the hbase.rpc.timeout property to 5 minutes (

30

ms

)  ?

Regards,
- Sreepathi

On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa  wrote:

  That combo should be fine.

Great!!


  If JVM is full GC'ing, the application is stopped.

The below does not look like a full GC 

Re: Does a major compact flush memstore?

2013-03-12 Thread ramkrishna vasudevan
Did you try from java api?  If flush does not happen we may need to fix it.

Regards
RAm

On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond  wrote:

> It seems to me that a major_compact table command from hbase shell do not
> fush memstore?  When I done with major compact, still some data in memstore
> and will be flush out to disk when I shut down hbase cluster.
>
> Best Regards,
> Raymond Liu
>
>


RE: region server down when scanning using mapreduce

2013-03-12 Thread Anoop Sam John
How is the GC pattern in your RSs which are getting down? In RS logs you might 
be having YouAreDeadExceptions...
Pls try tuning your RS memory and GC opts.

-Anoop-

From: Lu, Wei [w...@microstrategy.com]
Sent: Tuesday, March 12, 2013 1:42 PM
To: user@hbase.apache.org
Subject: RE: region server down when scanning using mapreduce

We turned the block cache to false and tried again, regionserver still crash 
one after another.
There are a lot of scanner lease time out, and then master log info:
RegionServer ephemeral node deleted, processing expiration 
[rs21,60020,1363010589837]
Seems the problem is not caused by block cache


Thanks

-Original Message-
From: Azuryy Yu [mailto:azury...@gmail.com]
Sent: Tuesday, March 12, 2013 1:41 PM
To: user@hbase.apache.org
Subject: Re: region server down when scanning using mapreduce

please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to
get some background of block cache.


On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei  wrote:

> No, does block cache matter? Btw, the mr dump is a mr program we
> implemented rather than the hbase tool.
>
> Thanks
>
> -Original Message-
> From: Azuryy Yu [mailto:azury...@gmail.com]
> Sent: Tuesday, March 12, 2013 1:18 PM
> To: user@hbase.apache.org
> Subject: Re: region server down when scanning using mapreduce
>
> did you closed block cache when you used mr dump?
> On Mar 12, 2013 1:06 PM, "Lu, Wei"  wrote:
>
> > Hi,
> >
> > When we use mapreduce to dump data from a pretty large table on hbase.
> One
> > region server crash and then another. Mapreduce is deployed together with
> > hbase.
> >
> > 1) From log of the region server, there are both "next" and "multi"
> > operations on going. Is it because there is write/read conflict that
> cause
> > scanner timeout?
> > 2) Region server has 24 cores, and # max map tasks is 24 too; the table
> > has about 30 regions (each of size 0.5G) on the region server, is it
> > because cpu is all used by mapreduce and that case region server slow and
> > then timeout?
> > 2) current hbase.regionserver.handler.count is 10 by default, should it
> be
> > enlarged?
> >
> > Please give us some advices.
> >
> > Thanks,
> > Wei
> >
> >
> > Log information:
> >
> >
> > [Regionserver rs21:]
> >
> > 2013-03-11 18:36:28,148 INFO
> > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
> > adcbg21.machine.wisdom.com
> ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
> > entries=22417, filesize=127539793.  for
> >
> /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
> > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > slept 28183ms instead of 3000ms, this is likely due to a long garbage
> > collecting pause and it's usually bad, see
> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56058
> >
> ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56529
> >
> ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56146
> >
> ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
> > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56069
> >
> ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"}
> > 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56586
> >
> ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"}
> > 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> >
> {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d1

RE: region server down when scanning using mapreduce

2013-03-12 Thread Lu, Wei

We turned the block cache to false and tried again, regionserver still crash 
one after another.
There are a lot of scanner lease time out, and then master log info:
RegionServer ephemeral node deleted, processing expiration 
[rs21,60020,1363010589837]
Seems the problem is not caused by block cache


Thanks

-Original Message-
From: Azuryy Yu [mailto:azury...@gmail.com] 
Sent: Tuesday, March 12, 2013 1:41 PM
To: user@hbase.apache.org
Subject: Re: region server down when scanning using mapreduce

please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to
get some background of block cache.


On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei  wrote:

> No, does block cache matter? Btw, the mr dump is a mr program we
> implemented rather than the hbase tool.
>
> Thanks
>
> -Original Message-
> From: Azuryy Yu [mailto:azury...@gmail.com]
> Sent: Tuesday, March 12, 2013 1:18 PM
> To: user@hbase.apache.org
> Subject: Re: region server down when scanning using mapreduce
>
> did you closed block cache when you used mr dump?
> On Mar 12, 2013 1:06 PM, "Lu, Wei"  wrote:
>
> > Hi,
> >
> > When we use mapreduce to dump data from a pretty large table on hbase.
> One
> > region server crash and then another. Mapreduce is deployed together with
> > hbase.
> >
> > 1) From log of the region server, there are both "next" and "multi"
> > operations on going. Is it because there is write/read conflict that
> cause
> > scanner timeout?
> > 2) Region server has 24 cores, and # max map tasks is 24 too; the table
> > has about 30 regions (each of size 0.5G) on the region server, is it
> > because cpu is all used by mapreduce and that case region server slow and
> > then timeout?
> > 2) current hbase.regionserver.handler.count is 10 by default, should it
> be
> > enlarged?
> >
> > Please give us some advices.
> >
> > Thanks,
> > Wei
> >
> >
> > Log information:
> >
> >
> > [Regionserver rs21:]
> >
> > 2013-03-11 18:36:28,148 INFO
> > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
> > adcbg21.machine.wisdom.com
> ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
> > entries=22417, filesize=127539793.  for
> >
> /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
> > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > slept 28183ms instead of 3000ms, this is likely due to a long garbage
> > collecting pause and it's usually bad, see
> > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56058
> >
> ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56529
> >
> ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
> > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56146
> >
> ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
> > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56069
> >
> ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"}
> > 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> > {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc
> > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.127.21:56586
> >
> ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"}
> > 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> >
> {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a
> ),
> > rpc version=1, client version=29, methodsFingerPrint=54742778","client":"
> > 10.20.109.21:35342
> >
> ","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > 2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer:
> > (responseTooSlow):
> >
> {"processingtimems":38813,"call":"multi(org.apac

Does a major compact flush memstore?

2013-03-12 Thread Liu, Raymond
It seems to me that a major_compact table command from hbase shell do not fush 
memstore?  When I done with major compact, still some data in memstore and will 
be flush out to disk when I shut down hbase cluster.

Best Regards,
Raymond Liu



RE: Regionserver goes down while endpoint execution

2013-03-12 Thread Kumar, Deepak8
Lars,
It is having following errors when I execute the Endpoint RPC client from 
eclipse. It seems some of the regions at regionserver 
vm-8aa9-fe74.nam.nsroot.net is taking more time to reponse.

Could you guide how to fix it. I don't find any option to set hbase.rpc.timeout 
from hbase configuration menu in CDH4 CM server for hbase configuration.

Regards,
Deepak

3/03/12 02:33:12 INFO zookeeper.ClientCnxn: Session establishment complete on 
server vm-15c2-3bbf.nam.nsroot.net/10.96.172.44:2181, sessionid = 
0x53d591b77090026, negotiated timeout = 6
Mar 12, 2013 2:33:13 AM org.apache.hadoop.conf.Configuration 
warnOnceIfDeprecated
WARNING: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
Mar 12, 2013 2:44:00 AM 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation 
processExecs
WARNING: Error executing for row 
153299:1362780381523:2932572079500658:vm-ab1f-dd21.nam.nsroot.net:
java.util.concurrent.ExecutionException: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=10, exceptions:
Tue Mar 12 02:34:15 EDT 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, 
java.net.SocketTimeoutException: Call to 
vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout 
exception: java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/150.110.96.212:2271 
remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
Tue Mar 12 02:35:16 EDT 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, 
java.net.SocketTimeoutException: Call to 
vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout 
exception: java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/150.110.96.212:2403 
remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
Tue Mar 12 02:36:18 EDT 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, 
java.net.SocketTimeoutException: Call to 
vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout 
exception: java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/150.110.96.212:2465 
remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
Tue Mar 12 02:37:20 EDT 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, 
java.net.SocketTimeoutException: Call to 
vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout 
exception: java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/150.110.96.212:2500 
remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
Tue Mar 12 02:38:22 EDT 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, 
java.net.SocketTimeoutException: Call to 
vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout 
exception: java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/150.110.96.212:2538 
remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
Tue Mar 12 02:39:25 EDT 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, 
java.net.SocketTimeoutException: Call to 
vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout 
exception: java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/150.110.96.212:2572 
remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
Tue Mar 12 02:40:30 EDT 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, 
java.net.SocketTimeoutException: Call to 
vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout 
exception: java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/150.110.96.212:2606 
remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
Tue Mar 12 02:41:34 EDT 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, 
java.net.SocketTimeoutException: Call to 
vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout 
exception: java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/150.110.96.212:2640 
remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020]
Tue Mar 12 02:42:43 EDT 2013, 
org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, 
java.net.SocketTimeoutException: Call to 
vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout 
exception: java.net.SocketTimeoutException: 6 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected