Re: Welcome our newest Committer Anoop
Congratulations, Anoop! 2013/3/13 Devaraj Das > Hey Anoop, Congratulations! > Devaraj. > > > On Mon, Mar 11, 2013 at 10:50 AM, Enis Söztutar > wrote: > > > Congrats and welcome. > > > > > > On Mon, Mar 11, 2013 at 2:21 AM, Nicolas Liochon > > wrote: > > > > > Congrats, Anoop! > > > > > > > > > On Mon, Mar 11, 2013 at 5:35 AM, rajeshbabu chintaguntla < > > > rajeshbabu.chintagun...@huawei.com> wrote: > > > > > > > Contratulations Anoop! > > > > > > > > > > > > From: Anoop Sam John [anoo...@huawei.com] > > > > Sent: Monday, March 11, 2013 9:00 AM > > > > To: user@hbase.apache.org > > > > Subject: RE: Welcome our newest Committer Anoop > > > > > > > > Thanks to all.. Hope to work more and more for HBase! > > > > > > > > -Anoop- > > > > > > > > > > > > From: Andrew Purtell [apurt...@apache.org] > > > > Sent: Monday, March 11, 2013 7:33 AM > > > > To: user@hbase.apache.org > > > > Subject: Re: Welcome our newest Committer Anoop > > > > > > > > Congratulations Anoop. Welcome! > > > > > > > > > > > > On Mon, Mar 11, 2013 at 12:42 AM, ramkrishna vasudevan < > > > > ramkrishna.s.vasude...@gmail.com> wrote: > > > > > > > > > Hi All > > > > > > > > > > Pls welcome Anoop, our newest committer. Anoop's work in HBase has > > > been > > > > > great and he has helped lot of users in the mailing list. > > > > > > > > > > He has contributed features related to Endpoints and CPs. > > > > > > > > > > Welcome Anoop and best wishes for your future work. > > > > > > > > > > Hope to see your continuing efforts to the community. > > > > > > > > > > Regards > > > > > Ram > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > > > > >- Andy > > > > > > > > Problems worthy of attack prove their worth by hitting back. - Piet > > Hein > > > > (via Tom White) > > > > > > > > > >
RE: Does a major compact flush memstore?
St.Ack I am not sure what's the design idea behind it. While, If I want to invoke a major compact manually, I guess what I want is that all separate file and the memstore is combined into one file. If I don't write anything new there, from the user point of view, I will assume that it will end up in a single store file per region. Best Regards, Raymond Liu > > Raymond: > > Major compaction does not first flush. Should it or should it be an option? > > St.Ack > > > On Tue, Mar 12, 2013 at 6:46 PM, Liu, Raymond > wrote: > > > I tried both hbase shell's major_compact cmd and java api > > HBaseAdmin.majorCompact() on table name. > > They don't flush the memstore on to disk, compact cmd seems not doing > > that too. > > > > I hadn't read enough related code, While I am wondering, is that > > because there are size threshold before a memstore is flushed? Then a > > user invoked compact don't force to flush it? > > > > Best Regards, > > Raymond Liu > > > > > > > > Did you try from java api? If flush does not happen we may need to > > > fix > > it. > > > > > > Regards > > > RAm > > > > > > On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond > > > > > > wrote: > > > > > > > It seems to me that a major_compact table command from hbase shell > > > > do not fush memstore? When I done with major compact, still some > > > > data in memstore and will be flush out to disk when I shut down hbase > cluster. > > > > > > > > Best Regards, > > > > Raymond Liu > > > > > > > > > >
Re: Does a major compact flush memstore?
Raymond: Major compaction does not first flush. Should it or should it be an option? St.Ack On Tue, Mar 12, 2013 at 6:46 PM, Liu, Raymond wrote: > I tried both hbase shell's major_compact cmd and java api > HBaseAdmin.majorCompact() on table name. > They don't flush the memstore on to disk, compact cmd seems not doing that > too. > > I hadn't read enough related code, While I am wondering, is that because > there are size threshold before a memstore is flushed? Then a user invoked > compact don't force to flush it? > > Best Regards, > Raymond Liu > > > > > Did you try from java api? If flush does not happen we may need to fix > it. > > > > Regards > > RAm > > > > On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond > > wrote: > > > > > It seems to me that a major_compact table command from hbase shell do > > > not fush memstore? When I done with major compact, still some data in > > > memstore and will be flush out to disk when I shut down hbase cluster. > > > > > > Best Regards, > > > Raymond Liu > > > > > > >
RE: Does a major compact flush memstore?
I tried both hbase shell's major_compact cmd and java api HBaseAdmin.majorCompact() on table name. They don't flush the memstore on to disk, compact cmd seems not doing that too. I hadn't read enough related code, While I am wondering, is that because there are size threshold before a memstore is flushed? Then a user invoked compact don't force to flush it? Best Regards, Raymond Liu > > Did you try from java api? If flush does not happen we may need to fix it. > > Regards > RAm > > On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond > wrote: > > > It seems to me that a major_compact table command from hbase shell do > > not fush memstore? When I done with major compact, still some data in > > memstore and will be flush out to disk when I shut down hbase cluster. > > > > Best Regards, > > Raymond Liu > > > >
Re: "HBase is able to connect to ZooKeeper but the connection closes immediately..."
Yes, I am having success with it. We use Cloudera and there's a button, "Generate Client Configuration", which generated a .zip file with a very informative README.txt. The trick is to extract that .zip and export HBASE_CONF_DIR : hbase(main):002:0> exit -bash-3.2$ hostname node33 -bash-3.2$ export HBASE_CONF_DIR=/data/cm-gui/hbase-global-clientconfig -bash-3.2$ /usr/lib/hbase/bin/hbase shell HBase Shell; enter 'help' for list of supported commands. Type "exit" to leave the HBase Shell Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012 hbase(main):001:0> status 34 servers, 0 dead, 59.0882 average load So far, this is working and I can get to the shell from other nodes. I think the installation process for mapreduce/hdfs is the same (but the guy who set it all up quit a few weeks ago). On Tue, Mar 12, 2013 at 2:08 PM, Ted Yu wrote: > Have you seen Kevin's response ? > > Cheers > > On Tue, Mar 12, 2013 at 1:42 PM, Ryan Compton wrote: > >> When I try to access HBase from a cluster node which is not the >> zookeeper server I have the following problem: >> >> -bash-3.2$ hostname >> node33 >> -bash-3.2$ hbase shell >> HBase Shell; enter 'help' for list of supported commands. >> Type "exit" to leave the HBase Shell >> Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012 >> >> hbase(main):001:0> status >> >> ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is >> able to connect to ZooKeeper but the connection closes immediately. >> This could be a sign that the server has too many connections (30 is >> the default). Consider inspecting your ZK server logs for that error >> and then make sure you are reusing HBaseConfiguration as often as you >> can. See HTable's javadoc for more information. >> >> Here is some help for this command: >> Show cluster status. Can be 'summary', 'simple', or 'detailed'. The >> default is 'summary'. Examples: >> >> hbase> status >> hbase> status 'simple' >> hbase> status 'summary' >> hbase> status 'detailed' >> >> >> hbase(main):002:0> >> >> >> Now, if I log into the zookeeper server I can do: >> >> -bash-3.2$ hostname >> master >> -bash-3.2$ hbase shell >> HBase Shell; enter 'help' for list of supported commands. >> Type "exit" to leave the HBase Shell >> Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012 >> >> hbase(main):001:0> status >> 34 servers, 0 dead, 59.0882 average load >> >> hbase(main):002:0> >> >> >> What's going on? I suspect that HBase should be able to read/write >> from any node that I can modify hdfs from. >>
Re: "HBase is able to connect to ZooKeeper but the connection closes immediately..."
Have you seen Kevin's response ? Cheers On Tue, Mar 12, 2013 at 1:42 PM, Ryan Compton wrote: > When I try to access HBase from a cluster node which is not the > zookeeper server I have the following problem: > > -bash-3.2$ hostname > node33 > -bash-3.2$ hbase shell > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012 > > hbase(main):001:0> status > > ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is > able to connect to ZooKeeper but the connection closes immediately. > This could be a sign that the server has too many connections (30 is > the default). Consider inspecting your ZK server logs for that error > and then make sure you are reusing HBaseConfiguration as often as you > can. See HTable's javadoc for more information. > > Here is some help for this command: > Show cluster status. Can be 'summary', 'simple', or 'detailed'. The > default is 'summary'. Examples: > > hbase> status > hbase> status 'simple' > hbase> status 'summary' > hbase> status 'detailed' > > > hbase(main):002:0> > > > Now, if I log into the zookeeper server I can do: > > -bash-3.2$ hostname > master > -bash-3.2$ hbase shell > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012 > > hbase(main):001:0> status > 34 servers, 0 dead, 59.0882 average load > > hbase(main):002:0> > > > What's going on? I suspect that HBase should be able to read/write > from any node that I can modify hdfs from. >
"HBase is able to connect to ZooKeeper but the connection closes immediately..."
When I try to access HBase from a cluster node which is not the zookeeper server I have the following problem: -bash-3.2$ hostname node33 -bash-3.2$ hbase shell HBase Shell; enter 'help' for list of supported commands. Type "exit" to leave the HBase Shell Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012 hbase(main):001:0> status ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. Here is some help for this command: Show cluster status. Can be 'summary', 'simple', or 'detailed'. The default is 'summary'. Examples: hbase> status hbase> status 'simple' hbase> status 'summary' hbase> status 'detailed' hbase(main):002:0> Now, if I log into the zookeeper server I can do: -bash-3.2$ hostname master -bash-3.2$ hbase shell HBase Shell; enter 'help' for list of supported commands. Type "exit" to leave the HBase Shell Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012 hbase(main):001:0> status 34 servers, 0 dead, 59.0882 average load hbase(main):002:0> What's going on? I suspect that HBase should be able to read/write from any node that I can modify hdfs from.
Re: "HBase is able to connect to ZooKeeper but the connection closes immediately..."
Hi Ryan, Make sure you have the correct client configurations on the node you are trying to access from. You will need the hbase-site and the zoo.cfg to make this work. On Tue, Mar 12, 2013 at 4:47 PM, Ryan Compton wrote: > When I try to access HBase from a cluster node which is not the > zookeeper server I have the following problem: > > -bash-3.2$ hostname > node33 > -bash-3.2$ hbase shell > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012 > > hbase(main):001:0> status > > ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is > able to connect to ZooKeeper but the connection closes immediately. > This could be a sign that the server has too many connections (30 is > the default). Consider inspecting your ZK server logs for that error > and then make sure you are reusing HBaseConfiguration as often as you > can. See HTable's javadoc for more information. > > Here is some help for this command: > Show cluster status. Can be 'summary', 'simple', or 'detailed'. The > default is 'summary'. Examples: > > hbase> status > hbase> status 'simple' > hbase> status 'summary' > hbase> status 'detailed' > > > hbase(main):002:0> > > > Now, if I log into the zookeeper server I can do: > > -bash-3.2$ hostname > master > -bash-3.2$ hbase shell > HBase Shell; enter 'help' for list of supported commands. > Type "exit" to leave the HBase Shell > Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012 > > hbase(main):001:0> status > 34 servers, 0 dead, 59.0882 average load > > hbase(main):002:0> > > > What's going on? I suspect that HBase should be able to read/write > from any node that I can modify hdfs from. > -- Kevin O'Dell Customer Operations Engineer, Cloudera
"HBase is able to connect to ZooKeeper but the connection closes immediately..."
When I try to access HBase from a cluster node which is not the zookeeper server I have the following problem: -bash-3.2$ hostname node33 -bash-3.2$ hbase shell HBase Shell; enter 'help' for list of supported commands. Type "exit" to leave the HBase Shell Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012 hbase(main):001:0> status ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. Here is some help for this command: Show cluster status. Can be 'summary', 'simple', or 'detailed'. The default is 'summary'. Examples: hbase> status hbase> status 'simple' hbase> status 'summary' hbase> status 'detailed' hbase(main):002:0> Now, if I log into the zookeeper server I can do: -bash-3.2$ hostname master -bash-3.2$ hbase shell HBase Shell; enter 'help' for list of supported commands. Type "exit" to leave the HBase Shell Version 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012 hbase(main):001:0> status 34 servers, 0 dead, 59.0882 average load hbase(main):002:0> What's going on? I suspect that HBase should be able to read/write from any node that I can modify hdfs from.
Re: Welcome our newest Committer Anoop
Hey Anoop, Congratulations! Devaraj. On Mon, Mar 11, 2013 at 10:50 AM, Enis Söztutar wrote: > Congrats and welcome. > > > On Mon, Mar 11, 2013 at 2:21 AM, Nicolas Liochon > wrote: > > > Congrats, Anoop! > > > > > > On Mon, Mar 11, 2013 at 5:35 AM, rajeshbabu chintaguntla < > > rajeshbabu.chintagun...@huawei.com> wrote: > > > > > Contratulations Anoop! > > > > > > > > > From: Anoop Sam John [anoo...@huawei.com] > > > Sent: Monday, March 11, 2013 9:00 AM > > > To: user@hbase.apache.org > > > Subject: RE: Welcome our newest Committer Anoop > > > > > > Thanks to all.. Hope to work more and more for HBase! > > > > > > -Anoop- > > > > > > > > > From: Andrew Purtell [apurt...@apache.org] > > > Sent: Monday, March 11, 2013 7:33 AM > > > To: user@hbase.apache.org > > > Subject: Re: Welcome our newest Committer Anoop > > > > > > Congratulations Anoop. Welcome! > > > > > > > > > On Mon, Mar 11, 2013 at 12:42 AM, ramkrishna vasudevan < > > > ramkrishna.s.vasude...@gmail.com> wrote: > > > > > > > Hi All > > > > > > > > Pls welcome Anoop, our newest committer. Anoop's work in HBase has > > been > > > > great and he has helped lot of users in the mailing list. > > > > > > > > He has contributed features related to Endpoints and CPs. > > > > > > > > Welcome Anoop and best wishes for your future work. > > > > > > > > Hope to see your continuing efforts to the community. > > > > > > > > Regards > > > > Ram > > > > > > > > > > > > > > > > -- > > > Best regards, > > > > > >- Andy > > > > > > Problems worthy of attack prove their worth by hitting back. - Piet > Hein > > > (via Tom White) > > > > > >
Re: Regionserver goes down while endpoint execution
To expand on what Himanshu said, your endpoint is doing an unbounded scan on the region, so with a region with a lot of rows it's taking more than 60 seconds to run to the region end, which is why the client side of the call is timing out. In addition you're building up an in memory list of all the values for that qualifier in that region, which could cause you to bump into OOM issues, depending on how big your values are and how sparse the given column qualifier is. If you trigger an OOMException, then the region server would abort. For this usage specifically, though -- scanning through a single column qualifier for all rows -- you would be better off just doing a normal client side scan, ie. HTable.getScanner(). Then you will avoid the client timeout and potential server-side memory issues. On Tue, Mar 12, 2013 at 9:29 AM, Ted Yu wrote: > From region server log: > > 2013-03-12 03:07:22,605 DEBUG org.apache.hadoop.hdfs.DFSClient: Error > making BlockReader. Closing stale > Socket[addr=/10.42.105.112,port=50010,localport=54114] > java.io.EOFException: Premature EOF: no length prefix available > at > org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162) > at > org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:407) > > What version of HBase and hadoop are you using ? > Do versions of hadoop on Eclipse machine and in your cluster match ? > > Cheers > > On Tue, Mar 12, 2013 at 4:46 AM, Kumar, Deepak8 >wrote: > > > Lars, > > > > I am getting following errors at datanode & region servers. > > > > ** ** > > > > Regards, > > > > Deepak > > > > ** ** > > > > *From:* Kumar, Deepak8 [CCC-OT_IT NE] > > *Sent:* Tuesday, March 12, 2013 3:00 AM > > *To:* Kumar, Deepak8 [CCC-OT_IT NE]; 'user@hbase.apache.org'; 'lars > > hofhansl' > > > > *Subject:* RE: Regionserver goes down while endpoint execution > > > > ** ** > > > > Lars, > > > > It is having following errors when I execute the Endpoint RPC client from > > eclipse. It seems some of the regions at regionserver > > vm-8aa9-fe74.nam.nsroot.net is taking more time to reponse. > > > > ** ** > > > > Could you guide how to fix it. I don’t find any option to set > hbase.rpc.timeout > > from hbase configuration menu in CDH4 CM server for hbase > configuration.** > > ** > > > > ** ** > > > > Regards, > > > > Deepak > > > > ** ** > > > > 3/03/12 02:33:12 INFO zookeeper.ClientCnxn: Session establishment > complete > > on server vm-15c2-3bbf.nam.nsroot.net/10.96.172.44:2181, sessionid = > > 0x53d591b77090026, negotiated timeout = 6 > > > > Mar 12, 2013 2:33:13 AM org.apache.hadoop.conf.Configuration > > warnOnceIfDeprecated > > > > WARNING: hadoop.native.lib is deprecated. Instead, use > > io.native.lib.available > > > > Mar 12, 2013 2:44:00 AM > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation > > processExecs > > > > WARNING: Error executing for row 153299:1362780381523:2932572079500658: > > vm-ab1f-dd21.nam.nsroot.net: > > > > *java.util.concurrent.ExecutionException*: * > > org.apache.hadoop.hbase.client.RetriesExhaustedException*: Failed after > > attempts=10, exceptions: > > > > Tue Mar 12 02:34:15 EDT 2013, > > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, * > > java.net.SocketTimeoutException*: Call to > > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > > exception: *java.net.SocketTimeoutException*: 6 millis timeout while > > waiting for channel to be ready for read. ch : > > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2271 > remote= > > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > > > Tue Mar 12 02:35:16 EDT 2013, > > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, * > > java.net.SocketTimeoutException*: Call to > > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > > exception: *java.net.SocketTimeoutException*: 6 millis timeout while > > waiting for channel to be ready for read. ch : > > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2403 > remote= > > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > > > Tue Mar 12 02:36:18 EDT 2013, > > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, * > > java.net.SocketTimeoutException*: Call to > > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > > exception: *java.net.SocketTimeoutException*: 6 millis timeout while > > waiting for channel to be ready for read. ch : > > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2465 > remote= > > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > > > Tue Mar 12 02:37:20 EDT 2013, > > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, * > > java.net.SocketTimeoutException*: Call to > > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > > exception: *java.net.SocketTimeoutException*: 6 millis timeout while > >
Re: Regionserver goes down while endpoint execution
>From region server log: 2013-03-12 03:07:22,605 DEBUG org.apache.hadoop.hdfs.DFSClient: Error making BlockReader. Closing stale Socket[addr=/10.42.105.112,port=50010,localport=54114] java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:407) What version of HBase and hadoop are you using ? Do versions of hadoop on Eclipse machine and in your cluster match ? Cheers On Tue, Mar 12, 2013 at 4:46 AM, Kumar, Deepak8 wrote: > Lars, > > I am getting following errors at datanode & region servers. > > ** ** > > Regards, > > Deepak > > ** ** > > *From:* Kumar, Deepak8 [CCC-OT_IT NE] > *Sent:* Tuesday, March 12, 2013 3:00 AM > *To:* Kumar, Deepak8 [CCC-OT_IT NE]; 'user@hbase.apache.org'; 'lars > hofhansl' > > *Subject:* RE: Regionserver goes down while endpoint execution > > ** ** > > Lars, > > It is having following errors when I execute the Endpoint RPC client from > eclipse. It seems some of the regions at regionserver > vm-8aa9-fe74.nam.nsroot.net is taking more time to reponse. > > ** ** > > Could you guide how to fix it. I don’t find any option to set > hbase.rpc.timeout > from hbase configuration menu in CDH4 CM server for hbase configuration.** > ** > > ** ** > > Regards, > > Deepak > > ** ** > > 3/03/12 02:33:12 INFO zookeeper.ClientCnxn: Session establishment complete > on server vm-15c2-3bbf.nam.nsroot.net/10.96.172.44:2181, sessionid = > 0x53d591b77090026, negotiated timeout = 6 > > Mar 12, 2013 2:33:13 AM org.apache.hadoop.conf.Configuration > warnOnceIfDeprecated > > WARNING: hadoop.native.lib is deprecated. Instead, use > io.native.lib.available > > Mar 12, 2013 2:44:00 AM > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation > processExecs > > WARNING: Error executing for row 153299:1362780381523:2932572079500658: > vm-ab1f-dd21.nam.nsroot.net: > > *java.util.concurrent.ExecutionException*: * > org.apache.hadoop.hbase.client.RetriesExhaustedException*: Failed after > attempts=10, exceptions: > > Tue Mar 12 02:34:15 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, * > java.net.SocketTimeoutException*: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: *java.net.SocketTimeoutException*: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2271remote= > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > Tue Mar 12 02:35:16 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, * > java.net.SocketTimeoutException*: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: *java.net.SocketTimeoutException*: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2403remote= > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > Tue Mar 12 02:36:18 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, * > java.net.SocketTimeoutException*: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: *java.net.SocketTimeoutException*: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2465remote= > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > Tue Mar 12 02:37:20 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, * > java.net.SocketTimeoutException*: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: *java.net.SocketTimeoutException*: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2500remote= > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > Tue Mar 12 02:38:22 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, * > java.net.SocketTimeoutException*: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: *java.net.SocketTimeoutException*: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2538remote= > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > Tue Mar 12 02:39:25 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, * > java.net.SocketTimeoutException*: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: *java.net.SocketTimeoutException*: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2572remote= > vm-8aa9-fe74.nam.nsroot.net/10.42.10
Re: Hbase Thrift DemoClient.php bug
HBASE-8079 has been logged. On Tue, Mar 12, 2013 at 9:55 AM, Ted Yu wrote: > I checked trunk code - parameter count matches. > > Looks like this is a problem in 0.94 only > > > On Mon, Mar 11, 2013 at 8:19 PM, Ted wrote: > >> Thanks for reporting this. >> >> Mind opening a jira ? >> >> On Mar 11, 2013, at 7:13 PM, jung wrote: >> >> > hiya >> > >> > installed thrift and hbase. and testing php with hbase. >> > >> > i tested apache thrift 0.90 and 1.0-dev with hbase 0.94.2 and hbase >> > 0.94.5 's Hbase.thrift >> > >> > $ php DemoClient.php failed like below >> > >> > >> > >> > DemoClient >> > >> > >> > >> > scanning tables... >> > found: cars >> > found: demo_table >> >disabling table: demo_table >> >deleting table: demo_table >> > found: dnstest >> > found: hello_world >> > creating table: demo_table >> > column families in demo_table: >> > column: entry:, maxVer: 10 >> > column: unused:, maxVer: 3 >> > PHP Warning: Missing argument 4 for Hbase\HbaseClient::mutateRow(), >> > called in /home/hbase/test/www-current/DemoClient.php on line 138 and >> > defined in /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php >> > on line 1233 >> > PHP Notice: Undefined variable: attributes in >> > /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php on line >> > 1235 >> > PHP Warning: Missing argument 4 for Hbase\HbaseClient::mutateRow(), >> > called in /home/hbase/test/www-current/DemoClient.php on line 147 and >> > defined in /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php >> > on line 1233 >> > >> > --- >> > DemoClient.php using 3 args >> > >> > $client->mutateRow( $t, "foo", $mutations ) >> > >> > but mutateRow in gen-php/Hbase.php have a 4 args like below >> > >> > 40: public function mutateRow($tableName, $row, $mutations, >> $attributes); >> > 41: public function mutateRowTs($tableName, $row, $mutations, >> > $timestamp, $attributes); >> > 42: public function mutateRows($tableName, $rowBatches, $attributes); >> > 43: public function mutateRowsTs($tableName, $rowBatches, $timestamp, >> > $attributes); >> > 1233: public function mutateRow($tableName, $row, $mutations, >> $attributes) >> > 1290: public function mutateRowTs($tableName, $row, $mutations, >> > $timestamp, $attributes) >> > 1348: public function mutateRows($tableName, $rowBatches, $attributes) >> > 1404: public function mutateRowsTs($tableName, $rowBatches, >> > $timestamp, $attributes) >> > >> > >> > any ideas? >> > >> > thanks >> > >
Re: Regionserver goes down while endpoint execution
I don't see RS dying with this. It says that it is taking more time than 60 sec (default timeout for clients), and therefore it stops processing the coprocessor call as the client is disconnected. Is your cluster okay? how many rows in the table? Normal scan works good? Can you share more about your cluster details (nodes, tables, regions, data size, etc)? Thanks, Himanshu On Tue, Mar 12, 2013 at 4:46 AM, Kumar, Deepak8 wrote: > Lars, > > I am getting following errors at datanode & region servers. > > > > Regards, > > Deepak > > > > From: Kumar, Deepak8 [CCC-OT_IT NE] > Sent: Tuesday, March 12, 2013 3:00 AM > To: Kumar, Deepak8 [CCC-OT_IT NE]; 'user@hbase.apache.org'; 'lars hofhansl' > > > Subject: RE: Regionserver goes down while endpoint execution > > > > Lars, > > It is having following errors when I execute the Endpoint RPC client from > eclipse. It seems some of the regions at regionserver > vm-8aa9-fe74.nam.nsroot.net is taking more time to reponse. > > > > Could you guide how to fix it. I don’t find any option to set > hbase.rpc.timeout from hbase configuration menu in CDH4 CM server for hbase > configuration. > > > > Regards, > > Deepak > > > > 3/03/12 02:33:12 INFO zookeeper.ClientCnxn: Session establishment complete > on server vm-15c2-3bbf.nam.nsroot.net/10.96.172.44:2181, sessionid = > 0x53d591b77090026, negotiated timeout = 6 > > Mar 12, 2013 2:33:13 AM org.apache.hadoop.conf.Configuration > warnOnceIfDeprecated > > WARNING: hadoop.native.lib is deprecated. Instead, use > io.native.lib.available > > Mar 12, 2013 2:44:00 AM > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation > processExecs > > WARNING: Error executing for row > 153299:1362780381523:2932572079500658:vm-ab1f-dd21.nam.nsroot.net: > > java.util.concurrent.ExecutionException: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=10, exceptions: > > Tue Mar 12 02:34:15 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, > java.net.SocketTimeoutException: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: java.net.SocketTimeoutException: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2271 > remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > Tue Mar 12 02:35:16 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, > java.net.SocketTimeoutException: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: java.net.SocketTimeoutException: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2403 > remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > Tue Mar 12 02:36:18 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, > java.net.SocketTimeoutException: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: java.net.SocketTimeoutException: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2465 > remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > Tue Mar 12 02:37:20 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, > java.net.SocketTimeoutException: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: java.net.SocketTimeoutException: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2500 > remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > Tue Mar 12 02:38:22 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, > java.net.SocketTimeoutException: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: java.net.SocketTimeoutException: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2538 > remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > Tue Mar 12 02:39:25 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, > java.net.SocketTimeoutException: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: java.net.SocketTimeoutException: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/150.110.96.212:2572 > remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] > > Tue Mar 12 02:40:30 EDT 2013, > org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, > java.net.SocketTimeoutException: Call to > vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout > exception: java.net.SocketTimeoutException: 6 millis timeout while > waiting for channel to be ready for read. ch : > java.nio.channels.So
Re: Hbase Thrift DemoClient.php bug
I checked trunk code - parameter count matches. Looks like this is a problem in 0.94 only On Mon, Mar 11, 2013 at 8:19 PM, Ted wrote: > Thanks for reporting this. > > Mind opening a jira ? > > On Mar 11, 2013, at 7:13 PM, jung wrote: > > > hiya > > > > installed thrift and hbase. and testing php with hbase. > > > > i tested apache thrift 0.90 and 1.0-dev with hbase 0.94.2 and hbase > > 0.94.5 's Hbase.thrift > > > > $ php DemoClient.php failed like below > > > > > > > > DemoClient > > > > > > > > scanning tables... > > found: cars > > found: demo_table > >disabling table: demo_table > >deleting table: demo_table > > found: dnstest > > found: hello_world > > creating table: demo_table > > column families in demo_table: > > column: entry:, maxVer: 10 > > column: unused:, maxVer: 3 > > PHP Warning: Missing argument 4 for Hbase\HbaseClient::mutateRow(), > > called in /home/hbase/test/www-current/DemoClient.php on line 138 and > > defined in /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php > > on line 1233 > > PHP Notice: Undefined variable: attributes in > > /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php on line > > 1235 > > PHP Warning: Missing argument 4 for Hbase\HbaseClient::mutateRow(), > > called in /home/hbase/test/www-current/DemoClient.php on line 147 and > > defined in /home/hbase/test/www-current/thrift/packages/Hbase/Hbase.php > > on line 1233 > > > > --- > > DemoClient.php using 3 args > > > > $client->mutateRow( $t, "foo", $mutations ) > > > > but mutateRow in gen-php/Hbase.php have a 4 args like below > > > > 40: public function mutateRow($tableName, $row, $mutations, > $attributes); > > 41: public function mutateRowTs($tableName, $row, $mutations, > > $timestamp, $attributes); > > 42: public function mutateRows($tableName, $rowBatches, $attributes); > > 43: public function mutateRowsTs($tableName, $rowBatches, $timestamp, > > $attributes); > > 1233: public function mutateRow($tableName, $row, $mutations, > $attributes) > > 1290: public function mutateRowTs($tableName, $row, $mutations, > > $timestamp, $attributes) > > 1348: public function mutateRows($tableName, $rowBatches, $attributes) > > 1404: public function mutateRowsTs($tableName, $rowBatches, > > $timestamp, $attributes) > > > > > > any ideas? > > > > thanks >
HBase receivedBytes
Hi all, Monitoring Hbase, we need to know the amount of bytes received by each region server (and overall throughput). The only metric I found for this is RS.hadoop.service_HBase.name_RPCStatistics-60020.ReceivedBytes. However, this metric Is a delta and not total number. My question is is there a way to calculate the total received bytes per region server from the available metrics. Thanks, Ohad.
RE: region server down when scanning using mapreduce
how did you use scanner? paste some codes here. On Mar 12, 2013 4:13 PM, "Lu, Wei" wrote: > > We turned the block cache to false and tried again, regionserver still > crash one after another. > There are a lot of scanner lease time out, and then master log info: > RegionServer ephemeral node deleted, processing expiration > [rs21,60020,1363010589837] > Seems the problem is not caused by block cache > > > Thanks > > -Original Message- > From: Azuryy Yu [mailto:azury...@gmail.com] > Sent: Tuesday, March 12, 2013 1:41 PM > To: user@hbase.apache.org > Subject: Re: region server down when scanning using mapreduce > > please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) > to > get some background of block cache. > > > On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei wrote: > > > No, does block cache matter? Btw, the mr dump is a mr program we > > implemented rather than the hbase tool. > > > > Thanks > > > > -Original Message- > > From: Azuryy Yu [mailto:azury...@gmail.com] > > Sent: Tuesday, March 12, 2013 1:18 PM > > To: user@hbase.apache.org > > Subject: Re: region server down when scanning using mapreduce > > > > did you closed block cache when you used mr dump? > > On Mar 12, 2013 1:06 PM, "Lu, Wei" wrote: > > > > > Hi, > > > > > > When we use mapreduce to dump data from a pretty large table on hbase. > > One > > > region server crash and then another. Mapreduce is deployed together > with > > > hbase. > > > > > > 1) From log of the region server, there are both "next" and "multi" > > > operations on going. Is it because there is write/read conflict that > > cause > > > scanner timeout? > > > 2) Region server has 24 cores, and # max map tasks is 24 too; the table > > > has about 30 regions (each of size 0.5G) on the region server, is it > > > because cpu is all used by mapreduce and that case region server slow > and > > > then timeout? > > > 2) current hbase.regionserver.handler.count is 10 by default, should it > > be > > > enlarged? > > > > > > Please give us some advices. > > > > > > Thanks, > > > Wei > > > > > > > > > Log information: > > > > > > > > > [Regionserver rs21:] > > > > > > 2013-03-11 18:36:28,148 INFO > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/ > > > adcbg21.machine.wisdom.com > > ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488, > > > entries=22417, filesize=127539793. for > > > > > > /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052 > > > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > slept 28183ms instead of 3000ms, this is likely due to a long garbage > > > collecting pause and it's usually bad, see > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc > > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > > 10.20.127.21:56058 > > > > > > ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"} > > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc > > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > > 10.20.127.21:56529 > > > > > > ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"} > > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc > > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > > 10.20.127.21:56146 > > > > > > ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"} > > > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc > > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > > 10.20.127.21:56069 > > > > > > ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"} > > > 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc > > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > > 10.20.127.21:56586 > > > > > > ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"} > > > 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > > > > {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a > > ), > > > rpc version=1, cl
Re: RegionServers Crashing every hour in production env
Guys, thank you very much for the help. Yesterday I spent 14 hours trying to tune the whole cluster. The cluster is not ready yet needs a lot of tunning, but at least is working. My first big problem was namenode + datanode GC. They were not using CMS and thus were taking "incremental" time to run. Ii started in 0.01 ms and in 20 minutes was taking 150 secs. After setting CMSGC this time is much smaller taking a maximum of 70 secs, which is VERY HIGH, but for now does not stop HBase. With this issue solved, it was clear that the RS was doing a long pause GC, taking up to 220 secs. Zookeeper expired the RS and it shutdown. I tried a lot of different flags configuration (MORE than 20), and could not get small gcs. Eventually it would take more than 150 secs (zookeeper timeout) and shutdown. Finally I tried a config that so far, 12 hours, is working with a maximum GC time of 90 secs. Which of course is a terrible problem since HBase is a database, but at least the cluster is stable while I can tune it a little more. In my opinion, my biggest problem is to have a few "monster" machines in the cluster instead of a bunch of commodities machines. I don't know if there are a lot companies using this kind of machines inside a hadoop cluster, but a fast search on google could not find a lot of tunes for big heap GCs. I guess my next step will be search for big heap gc tuning. Back to some questions ;) > You have ganglia or tsdb running? I use zabbix for now, and no there is nothing going on when the big pause happens. > When you see the big pause above, can you see anything going on on the > machine? (swap, iowait, concurrent fat mapreduce job?) > what are you doing during long GC happened? read or write? if reading, what > the block cache size? The cpu for the RS process goes to 100% and the logs "pause", until it gets out. Ex: [NewPar IO and SWAP are normal. There is no MR running, just normal database load, which is very low. I am probably doing reads AND writes to the database with default block cache size. One problem in this moment might be the big number of regions (1252) since I am using only one RS to be able to track the problem. The links and ideas were very helpful. Thank you very much guys. I will post my future researches as I find a solution ;) If you have more ideas or info (links, flag suggestions, etc.), please post it :) Abs, Pablo On 03/10/2013 11:24 PM, Andrew Purtell wrote: Be careful with GC tuning, throwing changes at an application without analysis of what is going on with the heap is shooting in the dark. One particular good treatment of the subject is here: http://java.dzone.com/articles/how-tame-java-gc-pauses If you have made custom changes to blockcache or memstore configurations, back them out until you're sure everything else is ok. Watch carefully for swapping. Set the vm.swappiness sysctl to 0. Monitor for spikes in page scanning or any swap activity. Nothing brings on "Juliette" pauses better than a JVM partially swapped out. The Java GC starts collection by examining the oldest pages, and those are the first pages the OS swaps out... On Mon, Mar 11, 2013 at 10:13 AM, Azuryy Yu wrote: Hi Pablo, It'a terrible for a long minor GC. I don't think there are swaping from your vmstat log. but I just suggest you 1) add following JVM options: -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2 -XX:MaxTenuringThreshold=3 -XX:+UseFastAccessorMethods 2) -Xmn is two small, your total Mem is 74GB, just make -Xmn2g 3) what are you doing during long GC happened? read or write? if reading, what the block cache size? On Mon, Mar 11, 2013 at 6:41 AM, Stack wrote: You could increase your zookeeper session timeout to 5 minutes while you are figuring why these long pauses. http://hbase.apache.org/book.html#zookeeper.session.timeout Above, there is an outage for almost 5 minutes: We slept 225100ms instead of 3000ms, this is likely due to a long You have ganglia or tsdb running? When you see the big pause above, can you see anything going on on the machine? (swap, iowait, concurrent fat mapreduce job?) St.Ack On Sun, Mar 10, 2013 at 3:29 PM, Pablo Musa wrote: Hi Sreepathi, they say in the book (or the site), we could try it to see if it is really a timeout error or there is something more. But it is not recomended for production environments. I could give it a try if five minutes will ensure to us that the problem is the GC or elsewhere!! Anyway, I think it is hard to beleive a GC is taking 2:30 minutes. Abs, Pablo On 03/10/2013 04:06 PM, Sreepathi wrote: Hi Stack/Ted/Pablo, Should we increase the hbase.rpc.timeout property to 5 minutes ( 30 ms ) ? Regards, - Sreepathi On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa wrote: That combo should be fine. Great!! If JVM is full GC'ing, the application is stopped. The below does not look like a full GC
Re: Does a major compact flush memstore?
Did you try from java api? If flush does not happen we may need to fix it. Regards RAm On Tue, Mar 12, 2013 at 1:04 PM, Liu, Raymond wrote: > It seems to me that a major_compact table command from hbase shell do not > fush memstore? When I done with major compact, still some data in memstore > and will be flush out to disk when I shut down hbase cluster. > > Best Regards, > Raymond Liu > >
RE: region server down when scanning using mapreduce
How is the GC pattern in your RSs which are getting down? In RS logs you might be having YouAreDeadExceptions... Pls try tuning your RS memory and GC opts. -Anoop- From: Lu, Wei [w...@microstrategy.com] Sent: Tuesday, March 12, 2013 1:42 PM To: user@hbase.apache.org Subject: RE: region server down when scanning using mapreduce We turned the block cache to false and tried again, regionserver still crash one after another. There are a lot of scanner lease time out, and then master log info: RegionServer ephemeral node deleted, processing expiration [rs21,60020,1363010589837] Seems the problem is not caused by block cache Thanks -Original Message- From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Tuesday, March 12, 2013 1:41 PM To: user@hbase.apache.org Subject: Re: region server down when scanning using mapreduce please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to get some background of block cache. On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei wrote: > No, does block cache matter? Btw, the mr dump is a mr program we > implemented rather than the hbase tool. > > Thanks > > -Original Message- > From: Azuryy Yu [mailto:azury...@gmail.com] > Sent: Tuesday, March 12, 2013 1:18 PM > To: user@hbase.apache.org > Subject: Re: region server down when scanning using mapreduce > > did you closed block cache when you used mr dump? > On Mar 12, 2013 1:06 PM, "Lu, Wei" wrote: > > > Hi, > > > > When we use mapreduce to dump data from a pretty large table on hbase. > One > > region server crash and then another. Mapreduce is deployed together with > > hbase. > > > > 1) From log of the region server, there are both "next" and "multi" > > operations on going. Is it because there is write/read conflict that > cause > > scanner timeout? > > 2) Region server has 24 cores, and # max map tasks is 24 too; the table > > has about 30 regions (each of size 0.5G) on the region server, is it > > because cpu is all used by mapreduce and that case region server slow and > > then timeout? > > 2) current hbase.regionserver.handler.count is 10 by default, should it > be > > enlarged? > > > > Please give us some advices. > > > > Thanks, > > Wei > > > > > > Log information: > > > > > > [Regionserver rs21:] > > > > 2013-03-11 18:36:28,148 INFO > > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/ > > adcbg21.machine.wisdom.com > ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488, > > entries=22417, filesize=127539793. for > > > /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052 > > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We > > slept 28183ms instead of 3000ms, this is likely due to a long garbage > > collecting pause and it's usually bad, see > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56058 > > > ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"} > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56529 > > > ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"} > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56146 > > > ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"} > > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56069 > > > ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"} > > 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56586 > > > ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"} > > 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > > {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d1
RE: region server down when scanning using mapreduce
We turned the block cache to false and tried again, regionserver still crash one after another. There are a lot of scanner lease time out, and then master log info: RegionServer ephemeral node deleted, processing expiration [rs21,60020,1363010589837] Seems the problem is not caused by block cache Thanks -Original Message- From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Tuesday, March 12, 2013 1:41 PM To: user@hbase.apache.org Subject: Re: region server down when scanning using mapreduce please read here http://hbase.apache.org/book.html (11.8.5. Block Cache) to get some background of block cache. On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei wrote: > No, does block cache matter? Btw, the mr dump is a mr program we > implemented rather than the hbase tool. > > Thanks > > -Original Message- > From: Azuryy Yu [mailto:azury...@gmail.com] > Sent: Tuesday, March 12, 2013 1:18 PM > To: user@hbase.apache.org > Subject: Re: region server down when scanning using mapreduce > > did you closed block cache when you used mr dump? > On Mar 12, 2013 1:06 PM, "Lu, Wei" wrote: > > > Hi, > > > > When we use mapreduce to dump data from a pretty large table on hbase. > One > > region server crash and then another. Mapreduce is deployed together with > > hbase. > > > > 1) From log of the region server, there are both "next" and "multi" > > operations on going. Is it because there is write/read conflict that > cause > > scanner timeout? > > 2) Region server has 24 cores, and # max map tasks is 24 too; the table > > has about 30 regions (each of size 0.5G) on the region server, is it > > because cpu is all used by mapreduce and that case region server slow and > > then timeout? > > 2) current hbase.regionserver.handler.count is 10 by default, should it > be > > enlarged? > > > > Please give us some advices. > > > > Thanks, > > Wei > > > > > > Log information: > > > > > > [Regionserver rs21:] > > > > 2013-03-11 18:36:28,148 INFO > > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/ > > adcbg21.machine.wisdom.com > ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488, > > entries=22417, filesize=127539793. for > > > /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052 > > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We > > slept 28183ms instead of 3000ms, this is likely due to a long garbage > > collecting pause and it's usually bad, see > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56058 > > > ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"} > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56529 > > > ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"} > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56146 > > > ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"} > > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56069 > > > ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"} > > 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc > > version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.127.21:56586 > > > ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"} > > 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > > {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a > ), > > rpc version=1, client version=29, methodsFingerPrint=54742778","client":" > > 10.20.109.21:35342 > > > ","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"} > > 2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer: > > (responseTooSlow): > > > {"processingtimems":38813,"call":"multi(org.apac
Does a major compact flush memstore?
It seems to me that a major_compact table command from hbase shell do not fush memstore? When I done with major compact, still some data in memstore and will be flush out to disk when I shut down hbase cluster. Best Regards, Raymond Liu
RE: Regionserver goes down while endpoint execution
Lars, It is having following errors when I execute the Endpoint RPC client from eclipse. It seems some of the regions at regionserver vm-8aa9-fe74.nam.nsroot.net is taking more time to reponse. Could you guide how to fix it. I don't find any option to set hbase.rpc.timeout from hbase configuration menu in CDH4 CM server for hbase configuration. Regards, Deepak 3/03/12 02:33:12 INFO zookeeper.ClientCnxn: Session establishment complete on server vm-15c2-3bbf.nam.nsroot.net/10.96.172.44:2181, sessionid = 0x53d591b77090026, negotiated timeout = 6 Mar 12, 2013 2:33:13 AM org.apache.hadoop.conf.Configuration warnOnceIfDeprecated WARNING: hadoop.native.lib is deprecated. Instead, use io.native.lib.available Mar 12, 2013 2:44:00 AM org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation processExecs WARNING: Error executing for row 153299:1362780381523:2932572079500658:vm-ab1f-dd21.nam.nsroot.net: java.util.concurrent.ExecutionException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=10, exceptions: Tue Mar 12 02:34:15 EDT 2013, org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, java.net.SocketTimeoutException: Call to vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/150.110.96.212:2271 remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] Tue Mar 12 02:35:16 EDT 2013, org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, java.net.SocketTimeoutException: Call to vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/150.110.96.212:2403 remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] Tue Mar 12 02:36:18 EDT 2013, org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, java.net.SocketTimeoutException: Call to vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/150.110.96.212:2465 remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] Tue Mar 12 02:37:20 EDT 2013, org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, java.net.SocketTimeoutException: Call to vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/150.110.96.212:2500 remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] Tue Mar 12 02:38:22 EDT 2013, org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, java.net.SocketTimeoutException: Call to vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/150.110.96.212:2538 remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] Tue Mar 12 02:39:25 EDT 2013, org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, java.net.SocketTimeoutException: Call to vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/150.110.96.212:2572 remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] Tue Mar 12 02:40:30 EDT 2013, org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, java.net.SocketTimeoutException: Call to vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/150.110.96.212:2606 remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] Tue Mar 12 02:41:34 EDT 2013, org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, java.net.SocketTimeoutException: Call to vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/150.110.96.212:2640 remote=vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020] Tue Mar 12 02:42:43 EDT 2013, org.apache.hadoop.hbase.ipc.ExecRPCInvoker$1@39443f, java.net.SocketTimeoutException: Call to vm-8aa9-fe74.nam.nsroot.net/10.42.105.91:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 6 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected