[jira] [Created] (HBASE-11083) ExportSnapshot should provide capability to limit bandwidth consumption
Ted Yu created HBASE-11083: -- Summary: ExportSnapshot should provide capability to limit bandwidth consumption Key: HBASE-11083 URL: https://issues.apache.org/jira/browse/HBASE-11083 Project: HBase Issue Type: Improvement Components: snapshots Reporter: Ted Yu This capability was first brought up in this thread: http://search-hadoop.com/m/DHED4Td8Xb1 The rewritten distcp already provides this capability. See MAPREDUCE-2765 distcp implementation utilizes ThrottledInputStream which provides bandwidth throttling on a specified InputStream. As a first step, we can * add an option to ExportSnapshot which expresses bandwidth per map in MB * utilize ThrottledInputStream in ExportSnapshot#ExportMapper#copyFile(). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HBASE-10957) HBASE-10070: HMaster can abort with NPE in #rebuildUserRegions
[ https://issues.apache.org/jira/browse/HBASE-10957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar resolved HBASE-10957. --- Resolution: Fixed Hadoop Flags: Reviewed I've committed this to branch. > HBASE-10070: HMaster can abort with NPE in #rebuildUserRegions > --- > > Key: HBASE-10957 > URL: https://issues.apache.org/jira/browse/HBASE-10957 > Project: HBase > Issue Type: Sub-task > Components: master >Affects Versions: hbase-10070 >Reporter: Nicolas Liochon >Assignee: Nicolas Liochon > Fix For: hbase-10070 > > Attachments: 10957.v1.patch > > > Seen during tests. The fix is to test this condition as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: The builds.apache.org grind
And yet the reason the builds.apache.org builds are failing, as opposed to tests I run on VMs elsewhere and locally, is because builds.apache.org is becoming more and more loaded over time. So give me a break about the "stability" of the 0.98 build. You give people a false impression. On Fri, Apr 25, 2014 at 3:47 PM, Ted Yu wrote: > Looking at https://builds.apache.org/job/hbase-0.98/ , there were 9 failed > builds out of the last 17 builds. > The success rate for > https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/was even lower. > > I think effort of making the builds, especially hbase-0.98, more stable > should be considered. > > My two cents. > > > On Fri, Apr 25, 2014 at 3:13 PM, Andrew Purtell > wrote: > > > Do we keep filing the "TestFoo occasionally fails on builds.apache.org > " > > type of issues as builds.apache.org gets slower and slower? We can see > the > > build results independent of JIRA so for documentary purposes the > rationale > > seems light. > > > > I run the 0.98 unit test suite 20 times daily on JDK 6 and 7 boxes and > have > > not observed failures or zombies for a while now. Those EC2 VMs are > clearly > > reasonable test environments compared to builds.apache.org, sadly. I'm > > tempted to close any test issue reporting something on > > builds.apache.orgthat I don't see as Cannot Reproduce but wonder how > > common that feeling is. > > > > Of course small patches to increase a timeout here or retry more often > > there could be useful and acceptable. At the same time, do we increase > the > > tolerances for builds.apache.org and trade away the effectiveness of the > > test to catch real timing issues? > > > > > > -- > > Best regards, > > > >- Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: The builds.apache.org grind
My 2 cents.. should the test runners have profiles like "ASF build" vs "EC2 large m/c" or something, from which the appropriate timeouts are derived, and for ASF timeouts are longer than for custom envs? Or that would make the whole test infra less trustworthy? -Mikhail 2014-04-25 15:47 GMT-07:00 Ted Yu : > Looking at https://builds.apache.org/job/hbase-0.98/ , there were 9 failed > builds out of the last 17 builds. > The success rate for > https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/was even lower. > > I think effort of making the builds, especially hbase-0.98, more stable > should be considered. > > My two cents. > > > On Fri, Apr 25, 2014 at 3:13 PM, Andrew Purtell > wrote: > > > Do we keep filing the "TestFoo occasionally fails on builds.apache.org > " > > type of issues as builds.apache.org gets slower and slower? We can see > the > > build results independent of JIRA so for documentary purposes the > rationale > > seems light. > > > > I run the 0.98 unit test suite 20 times daily on JDK 6 and 7 boxes and > have > > not observed failures or zombies for a while now. Those EC2 VMs are > clearly > > reasonable test environments compared to builds.apache.org, sadly. I'm > > tempted to close any test issue reporting something on > > builds.apache.orgthat I don't see as Cannot Reproduce but wonder how > > common that feeling is. > > > > Of course small patches to increase a timeout here or retry more often > > there could be useful and acceptable. At the same time, do we increase > the > > tolerances for builds.apache.org and trade away the effectiveness of the > > test to catch real timing issues? > > > > > > -- > > Best regards, > > > >- Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > -- Thanks, Michael Antonov
[jira] [Resolved] (HBASE-10960) Enhance HBase Thrift 1 to include "append" and "checkAndPut" operations
[ https://issues.apache.org/jira/browse/HBASE-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-10960. Resolution: Fixed Committed missing file, verified compilation. Thanks Srikanth. > Enhance HBase Thrift 1 to include "append" and "checkAndPut" operations > --- > > Key: HBASE-10960 > URL: https://issues.apache.org/jira/browse/HBASE-10960 > Project: HBase > Issue Type: Improvement > Components: Thrift >Reporter: Srikanth Srungarapu >Assignee: Srikanth Srungarapu > Fix For: 0.99.0 > > Attachments: HBASE-10960.patch, hbase-10960.v3.patch > > > Both append, and checkAndPut functionalities are available in Thrift 2 > interface, but not in Thrift. So, adding the support for these > functionalities in Thrift1 too. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: The builds.apache.org grind
Looking at https://builds.apache.org/job/hbase-0.98/ , there were 9 failed builds out of the last 17 builds. The success rate for https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/was even lower. I think effort of making the builds, especially hbase-0.98, more stable should be considered. My two cents. On Fri, Apr 25, 2014 at 3:13 PM, Andrew Purtell wrote: > Do we keep filing the "TestFoo occasionally fails on builds.apache.org" > type of issues as builds.apache.org gets slower and slower? We can see the > build results independent of JIRA so for documentary purposes the rationale > seems light. > > I run the 0.98 unit test suite 20 times daily on JDK 6 and 7 boxes and have > not observed failures or zombies for a while now. Those EC2 VMs are clearly > reasonable test environments compared to builds.apache.org, sadly. I'm > tempted to close any test issue reporting something on > builds.apache.orgthat I don't see as Cannot Reproduce but wonder how > common that feeling is. > > Of course small patches to increase a timeout here or retry more often > there could be useful and acceptable. At the same time, do we increase the > tolerances for builds.apache.org and trade away the effectiveness of the > test to catch real timing issues? > > > -- > Best regards, > >- Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
Re: The builds.apache.org grind
On Fri, Apr 25, 2014 at 3:13 PM, Andrew Purtell wrote: > do we increase the tolerances for builds.apache.org and trade away the > effectiveness of the test to catch real timing issues? > I wonder about this often.
[jira] [Reopened] (HBASE-10960) Enhance HBase Thrift 1 to include "append" and "checkAndPut" operations
[ https://issues.apache.org/jira/browse/HBASE-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reopened HBASE-10960: I think this commit broke the trunk build {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hbase-thrift: Compilation failure: Compilation failure: [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java:[81,47] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: package org.apache.hadoop.hbase.thrift.generated [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[629,30] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: interface Iface [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java:[1493,30] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class HBaseHandler [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftUtilities.java:[40,47] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: package org.apache.hadoop.hbase.thrift.generated [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftUtilities.java:[215,40] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class ThriftUtilities [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[741,23] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: interface AsyncIface [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[3666,23] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class AsyncClient [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[3674,14] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class append_call [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[3675,25] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class append_call [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[1951,30] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class Client [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[1957,28] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class Client [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53476,11] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class append_args [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53553,6] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class append_args [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53580,11] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class append_args [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53587,33] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class append_args [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53544,98] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class append_args [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53564,26] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class append_args [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53613,21] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class append_args [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53765,36] error: cannot find symbol [ERROR] symbol: class TAppend [ERROR] location: class append_argsStandardScheme [ERROR] /usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53824,30] error: cannot find symbol [ERROR] -> [Help 1] {noformat} > Enhance HBase Thrift 1 to include "append" and "checkAndPut" operations > --- > > Key: HBASE-10960 > URL: https://i
The builds.apache.org grind
Do we keep filing the "TestFoo occasionally fails on builds.apache.org" type of issues as builds.apache.org gets slower and slower? We can see the build results independent of JIRA so for documentary purposes the rationale seems light. I run the 0.98 unit test suite 20 times daily on JDK 6 and 7 boxes and have not observed failures or zombies for a while now. Those EC2 VMs are clearly reasonable test environments compared to builds.apache.org, sadly. I'm tempted to close any test issue reporting something on builds.apache.orgthat I don't see as Cannot Reproduce but wonder how common that feeling is. Of course small patches to increase a timeout here or retry more often there could be useful and acceptable. At the same time, do we increase the tolerances for builds.apache.org and trade away the effectiveness of the test to catch real timing issues? -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
[jira] [Created] (HBASE-11082) Potential unclosed TraceScope in FSHLog#replaceWriter()
Ted Yu created HBASE-11082: -- Summary: Potential unclosed TraceScope in FSHLog#replaceWriter() Key: HBASE-11082 URL: https://issues.apache.org/jira/browse/HBASE-11082 Project: HBase Issue Type: Bug Reporter: Ted Yu Priority: Minor In the finally block starting at line 924: {code} } finally { // Let the writer thread go regardless, whether error or not. if (zigzagLatch != null) { zigzagLatch.releaseSafePoint(); // It will be null if we failed our wait on safe point above. if (syncFuture != null) blockOnSync(syncFuture); } scope.close(); {code} If blockOnSync() throws IOException, the TraceScope would be left unclosed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11081) Trunk Master won't start; looking for Constructor that takes conf only
stack created HBASE-11081: - Summary: Trunk Master won't start; looking for Constructor that takes conf only Key: HBASE-11081 URL: https://issues.apache.org/jira/browse/HBASE-11081 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.99.0 Committing the Consensus Infra, we broke starting master. Small fix so constructMaster passes in a ConsensusProvider. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11080) TestZKSecretWatcher#testKeyUpdate occasionally fails
Ted Yu created HBASE-11080: -- Summary: TestZKSecretWatcher#testKeyUpdate occasionally fails Key: HBASE-11080 URL: https://issues.apache.org/jira/browse/HBASE-11080 Project: HBase Issue Type: Test Affects Versions: 0.98.1 Reporter: Ted Yu Priority: Minor >From >https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/280/testReport/junit/org.apache.hadoop.hbase.security.token/TestZKSecretWatcher/testKeyUpdate/ > : {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertNotNull(Assert.java:621) at org.junit.Assert.assertNotNull(Assert.java:631) at org.apache.hadoop.hbase.security.token.TestZKSecretWatcher.testKeyUpdate(TestZKSecretWatcher.java:221) {code} Here is the assertion that failed: {code} assertNotNull(newMaster); {code} Looks like new master did not come up within 5 tries. One potential fix is to increase the number of attempts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11079) Normalize test tools across branches
Andrew Purtell created HBASE-11079: -- Summary: Normalize test tools across branches Key: HBASE-11079 URL: https://issues.apache.org/jira/browse/HBASE-11079 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Will be a challenge wherever the branches vary functionally, but it would be good to normalize the test tools (LoadTestTool and PerformanceEvaluation) as much as possible among the active branches so we can compare them. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HBASE-10932) Improve RowCounter to allow mapper number set/control
[ https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans resolved HBASE-10932. Resolution: Won't Fix Resolving as won't fix. If you want to work on a more general solution, like adding this option to the TIF, please open a new jira. Thanks. > Improve RowCounter to allow mapper number set/control > - > > Key: HBASE-10932 > URL: https://issues.apache.org/jira/browse/HBASE-10932 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Reporter: Yu Li >Assignee: Yu Li >Priority: Minor > Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch > > > The typical use case of RowCounter is to do some kind of data integrity > checking, like after exporting some data from RDBMS to HBase, or from one > HBase cluster to another, making sure the row(record) number matches. Such > check commonly won't require much on response time. > Meanwhile, based on current impl, RowCounter will launch one mapper per > region, and each mapper will send one scan request. Assuming the table is > kind of big like having tens of regions, and the cpu core number of the whole > MR cluster is also enough, the parallel scan requests sent by mapper would be > a real burden for the HBase cluster. > So in this JIRA, we're proposing to make rowcounter support an additional > option "--maps" to specify mapper number, and make each mapper able to scan > more than one region of the target table. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11077) [AccessController] Restore compatible early-out access denial
Andrew Purtell created HBASE-11077: -- Summary: [AccessController] Restore compatible early-out access denial Key: HBASE-11077 URL: https://issues.apache.org/jira/browse/HBASE-11077 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Assignee: Andrew Purtell Fix For: 0.99.0, 0.98.2 See parent for the whole story. For 0.98, to start, just put back the early out that was removed in 0.98.0 and allow it to be overridden with a table attribute. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: 0.98.2
Unlikely, there are open issues that could go in before or on Monday the 28th. On Fri, Apr 25, 2014 at 11:38 AM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > Hey Andrew, any chance to get 0.98.2 today so I will have something to do > this week-end? ;) > > JM > > > 2014-04-19 11:28 GMT-04:00 lars hofhansl : > > > And 0.94.19 is due as well. Planning an RC on Monday, that way we do not > > have the RCs at the same time. > > > > -- Lars > > > > > > > > > > From: Andrew Purtell > > To: "dev@hbase.apache.org" > > Sent: Saturday, April 19, 2014 7:28 AM > > Subject: 0.98.2 > > > > > > I'd like to start the RC for 0.98.2 at the end of the month. I'm thinking > > next weekend with voting concluded (if nothing sinks the RC) by the > > following weekend, so the 3rd or 4th of May, just in time for HBaseCon. > > > > If there are any criticals or blockers for 0.98.2, can we get them in > this > > week? Thanks! > > > > -- > > Best regards, > > > >- Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
[jira] [Created] (HBASE-11078) [AccessController] Consider new permission for "read visible"
Andrew Purtell created HBASE-11078: -- Summary: [AccessController] Consider new permission for "read visible" Key: HBASE-11078 URL: https://issues.apache.org/jira/browse/HBASE-11078 Project: HBase Issue Type: Sub-task Reporter: Andrew Purtell Fix For: 0.99.0 See parent for the whole story. Consider a new permission with the semantics "being able to read only granted cells", perhaps called READ_VISIBLE. Maybe consider a symmetric new permission for writes. The lack of default READ perm should prevent users from launching scanners. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: 0.98.2
Hey Andrew, any chance to get 0.98.2 today so I will have something to do this week-end? ;) JM 2014-04-19 11:28 GMT-04:00 lars hofhansl : > And 0.94.19 is due as well. Planning an RC on Monday, that way we do not > have the RCs at the same time. > > -- Lars > > > > > From: Andrew Purtell > To: "dev@hbase.apache.org" > Sent: Saturday, April 19, 2014 7:28 AM > Subject: 0.98.2 > > > I'd like to start the RC for 0.98.2 at the end of the month. I'm thinking > next weekend with voting concluded (if nothing sinks the RC) by the > following weekend, so the 3rd or 4th of May, just in time for HBaseCon. > > If there are any criticals or blockers for 0.98.2, can we get them in this > week? Thanks! > > -- > Best regards, > >- Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
Re: Error in RS with 0.94.8
Did you set replication to 1? The following error message indicates that the default replication is set to 1: could only be replicated to 0 nodes, instead of 1 In that case, losing a datanode would mean blocks will be lost. Enis On Fri, Apr 25, 2014 at 1:32 AM, Álvaro Recuero wrote: > Data nodes are fine. Actually the Region server on that serverx is the > solely one dead afterwards. Datanode is up, and HDFS reporting healthy > status. Interesting that is possible. > > I have steadily come across the problem again testing a new HBase cluster, > so yes, I would bet the problem is in HDFS somehow. Probably something is > missing yes. > > 2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block null bad datanode[0] nodes == null > 2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Could not > get block locations. Source file > > "/hbase/.logs/serverx,1398350408274/serverx%2C60020%2C1398350408274.1398350409004" > - Aborting... > 2014-04-24 17:59:30,003 ERROR > org.apache.hadoop.hbase.regionserver.wal.HLog: syncer encountered error, > will retry. txid=1 > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > > /hbase/.logs/serverx,60020,1398350408274/serverx%2C60020%2C1398350408274.1398350409004 > could only be replicated to 0 nodes, instead of 1 > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) > at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) > at java.security.AccessController.doPrivileged(Native Method) > at javax.securitWrite failed: Broken pipect.java:416) > > > On 5 April 2014 21:58, Álvaro Recuero wrote: > > > Yes Esteban I have checked the health of the datanodes from the master > > in the hadoop console. Nothing seems really wrong to cause this, even > > though one data-node is apparently lost along with the RS in the process > of > > inserting 50 Million updates... the other 11 are there, up and running so > > it should pick-up next and that is it (as long as it is replicating as it > > should through the HDFS pipelining process). I thought of HBase > > writes-key-hotspotting or some problem in the Hadoop namenode, so > checking > > this out now... > > > > I will keep investigating and let you know, in fact my first thought was > > same as yours too but ./hadoop fsck / is showing all "active" nodes are > > healthy nodes, and no file-system level inconsistencies are detected > (first > > thing I checked before sending the post). Of course running the HBase > hbck > > consistency check from the command line behaves differently, missing the > > mentioned RS in place and throws corresponding exception log that is > a > > weird one then... I might check the name node before I get back to you on > > this. I can't think of anything else as of now. Space is not unlimited, > yet > > sufficient in each of the data-nodes (12) but getting close to its limit > in > > the mentioned dead RS so yes writes are yet not very balanced but > > definitely not the issue as I understand. > > > > > > On 5 April 2014 19:16, Esteban Gutierrez wrote: > > > >> Álvaro, > >> > >> Have you checked for the health of HDFS? Maybe your cluster ran out of > >> space or you don't have data nodes running. > >> > >> Esteban > >> > >> > On Apr 5, 2014, at 10:11, haosdent wrote: > >> > > >> > From the log informations, it seems you lost blocks. > >> > 2014-4-6 上午12:38于 "Álvaro Recuero" 写道: > >> > > >> >> has anyone come across this before? there is still space in the RS > and > >> this > >> >> is not a problem of datanodes availability as I can confirm. cheers > >> >> > >> >> 2014-04-05 09:55:19,210 DEBUG > >> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using > >> new > >> >> createWriter -- HADOOP-6840 > >> >> 2014-04-05 09:55:19,211 DEBUG > >> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: > >> >> Path=hdfs:// > >> >> taurus-5.lyon.grid5000.fr: > >> >> > >> >> > >> > 9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/2550928.temp, > >> >> syncFs=true, hflush=false, compressi > >> >> on=false > >> >> 2014-04-05 09:55:19,211 DEBUG > >> >> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating > writer > >> >> path=hdfs://taurus-5.lyon.grid5 > >> >> > >> >> > >> > 000.fr:9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/2550928.tempregion=fc55e2d2d4bcec49d6fedf5 > >> >> a46935
[jira] [Resolved] (HBASE-10923) Control where to put meta region
[ https://issues.apache.org/jira/browse/HBASE-10923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HBASE-10923. - Resolution: Won't Fix Close it as Won't Fix. Let's keep meta together with master for now. > Control where to put meta region > > > Key: HBASE-10923 > URL: https://issues.apache.org/jira/browse/HBASE-10923 > Project: HBase > Issue Type: Improvement >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > > There is a concern on placing meta regions on the master, as in the comments > of HBASE-10569. I was thinking we should have a configuration for a load > balancer to decide where to put it. Adjusting this configuration we can > control whether to put the meta on master, or other region server. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11076) Update refguide on getting 0.94.x to run on hadoop 2.2.0+
Ted Yu created HBASE-11076: -- Summary: Update refguide on getting 0.94.x to run on hadoop 2.2.0+ Key: HBASE-11076 URL: https://issues.apache.org/jira/browse/HBASE-11076 Project: HBase Issue Type: Task Reporter: Ted Yu http://hbase.apache.org/book.html#d248e643 contains steps for rebuilding 0.94 code base to run on hadoop 2.2.0+ However, the files under src/main/java/org/apache/hadoop/hbase/protobuf/generated were produced by protoc 2.4.0 These files need to be regenerated. See http://search-hadoop.com/m/DHED4j7Um02/HBase+0.94+on+hadoop+2.2.0&subj=Re+HBase+0+94+on+hadoop+2+2+0+2+4+0+ This issue is to update refguide with this regeneration step. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11075) TestVisibilityLabelsWithDistributedLogReplay is failing in Precommit builds frequently
ramkrishna.s.vasudevan created HBASE-11075: -- Summary: TestVisibilityLabelsWithDistributedLogReplay is failing in Precommit builds frequently Key: HBASE-11075 URL: https://issues.apache.org/jira/browse/HBASE-11075 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Latest precommit builds I could see TestVisibilityLabelsWithDistributedLogReplay failing frequently. Need to identify the root cause and fix it. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Error in RS with 0.94.8
Data nodes are fine. Actually the Region server on that serverx is the solely one dead afterwards. Datanode is up, and HDFS reporting healthy status. Interesting that is possible. I have steadily come across the problem again testing a new HBase cluster, so yes, I would bet the problem is in HDFS somehow. Probably something is missing yes. 2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Could not get block locations. Source file "/hbase/.logs/serverx,1398350408274/serverx%2C60020%2C1398350408274.1398350409004" - Aborting... 2014-04-24 17:59:30,003 ERROR org.apache.hadoop.hbase.regionserver.wal.HLog: syncer encountered error, will retry. txid=1 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /hbase/.logs/serverx,60020,1398350408274/serverx%2C60020%2C1398350408274.1398350409004 could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.securitWrite failed: Broken pipect.java:416) On 5 April 2014 21:58, Álvaro Recuero wrote: > Yes Esteban I have checked the health of the datanodes from the master > in the hadoop console. Nothing seems really wrong to cause this, even > though one data-node is apparently lost along with the RS in the process of > inserting 50 Million updates... the other 11 are there, up and running so > it should pick-up next and that is it (as long as it is replicating as it > should through the HDFS pipelining process). I thought of HBase > writes-key-hotspotting or some problem in the Hadoop namenode, so checking > this out now... > > I will keep investigating and let you know, in fact my first thought was > same as yours too but ./hadoop fsck / is showing all "active" nodes are > healthy nodes, and no file-system level inconsistencies are detected (first > thing I checked before sending the post). Of course running the HBase hbck > consistency check from the command line behaves differently, missing the > mentioned RS in place and throws corresponding exception log that is a > weird one then... I might check the name node before I get back to you on > this. I can't think of anything else as of now. Space is not unlimited, yet > sufficient in each of the data-nodes (12) but getting close to its limit in > the mentioned dead RS so yes writes are yet not very balanced but > definitely not the issue as I understand. > > > On 5 April 2014 19:16, Esteban Gutierrez wrote: > >> Álvaro, >> >> Have you checked for the health of HDFS? Maybe your cluster ran out of >> space or you don't have data nodes running. >> >> Esteban >> >> > On Apr 5, 2014, at 10:11, haosdent wrote: >> > >> > From the log informations, it seems you lost blocks. >> > 2014-4-6 上午12:38于 "Álvaro Recuero" 写道: >> > >> >> has anyone come across this before? there is still space in the RS and >> this >> >> is not a problem of datanodes availability as I can confirm. cheers >> >> >> >> 2014-04-05 09:55:19,210 DEBUG >> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using >> new >> >> createWriter -- HADOOP-6840 >> >> 2014-04-05 09:55:19,211 DEBUG >> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: >> >> Path=hdfs:// >> >> taurus-5.lyon.grid5000.fr: >> >> >> >> >> 9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/2550928.temp, >> >> syncFs=true, hflush=false, compressi >> >> on=false >> >> 2014-04-05 09:55:19,211 DEBUG >> >> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer >> >> path=hdfs://taurus-5.lyon.grid5 >> >> >> >> >> 000.fr:9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/2550928.tempregion=fc55e2d2d4bcec49d6fedf5 >> >> a469353b9 >> >> 2014-04-05 09:55:19,233 DEBUG >> >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: tasks arrived or >> >> departed >> >> 2014-04-05 09:55:19,233 WARN org.apache.hadoop.hdfs.DFSClient: >> DataStreamer >> >> Exception: org.apache.hadoop.ipc.RemoteException: java.i >> >> o.IOException: File >> >> >> >> >> /hbase/usertable/237859a0b1e47c86c25a6123506ccb2a/recovered.edits/2550921.temp >> >> could only be replica >> >> ted to 0 nodes, instead of 1 >> >>at >> >> >> >> >> org.apa