[jira] [Created] (HBASE-11083) ExportSnapshot should provide capability to limit bandwidth consumption

2014-04-25 Thread Ted Yu (JIRA)
Ted Yu created HBASE-11083:
--

 Summary: ExportSnapshot should provide capability to limit 
bandwidth consumption
 Key: HBASE-11083
 URL: https://issues.apache.org/jira/browse/HBASE-11083
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Reporter: Ted Yu


This capability was first brought up in this thread:
http://search-hadoop.com/m/DHED4Td8Xb1

The rewritten distcp already provides this capability.
See MAPREDUCE-2765

distcp implementation utilizes ThrottledInputStream which provides bandwidth 
throttling on a specified InputStream.

As a first step, we can
* add an option to ExportSnapshot which expresses bandwidth per map in MB
* utilize ThrottledInputStream in ExportSnapshot#ExportMapper#copyFile().



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-10957) HBASE-10070: HMaster can abort with NPE in #rebuildUserRegions

2014-04-25 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar resolved HBASE-10957.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

I've committed this to branch.

> HBASE-10070: HMaster can abort with NPE in #rebuildUserRegions 
> ---
>
> Key: HBASE-10957
> URL: https://issues.apache.org/jira/browse/HBASE-10957
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Affects Versions: hbase-10070
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: hbase-10070
>
> Attachments: 10957.v1.patch
>
>
> Seen during tests. The fix is to test this condition as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: The builds.apache.org grind

2014-04-25 Thread Andrew Purtell
And yet the reason the builds.apache.org builds are failing, as opposed to
tests I run on VMs elsewhere and locally, is because builds.apache.org is
becoming more and more loaded over time. So give me a break about the
"stability" of the 0.98 build. You give people a false impression.


On Fri, Apr 25, 2014 at 3:47 PM, Ted Yu  wrote:

> Looking at https://builds.apache.org/job/hbase-0.98/ , there were 9 failed
> builds out of the last 17 builds.
> The success rate for
> https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/was even lower.
>
> I think effort of making the builds, especially hbase-0.98, more stable
> should be considered.
>
> My two cents.
>
>
> On Fri, Apr 25, 2014 at 3:13 PM, Andrew Purtell 
> wrote:
>
> > ​​Do we keep filing the "TestFoo occasionally fails on builds.apache.org
> "
> > type of issues as builds.apache.org gets slower and slower? We can see
> the
> > build results independent of JIRA so for documentary purposes the
> rationale
> > seems light.
> >
> > I run the 0.98 unit test suite 20 times daily on JDK 6 and 7 boxes and
> have
> > not observed failures or zombies for a while now. Those EC2 VMs are
> clearly
> > reasonable test environments compared to builds.apache.org, sadly. I'm
> > tempted to close any test issue reporting something on
> > builds.apache.orgthat I don't see as Cannot Reproduce but wonder how
> > common that feeling is.
> >
> > Of course small patches to increase a timeout here or retry more often
> > there could be useful and acceptable. At the same time, do we increase
> the
> > tolerances for builds.apache.org and trade away the effectiveness of the
> > test to catch real timing issues?
> >
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: The builds.apache.org grind

2014-04-25 Thread Mikhail Antonov
My 2 cents..

should the test runners have profiles like "ASF build" vs "EC2 large m/c"
or something,
from which the appropriate timeouts are derived, and for ASF timeouts are
longer than for custom envs? Or that would make the whole test infra less
trustworthy?

-Mikhail


2014-04-25 15:47 GMT-07:00 Ted Yu :

> Looking at https://builds.apache.org/job/hbase-0.98/ , there were 9 failed
> builds out of the last 17 builds.
> The success rate for
> https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/was even lower.
>
> I think effort of making the builds, especially hbase-0.98, more stable
> should be considered.
>
> My two cents.
>
>
> On Fri, Apr 25, 2014 at 3:13 PM, Andrew Purtell 
> wrote:
>
> > ​​Do we keep filing the "TestFoo occasionally fails on builds.apache.org
> "
> > type of issues as builds.apache.org gets slower and slower? We can see
> the
> > build results independent of JIRA so for documentary purposes the
> rationale
> > seems light.
> >
> > I run the 0.98 unit test suite 20 times daily on JDK 6 and 7 boxes and
> have
> > not observed failures or zombies for a while now. Those EC2 VMs are
> clearly
> > reasonable test environments compared to builds.apache.org, sadly. I'm
> > tempted to close any test issue reporting something on
> > builds.apache.orgthat I don't see as Cannot Reproduce but wonder how
> > common that feeling is.
> >
> > Of course small patches to increase a timeout here or retry more often
> > there could be useful and acceptable. At the same time, do we increase
> the
> > tolerances for builds.apache.org and trade away the effectiveness of the
> > test to catch real timing issues?
> >
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Thanks,
Michael Antonov


[jira] [Resolved] (HBASE-10960) Enhance HBase Thrift 1 to include "append" and "checkAndPut" operations

2014-04-25 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-10960.


Resolution: Fixed

Committed missing file, verified compilation. Thanks Srikanth. 

> Enhance HBase Thrift 1 to include "append" and "checkAndPut" operations
> ---
>
> Key: HBASE-10960
> URL: https://issues.apache.org/jira/browse/HBASE-10960
> Project: HBase
>  Issue Type: Improvement
>  Components: Thrift
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Fix For: 0.99.0
>
> Attachments: HBASE-10960.patch, hbase-10960.v3.patch
>
>
> Both append, and checkAndPut functionalities are available in Thrift 2 
> interface, but not in Thrift. So, adding the support for these 
> functionalities in Thrift1 too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: The builds.apache.org grind

2014-04-25 Thread Ted Yu
Looking at https://builds.apache.org/job/hbase-0.98/ , there were 9 failed
builds out of the last 17 builds.
The success rate for
https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/was even lower.

I think effort of making the builds, especially hbase-0.98, more stable
should be considered.

My two cents.


On Fri, Apr 25, 2014 at 3:13 PM, Andrew Purtell  wrote:

> ​​Do we keep filing the "TestFoo occasionally fails on builds.apache.org"
> type of issues as builds.apache.org gets slower and slower? We can see the
> build results independent of JIRA so for documentary purposes the rationale
> seems light.
>
> I run the 0.98 unit test suite 20 times daily on JDK 6 and 7 boxes and have
> not observed failures or zombies for a while now. Those EC2 VMs are clearly
> reasonable test environments compared to builds.apache.org, sadly. I'm
> tempted to close any test issue reporting something on
> builds.apache.orgthat I don't see as Cannot Reproduce but wonder how
> common that feeling is.
>
> Of course small patches to increase a timeout here or retry more often
> there could be useful and acceptable. At the same time, do we increase the
> tolerances for builds.apache.org and trade away the effectiveness of the
> test to catch real timing issues?
>
>
> --
> Best regards,
>
>- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>


Re: The builds.apache.org grind

2014-04-25 Thread Nick Dimiduk
On Fri, Apr 25, 2014 at 3:13 PM, Andrew Purtell  wrote:

> do we increase the tolerances for builds.apache.org and trade away the
> effectiveness of the test to catch real timing issues?
>

I wonder about this often.


[jira] [Reopened] (HBASE-10960) Enhance HBase Thrift 1 to include "append" and "checkAndPut" operations

2014-04-25 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reopened HBASE-10960:



I think this commit broke the trunk build

{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) 
on project hbase-thrift: Compilation failure: Compilation failure:
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java:[81,47]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: package org.apache.hadoop.hbase.thrift.generated
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[629,30]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: interface Iface
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java:[1493,30]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class HBaseHandler
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftUtilities.java:[40,47]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: package org.apache.hadoop.hbase.thrift.generated
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/ThriftUtilities.java:[215,40]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class ThriftUtilities
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[741,23]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: interface AsyncIface
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[3666,23]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class AsyncClient
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[3674,14]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class append_call
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[3675,25]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class append_call
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[1951,30]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class Client
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[1957,28]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class Client
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53476,11]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class append_args
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53553,6]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class append_args
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53580,11]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class append_args
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53587,33]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class append_args
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53544,98]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class append_args
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53564,26]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class append_args
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53613,21]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class append_args
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53765,36]
 error: cannot find symbol
[ERROR] symbol:   class TAppend
[ERROR] location: class append_argsStandardScheme
[ERROR] 
/usr/src/Hadoop/hbase/hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java:[53824,30]
 error: cannot find symbol
[ERROR] -> [Help 1]

{noformat}

> Enhance HBase Thrift 1 to include "append" and "checkAndPut" operations
> ---
>
> Key: HBASE-10960
> URL: https://i

The builds.apache.org grind

2014-04-25 Thread Andrew Purtell
​​Do we keep filing the "TestFoo occasionally fails on builds.apache.org"
type of issues as builds.apache.org gets slower and slower? We can see the
build results independent of JIRA so for documentary purposes the rationale
seems light.

I run the 0.98 unit test suite 20 times daily on JDK 6 and 7 boxes and have
not observed failures or zombies for a while now. Those EC2 VMs are clearly
reasonable test environments compared to builds.apache.org, sadly. I'm
tempted to close any test issue reporting something on
builds.apache.orgthat I don't see as Cannot Reproduce but wonder how
common that feeling is.

Of course small patches to increase a timeout here or retry more often
there could be useful and acceptable. At the same time, do we increase the
tolerances for builds.apache.org and trade away the effectiveness of the
test to catch real timing issues?


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


[jira] [Created] (HBASE-11082) Potential unclosed TraceScope in FSHLog#replaceWriter()

2014-04-25 Thread Ted Yu (JIRA)
Ted Yu created HBASE-11082:
--

 Summary: Potential unclosed TraceScope in FSHLog#replaceWriter()
 Key: HBASE-11082
 URL: https://issues.apache.org/jira/browse/HBASE-11082
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


In the finally block starting at line 924:
{code}
} finally {
  // Let the writer thread go regardless, whether error or not.
  if (zigzagLatch != null) {
zigzagLatch.releaseSafePoint();
// It will be null if we failed our wait on safe point above.
if (syncFuture != null) blockOnSync(syncFuture);
  }
  scope.close();
{code}
If blockOnSync() throws IOException, the TraceScope would be left unclosed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11081) Trunk Master won't start; looking for Constructor that takes conf only

2014-04-25 Thread stack (JIRA)
stack created HBASE-11081:
-

 Summary: Trunk Master won't start; looking for Constructor that 
takes conf only
 Key: HBASE-11081
 URL: https://issues.apache.org/jira/browse/HBASE-11081
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


Committing the Consensus Infra, we broke starting master.  Small fix so 
constructMaster passes in a ConsensusProvider.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11080) TestZKSecretWatcher#testKeyUpdate occasionally fails

2014-04-25 Thread Ted Yu (JIRA)
Ted Yu created HBASE-11080:
--

 Summary: TestZKSecretWatcher#testKeyUpdate occasionally fails
 Key: HBASE-11080
 URL: https://issues.apache.org/jira/browse/HBASE-11080
 Project: HBase
  Issue Type: Test
Affects Versions: 0.98.1
Reporter: Ted Yu
Priority: Minor


>From 
>https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/280/testReport/junit/org.apache.hadoop.hbase.security.token/TestZKSecretWatcher/testKeyUpdate/
> :
{code}
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:621)
at org.junit.Assert.assertNotNull(Assert.java:631)
at 
org.apache.hadoop.hbase.security.token.TestZKSecretWatcher.testKeyUpdate(TestZKSecretWatcher.java:221)
{code}
Here is the assertion that failed:
{code}
assertNotNull(newMaster);
{code}
Looks like new master did not come up within 5 tries.

One potential fix is to increase the number of attempts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11079) Normalize test tools across branches

2014-04-25 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-11079:
--

 Summary: Normalize test tools across branches
 Key: HBASE-11079
 URL: https://issues.apache.org/jira/browse/HBASE-11079
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell


Will be a challenge wherever the branches vary functionally, but it would be 
good to normalize the test tools (LoadTestTool and PerformanceEvaluation) as 
much as possible among the active branches so we can compare them. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-10932) Improve RowCounter to allow mapper number set/control

2014-04-25 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-10932.


Resolution: Won't Fix

Resolving as won't fix. If you want to work on a more general solution, like 
adding this option to the TIF, please open a new jira. Thanks.

> Improve RowCounter to allow mapper number set/control
> -
>
> Key: HBASE-10932
> URL: https://issues.apache.org/jira/browse/HBASE-10932
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch
>
>
> The typical use case of RowCounter is to do some kind of data integrity 
> checking, like after exporting some data from RDBMS to HBase, or from one 
> HBase cluster to another, making sure the row(record) number matches. Such 
> check commonly won't require much on response time.
> Meanwhile, based on current impl, RowCounter will launch one mapper per 
> region, and each mapper will send one scan request. Assuming the table is 
> kind of big like having tens of regions, and the cpu core number of the whole 
> MR cluster is also enough, the parallel scan requests sent by mapper would be 
> a real burden for the HBase cluster.
> So in this JIRA, we're proposing to make rowcounter support an additional 
> option "--maps" to specify mapper number, and make each mapper able to scan 
> more than one region of the target table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11077) [AccessController] Restore compatible early-out access denial

2014-04-25 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-11077:
--

 Summary: [AccessController] Restore compatible early-out access 
denial
 Key: HBASE-11077
 URL: https://issues.apache.org/jira/browse/HBASE-11077
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.99.0, 0.98.2


See parent for the whole story.

For 0.98, to start, just put back the early out that was removed in 0.98.0 and 
allow it to be overridden with a table attribute. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: 0.98.2

2014-04-25 Thread Andrew Purtell
Unlikely, there are open issues that could go in before or on Monday the
28th.


On Fri, Apr 25, 2014 at 11:38 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> Hey Andrew, any chance to get 0.98.2 today so I will have something to do
> this week-end? ;)
>
> JM
>
>
> 2014-04-19 11:28 GMT-04:00 lars hofhansl :
>
> > And 0.94.19 is due as well. Planning an RC on Monday, that way we do not
> > have the RCs at the same time.
> >
> > -- Lars
> >
> >
> >
> > 
> >  From: Andrew Purtell 
> > To: "dev@hbase.apache.org" 
> > Sent: Saturday, April 19, 2014 7:28 AM
> > Subject: 0.98.2
> >
> >
> > I'd like to start the RC for 0.98.2 at the end of the month. I'm thinking
> > next weekend with voting concluded (if nothing sinks the RC) by the
> > following weekend, so the 3rd or 4th of May, just in time for HBaseCon.
> >
> > If there are any criticals or blockers for 0.98.2, can we get them in
> this
> > week? Thanks!
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


[jira] [Created] (HBASE-11078) [AccessController] Consider new permission for "read visible"

2014-04-25 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-11078:
--

 Summary: [AccessController] Consider new permission for "read 
visible"
 Key: HBASE-11078
 URL: https://issues.apache.org/jira/browse/HBASE-11078
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
 Fix For: 0.99.0


See parent for the whole story.

Consider a new permission with the semantics "being able to read only granted 
cells", perhaps called READ_VISIBLE. 

Maybe consider a symmetric new permission for writes. 

The lack of default READ perm should prevent users from launching scanners.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: 0.98.2

2014-04-25 Thread Jean-Marc Spaggiari
Hey Andrew, any chance to get 0.98.2 today so I will have something to do
this week-end? ;)

JM


2014-04-19 11:28 GMT-04:00 lars hofhansl :

> And 0.94.19 is due as well. Planning an RC on Monday, that way we do not
> have the RCs at the same time.
>
> -- Lars
>
>
>
> 
>  From: Andrew Purtell 
> To: "dev@hbase.apache.org" 
> Sent: Saturday, April 19, 2014 7:28 AM
> Subject: 0.98.2
>
>
> I'd like to start the RC for 0.98.2 at the end of the month. I'm thinking
> next weekend with voting concluded (if nothing sinks the RC) by the
> following weekend, so the 3rd or 4th of May, just in time for HBaseCon.
>
> If there are any criticals or blockers for 0.98.2, can we get them in this
> week? Thanks!
>
> --
> Best regards,
>
>- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>


Re: Error in RS with 0.94.8

2014-04-25 Thread Enis Söztutar
Did you set replication to 1?

The following error message indicates that the default replication is set
to 1:

could only be replicated to 0 nodes, instead of 1


In that case, losing a datanode would mean blocks will be lost.

Enis


On Fri, Apr 25, 2014 at 1:32 AM, Álvaro Recuero  wrote:

> Data nodes are fine. Actually the Region server on that serverx is the
> solely one dead afterwards. Datanode is up, and HDFS reporting healthy
> status. Interesting that is possible.
>
> I have steadily come across the problem again testing a new HBase cluster,
> so yes, I would bet the problem is in HDFS somehow. Probably something is
> missing yes.
>
> 2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block null bad datanode[0] nodes == null
> 2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get block locations. Source file
>
> "/hbase/.logs/serverx,1398350408274/serverx%2C60020%2C1398350408274.1398350409004"
> - Aborting...
> 2014-04-24 17:59:30,003 ERROR
> org.apache.hadoop.hbase.regionserver.wal.HLog: syncer encountered error,
> will retry. txid=1
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
>
> /hbase/.logs/serverx,60020,1398350408274/serverx%2C60020%2C1398350408274.1398350409004
> could only be replicated to 0 nodes, instead of 1
> at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
> at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.securitWrite failed: Broken pipect.java:416)
>
>
> On 5 April 2014 21:58, Álvaro Recuero  wrote:
>
> > Yes Esteban  I have checked the health of the datanodes from the master
> > in the hadoop console. Nothing seems really wrong to cause this, even
> > though one data-node is apparently lost along with the RS in the process
> of
> > inserting 50 Million updates... the other 11 are there, up and running so
> > it should pick-up next and that is it (as long as it is replicating as it
> > should through the HDFS pipelining process). I thought of HBase
> > writes-key-hotspotting or some problem in the Hadoop namenode, so
> checking
> > this out now...
> >
> > I will keep investigating and let you know, in fact my first thought was
> > same as yours too but ./hadoop fsck / is showing all "active" nodes are
> > healthy nodes, and no file-system level inconsistencies are detected
> (first
> > thing I checked before sending the post). Of course running the HBase
> hbck
> > consistency check from the command line behaves differently, missing the
> > mentioned RS in place and throws corresponding exception log that is
> a
> > weird one then... I might check the name node before I get back to you on
> > this. I can't think of anything else as of now. Space is not unlimited,
> yet
> > sufficient in each of the data-nodes (12) but getting close to its limit
> in
> > the mentioned dead RS so yes writes are yet not very balanced but
> > definitely not the issue as I understand.
> >
> >
> > On 5 April 2014 19:16, Esteban Gutierrez  wrote:
> >
> >> Álvaro,
> >>
> >> Have you checked for the health of HDFS? Maybe your cluster ran out of
> >> space or you don't have data nodes running.
> >>
> >> Esteban
> >>
> >> > On Apr 5, 2014, at 10:11, haosdent  wrote:
> >> >
> >> > From the log informations, it seems you lost blocks.
> >> > 2014-4-6 上午12:38于 "Álvaro Recuero" 写道:
> >> >
> >> >> has anyone come across this before? there is still space in the RS
> and
> >> this
> >> >> is not a problem of datanodes availability as I can confirm. cheers
> >> >>
> >> >> 2014-04-05 09:55:19,210 DEBUG
> >> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using
> >> new
> >> >> createWriter -- HADOOP-6840
> >> >> 2014-04-05 09:55:19,211 DEBUG
> >> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
> >> >> Path=hdfs://
> >> >> taurus-5.lyon.grid5000.fr:
> >> >>
> >> >>
> >>
> 9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/2550928.temp,
> >> >> syncFs=true, hflush=false, compressi
> >> >> on=false
> >> >> 2014-04-05 09:55:19,211 DEBUG
> >> >> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating
> writer
> >> >> path=hdfs://taurus-5.lyon.grid5
> >> >>
> >> >>
> >>
> 000.fr:9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/2550928.tempregion=fc55e2d2d4bcec49d6fedf5
> >> >> a46935

[jira] [Resolved] (HBASE-10923) Control where to put meta region

2014-04-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-10923.
-

Resolution: Won't Fix

Close it as Won't Fix. Let's keep meta together with master for now. 

> Control where to put meta region
> 
>
> Key: HBASE-10923
> URL: https://issues.apache.org/jira/browse/HBASE-10923
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>
> There is a concern on placing meta regions on the master, as in the comments 
> of HBASE-10569. I was thinking we should have a configuration for a load 
> balancer to decide where to put it.  Adjusting this configuration we can 
> control whether to put the meta on master, or other region server.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11076) Update refguide on getting 0.94.x to run on hadoop 2.2.0+

2014-04-25 Thread Ted Yu (JIRA)
Ted Yu created HBASE-11076:
--

 Summary: Update refguide on getting 0.94.x to run on hadoop 2.2.0+
 Key: HBASE-11076
 URL: https://issues.apache.org/jira/browse/HBASE-11076
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu


http://hbase.apache.org/book.html#d248e643 contains steps for rebuilding 0.94 
code base to run on hadoop 2.2.0+

However, the files under 
src/main/java/org/apache/hadoop/hbase/protobuf/generated were produced by 
protoc 2.4.0
These files need to be regenerated.

See 
http://search-hadoop.com/m/DHED4j7Um02/HBase+0.94+on+hadoop+2.2.0&subj=Re+HBase+0+94+on+hadoop+2+2+0+2+4+0+

This issue is to update refguide with this regeneration step.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11075) TestVisibilityLabelsWithDistributedLogReplay is failing in Precommit builds frequently

2014-04-25 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-11075:
--

 Summary: TestVisibilityLabelsWithDistributedLogReplay is failing 
in Precommit builds frequently
 Key: HBASE-11075
 URL: https://issues.apache.org/jira/browse/HBASE-11075
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


Latest precommit builds I could see 
TestVisibilityLabelsWithDistributedLogReplay failing frequently.  Need to 
identify the root cause and fix it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Error in RS with 0.94.8

2014-04-25 Thread Álvaro Recuero
Data nodes are fine. Actually the Region server on that serverx is the
solely one dead afterwards. Datanode is up, and HDFS reporting healthy
status. Interesting that is possible.

I have steadily come across the problem again testing a new HBase cluster,
so yes, I would bet the problem is in HDFS somehow. Probably something is
missing yes.

2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block null bad datanode[0] nodes == null
2014-04-24 17:59:30,003 WARN org.apache.hadoop.hdfs.DFSClient: Could not
get block locations. Source file
"/hbase/.logs/serverx,1398350408274/serverx%2C60020%2C1398350408274.1398350409004"
- Aborting...
2014-04-24 17:59:30,003 ERROR
org.apache.hadoop.hbase.regionserver.wal.HLog: syncer encountered error,
will retry. txid=1
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/hbase/.logs/serverx,60020,1398350408274/serverx%2C60020%2C1398350408274.1398350409004
could only be replicated to 0 nodes, instead of 1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.securitWrite failed: Broken pipect.java:416)


On 5 April 2014 21:58, Álvaro Recuero  wrote:

> Yes Esteban  I have checked the health of the datanodes from the master
> in the hadoop console. Nothing seems really wrong to cause this, even
> though one data-node is apparently lost along with the RS in the process of
> inserting 50 Million updates... the other 11 are there, up and running so
> it should pick-up next and that is it (as long as it is replicating as it
> should through the HDFS pipelining process). I thought of HBase
> writes-key-hotspotting or some problem in the Hadoop namenode, so checking
> this out now...
>
> I will keep investigating and let you know, in fact my first thought was
> same as yours too but ./hadoop fsck / is showing all "active" nodes are
> healthy nodes, and no file-system level inconsistencies are detected (first
> thing I checked before sending the post). Of course running the HBase hbck
> consistency check from the command line behaves differently, missing the
> mentioned RS in place and throws corresponding exception log that is a
> weird one then... I might check the name node before I get back to you on
> this. I can't think of anything else as of now. Space is not unlimited, yet
> sufficient in each of the data-nodes (12) but getting close to its limit in
> the mentioned dead RS so yes writes are yet not very balanced but
> definitely not the issue as I understand.
>
>
> On 5 April 2014 19:16, Esteban Gutierrez  wrote:
>
>> Álvaro,
>>
>> Have you checked for the health of HDFS? Maybe your cluster ran out of
>> space or you don't have data nodes running.
>>
>> Esteban
>>
>> > On Apr 5, 2014, at 10:11, haosdent  wrote:
>> >
>> > From the log informations, it seems you lost blocks.
>> > 2014-4-6 上午12:38于 "Álvaro Recuero" 写道:
>> >
>> >> has anyone come across this before? there is still space in the RS and
>> this
>> >> is not a problem of datanodes availability as I can confirm. cheers
>> >>
>> >> 2014-04-05 09:55:19,210 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: using
>> new
>> >> createWriter -- HADOOP-6840
>> >> 2014-04-05 09:55:19,211 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
>> >> Path=hdfs://
>> >> taurus-5.lyon.grid5000.fr:
>> >>
>> >>
>> 9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/2550928.temp,
>> >> syncFs=true, hflush=false, compressi
>> >> on=false
>> >> 2014-04-05 09:55:19,211 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer
>> >> path=hdfs://taurus-5.lyon.grid5
>> >>
>> >>
>> 000.fr:9000/hbase/usertable/fc55e2d2d4bcec49d6fedf5a469353b9/recovered.edits/2550928.tempregion=fc55e2d2d4bcec49d6fedf5
>> >> a469353b9
>> >> 2014-04-05 09:55:19,233 DEBUG
>> >> org.apache.hadoop.hbase.regionserver.SplitLogWorker: tasks arrived or
>> >> departed
>> >> 2014-04-05 09:55:19,233 WARN org.apache.hadoop.hdfs.DFSClient:
>> DataStreamer
>> >> Exception: org.apache.hadoop.ipc.RemoteException: java.i
>> >> o.IOException: File
>> >>
>> >>
>> /hbase/usertable/237859a0b1e47c86c25a6123506ccb2a/recovered.edits/2550921.temp
>> >> could only be replica
>> >> ted to 0 nodes, instead of 1
>> >>at
>> >>
>> >>
>> org.apa