Re: Thrift versions and generated code

2015-11-12 Thread Ted Yu
See https://issues.apache.org/jira/browse/HBASE-14172

On Thu, Nov 12, 2015 at 11:46 AM, Josh Elser  wrote:

> Hi,
>
> In looking at https://issues.apache.org/jira/browse/HBASE-14800, I saw
> that the current libthrift dependency on master was at 0.9.2, but the
> generated code still has the 0.9.0 comments.
>
> Is there a reason for that? Should the libthrift version defined in the
> poms be the de-facto version used by that version of HBase?
>
> Thanks.
>
> - Josh
>


Re: Thrift versions and generated code

2015-11-12 Thread Josh Elser

Ahh, thanks, gentlemen.

Andrew Purtell wrote:

Yeah, let's finish that.


On Thu, Nov 12, 2015 at 11:49 AM, Ted Yu  wrote:


See https://issues.apache.org/jira/browse/HBASE-14172

On Thu, Nov 12, 2015 at 11:46 AM, Josh Elser  wrote:


Hi,

In looking at https://issues.apache.org/jira/browse/HBASE-14800, I saw
that the current libthrift dependency on master was at 0.9.2, but the
generated code still has the 0.9.0 comments.

Is there a reason for that? Should the libthrift version defined in the
poms be the de-facto version used by that version of HBase?

Thanks.

- Josh







[jira] [Reopened] (HBASE-14498) Master stuck in infinite loop when all Zookeeper servers are unreachable

2015-11-12 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-14498:


> Master stuck in infinite loop when all Zookeeper servers are unreachable
> 
>
> Key: HBASE-14498
> URL: https://issues.apache.org/jira/browse/HBASE-14498
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Pankaj Kumar
>Priority: Blocker
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.4
>
> Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch, 
> HBASE-14498-V4.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to 
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] 
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, 
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, 
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. 
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt 
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an 
> > active master. 
> > HM2 is keep on waiting for region server to report him as part of active 
> > master intialization.
> {noformat} 
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 0 ms, 
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval 
> of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 483913 
> ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, 
> interval of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> {noformat}
> > At other end, region servers are reporting to HM1 on 3 sec interval. Here 
> > region server retrieve master location from zookeeper only when they 
> > couldn't connect to Master (ServiceException).
> Region Server will not report HM2 as per current design until unless HM1 
> abort,so HM2 will exit(InitializationMonitor) and again wait for region 
> servers in loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14800) Expose checkAndMutate via Thrift2

2015-11-12 Thread Josh Elser (JIRA)
Josh Elser created HBASE-14800:
--

 Summary: Expose checkAndMutate via Thrift2
 Key: HBASE-14800
 URL: https://issues.apache.org/jira/browse/HBASE-14800
 Project: HBase
  Issue Type: Improvement
  Components: Thrift
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 2.0.0


Had a user ask why checkAndMutate wasn't exposed via Thrift2.

I see no good reason (since checkAndPut and checkAndDelete are already there), 
so let's add it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14803) Add some debug logs to StoreFileScanner

2015-11-12 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-14803:
---

 Summary: Add some debug logs to StoreFileScanner
 Key: HBASE-14803
 URL: https://issues.apache.org/jira/browse/HBASE-14803
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Minor
 Fix For: 1.2.0


To validate some behaviors I had to add some logs into StoreFileScanner.

I think it can be interesting for other people looking for debuging. So sharing 
the modifications here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2015-11-12 Thread Zhan Zhang (JIRA)
Zhan Zhang created HBASE-14801:
--

 Summary: Enhance the Spark-HBase connector catalog with json format
 Key: HBASE-14801
 URL: https://issues.apache.org/jira/browse/HBASE-14801
 Project: HBase
  Issue Type: Improvement
Reporter: Zhan Zhang
Assignee: Zhan Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Please look at HBASE-14803

2015-11-12 Thread Jean-Marc Spaggiari
Very simple one. Just adding some debug logs. Helped me to debug something,
so might help someone else.

Thanks,

JMS


Re: Thrift versions and generated code

2015-11-12 Thread Andrew Purtell
Yeah, let's finish that.


On Thu, Nov 12, 2015 at 11:49 AM, Ted Yu  wrote:

> See https://issues.apache.org/jira/browse/HBASE-14172
>
> On Thu, Nov 12, 2015 at 11:46 AM, Josh Elser  wrote:
>
> > Hi,
> >
> > In looking at https://issues.apache.org/jira/browse/HBASE-14800, I saw
> > that the current libthrift dependency on master was at 0.9.2, but the
> > generated code still has the 0.9.0 comments.
> >
> > Is there a reason for that? Should the libthrift version defined in the
> > poms be the de-facto version used by that version of HBase?
> >
> > Thanks.
> >
> > - Josh
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Thrift versions and generated code

2015-11-12 Thread Josh Elser

Hi,

In looking at https://issues.apache.org/jira/browse/HBASE-14800, I saw 
that the current libthrift dependency on master was at 0.9.2, but the 
generated code still has the 0.9.0 comments.


Is there a reason for that? Should the libthrift version defined in the 
poms be the de-facto version used by that version of HBase?


Thanks.

- Josh


[jira] [Created] (HBASE-14802) Replaying server crash recovery procedure after a failover causes incorrect handling of deadservers

2015-11-12 Thread Ashu Pachauri (JIRA)
Ashu Pachauri created HBASE-14802:
-

 Summary: Replaying server crash recovery procedure after a 
failover causes incorrect handling of deadservers
 Key: HBASE-14802
 URL: https://issues.apache.org/jira/browse/HBASE-14802
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0, 1.2.0, 1.2.1
Reporter: Ashu Pachauri
Assignee: Ashu Pachauri


The way dead servers are processed is that a ServerCrashProcedure is launched 
for a server after it is added to the dead servers list. 
Every time a server is added to the dead list, a counter "numProcessing" is 
incremented and it is decremented when a crash recovery procedure finishes. 
Since, adding a dead server and recovering it are two separate events, it can 
cause inconsistencies.

If a master failover occurs in the middle of the crash recovery, the 
numProcessing counter resets but the ServerCrashProcedure is replayed by the 
new master. This causes the counter to go negative and makes the master think 
that dead servers are still in process of recovery. 
This has ramifications on the balancer that the balancer ceases to run after 
such a failover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14804) HBase shell's create table command ignores 'NORMALIZATION_ENABLED' attribute

2015-11-12 Thread Romil Choksi (JIRA)
Romil Choksi created HBASE-14804:


 Summary: HBase shell's create table command ignores 
'NORMALIZATION_ENABLED' attribute
 Key: HBASE-14804
 URL: https://issues.apache.org/jira/browse/HBASE-14804
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 1.1.2
Reporter: Romil Choksi


I am trying to create a new table and set the NORMALIZATION_ENABLED as true, 
but seems like the argument NORMALIZATION_ENABLED is being ignored. And the 
attribute NORMALIZATION_ENABLED is not displayed on doing a desc command on 
that table
hbase(main):020:0> create 'test-table-4', 'cf', {NORMALIZATION_ENABLED => 
'true'}
An argument ignored (unknown or overridden): NORMALIZATION_ENABLED
0 row(s) in 4.2670 seconds

=> Hbase::Table - test-table-4
hbase(main):021:0> desc 'test-table-4'
Table test-table-4 is ENABLED   


test-table-4


COLUMN FAMILIES DESCRIPTION 


{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}   


1 row(s) in 0.0430 seconds
However, on doing an alter command on that table we can set the 
NORMALIZATION_ENABLED attribute for that table
hbase(main):022:0> alter 'test-table-4', {NORMALIZATION_ENABLED => 'true'}
Unknown argument ignored: NORMALIZATION_ENABLED
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.3640 seconds

hbase(main):023:0> desc 'test-table-4'
Table test-table-4 is ENABLED   


test-table-4, {TABLE_ATTRIBUTES => {NORMALIZATION_ENABLED => 'true'}


COLUMN FAMILIES DESCRIPTION 


{NAME => 'cf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOC
KCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}   


1 row(s) in 0.0190 seconds
I think it would be better to have a single step process to enable 
normalization while creating the table itself, rather than a two step process 
to alter the table later on to enable normalization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14805) status should show the master in shell

2015-11-12 Thread Enis Soztutar (JIRA)
Enis Soztutar created HBASE-14805:
-

 Summary: status should show the master in shell
 Key: HBASE-14805
 URL: https://issues.apache.org/jira/browse/HBASE-14805
 Project: HBase
  Issue Type: Improvement
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 1.2.0, 1.3.0


{{status 'simple'}} or {{'detailed'}} only shows the regionservers and regions, 
but not the active master. Actually, there is no way to know about the active 
masters from the shell it seems. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14807) TestWALLockup is flakey

2015-11-12 Thread stack (JIRA)
stack created HBASE-14807:
-

 Summary: TestWALLockup is flakey
 Key: HBASE-14807
 URL: https://issues.apache.org/jira/browse/HBASE-14807
 Project: HBase
  Issue Type: Bug
  Components: flakey, test
Reporter: stack
Assignee: stack


Fails frequently. 

Looks like this:

{code}
2015-11-12 10:38:51,812 DEBUG [Time-limited test] regionserver.HRegion(3882): 
Found 0 recovered edits file(s) under 
/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d
2015-11-12 10:38:51,821 DEBUG [Time-limited test] 
regionserver.FlushLargeStoresPolicy(56): 
hbase.hregion.percolumnfamilyflush.size.lower.bound is not specified, use 
global config(16777216) instead
2015-11-12 10:38:51,880 DEBUG [Time-limited test] wal.WALSplitter(729): Wrote 
region 
seqId=/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad/data/default/testLockupWhenSyncInMiddleOfZigZagSetup/c8694b53368f3301a8d370089120388d/recovered.edits/2.seqid
 to file, newSeqId=2, maxSeqId=0
2015-11-12 10:38:51,881 INFO  [Time-limited test] regionserver.HRegion(868): 
Onlined c8694b53368f3301a8d370089120388d; next sequenceid=2
2015-11-12 10:38:51,994 ERROR [sync.1] wal.FSHLog$SyncRunner(1226): Error 
syncing, request close of WAL
java.io.IOException: FAKE! Failed to replace a bad datanode...SYNC
at 
org.apache.hadoop.hbase.regionserver.TestWALLockup$1DodgyFSLog$1.sync(TestWALLockup.java:162)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1222)
at java.lang.Thread.run(Thread.java:745)
2015-11-12 10:38:51,997 DEBUG [Thread-4] regionserver.LogRoller(139): WAL roll 
requested
2015-11-12 10:38:52,019 DEBUG [flusher] 
regionserver.FlushLargeStoresPolicy(100): Since none of the CFs were above the 
size, flushing all.
2015-11-12 10:38:52,192 INFO  [Thread-4] 
regionserver.TestWALLockup$1DodgyFSLog(129): LATCHED
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146)
at 
org.apache.hadoop.hbase.regionserver.TestWALLockup.testLockupWhenSyncInMiddleOfZigZagSetup(TestWALLockup.java:245)
2015-11-12 10:39:18,609 INFO  [main] regionserver.TestWALLockup(91): Cleaning 
test directory: 
/home/jenkins/jenkins-slave/workspace/HBase-1.2/jdk/latest1.7/label/Hadoop/hbase-server/target/test-data/8b8f8f12-1819-47e3-b1f1-8ffa789438ad
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.lang.Thread.run(Thread.java:745)

{code}

... then times out after being locked up for 30 seconds.  Writes 50+MB of logs 
while spinning.

Reported as this:

{code}
---
Test set: org.apache.hadoop.hbase.regionserver.TestWALLockup
---
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 198.23 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.regionserver.TestWALLockup
testLockupWhenSyncInMiddleOfZigZagSetup(org.apache.hadoop.hbase.regionserver.TestWALLockup)
  Time elapsed: 0.049 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds
at org.apache.log4j.Category.callAppenders(Category.java:205)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at 
org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:155)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1386)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1352)
at 

Re: [VOTE] First release candidate for HBase 1.1.3 (RC0) is available

2015-11-12 Thread Nick Dimiduk
Thank you Josh for taking time to evaluate this release candidate. A
reminder to others that the voting period is scheduled to expire in roughly
24 hours. I myself have not had time this week to evaluate this candidate
properly, so I would like to extend the voting period through the weekend.
If there are no objections I will tally the vote at 23:59 on Sunday,
November 15, Pacific time.

Thanks,
Nick

On Tuesday, November 10, 2015, Josh Elser  wrote:

> +1 (non-binding)
>
> * Built from source
> * Ran tests (-PrunDevTests). o.a.h.h.r.TestRegionServerHostname was
> problematic, might have just been me.
> * Checked sigs/xsums
> * Checked the compat report (thanks for posting it, Nick)
> * Skimmed release notes looking for anything that might introduce new deps
> for licensing concerns (found none)
>
> Nick Dimiduk wrote:
>
>> I'm happy to announce the first release candidate of HBase 1.1.3
>> (HBase-1.1.
>> 3RC0) is available for download at
>> https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.3RC0/
>>
>> Maven artifacts are also available in the staging repository
>> https://repository.apache.org/content/repositories/orgapachehbase-1117
>>
>> Artifacts are signed with my code signing subkey 0xAD9039071C3489BD,
>> available in the Apache keys directory
>> https://people.apache.org/keys/committer/ndimiduk.asc
>>
>> There's also a signed tag for this release at
>>
>> https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=16e905679e2dd5cb1b05ca8bc34a403e154a395f
>>
>> The detailed source and binary compatibility report vs 1.1.0 has been
>> published for your review, at
>> http://people.apache.org/~ndimiduk/1.1.0_1.1.3RC0_compat_report.html
>>
>> HBase 1.1.3 is the third patch release in the HBase 1.1 line, continuing
>> on
>> the theme of bringing a stable, reliable database to the Hadoop and NoSQL
>> communities. This release includes over 120 bug fixes since the 1.1.2
>> release. Notable correctness fixes
>> include HBASE-14474, HBASE-14591, HBASE-14224,
>> HBASE-14431, HBASE-14407, HBASE-14313, HBASE-14621, HBASE-14501, and
>> HBASE-13250.
>>
>> The full list of fixes included in this release is available at
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753=12333152
>>   and in the CHANGES.txt file included in the distribution.
>>
>> Please try out this candidate and vote +/-1 by 23:59 Pacific time on
>> Friday, 2015-11-13 as to whether we should release these artifacts as
>> HBase
>> 1.1.3.
>>
>> Thanks,
>> Nick
>>
>>