[jira] [Commented] (HBASE-9410) Concurrent coprocessor endpoint executions slow down exponentially

2013-09-01 Thread Kirubakaran Pakkirisamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755815#comment-13755815
 ] 

Kirubakaran Pakkirisamy commented on HBASE-9410:


Ofcourse the latency is affected by the number of regions of the table and the 
total # of regions being used on the server. I expect a degradation, but not 
exponential. 

> Concurrent coprocessor endpoint executions slow down exponentially
> --
>
> Key: HBASE-9410
> URL: https://issues.apache.org/jira/browse/HBASE-9410
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Affects Versions: 0.94.11
> Environment: Amazon ec2
>Reporter: Kirubakaran Pakkirisamy
> Attachments: jstack1.log, jstack2.log, jstack3.log, jstack.log, 
> SearchEndpoint.java, Search.java, SearchProtocol.java
>
>
> Multiple concurrent executions of coprocessor endpoints slow down 
> drastically. It is compounded further when there are more Htable connection 
> setups happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9410) Concurrent coprocessor endpoint executions slow down exponentially

2013-09-01 Thread Kirubakaran Pakkirisamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirubakaran Pakkirisamy updated HBASE-9410:
---

Attachment: jstack1.log
jstack2.log
jstack3.log

Andrew, I have set the rpc handler count to 10, the default and re ran the test 
case. I have attached 3 jstacks taken during the run.

> Concurrent coprocessor endpoint executions slow down exponentially
> --
>
> Key: HBASE-9410
> URL: https://issues.apache.org/jira/browse/HBASE-9410
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Affects Versions: 0.94.11
> Environment: Amazon ec2
>Reporter: Kirubakaran Pakkirisamy
> Attachments: jstack1.log, jstack2.log, jstack3.log, jstack.log, 
> SearchEndpoint.java, Search.java, SearchProtocol.java
>
>
> Multiple concurrent executions of coprocessor endpoints slow down 
> drastically. It is compounded further when there are more Htable connection 
> setups happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6581) Build with hadoop.profile=3.0

2013-09-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755811#comment-13755811
 ] 

stack commented on HBASE-6581:
--

I'd imagine we'd get the method once on startup and then it would never change 
during the life of the WAL?

bq. FSHLog#syncer has some synchronized block, but still one call to 
PBLW.sync() is not in a synchronized block. so this maybe a problem. I guess we 
can't change like that the unsynchronized code block FSHLog (line 1068) without 
side-effect (the logic seems there quite tricky).

I'm not the expert here.   The less synchronization hereabouts the better but 
current state comes of study by others.

Copy/paste is probably fine especially if you feel it 'safer'.

> Build with hadoop.profile=3.0
> -
>
> Key: HBASE-6581
> URL: https://issues.apache.org/jira/browse/HBASE-6581
> Project: HBase
>  Issue Type: Bug
>Reporter: Eric Charles
>Assignee: Eric Charles
> Attachments: HBASE-6581-1.patch, HBASE-6581-20130821.patch, 
> HBASE-6581-2.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, HBASE-6581.diff, 
> HBASE-6581.diff
>
>
> Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to 
> change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT 
> instead of 3.0.0-SNAPSHOT in hbase-common).
> I can provide a patch that would move most of hadoop dependencies in their 
> respective profiles and will define the correct hadoop deps in the 3.0 
> profile.
> Please tell me if that's ok to go this way.
> Thx, Eric
> [1]
> $ mvn clean install -Dhadoop.profile=3.0
> [INFO] Scanning for projects...
> [ERROR] The build could not read 3 projects -> [Help 1]
> [ERROR]   
> [ERROR]   The project org.apache.hbase:hbase-server:0.95-SNAPSHOT 
> (/d/hbase.svn/hbase-server/pom.xml) has 3 errors
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21
> [ERROR]   
> [ERROR]   The project org.apache.hbase:hbase-common:0.95-SNAPSHOT 
> (/d/hbase.svn/hbase-common/pom.xml) has 3 errors
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21
> [ERROR]   
> [ERROR]   The project org.apache.hbase:hbase-it:0.95-SNAPSHOT 
> (/d/hbase.svn/hbase-it/pom.xml) has 3 errors
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21
> [ERROR] 'dependencies.dependency.version' for 
> org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21
> [ERROR] 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9314) Dropping a table always prints a TableInfoMissingException in the master log

2013-09-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755810#comment-13755810
 ] 

stack commented on HBASE-9314:
--

+1 on commit to 0.96 branch and trunk.

> Dropping a table always prints a TableInfoMissingException in the master log
> 
>
> Key: HBASE-9314
> URL: https://issues.apache.org/jira/browse/HBASE-9314
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.95.2, 0.94.10
>Reporter: Jean-Daniel Cryans
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 0.98.0, 0.94.12, 0.96.0
>
> Attachments: 9314-0.94.patch, 9314.patch
>
>
> Everytime I drop a table I get the same stack trace in the master's log:
> {noformat}
> 2013-08-22 23:11:31,939 DEBUG 
> [MASTER_TABLE_OPERATIONS-jdec2hbase0403-1:6-0] 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Table 't' archived!
> 2013-08-22 23:11:31,939 DEBUG 
> [MASTER_TABLE_OPERATIONS-jdec2hbase0403-1:6-0] 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Removing 't' 
> descriptor.
> 2013-08-22 23:11:31,940 DEBUG 
> [MASTER_TABLE_OPERATIONS-jdec2hbase0403-1:6-0] 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Marking 't' as 
> deleted.
> 2013-08-22 23:11:31,944 DEBUG 
> [MASTER_TABLE_OPERATIONS-jdec2hbase0403-1:6-0] 
> org.apache.hadoop.hbase.zookeeper.lock.ZKInterProcessLockBase: Released 
> /hbase/table-lock/t/write-master:602
> 2013-08-22 23:11:32,024 DEBUG [RpcServer.handler=0,port=6] 
> org.apache.hadoop.hbase.util.FSTableDescriptors: Exception during 
> readTableDecriptor. Current table name = t
> org.apache.hadoop.hbase.TableInfoMissingException: No table descriptor file 
> under hdfs://jdec2hbase0403-1.vpc.cloudera.com:9000/hbase/data/default/t
>   at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorAndModtime(FSTableDescriptors.java:503)
>   at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorAndModtime(FSTableDescriptors.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:170)
>   at 
> org.apache.hadoop.hbase.master.HMaster.getTableDescriptors(HMaster.java:2629)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterMonitorProtos$MasterMonitorService$2.callBlockingMethod(MasterMonitorProtos.java:4634)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>   at 
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1861)
> 2013-08-22 23:11:32,024 WARN  [RpcServer.handler=0,port=6] 
> org.apache.hadoop.hbase.util.FSTableDescriptors: The following folder is in 
> HBase's root directory and doesn't contain a table descriptor, do consider 
> deleting it: t
> {noformat}
> But the operation completes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9410) Concurrent coprocessor endpoint executions slow down exponentially

2013-09-01 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755805#comment-13755805
 ] 

Andrew Purtell commented on HBASE-9410:
---

Looks like you configured 1000 IPC handlers? This should be set to 
approximately the number of cores and spindles of the server hardware. 

Scrolling through some of that 7 MB log (can you attach only one jstack you 
think is relevant?) I don't see any IPC handlers doing any work, but it's too 
big to look at in total.

> Concurrent coprocessor endpoint executions slow down exponentially
> --
>
> Key: HBASE-9410
> URL: https://issues.apache.org/jira/browse/HBASE-9410
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Affects Versions: 0.94.11
> Environment: Amazon ec2
>Reporter: Kirubakaran Pakkirisamy
> Attachments: jstack.log, SearchEndpoint.java, Search.java, 
> SearchProtocol.java
>
>
> Multiple concurrent executions of coprocessor endpoints slow down 
> drastically. It is compounded further when there are more Htable connection 
> setups happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9397) Snapshots with the same name are allowed to proceed concurrently

2013-09-01 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755798#comment-13755798
 ] 

Matteo Bertozzi commented on HBASE-9397:


+1, v2 looks good to me.
I think that our standard is: always use {} even if is a single line. But I can 
fix that on commit.

I'll commit the patch tomorrow if no other comments.

> Snapshots with the same name are allowed to proceed concurrently
> 
>
> Key: HBASE-9397
> URL: https://issues.apache.org/jira/browse/HBASE-9397
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.95.2, 0.94.11
>Reporter: Jerry He
>Assignee: Jerry He
> Fix For: 0.94.12, 0.96.0
>
> Attachments: HBASE-9397-0.94.patch, HBASE-9397-0.94-v2.patch, 
> HBASE-9397-trunk.patch, HBASE-9397-trunk-v2.patch
>
>
> Snapshots with the same name (but on different tables) are allowed to proceed 
> concurrently.
> This seems to be loop hole created by allowing multiple snapshots (on 
> different tables) to run concurrently.
> There are two checks in SnapshotManager, but fail to catch this particular 
> case.
> In isSnapshotCompleted(), we only check the completed snapshot directory.
> In isTakingSnapshot(), we only check for the same table name.
> The end result is the concurrently running snapshots with the same name are 
> overlapping and messing up each other. For example, cleaning up the other's 
> snapshot working directory in .hbase-snapshot/.tmp/snapshot-name.
> {code}
> 2013-08-29 18:25:13,443 ERROR 
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed taking 
> snapshot { ss=mysnapshot table=TestTable type=FLUSH } due to 
> exception:Couldn't read snapshot info 
> from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo
> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
> snapshot info 
> from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo
> at 
> org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:321)
> at 
> org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshotDescription(MasterSnapshotVerifier.java:123)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9410) Concurrent coprocessor endpoint executions slow down exponentially

2013-09-01 Thread Kirubakaran Pakkirisamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirubakaran Pakkirisamy updated HBASE-9410:
---

Attachment: jstack.log

Attaching jstack output at one run done for taking the jstack output

> Concurrent coprocessor endpoint executions slow down exponentially
> --
>
> Key: HBASE-9410
> URL: https://issues.apache.org/jira/browse/HBASE-9410
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Affects Versions: 0.94.11
> Environment: Amazon ec2
>Reporter: Kirubakaran Pakkirisamy
> Attachments: jstack.log, SearchEndpoint.java, Search.java, 
> SearchProtocol.java
>
>
> Multiple concurrent executions of coprocessor endpoints slow down 
> drastically. It is compounded further when there are more Htable connection 
> setups happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9410) Concurrent coprocessor endpoint executions slow down exponentially

2013-09-01 Thread Kirubakaran Pakkirisamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirubakaran Pakkirisamy updated HBASE-9410:
---

Attachment: SearchProtocol.java
SearchEndpoint.java
Search.java

Attached files which demonstrate the problem. The Thread.sleep in the client 
allows for clients to have created the HTable. It then loops for say, 50 times. 
What is usually 10-20msec suddenly jumps to few hundreds and some in thousands. 
This is with 32 conncurrent connections to a 4 node ec2 cluster with 32 cores 
in total

> Concurrent coprocessor endpoint executions slow down exponentially
> --
>
> Key: HBASE-9410
> URL: https://issues.apache.org/jira/browse/HBASE-9410
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Affects Versions: 0.94.11
> Environment: Amazon ec2
>Reporter: Kirubakaran Pakkirisamy
> Attachments: SearchEndpoint.java, Search.java, SearchProtocol.java
>
>
> Multiple concurrent executions of coprocessor endpoints slow down 
> drastically. It is compounded further when there are more Htable connection 
> setups happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9410) Concurrent coprocessor endpoint executions slow down exponentially

2013-09-01 Thread Kirubakaran Pakkirisamy (JIRA)
Kirubakaran Pakkirisamy created HBASE-9410:
--

 Summary: Concurrent coprocessor endpoint executions slow down 
exponentially
 Key: HBASE-9410
 URL: https://issues.apache.org/jira/browse/HBASE-9410
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors
Affects Versions: 0.94.11
 Environment: Amazon ec2
Reporter: Kirubakaran Pakkirisamy


Multiple concurrent executions of coprocessor endpoints slow down drastically. 
It is compounded further when there are more Htable connection setups happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9295) Allow test-patch.sh to detect TreeMap keyed by byte[] which doesn't use proper comparator

2013-09-01 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-9295:
--

Fix Version/s: 0.98.0

> Allow test-patch.sh to detect TreeMap keyed by byte[] which doesn't use 
> proper comparator
> -
>
> Key: HBASE-9295
> URL: https://issues.apache.org/jira/browse/HBASE-9295
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
> Fix For: 0.98.0
>
>
> There were two recent bug fixes (HBASE-9285 and HBASE-9238) for the case 
> where the TreeMap keyed by byte[] doesn't use proper comparator:
> {code}
> new TreeMap()
> {code}
> test-patch.sh should be able to detect this situation and report accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9314) Dropping a table always prints a TableInfoMissingException in the master log

2013-09-01 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755690#comment-13755690
 ] 

Andrew Purtell commented on HBASE-9314:
---

All trunk and 0.96 tests pass, but 0.94 reports this:

{noformat}
Tests run: 34, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 232.361 sec 
<<< FAILURE!
testHbckFixOrphanTable(org.apache.hadoop.hbase.util.TestHBaseFsck)  Time 
elapsed: 3.031 sec  <<< FAILURE!
java.lang.AssertionError: expected:<[NO_TABLEINFO_FILE]> but was:<[]>
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.hadoop.hbase.util.hbck.HbckTestingUtil.assertErrors(HbckTestingUtil.java:88)
at 
org.apache.hadoop.hbase.util.TestHBaseFsck.testHbckFixOrphanTable(TestHBaseFsck.java:433)
{noformat}

Looks like after this change hbck needs a fixup, likely is expecting 
TableInfoMissingException somewhere.

> Dropping a table always prints a TableInfoMissingException in the master log
> 
>
> Key: HBASE-9314
> URL: https://issues.apache.org/jira/browse/HBASE-9314
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.95.2, 0.94.10
>Reporter: Jean-Daniel Cryans
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 0.98.0, 0.94.12, 0.96.0
>
> Attachments: 9314-0.94.patch, 9314.patch
>
>
> Everytime I drop a table I get the same stack trace in the master's log:
> {noformat}
> 2013-08-22 23:11:31,939 DEBUG 
> [MASTER_TABLE_OPERATIONS-jdec2hbase0403-1:6-0] 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Table 't' archived!
> 2013-08-22 23:11:31,939 DEBUG 
> [MASTER_TABLE_OPERATIONS-jdec2hbase0403-1:6-0] 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Removing 't' 
> descriptor.
> 2013-08-22 23:11:31,940 DEBUG 
> [MASTER_TABLE_OPERATIONS-jdec2hbase0403-1:6-0] 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Marking 't' as 
> deleted.
> 2013-08-22 23:11:31,944 DEBUG 
> [MASTER_TABLE_OPERATIONS-jdec2hbase0403-1:6-0] 
> org.apache.hadoop.hbase.zookeeper.lock.ZKInterProcessLockBase: Released 
> /hbase/table-lock/t/write-master:602
> 2013-08-22 23:11:32,024 DEBUG [RpcServer.handler=0,port=6] 
> org.apache.hadoop.hbase.util.FSTableDescriptors: Exception during 
> readTableDecriptor. Current table name = t
> org.apache.hadoop.hbase.TableInfoMissingException: No table descriptor file 
> under hdfs://jdec2hbase0403-1.vpc.cloudera.com:9000/hbase/data/default/t
>   at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorAndModtime(FSTableDescriptors.java:503)
>   at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorAndModtime(FSTableDescriptors.java:496)
>   at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:170)
>   at 
> org.apache.hadoop.hbase.master.HMaster.getTableDescriptors(HMaster.java:2629)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterMonitorProtos$MasterMonitorService$2.callBlockingMethod(MasterMonitorProtos.java:4634)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>   at 
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1861)
> 2013-08-22 23:11:32,024 WARN  [RpcServer.handler=0,port=6] 
> org.apache.hadoop.hbase.util.FSTableDescriptors: The following folder is in 
> HBase's root directory and doesn't contain a table descriptor, do consider 
> deleting it: t
> {noformat}
> But the operation completes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9393) Hbase dose not closing a closed socket resulting in many CLOSE_WAIT

2013-09-01 Thread Avi Zrachya (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755675#comment-13755675
 ] 

Avi Zrachya commented on HBASE-9393:


Yes, it is most definetly the regionserver as you can see below.
pid 21592 is the pid the holding the CLOSE_WAIT socket and as you can see 21592 
is the regionserver.

{code}
[root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l
13156
[root@hd2-region3 ~]# ps -ef |grep 21592
root 17255 17219 0 12:26 pts/0 00:00:00 grep 21592
hbase 21592 1 17 Aug29 ? 03:29:06 /usr/java/jdk1.6.0_26/bin/java 
-XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m -ea -XX:+UseConcMarkSweepGC 
-XX:+CMSIncrementalMode -Dhbase.log.dir=/var /log/hbase 
-Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ..
{code}

> Hbase dose not closing a closed socket resulting in many CLOSE_WAIT 
> 
>
> Key: HBASE-9393
> URL: https://issues.apache.org/jira/browse/HBASE-9393
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.2
> Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, 
> 7279 regions
>Reporter: Avi Zrachya
>
> HBase dose not close a dead connection with the datanode.
> This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect 
> to the datanode because too many mapped sockets from one host to another on 
> the same port.
> The example below is with low CLOSE_WAIT count because we had to restart 
> hbase to solve the porblem, later in time it will incease to 60-100K sockets 
> on CLOSE_WAIT
> [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l
> 13156
> [root@hd2-region3 ~]# ps -ef |grep 21592
> root 17255 17219  0 12:26 pts/000:00:00 grep 21592
> hbase21592 1 17 Aug29 ?03:29:06 
> /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m 
> -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
> -Dhbase.log.dir=/var/log/hbase 
> -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira