[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181192#comment-13181192
 ] 

Hadoop QA commented on HBASE-5088:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509671/5088-syncObj.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 79 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/684//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/684//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/684//console

This message is automatically generated.

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 5088-syncObj.txt, 5088-useMapInterfaces.txt, 
> 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, 
> HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time

2012-01-06 Thread ramkrishna.s.vasudevan (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181195#comment-13181195
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-4397 at 1/6/12 8:14 AM:
---

Resolving as  committed to trunk and 0.92.

  was (Author: ram_krish):
Resolving as  committed to trunk and 0.92 long back.
  
> -ROOT-, .META. tables stay offline for too long in recovery phase after all 
> RSs are shutdown at the same time
> -
>
> Key: HBASE-4397
> URL: https://issues.apache.org/jira/browse/HBASE-4397
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4397-0.92.patch
>
>
> 1. Shutdown all RSs.
> 2. Bring all RS back online.
> The "-ROOT-", ".META." stay in offline state until timeout monitor force 
> assignment 30 minutes later. That is because HMaster can't find a RS to 
> assign the tables to in assign operation.
> 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
> Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
> trying to assign elsewhere instead; retry=0
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
> at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
> at 
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
> at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
> at $Proxy9.openRegion(Unknown Source)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2011-09-13 13:25:52,743 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
> location to assign region -ROOT-,,0.70236052
> Possible fixes:
> 1. Have serverManager handle "server online" event similar to how 
> RegionServerTracker.java calls servermanager.expireServer in the case server 
> goes down.
> 2. Make timeoutMonitor handle the situation better. This is a special 
> situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4397) -ROOT-, .META. tables stay offline for too long in recovery phase after all RSs are shutdown at the same time

2012-01-06 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4397:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolving as  committed to trunk and 0.92 long back.

> -ROOT-, .META. tables stay offline for too long in recovery phase after all 
> RSs are shutdown at the same time
> -
>
> Key: HBASE-4397
> URL: https://issues.apache.org/jira/browse/HBASE-4397
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4397-0.92.patch
>
>
> 1. Shutdown all RSs.
> 2. Bring all RS back online.
> The "-ROOT-", ".META." stay in offline state until timeout monitor force 
> assignment 30 minutes later. That is because HMaster can't find a RS to 
> assign the tables to in assign operation.
> 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
> Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
> trying to assign elsewhere instead; retry=0
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
> at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
> at 
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
> at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
> at $Proxy9.openRegion(Unknown Source)
> at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> 2011-09-13 13:25:52,743 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
> location to assign region -ROOT-,,0.70236052
> Possible fixes:
> 1. Have serverManager handle "server online" event similar to how 
> RegionServerTracker.java calls servermanager.expireServer in the case server 
> goes down.
> 2. Make timeoutMonitor handle the situation better. This is a special 
> situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5125) Upgrade hadoop to 1.0.0

2012-01-06 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181204#comment-13181204
 ] 

Hudson commented on HBASE-5125:
---

Integrated in HBase-0.92-security #63 (See 
[https://builds.apache.org/job/HBase-0.92-security/63/])
HBASE-5125 Upgrade hadoop to 1.0.0

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/pom.xml


> Upgrade hadoop to 1.0.0
> ---
>
> Key: HBASE-5125
> URL: https://issues.apache.org/jira/browse/HBASE-5125
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: 5125.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181205#comment-13181205
 ] 

Hudson commented on HBASE-5081:
---

Integrated in HBase-0.92-security #63 (See 
[https://builds.apache.org/job/HBase-0.92-security/63/])
HBASE-5081  Distributed log splitting deleteNode races against splitLog 
retry (Prakash)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKSplitLog.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java


> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 5081-deleteNode-with-while-loop.txt, 
> HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, 
> distributed_log_splitting_screen_shot2.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181217#comment-13181217
 ] 

Jieshan Bean commented on HBASE-5088:
-

If the new instance of SoftValueSortedMap hold a reference of "sync", then the 
problem can also be solved. It's a good approach:)

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 5088-syncObj.txt, 5088-useMapInterfaces.txt, 
> 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, 
> HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5135) TableMapReduceUtil should be using the DistributedCache API, not the 'tmpjars' config directly.

2012-01-06 Thread Harsh J (Created) (JIRA)
TableMapReduceUtil should be using the DistributedCache API, not the 'tmpjars' 
config directly.
---

 Key: HBASE-5135
 URL: https://issues.apache.org/jira/browse/HBASE-5135
 Project: HBase
  Issue Type: Improvement
Reporter: Harsh J


The jar adding methods of TableMapReduceUtil seem to be bypassing the 
DistributedCache API by plugging in the jar lists themselves to the actual 
property. This is not a good practice and must be avoided if possible.

_Observed during HBASE-3274_.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181229#comment-13181229
 ] 

ramkrishna.s.vasudevan commented on HBASE-4357:
---

@Ming Ma
+1 on patch.  
When we started discussion timeout monitor was not doing anything for unassign.
But now it forcefully does an unassign also. :)


> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5123) Provide more aggregate functions for Aggregations Protocol

2012-01-06 Thread Royston Sellman (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181366#comment-13181366
 ] 

Royston Sellman commented on HBASE-5123:


Re: 5123 I have also had some time to think about other aggregation functions 
(Please be aware that I am new to HBase, Coprocessors, and the Aggregation 
Protocol and I have little knowledge of distributed numerical algorithms!). It 
seems to me the pattern in AP is to return a SINGLE value from a SINGLE column 
(CF:CQ) of a table. In future one might wish to extend AP to return MULTIPLE 
values from MULTIPLE columns, so it is good to keep this in mind for the SINGLE 
value/SINGLE column (SVSC) case. 

So, common SVSC aggregation functions:
currently supported:
min
max
sum
count
avg (arithmetic mean)
std

not currently supported:
median
mode 
quantile/ntile
mult/product

for column values of all numeric types, returning values of that type. Current 
support is only for Long type.

Some thoughts on the future possibilities:
An example of a future SINGLE value MULTIPLE column use case could be weighted 
versions of the above functions i.e. a column of weights applied to the column 
of values then the new aggregation derived.
(note: there is a very good description of Weighted Median in the R language 
documentation:
http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/weighted.median.html)

An example of future MULTIPLE value SINGLE column could be range: return all 
rows with a column value between two values. Maybe this is a bad example 
because there could be better HBase ways to do it with filters/scans at a 
higher level. Perhaps binning is a better example? i.e. return an array 
containing values derived from applying one of the SVSC functions to a binned 
column e.g:
int bins = 100;
aClient.sum(table, ci, scan, bins); => {12.3, 14.5...}
Another example (common in several programming languages) is to map an 
arbitrary function over a column and return the new vector. Of course, again 
this may be a bad example in the case of long HBase columns but it seems like 
an appropriate thing to do with coprocessors.

MULTIPLE value MULTIPLE column examples are common in spatial data processing 
but I see there has been a lot of spatial/GIS discussion around HBase which I 
have not read yet. So I'll keep quiet for now.

I hope these thoughts strike a balance between my (special interest) use case 
of statistical/spatial functions on tables and general purpose (but coprocessor 
enabled/regionserver distributed) HBase.


> Provide more aggregate functions for Aggregations Protocol
> --
>
> Key: HBASE-5123
> URL: https://issues.apache.org/jira/browse/HBASE-5123
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhihong Yu
>
> Royston requested the following aggregates on top of what we already have:
> Median, Weighted Median, Mult
> See discussion entitled 'AggregateProtocol Help' on user list

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-01-06 Thread Zhihong Yu (Created) (JIRA)
Redundant MonitoredTask instances in case of distributed log splitting retry


 Key: HBASE-5136
 URL: https://issues.apache.org/jira/browse/HBASE-5136
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu


In case of log splitting retry, the following code would be executed multiple 
times:
{code}
  public long splitLogDistributed(final List logDirs) throws IOException {
MonitoredTask status = TaskMonitor.get().createStatus(
  "Doing distributed log split in " + logDirs);
{code}
leading to multiple MonitoredTask instances.

User may get confused by multiple distributed log splitting entries for the 
same region server on master UI


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181396#comment-13181396
 ] 

stack commented on HBASE-5081:
--

@Jimmy After restart, can you get it to hang?  Lets open new issue if you can.  
How are you testing out of interest so I can try it over here.  Thanks.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 5081-deleteNode-with-while-loop.txt, 
> HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, 
> distributed_log_splitting_screen_shot2.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5081:
---

Attachment: distributed_log_splitting_screenshot3.png

After restart, it still doesn't work.  See the attached 3rd screen shot.  

Probably we should commit this one and open a new Jira.

@Stack,  to reproduce it, you can set these properties and run bigtop 
TestLoadAndVerify: 

  
hbase.hregion.max.filesize
1048576
  
  
hbase.master.distributed.log.splitting
true
  

  
io.file.buffer.size
131072
Hadoop setting 
  
  
hbase.balancer.period

2000
Period at which the region balancer runs in the Master.

  
  
hbase.hregion.memstore.flush.size
262144 
  


> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 5081-deleteNode-with-while-loop.txt, 
> HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, 
> distributed_log_splitting_screen_shot2.png, 
> distributed_log_splitting_screenshot3.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181405#comment-13181405
 ] 

Hadoop QA commented on HBASE-5081:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12509690/distributed_log_splitting_screenshot3.png
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/685//console

This message is automatically generated.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 5081-deleteNode-with-while-loop.txt, 
> HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, 
> distributed_log_splitting_screen_shot2.png, 
> distributed_log_splitting_screenshot3.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181407#comment-13181407
 ] 

Jimmy Xiang commented on HBASE-5081:


It turns out all my region servers died.  I restarted them (rs) all, and things 
are looking better now.  One folder is completed. Two more to go.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 5081-deleteNode-with-while-loop.txt, 
> HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, 
> distributed_log_splitting_screen_shot2.png, 
> distributed_log_splitting_screenshot3.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181425#comment-13181425
 ] 

stack commented on HBASE-5081:
--

@Jimmy And can you make it hang subsequently?  Thanks for the prescription on 
how to repo.  Will try it...

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 5081-deleteNode-with-while-loop.txt, 
> HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, 
> distributed_log_splitting_screen_shot2.png, 
> distributed_log_splitting_screenshot3.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4357:
--

Affects Version/s: 0.92.0
Fix Version/s: 0.94.0
   0.92.0

> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4357:
--

Status: Patch Available  (was: Open)

> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException

2012-01-06 Thread ramkrishna.s.vasudevan (Created) (JIRA)
MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws 
IOException


 Key: HBASE-5137
 URL: https://issues.apache.org/jira/browse/HBASE-5137
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


I am not sure if this bug was already raised in JIRA.
In our test cluster we had a scenario where the RS had gone down and 
ServerShutDownHandler started with splitLog.
But as the HDFS was down the check waitOnSafeMode throws IOException.
{code}
try {
// If FS is in safe mode, just wait till out of it.
FSUtils.waitOnSafeMode(conf,
  conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
splitter.splitLog();
  } catch (OrphanHLogAfterSplitException e) {
{code}
We catch the exception
{code}
} catch (IOException e) {
  checkFileSystem();
  LOG.error("Failed splitting " + logDir.toString(), e);
}
{code}
So the HLog split itself did not happen. We encontered like 4 regions that was 
recently splitted in the crashed RS was lost.

Can we abort the Master in such scenarios? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181437#comment-13181437
 ] 

Hadoop QA commented on HBASE-4357:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509667/HBASE-4357-0.92.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/686//console

This message is automatically generated.

> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181438#comment-13181438
 ] 

Zhihong Yu commented on HBASE-4357:
---

+1 on patch, if tests pass.

Minor comment:
{code}
-   * @return Transition znode to CLOSED state.
+   * @return if Transition znode to RS_ZK_REGION_FAILED_OPEN state succeeds or
+   *  not
{code}
The above should read '@return whether znode transition to ...'

> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181440#comment-13181440
 ] 

Zhihong Yu commented on HBASE-4357:
---

Please fix the following and submit new patch:
{code}
Hunk #6 FAILED at 111.
2 out of 6 hunks FAILED -- saving rejects to file 
src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java.rej
{code}

> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181442#comment-13181442
 ] 

Zhihong Yu commented on HBASE-5137:
---

Thanks for reporting this, Ram.
This has been fixed in TRUNK:
{code}
  } catch (IOException ioe) {
LOG.warn("Failed splitting of " + serverNames, ioe);
if (!checkFileSystem()) {
  LOG.warn("Bad Filesystem, exiting");
  Runtime.getRuntime().halt(1);
}
{code}
So answer to your question is: yes.

> MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws 
> IOException
> 
>
> Key: HBASE-5137
> URL: https://issues.apache.org/jira/browse/HBASE-5137
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> I am not sure if this bug was already raised in JIRA.
> In our test cluster we had a scenario where the RS had gone down and 
> ServerShutDownHandler started with splitLog.
> But as the HDFS was down the check waitOnSafeMode throws IOException.
> {code}
> try {
> // If FS is in safe mode, just wait till out of it.
> FSUtils.waitOnSafeMode(conf,
>   conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
> splitter.splitLog();
>   } catch (OrphanHLogAfterSplitException e) {
> {code}
> We catch the exception
> {code}
> } catch (IOException e) {
>   checkFileSystem();
>   LOG.error("Failed splitting " + logDir.toString(), e);
> }
> {code}
> So the HLog split itself did not happen. We encontered like 4 regions that 
> was recently splitted in the crashed RS was lost.
> Can we abort the Master in such scenarios? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181451#comment-13181451
 ] 

Jimmy Xiang commented on HBASE-5081:


Now, all logs are split. I am happy with the patch.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 5081-deleteNode-with-while-loop.txt, 
> HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, 
> distributed_log_splitting_screen_shot2.png, 
> distributed_log_splitting_screenshot3.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4224) Need a flush by regionserver rather than by table option

2012-01-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181456#comment-13181456
 ] 

jirapos...@reviews.apache.org commented on HBASE-4224:
--



bq.  On 2011-12-25 17:53:31, Ted Yu wrote:
bq.  > /src/main/java/org/apache/hadoop/hbase/ServerName.java, line 277
bq.  > 
bq.  >
bq.  > I think we should perform stricter checking on hostname, without 
using DNS.
bq.  > See 
http://regexlib.com/DisplayPatterns.aspx?cattabindex=1&categoryId=2&AspxAutoDetectCookieSupport=1

I am assuming hostname can be either a DNS address or IPv4 address. Is IPv6 
supported yet ?


- Akash


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3308/#review4116
---


On 2011-12-24 04:31:50, Akash  Ashok wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3308/
bq.  ---
bq.  
bq.  (Updated 2011-12-24 04:31:50)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Flush by RegionServer
bq.  
bq.  
bq.  This addresses bug HBase-4224.
bq.  https://issues.apache.org/jira/browse/HBase-4224
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/ServerName.java 1222902 
bq./src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1222902 
bq./src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1222902 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
1222902 
bq.  
bq.  Diff: https://reviews.apache.org/r/3308/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Akash
bq.  
bq.



> Need a flush by regionserver rather than by table option
> 
>
> Key: HBASE-4224
> URL: https://issues.apache.org/jira/browse/HBASE-4224
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: stack
>Assignee: Akash Ashok
> Attachments: HBase-4224-v2.patch, HBase-4224.patch
>
>
> This evening needed to clean out logs on the cluster.  logs are by 
> regionserver.  to let go of logs, we need to have all edits emptied from 
> memory.  only flush is by table or region.  We need to be able to flush the 
> regionserver.  Need to add this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181457#comment-13181457
 ] 

Jimmy Xiang commented on HBASE-5081:


@Stack, yes, it will screw up the cluster (7 nodes).

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 5081-deleteNode-with-while-loop.txt, 
> HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, 
> distributed_log_splitting_screen_shot2.png, 
> distributed_log_splitting_screenshot3.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4224) Need a flush by regionserver rather than by table option

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181461#comment-13181461
 ] 

Zhihong Yu commented on HBASE-4224:
---

Advice to HBase users is to turn off IPv6. e.g. see thread entitled 'hbase 
master is not starting @ 60010 on new ubutnu 11.04 system' on user@hbase.

You can leave a comment or TODO stating the need to revisit when IPv6 support 
is added.

> Need a flush by regionserver rather than by table option
> 
>
> Key: HBASE-4224
> URL: https://issues.apache.org/jira/browse/HBASE-4224
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: stack
>Assignee: Akash Ashok
> Attachments: HBase-4224-v2.patch, HBase-4224.patch
>
>
> This evening needed to clean out logs on the cluster.  logs are by 
> regionserver.  to let go of logs, we need to have all edits emptied from 
> memory.  only flush is by table or region.  We need to be able to flush the 
> regionserver.  Need to add this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181465#comment-13181465
 ] 

Lars Hofhansl commented on HBASE-5088:
--

Do you think you can run your test with this again for comparison?

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 5088-syncObj.txt, 5088-useMapInterfaces.txt, 
> 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, 
> HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5121:
--

Attachment: 5121.90

Patch for 0.90 branch.

> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 5121-trunk-combined.txt, 5121.90, 
> hbase-5121-testcase.patch, hbase-5121.patch, hbase-5121v2.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181487#comment-13181487
 ] 

Hadoop QA commented on HBASE-5121:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509698/5121.90
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/687//console

This message is automatically generated.

> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 5121-trunk-combined.txt, 5121.90, 
> hbase-5121-testcase.patch, hbase-5121.patch, hbase-5121v2.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5138) [ref manual] Add a discussion on the number of regions

2012-01-06 Thread Jean-Daniel Cryans (Created) (JIRA)
[ref manual] Add a discussion on the number of regions
--

 Key: HBASE-5138
 URL: https://issues.apache.org/jira/browse/HBASE-5138
 Project: HBase
  Issue Type: Task
Reporter: Jean-Daniel Cryans


ntelford on IRC made the good point that we say people shouldn't have too many 
regions, but we don't say why. His problem currently is:

{quote}
09:21 < ntelford> problem is, if you're running MR jobs on a subset of that 
data, you need the regions to be as small as possible otherwise tasks don't get 
allocated in parallel much
09:22 < ntelford> so we've found we have to strike a balance between keeping 
them small for MR and keeping them large for HBase to behave well
09:22 < ntelford> we erred on the side of smaller regions because our MR issues 
were more immediate - we couldn't find any documentation or anecdotal evidence 
as to why HBase doesn't like lots of regions
{quote}

The three main issues I can think of when having too many regions are:

 - mslab requires 2mb per memstore (that's 2mb per family per region). 1000 
regions that have 2 families each is 3.9GB of heap used, and it's not even 
storing data yet. NB: the 2MB value is configurable.
 - if you fill all the regions at somewhat the same rate, the global memory 
usage makes it that it forces tiny flushes when you have too many regions which 
in turn generates compactions. Rewriting the same data tens of times is the 
last thing you want. An example is filling 1000 regions (with one family) 
equally and let's consider a lower bound for global memstore usage of 5GB (the 
region server would have a big heap). Once it reaches 5GB it will force flush 
the biggest region, at that point they should almost all have about 5MB of data 
so it would flush that amount. 5MB inserted later, it would flush another 
region that will now have a bit over 5MB of data, and so on.
 - the new master is allergic to tons of regions, and will take a lot of time 
assigning them and moving them around in batches. The reason is that it's heavy 
on ZK usage, and it's not very async at the moment (could really be improved).

Another issue is the effect of the number of regions on mapreduce jobs. Keeping 
5 regions per RS would be too low for a job, whereas 1000 will generate too 
many maps. This comes back to ntelford's problem of needing to scan portions of 
tables. To solve his problem, we discussed using a custom input format that 
generates many splits per region.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5121:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509698/5121.90
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/687//console

This message is automatically generated.)

> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 5121-trunk-combined.txt, 5121.90, 
> hbase-5121-testcase.patch, hbase-5121.patch, hbase-5121v2.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5121:
--

Priority: Critical  (was: Major)

> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 5121-trunk-combined.txt, 5121.90, 
> hbase-5121-testcase.patch, hbase-5121.patch, hbase-5121v2.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException

2012-01-06 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181500#comment-13181500
 ] 

ramkrishna.s.vasudevan commented on HBASE-5137:
---

@Ted
One more thing, we should abort even without checking the file system. Because 
when we check the file system and if it says the File system is fine then we 
dont abort. But the log split has any way not happened.



> MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws 
> IOException
> 
>
> Key: HBASE-5137
> URL: https://issues.apache.org/jira/browse/HBASE-5137
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> I am not sure if this bug was already raised in JIRA.
> In our test cluster we had a scenario where the RS had gone down and 
> ServerShutDownHandler started with splitLog.
> But as the HDFS was down the check waitOnSafeMode throws IOException.
> {code}
> try {
> // If FS is in safe mode, just wait till out of it.
> FSUtils.waitOnSafeMode(conf,
>   conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
> splitter.splitLog();
>   } catch (OrphanHLogAfterSplitException e) {
> {code}
> We catch the exception
> {code}
> } catch (IOException e) {
>   checkFileSystem();
>   LOG.error("Failed splitting " + logDir.toString(), e);
> }
> {code}
> So the HLog split itself did not happen. We encontered like 4 regions that 
> was recently splitted in the crashed RS was lost.
> Can we abort the Master in such scenarios? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-5088:
--

Fix Version/s: 0.90.6

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-syncObj.txt, 5088-useMapInterfaces.txt, 
> 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, 
> HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-06 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181502#comment-13181502
 ] 

ramkrishna.s.vasudevan commented on HBASE-5121:
---

@Ted
Thanks for the 0.90 patch.

> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.92.0, 0.94.0, 0.90.6
>
> Attachments: 5121-trunk-combined.txt, 5121.90, 
> hbase-5121-testcase.patch, hbase-5121.patch, hbase-5121v2.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5139) Compute (weighted) median using AggregateProtocol

2012-01-06 Thread Zhihong Yu (Created) (JIRA)
Compute (weighted) median using AggregateProtocol
-

 Key: HBASE-5139
 URL: https://issues.apache.org/jira/browse/HBASE-5139
 Project: HBase
  Issue Type: Sub-task
Reporter: Zhihong Yu


Suppose cf:cq1 stores numeric values and optionally cf:cq2 stores weights. This 
task finds out the median value among the values of cf:cq1 (See 
http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/weighted.median.html)

This can be done in two passes.
The first pass utilizes AggregateProtocol where the following tuple is returned 
from each region:
(start-rowkey, partial-sum-of-values, partial-sum-of-weights)
The start-rowkey is used to sort the tuples. This way we can determine which 
region (called R) contains the (weighted) median. partial-sum-of-weights can be 
0 if unweighted median is sought

The second pass involves scanning the region R and computing partial (weighted) 
sum until the threshold of S/2 is crossed. The (weighted) median is returned.

However, this approach wouldn't work if there is mutation in the underlying 
table between pass one and pass two.

In that case, sequential scanning seems to be the solution which is slower than 
the above approach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4224) Need a flush by regionserver rather than by table option

2012-01-06 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181512#comment-13181512
 ] 

jirapos...@reviews.apache.org commented on HBASE-4224:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3308/
---

(Updated 2012-01-06 18:48:11.359389)


Review request for hbase.


Changes
---

Changes:
1. Stricter hostname checking in ServerName.java
2. Flushes made completly Async.
3. WAL Roll moved to HRegionServer.java only for RegionServer flushes
4. Executor Service with named ThreadFactory
5. Dependency on Google Guava removed


Summary
---

Flush by RegionServer


This addresses bug HBase-4224.
https://issues.apache.org/jira/browse/HBase-4224


Diffs (updated)
-

  /src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1226330 
  /src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
1226330 
  /src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1226330 
  /src/main/java/org/apache/hadoop/hbase/ServerName.java 1226330 

Diff: https://reviews.apache.org/r/3308/diff


Testing
---


Thanks,

Akash



> Need a flush by regionserver rather than by table option
> 
>
> Key: HBASE-4224
> URL: https://issues.apache.org/jira/browse/HBASE-4224
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: stack
>Assignee: Akash Ashok
> Attachments: HBase-4224-v2.patch, HBase-4224.patch
>
>
> This evening needed to clean out logs on the cluster.  logs are by 
> regionserver.  to let go of logs, we need to have all edits emptied from 
> memory.  only flush is by table or region.  We need to be able to flush the 
> regionserver.  Need to add this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181515#comment-13181515
 ] 

Lars Hofhansl commented on HBASE-5088:
--

I think we should check in the latest change. Since we cannot reliably 
reproduce the issue, we have verify by inspection. The latest change 
(5088-syncObject.txt) will be least amount of change from the original code and 
adds the lowest least extra synchronization needed.

I'll fix up the patch slightly (pull in the interface cleanups from my other 
patch).

Ted, Stack, etc, please weigh in.


> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-syncObj.txt, 5088-useMapInterfaces.txt, 
> 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, 
> HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Josh Wymer (Created) (JIRA)
TableInputFormat subclass to allow N number of splits per region during MR jobs
---

 Key: HBASE-5140
 URL: https://issues.apache.org/jira/browse/HBASE-5140
 Project: HBase
  Issue Type: New Feature
  Components: mapreduce
Reporter: Josh Wymer
Priority: Trivial


In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
am working on a subclass for the TableInputFormat class that overrides 
getSplits in order to generate N number of splits per regions and/or N number 
of splits per job. The idea is to convert the startKey and endKey for each 
region from byte[] to BigDecimal, take the difference, divide by N, convert 
back to byte[] and generate splits on the resulting values. Assuming your keys 
are fully distributed this should generate splits at nearly the same number of 
rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181515#comment-13181515
 ] 

Lars Hofhansl edited comment on HBASE-5088 at 1/6/12 7:00 PM:
--

I think we should check in the latest change. Since we cannot reliably 
reproduce the issue, we have to verify by inspection. The latest change 
(5088-syncObject.txt) will be least amount of change from the original code and 
adds the least extra synchronization needed.

I'll fix up the patch slightly (pull in the interface cleanups from my other 
patch).

Ted, Stack, etc, please weigh in.


  was (Author: lhofhansl):
I think we should check in the latest change. Since we cannot reliably 
reproduce the issue, we have verify by inspection. The latest change 
(5088-syncObject.txt) will be least amount of change from the original code and 
adds the lowest least extra synchronization needed.

I'll fix up the patch slightly (pull in the interface cleanups from my other 
patch).

Ted, Stack, etc, please weigh in.

  
> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-syncObj.txt, 5088-useMapInterfaces.txt, 
> 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, 
> HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5088:
-

Status: Open  (was: Patch Available)

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181524#comment-13181524
 ] 

Zhihong Yu commented on HBASE-5088:
---

I think we'd better verify the performance first.

This issue is not so critical that 0.92 RC3 should be blocked since it is not 
easily reproducible.

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread Ming Ma (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HBASE-4357:
---

Attachment: HBASE-4357-0.92.patch

ok. try this one.

> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4357-0.92.patch, HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5088:
-

Attachment: 5088-final.txt

Here's a proposal for a final patch.


> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181529#comment-13181529
 ] 

Lars Hofhansl commented on HBASE-5088:
--

@Ted: This change is the minimum we *must* do (in the sense that it is the 
minimum extra synchronization to make it correct), even it takes a perf hit.


> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181531#comment-13181531
 ] 

Hadoop QA commented on HBASE-4357:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509708/HBASE-4357-0.92.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/688//console

This message is automatically generated.

> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4357-0.92.patch, HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181536#comment-13181536
 ] 

Zhihong Yu commented on HBASE-5140:
---

We should consider the amount of computing involved in the map/reduce tasks.
The assumption expressed in the description may not be satisfied in various 
scenarios.

I think we can provide abstraction over key space partitioning by introducing 
an interface.
The BigDecimal idea would be one implementation.

> TableInputFormat subclass to allow N number of splits per region during MR 
> jobs
> ---
>
> Key: HBASE-5140
> URL: https://issues.apache.org/jira/browse/HBASE-5140
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Josh Wymer
>Priority: Trivial
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
> am working on a subclass for the TableInputFormat class that overrides 
> getSplits in order to generate N number of splits per regions and/or N number 
> of splits per job. The idea is to convert the startKey and endKey for each 
> region from byte[] to BigDecimal, take the difference, divide by N, convert 
> back to byte[] and generate splits on the resulting values. Assuming your 
> keys are fully distributed this should generate splits at nearly the same 
> number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5119) Set the TimeoutMonitor's timeout back down

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5119:
--

Fix Version/s: (was: 0.92.0)
   0.92.1

> Set the TimeoutMonitor's timeout back down
> --
>
> Key: HBASE-5119
> URL: https://issues.apache.org/jira/browse/HBASE-5119
> Project: HBase
>  Issue Type: Task
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
>
> The TimeoutMonitor used to be extremely racy and caused more troubles than it 
> fixed, but most of this has been fixed I believe in the context of 0.92 so I 
> think we should set it down back to a useful level. Currently it's 30 
> minutes, what should the new value be?
> I think 5 minutes should be good, will do some testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5120) Timeout monitor races with table disable handler

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5120:
--

Fix Version/s: (was: 0.92.0)
   0.92.1

> Timeout monitor races with table disable handler
> 
>
> Key: HBASE-5120
> URL: https://issues.apache.org/jira/browse/HBASE-5120
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Zhihong Yu
>Priority: Blocker
> Fix For: 0.94.0, 0.92.1
>
> Attachments: HBASE-5120.patch
>
>
> Here is what J-D described here:
> https://issues.apache.org/jira/browse/HBASE-5119?focusedCommentId=13179176&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13179176
> I think I will retract from my statement that it "used to be extremely racy 
> and caused more troubles than it fixed", on my first test I got a stuck 
> region in transition instead of being able to recover. The timeout was set to 
> 2 minutes to be sure I hit it.
> First the region gets closed
> {quote}
> 2012-01-04 00:16:25,811 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
> sv4r5s38,62023,1325635980913 for region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> {quote}
> 2 minutes later it times out:
> {quote}
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636185810, server=null
> 2012-01-04 00:18:30,026 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
> PENDING_CLOSE for too long, running forced unassign again on 
> region=test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,027 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
> region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> (offlining)
> {quote}
> 100ms later the master finally gets the event:
> {quote}
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Handling 
> transition=RS_ZK_REGION_CLOSED, server=sv4r5s38,62023,1325635980913, 
> region=1a4b111bcc228043e89f59c4c3f6a791, which is more than 15 seconds late
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
> event for 1a4b111bcc228043e89f59c4c3f6a791
> 2012-01-04 00:18:30,129 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: Table being disabled so 
> deleting ZK node and removing from regions in transition, skipping assignment 
> of region test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791.
> 2012-01-04 00:18:30,129 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Deleting existing unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 that is in expected state RS_ZK_REGION_CLOSED
> 2012-01-04 00:18:30,166 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Successfully deleted unassigned node for 
> region 1a4b111bcc228043e89f59c4c3f6a791 in expected state RS_ZK_REGION_CLOSED
> {quote}
> At this point everything is fine, the region was processed as closed. But 
> wait, remember that line where it said it was going to force an unassign?
> {quote}
> 2012-01-04 00:18:30,322 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> master:62003-0x134589d3db03587 Creating unassigned node for 
> 1a4b111bcc228043e89f59c4c3f6a791 in a CLOSING state
> 2012-01-04 00:18:30,328 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Server null returned 
> java.lang.NullPointerException: Passed server is null for 
> 1a4b111bcc228043e89f59c4c3f6a791
> {quote}
> Now the master is confused, it recreated the RIT znode but the region doesn't 
> even exist anymore. It even tries to shut it down but is blocked by NPEs. Now 
> this is what's going on.
> The late ZK notification that the znode was deleted (but it got recreated 
> after):
> {quote}
> 2012-01-04 00:19:33,285 DEBUG 
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. has been 
> deleted.
> {quote}
> Then it prints this, and much later tries to unassign it again:
> {quote}
> 2012-01-04 00:19:46,607 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, ts=1325636310328, server=null
> ...
> 2012-01-04 00:20:39,623 DEBUG 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Waiting on  region 
> to clear regions in transition; 
> test1,089cd0c9,1325635015491.1a4b111bcc228043e89f59c4c3f6a791. 
> state=PENDING_CLOSE, 

[jira] [Updated] (HBASE-5121) MajorCompaction may affect scan's correctness

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5121:
--

Fix Version/s: (was: 0.92.0)
   0.92.1

People expressed concern whether introducing a new exception is the best 
approach to take.

@Chunhui:
Please extrapolate on other alternatives.

Thanks

> MajorCompaction may affect scan's correctness
> -
>
> Key: HBASE-5121
> URL: https://issues.apache.org/jira/browse/HBASE-5121
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.90.4
>Reporter: chunhui shen
>Assignee: chunhui shen
>Priority: Critical
> Fix For: 0.94.0, 0.92.1, 0.90.6
>
> Attachments: 5121-trunk-combined.txt, 5121.90, 
> hbase-5121-testcase.patch, hbase-5121.patch, hbase-5121v2.patch
>
>
> In our test, there are two families' keyvalue for one row.
> But we could find a infrequent problem when doing scan's next if 
> majorCompaction happens concurrently.
> In the client's two continuous doing scan.next():
> 1.First time, scan's next returns the result where family A is null.
> 2.Second time, scan's next returns the result where family B is null.
> The two next()'s result have the same row.
> If there are more families, I think the scenario will be more strange...
> We find the reason is that storescanner.peek() is changed after 
> majorCompaction if there are delete type KeyValue.
> This change causes the PriorityQueue of RegionScanner's heap 
> is not sure to be sorted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism

2012-01-06 Thread Kannan Muthukkaruppan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181546#comment-13181546
 ] 

Kannan Muthukkaruppan commented on HBASE-5104:
--

@Lars/Stack: In 89-fb, we had done this for adding a reliable "limit" mechanism 
(this works per-CF/per-row). Madhu had implemented this. The rev is here: 
http://svn.apache.org/viewvc?view=revision&revision=1181562. [I don't think 
this is ported to trunk yet.] We were thinking of extending/doing something 
similar for "offset".

Lars: The startColumn type of approach doesn't work for cases for example when 
you are using a ColumnValueFilter instead of filter based on column names. [See 
my previous post.]

Already, when we specify attributes such as timerange() or add a CF or specific 
column names, it applies to each row. So one way to think of this is that 
limit/offset are also applicable within each row the Scan encounters. Most 
folks are going to use it for "Get" (single row scans), but there is no need to 
preclude the functionality from a multi-row Scan either.

This is the API that was added in 89-fb:
{code}
 /**
   * Set the maximum number of values to return per row per Column Family
   * @param limit the maximum number of values returned / row / CF
   */
public void setMaxResultsPerColumnFamily(int limit)
{code}

The thought was we could add something like:
{code}
 /**
   * Skip offset number of values to return per row per Column Family
   * @param offset number of values to be skipped per row / CF
   */
public void setOffsetPerColumnFamily(int offset)
{code}





> Provide a reliable intra-row pagination mechanism
> -
>
> Key: HBASE-5104
> URL: https://issues.apache.org/jira/browse/HBASE-5104
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
>Assignee: Madhuwanti Vaidya
> Attachments: testFilterList.rb
>
>
> Addendum:
> Doing pagination (retrieving at most "limit" number of KVs at a particular 
> "offset") is currently supported via the ColumnPaginationFilter. However, it 
> is not a very clean way of supporting pagination.  Some of the problems with 
> it are:
> * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have 
> same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This 
> is not the case for ColumnPaginationFilter as its internal state gets updated 
> depending on whether or not Filter(A) returns TRUE/FALSE for a particular 
> cell.
> * When this Filter is used in combination with other filters (e.g., doing AND 
> with another filter using FilterList), the behavior of the query depends on 
> the order of filters in the FilterList. This is not ideal.
> * ColumnPaginationFilter is a stateful filter which ends up counting multiple 
> versions of the cell as separate values even if another filter upstream or 
> the ScanQueryMatcher is going to reject the value for other reasons.
> Seems like we need a reliable way to do pagination. The particular use case 
> that prompted this JIRA is pagination within the same rowKey. For example, 
> for a given row key R, get columns with prefix P, starting at offset X (among 
> columns which have prefix P) and limit Y. Some possible fixes might be:
> 1) enhance ColumnPrefixFilter to support another constructor which supports 
> limit/offset.
> 2) Support pagination (limit/offset) at the Scan/Get API level (rather than 
> as a filter) [Like SQL].
> Original Post:
> Thanks Jiakai Liu for reporting this issue and doing the initial 
> investigation. Email from Jiakai below:
> Assuming that we have an index column family with the following entries:
> "tag0:001:thread1"
> ...
> "tag1:001:thread1"
> "tag1:002:thread2"
> ...
> "tag1:010:thread10"
> ...
> "tag2:001:thread1"
> "tag2:005:thread5"
> ...
> To get threads with "tag1" in range [5, 10), I tried the following code:
> ColumnPrefixFilter filter1 = new 
> ColumnPrefixFilter(Bytes.toBytes("tag1"));
> ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
> */, 5 /* offset */);
> FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
> filters.addFilter(filter1);
> filters.addFilter(filter2);
> Get get = new Get(USER);
> get.addFamily(COLUMN_FAMILY);
> get.setMaxVersions(1);
> get.setFilter(filters);
> Somehow it didn't work as expected. It returned the entries as if the filter1 
> were not set.
> Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
> The FilterList filter does not handle this return code properly (treat it as 
> INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more inf

[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Josh Wymer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181550#comment-13181550
 ] 

Josh Wymer commented on HBASE-5140:
---

We also talked about other methods such as using the first 8 bytes of the keys 
and converting to a long. This could indeed be solved by an interface.

> TableInputFormat subclass to allow N number of splits per region during MR 
> jobs
> ---
>
> Key: HBASE-5140
> URL: https://issues.apache.org/jira/browse/HBASE-5140
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Josh Wymer
>Priority: Trivial
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
> am working on a subclass for the TableInputFormat class that overrides 
> getSplits in order to generate N number of splits per regions and/or N number 
> of splits per job. The idea is to convert the startKey and endKey for each 
> region from byte[] to BigDecimal, take the difference, divide by N, convert 
> back to byte[] and generate splits on the resulting values. Assuming your 
> keys are fully distributed this should generate splits at nearly the same 
> number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5104) Provide a reliable intra-row pagination mechanism

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181552#comment-13181552
 ] 

Zhihong Yu commented on HBASE-5104:
---

@Kannan:
The above methods look good.
Minor suggestion: I think the method names should reflect the nature of 
intra-row pagination. Currently people need to read the javadoc to get that.
Basically we should distinguish this feature from inter-row pagination support.

> Provide a reliable intra-row pagination mechanism
> -
>
> Key: HBASE-5104
> URL: https://issues.apache.org/jira/browse/HBASE-5104
> Project: HBase
>  Issue Type: Bug
>Reporter: Kannan Muthukkaruppan
>Assignee: Madhuwanti Vaidya
> Attachments: testFilterList.rb
>
>
> Addendum:
> Doing pagination (retrieving at most "limit" number of KVs at a particular 
> "offset") is currently supported via the ColumnPaginationFilter. However, it 
> is not a very clean way of supporting pagination.  Some of the problems with 
> it are:
> * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have 
> same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This 
> is not the case for ColumnPaginationFilter as its internal state gets updated 
> depending on whether or not Filter(A) returns TRUE/FALSE for a particular 
> cell.
> * When this Filter is used in combination with other filters (e.g., doing AND 
> with another filter using FilterList), the behavior of the query depends on 
> the order of filters in the FilterList. This is not ideal.
> * ColumnPaginationFilter is a stateful filter which ends up counting multiple 
> versions of the cell as separate values even if another filter upstream or 
> the ScanQueryMatcher is going to reject the value for other reasons.
> Seems like we need a reliable way to do pagination. The particular use case 
> that prompted this JIRA is pagination within the same rowKey. For example, 
> for a given row key R, get columns with prefix P, starting at offset X (among 
> columns which have prefix P) and limit Y. Some possible fixes might be:
> 1) enhance ColumnPrefixFilter to support another constructor which supports 
> limit/offset.
> 2) Support pagination (limit/offset) at the Scan/Get API level (rather than 
> as a filter) [Like SQL].
> Original Post:
> Thanks Jiakai Liu for reporting this issue and doing the initial 
> investigation. Email from Jiakai below:
> Assuming that we have an index column family with the following entries:
> "tag0:001:thread1"
> ...
> "tag1:001:thread1"
> "tag1:002:thread2"
> ...
> "tag1:010:thread10"
> ...
> "tag2:001:thread1"
> "tag2:005:thread5"
> ...
> To get threads with "tag1" in range [5, 10), I tried the following code:
> ColumnPrefixFilter filter1 = new 
> ColumnPrefixFilter(Bytes.toBytes("tag1"));
> ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
> */, 5 /* offset */);
> FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
> filters.addFilter(filter1);
> filters.addFilter(filter2);
> Get get = new Get(USER);
> get.addFamily(COLUMN_FAMILY);
> get.setMaxVersions(1);
> get.setFilter(filters);
> Somehow it didn't work as expected. It returned the entries as if the filter1 
> were not set.
> Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
> The FilterList filter does not handle this return code properly (treat it as 
> INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5081:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Will open new issue if other distributed log splitting bug is discovered.

> Distributed log splitting deleteNode races against splitLog retry 
> --
>
> Key: HBASE-5081
> URL: https://issues.apache.org/jira/browse/HBASE-5081
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Jimmy Xiang
>Assignee: Prakash Khemani
> Fix For: 0.92.0
>
> Attachments: 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> 5081-deleteNode-with-while-loop.txt, 
> HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
> distributed-log-splitting-screenshot.png, 
> distributed_log_splitting_screen_shot2.png, 
> distributed_log_splitting_screenshot3.png, hbase-5081-patch-v6.txt, 
> hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
> hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
> patch_for_92_v3.txt
>
>
> Recently, during 0.92 rc testing, we found distributed log splitting hangs 
> there forever.  Please see attached screen shot.
> I looked into it and here is what happened I think:
> 1. One rs died, the servershutdownhandler found it out and started the 
> distributed log splitting;
> 2. All three tasks failed, so the three tasks were deleted, asynchronously;
> 3. Servershutdownhandler retried the log splitting;
> 4. During the retrial, it created these three tasks again, and put them in a 
> hashmap (tasks);
> 5. The asynchronously deletion in step 2 finally happened for one task, in 
> the callback, it removed one
> task in the hashmap;
> 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
> unassigned, and it is not
> in the hashmap, so it created a new orphan task.
> 7.  All three tasks failed, but that task created in step 6 is an orphan so 
> the batch.err counter was one short,
> so the log splitting hangs there and keeps waiting for the last task to 
> finish which is never going to happen.
> So I think the problem is step 2.  The fix is to make deletion sync, instead 
> of async, so that the retry will have
> a clean start.
> Async deleteNode will mess up with split log retrial.  In extreme situation, 
> if async deleteNode doesn't happen
> soon enough, some node created during the retrial could be deleted.
> deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181558#comment-13181558
 ] 

Zhihong Yu commented on HBASE-5088:
---

@Lars:
Agreed.

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4357:
--

Attachment: 4357.txt

Manually resolved some conflicts.

> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4357.txt, HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4357:
--

Attachment: (was: HBASE-4357-0.92.patch)

> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4357.txt, HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Josh Wymer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181565#comment-13181565
 ] 

Josh Wymer commented on HBASE-5140:
---

One glaring issue is the lack of start & end keys for one region tables. To get 
the start key we could do a quick scan of the first row and get the key. For 
the last region of a table, I'm not sure how we'll handle determining the end 
key other than setting it to the max size of whatever data type (e.g. long) we 
are using for the split calculations. Any suggestions other than this?

> TableInputFormat subclass to allow N number of splits per region during MR 
> jobs
> ---
>
> Key: HBASE-5140
> URL: https://issues.apache.org/jira/browse/HBASE-5140
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Josh Wymer
>Priority: Trivial
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
> am working on a subclass for the TableInputFormat class that overrides 
> getSplits in order to generate N number of splits per regions and/or N number 
> of splits per job. The idea is to convert the startKey and endKey for each 
> region from byte[] to BigDecimal, take the difference, divide by N, convert 
> back to byte[] and generate splits on the resulting values. Assuming your 
> keys are fully distributed this should generate splits at nearly the same 
> number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181572#comment-13181572
 ] 

Zhihong Yu commented on HBASE-5140:
---

I suggest utilizing this method in HTable:
{code}
  public Pair getStartEndKeys() throws IOException {
{code}
i.e. start and end keys are passed to the splitter interface.

> TableInputFormat subclass to allow N number of splits per region during MR 
> jobs
> ---
>
> Key: HBASE-5140
> URL: https://issues.apache.org/jira/browse/HBASE-5140
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Josh Wymer
>Priority: Trivial
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
> am working on a subclass for the TableInputFormat class that overrides 
> getSplits in order to generate N number of splits per regions and/or N number 
> of splits per job. The idea is to convert the startKey and endKey for each 
> region from byte[] to BigDecimal, take the difference, divide by N, convert 
> back to byte[] and generate splits on the resulting values. Assuming your 
> keys are fully distributed this should generate splits at nearly the same 
> number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181572#comment-13181572
 ] 

Zhihong Yu edited comment on HBASE-5140 at 1/6/12 8:40 PM:
---

I suggest utilizing this method in HTable:
{code}
  public Pair getStartEndKeys() throws IOException {
{code}
i.e. start and end keys returned by the above method are passed to the splitter 
interface.

  was (Author: zhi...@ebaysf.com):
I suggest utilizing this method in HTable:
{code}
  public Pair getStartEndKeys() throws IOException {
{code}
i.e. start and end keys are passed to the splitter interface.
  
> TableInputFormat subclass to allow N number of splits per region during MR 
> jobs
> ---
>
> Key: HBASE-5140
> URL: https://issues.apache.org/jira/browse/HBASE-5140
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Josh Wymer
>Priority: Trivial
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
> am working on a subclass for the TableInputFormat class that overrides 
> getSplits in order to generate N number of splits per regions and/or N number 
> of splits per job. The idea is to convert the startKey and endKey for each 
> region from byte[] to BigDecimal, take the difference, divide by N, convert 
> back to byte[] and generate splits on the resulting values. Assuming your 
> keys are fully distributed this should generate splits at nearly the same 
> number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181577#comment-13181577
 ] 

Hadoop QA commented on HBASE-4357:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509710/4357.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 79 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/689//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/689//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/689//console

This message is automatically generated.

> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4357.txt, HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Josh Wymer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181579#comment-13181579
 ] 

Josh Wymer commented on HBASE-5140:
---

Correct but for example on a table with one region, getStartEndKeys() returns 
two empty byte[]. The last region (or only region) for the table will return 
empty byte[] as the end key allowing the scan to scan to the end of the table. 
Therefore, we don't know the upper bound byte[] to use in order to determine 
the long (or int, etc) value we want to use for split calculations. So we must 
either have an efficient way to get the last key in this case or arbitrarily 
set the long to it's max value (since in any case nothing could be higher) and 
use that number to make the calculations. This obviously won't work for unbound 
data types like BigDecimal and is a partial solution at best.

> TableInputFormat subclass to allow N number of splits per region during MR 
> jobs
> ---
>
> Key: HBASE-5140
> URL: https://issues.apache.org/jira/browse/HBASE-5140
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Josh Wymer
>Priority: Trivial
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
> am working on a subclass for the TableInputFormat class that overrides 
> getSplits in order to generate N number of splits per regions and/or N number 
> of splits per job. The idea is to convert the startKey and endKey for each 
> region from byte[] to BigDecimal, take the difference, divide by N, convert 
> back to byte[] and generate splits on the resulting values. Assuming your 
> keys are fully distributed this should generate splits at nearly the same 
> number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Ming Ma (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181582#comment-13181582
 ] 

Ming Ma commented on HBASE-5140:


Is it the same as HBASE-4063?

> TableInputFormat subclass to allow N number of splits per region during MR 
> jobs
> ---
>
> Key: HBASE-5140
> URL: https://issues.apache.org/jira/browse/HBASE-5140
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Josh Wymer
>Priority: Trivial
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
> am working on a subclass for the TableInputFormat class that overrides 
> getSplits in order to generate N number of splits per regions and/or N number 
> of splits per job. The idea is to convert the startKey and endKey for each 
> region from byte[] to BigDecimal, take the difference, divide by N, convert 
> back to byte[] and generate splits on the resulting values. Assuming your 
> keys are fully distributed this should generate splits at nearly the same 
> number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4357) Region in transition - in closing state

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181595#comment-13181595
 ] 

Zhihong Yu commented on HBASE-4357:
---

There was no hung test(s).
The above test failures are known due to MAPREDUCE-3583.

The latest patch is ready to be checked in.

> Region in transition - in closing state
> ---
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4357.txt, HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181602#comment-13181602
 ] 

Lars Hofhansl commented on HBASE-5088:
--

So I take this as a +1?


> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181604#comment-13181604
 ] 

Zhihong Yu commented on HBASE-5140:
---

MAPREDUCE-1220, referenced in HBASE-4063, has been resolved against hadoop 0.23.
So we cannot use it at the moment.

@Josh:
I believe the single region scenario is the degenerate case.
Using max value for long should be fine for that case.
The best practice is to presplit when creating the table.

> TableInputFormat subclass to allow N number of splits per region during MR 
> jobs
> ---
>
> Key: HBASE-5140
> URL: https://issues.apache.org/jira/browse/HBASE-5140
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Josh Wymer
>Priority: Trivial
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I 
> am working on a subclass for the TableInputFormat class that overrides 
> getSplits in order to generate N number of splits per regions and/or N number 
> of splits per job. The idea is to convert the startKey and endKey for each 
> region from byte[] to BigDecimal, take the difference, divide by N, convert 
> back to byte[] and generate splits on the resulting values. Assuming your 
> keys are fully distributed this should generate splits at nearly the same 
> number of rows per split. Any suggestions on this issue are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181608#comment-13181608
 ] 

Zhihong Yu commented on HBASE-5088:
---

I haven't looked at the final patch yet - can do that later.
So consider my vote +0.

Please run the patch through Hadoop QA.

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5088:
-

Status: Patch Available  (was: Open)

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181618#comment-13181618
 ] 

Zhihong Yu commented on HBASE-5136:
---

Can SplitLogManager maintain mapping from hash of logDirs to MonitoredTask ?
If hash of logDirs isn't found in this mapping, we create MonitoredTask.

@Prakash:
What do you think ?

> Redundant MonitoredTask instances in case of distributed log splitting retry
> 
>
> Key: HBASE-5136
> URL: https://issues.apache.org/jira/browse/HBASE-5136
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
>
> In case of log splitting retry, the following code would be executed multiple 
> times:
> {code}
>   public long splitLogDistributed(final List logDirs) throws 
> IOException {
> MonitoredTask status = TaskMonitor.get().createStatus(
>   "Doing distributed log split in " + logDirs);
> {code}
> leading to multiple MonitoredTask instances.
> User may get confused by multiple distributed log splitting entries for the 
> same region server on master UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5139) Compute (weighted) median using AggregateProtocol

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5139:
--

Description: 
Suppose cf:cq1 stores numeric values and optionally cf:cq2 stores weights. This 
task finds out the median value among the values of cf:cq1 (See 
http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/weighted.median.html)

This can be done in two passes.
The first pass utilizes AggregateProtocol where the following tuple is returned 
from each region:
(start-rowkey, partial-sum-of-values, partial-sum-of-weights)
The start-rowkey is used to sort the tuples. This way we can determine which 
region (called R) contains the (weighted) median. partial-sum-of-weights can be 
0 if unweighted median is sought

The second pass involves scanning the table, beginning with startrow of region 
R and computing partial (weighted) sum until the threshold of S/2 is crossed. 
The (weighted) median is returned.

However, this approach wouldn't work if there is mutation in the underlying 
table between pass one and pass two.

In that case, sequential scanning seems to be the solution which is slower than 
the above approach.

  was:
Suppose cf:cq1 stores numeric values and optionally cf:cq2 stores weights. This 
task finds out the median value among the values of cf:cq1 (See 
http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/weighted.median.html)

This can be done in two passes.
The first pass utilizes AggregateProtocol where the following tuple is returned 
from each region:
(start-rowkey, partial-sum-of-values, partial-sum-of-weights)
The start-rowkey is used to sort the tuples. This way we can determine which 
region (called R) contains the (weighted) median. partial-sum-of-weights can be 
0 if unweighted median is sought

The second pass involves scanning the region R and computing partial (weighted) 
sum until the threshold of S/2 is crossed. The (weighted) median is returned.

However, this approach wouldn't work if there is mutation in the underlying 
table between pass one and pass two.

In that case, sequential scanning seems to be the solution which is slower than 
the above approach.


> Compute (weighted) median using AggregateProtocol
> -
>
> Key: HBASE-5139
> URL: https://issues.apache.org/jira/browse/HBASE-5139
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zhihong Yu
>
> Suppose cf:cq1 stores numeric values and optionally cf:cq2 stores weights. 
> This task finds out the median value among the values of cf:cq1 (See 
> http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/R.basic/html/weighted.median.html)
> This can be done in two passes.
> The first pass utilizes AggregateProtocol where the following tuple is 
> returned from each region:
> (start-rowkey, partial-sum-of-values, partial-sum-of-weights)
> The start-rowkey is used to sort the tuples. This way we can determine which 
> region (called R) contains the (weighted) median. partial-sum-of-weights can 
> be 0 if unweighted median is sought
> The second pass involves scanning the table, beginning with startrow of 
> region R and computing partial (weighted) sum until the threshold of S/2 is 
> crossed. The (weighted) median is returned.
> However, this approach wouldn't work if there is mutation in the underlying 
> table between pass one and pass two.
> In that case, sequential scanning seems to be the solution which is slower 
> than the above approach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5137) MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181624#comment-13181624
 ] 

Zhihong Yu commented on HBASE-5137:
---

@Ram:
Good point. The current logic assumes that if file system check passes, 
retrying splitting log may succeed.

I think the correct logic should add abortion in the catch block below:
{code}
} catch (InterruptedException e) {
  LOG.warn("Interrupted, aborting since cannot return w/o splitting at 
startup");
  Thread.currentThread().interrupt();
  retrySplitting = false;
  Runtime.getRuntime().halt(1);
}
{code}

> MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws 
> IOException
> 
>
> Key: HBASE-5137
> URL: https://issues.apache.org/jira/browse/HBASE-5137
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> I am not sure if this bug was already raised in JIRA.
> In our test cluster we had a scenario where the RS had gone down and 
> ServerShutDownHandler started with splitLog.
> But as the HDFS was down the check waitOnSafeMode throws IOException.
> {code}
> try {
> // If FS is in safe mode, just wait till out of it.
> FSUtils.waitOnSafeMode(conf,
>   conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
> splitter.splitLog();
>   } catch (OrphanHLogAfterSplitException e) {
> {code}
> We catch the exception
> {code}
> } catch (IOException e) {
>   checkFileSystem();
>   LOG.error("Failed splitting " + logDir.toString(), e);
> }
> {code}
> So the HLog split itself did not happen. We encontered like 4 regions that 
> was recently splitted in the crashed RS was lost.
> Can we abort the Master in such scenarios? Pls suggest.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181632#comment-13181632
 ] 

Hadoop QA commented on HBASE-5088:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509707/5088-final.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -151 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 79 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/690//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/690//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/690//console

This message is automatically generated.

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181637#comment-13181637
 ] 

Lars Hofhansl commented on HBASE-5088:
--

In a quick local test (on pseudo distributed mode) I was able to reproduce the 
slow down with ConcurrentSkipListMap that Jieshan has reported, in my case the 
slowdown was even worse (runtime went up from ~12.5s to ~17.5 - everything is 
faster in local mode, so a constant slowdown will have a larger proportional 
impact).

With the last patch (sync Object) I have not seen any slowdown using the same 
test. So I think this is good to go a soon as HadoopQA confirms.


> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181647#comment-13181647
 ] 

Zhihong Yu commented on HBASE-5088:
---

{code}
+  private SoftValueSortedMap(SortedMap> original, Object 
sync) {
{code}
The new ctor is private. Meaning, sync would always be the same as original. I 
wonder if introducing the sync field is necessary.
Can this:
{code}
+synchronized(sync) {
{code}
be replaced by the following ?
{code}
+synchronized(this.internalMap) {
{code}

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181656#comment-13181656
 ] 

Lars Hofhansl commented on HBASE-5088:
--

Look at the {head|tail|sub}Map methods, that's where it gets interesting and is 
exactly the reason why there is an extra object to synchronize on.

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181657#comment-13181657
 ] 

Lars Hofhansl commented on HBASE-5088:
--

But I see how this is easy to miss, maybe some comments on that constructor 
would be in order.

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181663#comment-13181663
 ] 

Zhihong Yu commented on HBASE-5088:
---

That makes sense.
Some lines are too wide:
{code}
+  return new SoftValueSortedMap(this.internalMap.subMap(fromKey, 
toKey), sync);
{code}

{code}
-SoftValueSortedMap tableLocs =
-  this.cachedRegionLocations.get(key);
+Map tableLocs = this.cachedRegionLocations
+.get(key);
{code}
Did you get the above through auto-formatting ? I think keeping 
this.cachedRegionLocations.get(key) on one line is better.

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4357) Region stayed in transition - in closing state

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-4357:
--

Hadoop Flags: Reviewed
 Summary: Region stayed in transition - in closing state  (was: Region 
in transition - in closing state)

> Region stayed in transition - in closing state
> --
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4357.txt, HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4357) Region stayed in transition - in closing state

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181669#comment-13181669
 ] 

Zhihong Yu commented on HBASE-4357:
---

Integrated to 0.92 and TRUNK.

Thanks for the patch, Ming. You correctly generated patch for 0.92 but Hadoop 
QA runs test suite in TRUNK.

Thanks for the review Ram.

> Region stayed in transition - in closing state
> --
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4357.txt, HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl

2012-01-06 Thread Jean-Daniel Cryans (Created) (JIRA)
Memory leak in MonitoredRPCHandlerImpl
--

 Key: HBASE-5141
 URL: https://issues.apache.org/jira/browse/HBASE-5141
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.92.0, 0.94.0


I got a pretty reliable way of OOME'ing my region servers. Using a big payload 
(64MB in my case), a default heap and default number of handlers, it's not too 
long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a 
compaction kicks in it kills everything.

The issue is that even after the RPC call is done, the packet still lives in 
MonitoredRPCHandlerImpl.

Will attach a screen shot of jprofiler's analysis in a moment.

This is a blocker for 0.92.0, anyone using a high number of handlers and bigish 
values will kill themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl

2012-01-06 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5141:
--

Attachment: Screen Shot 2012-01-06 at 3.03.09 PM.png

This screen shot shows how MonitoredRPCHandlerImpl are all using 6% of the heap 
because they are holding on the packets.

> Memory leak in MonitoredRPCHandlerImpl
> --
>
> Key: HBASE-5141
> URL: https://issues.apache.org/jira/browse/HBASE-5141
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: Screen Shot 2012-01-06 at 3.03.09 PM.png
>
>
> I got a pretty reliable way of OOME'ing my region servers. Using a big 
> payload (64MB in my case), a default heap and default number of handlers, 
> it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB 
> reference and once a compaction kicks in it kills everything.
> The issue is that even after the RPC call is done, the packet still lives in 
> MonitoredRPCHandlerImpl.
> Will attach a screen shot of jprofiler's analysis in a moment.
> This is a blocker for 0.92.0, anyone using a high number of handlers and 
> bigish values will kill themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-01-06 Thread Zhihong Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu reassigned HBASE-5136:
-

Assignee: Zhihong Yu

> Redundant MonitoredTask instances in case of distributed log splitting retry
> 
>
> Key: HBASE-5136
> URL: https://issues.apache.org/jira/browse/HBASE-5136
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
>
> In case of log splitting retry, the following code would be executed multiple 
> times:
> {code}
>   public long splitLogDistributed(final List logDirs) throws 
> IOException {
> MonitoredTask status = TaskMonitor.get().createStatus(
>   "Doing distributed log split in " + logDirs);
> {code}
> leading to multiple MonitoredTask instances.
> User may get confused by multiple distributed log splitting entries for the 
> same region server on master UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5088:
-

Attachment: 5088-final2.txt

How about this one.

* Undid the auto-format that put .get in a separate line.
* broke up line that is too long
* added java doc to new constructor

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Jieshan Bean
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-final2.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5136:
--

Attachment: 5136.txt

> Redundant MonitoredTask instances in case of distributed log splitting retry
> 
>
> Key: HBASE-5136
> URL: https://issues.apache.org/jira/browse/HBASE-5136
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Attachments: 5136.txt
>
>
> In case of log splitting retry, the following code would be executed multiple 
> times:
> {code}
>   public long splitLogDistributed(final List logDirs) throws 
> IOException {
> MonitoredTask status = TaskMonitor.get().createStatus(
>   "Doing distributed log split in " + logDirs);
> {code}
> leading to multiple MonitoredTask instances.
> User may get confused by multiple distributed log splitting entries for the 
> same region server on master UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181686#comment-13181686
 ] 

Zhihong Yu commented on HBASE-5136:
---

Patch v1 is based on the assumption that the list of dead region servers for 
distributed log splitting stays the same across retries.

> Redundant MonitoredTask instances in case of distributed log splitting retry
> 
>
> Key: HBASE-5136
> URL: https://issues.apache.org/jira/browse/HBASE-5136
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Attachments: 5136.txt
>
>
> In case of log splitting retry, the following code would be executed multiple 
> times:
> {code}
>   public long splitLogDistributed(final List logDirs) throws 
> IOException {
> MonitoredTask status = TaskMonitor.get().createStatus(
>   "Doing distributed log split in " + logDirs);
> {code}
> leading to multiple MonitoredTask instances.
> User may get confused by multiple distributed log splitting entries for the 
> same region server on master UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl reassigned HBASE-5088:


Assignee: Lars Hofhansl  (was: Jieshan Bean)

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-final2.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5136:
--

Attachment: (was: 5136.txt)

> Redundant MonitoredTask instances in case of distributed log splitting retry
> 
>
> Key: HBASE-5136
> URL: https://issues.apache.org/jira/browse/HBASE-5136
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Attachments: 5136.txt
>
>
> In case of log splitting retry, the following code would be executed multiple 
> times:
> {code}
>   public long splitLogDistributed(final List logDirs) throws 
> IOException {
> MonitoredTask status = TaskMonitor.get().createStatus(
>   "Doing distributed log split in " + logDirs);
> {code}
> leading to multiple MonitoredTask instances.
> User may get confused by multiple distributed log splitting entries for the 
> same region server on master UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5136:
--

Attachment: 5136.txt

> Redundant MonitoredTask instances in case of distributed log splitting retry
> 
>
> Key: HBASE-5136
> URL: https://issues.apache.org/jira/browse/HBASE-5136
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Attachments: 5136.txt
>
>
> In case of log splitting retry, the following code would be executed multiple 
> times:
> {code}
>   public long splitLogDistributed(final List logDirs) throws 
> IOException {
> MonitoredTask status = TaskMonitor.get().createStatus(
>   "Doing distributed log split in " + logDirs);
> {code}
> leading to multiple MonitoredTask instances.
> User may get confused by multiple distributed log splitting entries for the 
> same region server on master UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-01-06 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5136:
--

Status: Patch Available  (was: Open)

> Redundant MonitoredTask instances in case of distributed log splitting retry
> 
>
> Key: HBASE-5136
> URL: https://issues.apache.org/jira/browse/HBASE-5136
> Project: HBase
>  Issue Type: Task
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
> Attachments: 5136.txt
>
>
> In case of log splitting retry, the following code would be executed multiple 
> times:
> {code}
>   public long splitLogDistributed(final List logDirs) throws 
> IOException {
> MonitoredTask status = TaskMonitor.get().createStatus(
>   "Doing distributed log split in " + logDirs);
> {code}
> leading to multiple MonitoredTask instances.
> User may get confused by multiple distributed log splitting entries for the 
> same region server on master UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181693#comment-13181693
 ] 

Zhihong Yu commented on HBASE-5088:
---

final2 ? That's funny :-)
This looks good.
+1

Minor: long line still:
{code}
+Map tableLocations = 
getTableLocations(tableName);
{code}

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-final2.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4357) Region stayed in transition - in closing state

2012-01-06 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181692#comment-13181692
 ] 

Hudson commented on HBASE-4357:
---

Integrated in HBase-TRUNK #2615 (See 
[https://builds.apache.org/job/HBase-TRUNK/2615/])
HBASE-4357  Region stayed in transition - in closing state (Ming Ma)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseMetaHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRootHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestCloseRegionHandler.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java


> Region stayed in transition - in closing state
> --
>
> Key: HBASE-4357
> URL: https://issues.apache.org/jira/browse/HBASE-4357
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Ming Ma
>Assignee: Ming Ma
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4357.txt, HBASE-4357-0.92.patch
>
>
> Got the following during testing, 
> 1. On a given machine, kill "RS process id". Then kill "HMaster process id".
> 2. Start RS first via "bin/hbase-daemon.sh --config ./conf start 
> regionserver.". Start HMaster via "bin/hbase-daemon.sh --config ./conf start 
> master".
> One region of a table stayed in closing state.
> According to zookeeper,
> 794a6ff17a4de0dd0a19b984ba18eea9 
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  state=CLOSING, ts=Wed Sep 07 17:21:44 PDT 2011 (75701s ago), 
> server=sea-esxi-0,6,1315428682281 
> According to .META. table, the region has been assigned to from sea-esxi-0 to 
> sea-esxi-4.
> miweng_500region,H\xB49X\x10bM\xB1,1315338786464.794a6ff17a4de0dd0a19b984ba18eea9.
>  sea-esxi-4:60030  H\xB49X\x10bM\xB1 I7K\xC6\xA7\xEF\x9D\x90 0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181695#comment-13181695
 ] 

Lars Hofhansl commented on HBASE-5088:
--

I thought so too :)
That line went through the formatter (just checked again) (84 chars)
I can break it up anyway, will do at commit.


> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-final2.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl

2012-01-06 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181697#comment-13181697
 ] 

Todd Lipcon commented on HBASE-5141:


Part of me thinks this is a configuration error -- a conservative admin should 
always assume that the memory usage of IPC buffers = numHandlers*maxPayload, 
since you could always have the threads all concurrently handling large calls. 
So memory should be allocated for this. The fact that the memory is kept around 
until the next call makes it more likely you'd hit this, but it's still a 
potential problem regardless.

I think the proper fix for this problem is to keep an atomic counter for the 
amount of memory used by IPC handlers, and gate the read calls off the wire 
based on a memory budget.

> Memory leak in MonitoredRPCHandlerImpl
> --
>
> Key: HBASE-5141
> URL: https://issues.apache.org/jira/browse/HBASE-5141
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: Screen Shot 2012-01-06 at 3.03.09 PM.png
>
>
> I got a pretty reliable way of OOME'ing my region servers. Using a big 
> payload (64MB in my case), a default heap and default number of handlers, 
> it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB 
> reference and once a compaction kicks in it kills everything.
> The issue is that even after the RPC call is done, the packet still lives in 
> MonitoredRPCHandlerImpl.
> Will attach a screen shot of jprofiler's analysis in a moment.
> This is a blocker for 0.92.0, anyone using a high number of handlers and 
> bigish values will kill themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5088:
-

Status: Open  (was: Patch Available)

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-final2.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl

2012-01-06 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181701#comment-13181701
 ] 

Zhihong Yu commented on HBASE-5141:
---

One solution is to release the packet before WritableRpcEngine.call() returns ?
{noformat}
status.setRPCPacket(null);
return retVal;
{noformat}

> Memory leak in MonitoredRPCHandlerImpl
> --
>
> Key: HBASE-5141
> URL: https://issues.apache.org/jira/browse/HBASE-5141
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: Screen Shot 2012-01-06 at 3.03.09 PM.png
>
>
> I got a pretty reliable way of OOME'ing my region servers. Using a big 
> payload (64MB in my case), a default heap and default number of handlers, 
> it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB 
> reference and once a compaction kicks in it kills everything.
> The issue is that even after the RPC call is done, the packet still lives in 
> MonitoredRPCHandlerImpl.
> Will attach a screen shot of jprofiler's analysis in a moment.
> This is a blocker for 0.92.0, anyone using a high number of handlers and 
> bigish values will kill themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl

2012-01-06 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181702#comment-13181702
 ] 

Jean-Daniel Cryans commented on HBASE-5141:
---

bq. I think the proper fix for this problem is to keep an atomic counter for 
the amount of memory used by IPC handlers, and gate the read calls off the wire 
based on a memory budget.

There's more to it, it's not just the handlers that have data but their queues 
too. By default we add 10 for each handler. I believe this is unrelated to this 
jira's issue.

> Memory leak in MonitoredRPCHandlerImpl
> --
>
> Key: HBASE-5141
> URL: https://issues.apache.org/jira/browse/HBASE-5141
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: Screen Shot 2012-01-06 at 3.03.09 PM.png
>
>
> I got a pretty reliable way of OOME'ing my region servers. Using a big 
> payload (64MB in my case), a default heap and default number of handlers, 
> it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB 
> reference and once a compaction kicks in it kills everything.
> The issue is that even after the RPC call is done, the packet still lives in 
> MonitoredRPCHandlerImpl.
> Will attach a screen shot of jprofiler's analysis in a moment.
> This is a blocker for 0.92.0, anyone using a high number of handlers and 
> bigish values will kill themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181704#comment-13181704
 ] 

Lars Hofhansl commented on HBASE-5088:
--

I assume this should go into 0.90 as well.

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-final2.txt, 5088-syncObj.txt, 
> 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, 
> HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl

2012-01-06 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181705#comment-13181705
 ] 

Todd Lipcon commented on HBASE-5141:


Yep, sorry, the memory budgeting would have to be before the data is read into 
the queue in the IPC server.

I agree that nulling out the rpc packet will fix the issue as reported here. 
Just saying that the issue doesn't seem likely to affect most use cases where 
RPCs are of mixed size and where people have already budgeted their heap to fit 
a bunch of calls.

> Memory leak in MonitoredRPCHandlerImpl
> --
>
> Key: HBASE-5141
> URL: https://issues.apache.org/jira/browse/HBASE-5141
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: Screen Shot 2012-01-06 at 3.03.09 PM.png
>
>
> I got a pretty reliable way of OOME'ing my region servers. Using a big 
> payload (64MB in my case), a default heap and default number of handlers, 
> it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB 
> reference and once a compaction kicks in it kills everything.
> The issue is that even after the RPC call is done, the packet still lives in 
> MonitoredRPCHandlerImpl.
> Will attach a screen shot of jprofiler's analysis in a moment.
> This is a blocker for 0.92.0, anyone using a high number of handlers and 
> bigish values will kill themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5141) Memory leak in MonitoredRPCHandlerImpl

2012-01-06 Thread Jean-Daniel Cryans (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5141:
--

Attachment: HBASE-5141.patch

Testing this patch, I like it better than passing null as the monitor takes 
care of itself.

> Memory leak in MonitoredRPCHandlerImpl
> --
>
> Key: HBASE-5141
> URL: https://issues.apache.org/jira/browse/HBASE-5141
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-5141.patch, Screen Shot 2012-01-06 at 3.03.09 
> PM.png
>
>
> I got a pretty reliable way of OOME'ing my region servers. Using a big 
> payload (64MB in my case), a default heap and default number of handlers, 
> it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB 
> reference and once a compaction kicks in it kills everything.
> The issue is that even after the RPC call is done, the packet still lives in 
> MonitoredRPCHandlerImpl.
> Will attach a screen shot of jprofiler's analysis in a moment.
> This is a blocker for 0.92.0, anyone using a high number of handlers and 
> bigish values will kill themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-06 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5088:
-

Attachment: 5088-final3.txt

> A concurrency issue on SoftValueSortedMap
> -
>
> Key: HBASE-5088
> URL: https://issues.apache.org/jira/browse/HBASE-5088
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.4, 0.94.0
>Reporter: Jieshan Bean
>Assignee: Lars Hofhansl
>Priority: Critical
> Fix For: 0.92.0, 0.90.6
>
> Attachments: 5088-final.txt, 5088-final2.txt, 5088-final3.txt, 
> 5088-syncObj.txt, 5088-useMapInterfaces.txt, 5088.generics.txt, 
> HBase-5088-90.patch, HBase-5088-trunk.patch, 
> HBase5088-90-replaceSoftValueSortedMap.patch, 
> HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
> HBase5088Reproduce.java, PerformanceTestResults.png
>
>
> SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
> synchronized. If we use this method to add/delete elements, it's ok.
> But in HConnectionManager#getCachedLocation, it use headMap to get a view 
> from SoftValueSortedMap#internalMap. Once we operate 
> on this view map(like add/delete) in other threads, a concurrency issue may 
> occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >