[jira] [Commented] (HBASE-10570) Allow overrides of Surefire secondPartForkMode and testFailureIgnore

2014-02-19 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13906021#comment-13906021
 ] 

Nicolas Liochon commented on HBASE-10570:
-

+1

 Allow overrides of Surefire secondPartForkMode and testFailureIgnore
 

 Key: HBASE-10570
 URL: https://issues.apache.org/jira/browse/HBASE-10570
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 0.96.2, 0.98.1, 0.99.0

 Attachments: 10570.patch


 It can be useful to override the fork and ignore behavior of our Surefire 
 unit testing machinery. 
 For example, this will put only one test into each generated surefire jar and 
 always fork a child JVM to run it:
 {noformat}
 mvn test ... -Dsurefire.firstPartForkMode=always 
 -Dsurefire.secondPartForkMode=always
 {noformat}
 Useful for isolating test failures and finding zombies.
 Sometimes it is also useful to ignore test failures:
 {noformat}
 mvn test ... -Dsurefire.testFailureIgnore=true
 {noformat}
 This will let test suite execution continue even if a test times out. Maven 
 has CLI options for this sort of thing but they don't always work as 
 expected. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10516) Refactor code where Threads.sleep is called within a while/for loop

2014-02-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903913#comment-13903913
 ] 

Nicolas Liochon commented on HBASE-10516:
-

I'm ok with the new patch. Thanks!
I didn't see the DeleteTableHandler / AssignmentManager / LruBlockCache / 
ZooKeeperWatcher changes yesterday.

DeleteTableHandler
= I think we can just throw an IOE exception here, it will be as if the loop 
timeouted. (see the 
{code}
throw new IOException(Waited hbase.master.wait.on.region ( +
  waitTime + ms) for region to leave region  +
  region.getRegionNameAsString() +  in transitions);
{code}

AssignmentManager
= It's not really managed today: the IE is transformed in a RuntimeException. 
The patch will increase the probability.
I would propose to explicitly throw IE  here from the method here. In any case, 
it's a risky patch, but throwing an exception seems clearer.

LruBlockCache
We don't need to continue to loop here. We can stop looping at the first 
interrupt (and may be we should restore the status?)

ZooKeeperWatcher
We can manage the interrupt as a timeout (so just throwing a Runtime: today we 
throw a NPE (!) on timeout, we can do the same or a little bit cleaner).

Sorry for missing this part yesterday.
I will have a look at the other patshes as well. Basically, I think it's better 
to manage the exception when we do the patch, because anyway, we need to think 
about the impact when we do the default strategy of restoring the status. And 
throwing and exception is often better, and requires less code, so...


 Refactor code where Threads.sleep is called within a while/for loop
 ---

 Key: HBASE-10516
 URL: https://issues.apache.org/jira/browse/HBASE-10516
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10516-trunk_v1.patch, HBASE-10516-trunk_v2.patch


 Threads.sleep implementation:
 {code}
  public static void sleep(long millis) {
 try {
   Thread.sleep(millis);
 } catch (InterruptedException e) {
   e.printStackTrace();
   Thread.currentThread().interrupt();
 }
   }
 {code}
 From above implementation, the current thread's interrupt status is 
 restored/reset when InterruptedException is caught and handled. If this 
 method is called within a while/for loop, if a first InterruptedException is 
 thrown during sleep, it will make the Threads.sleep in next loop immediately 
 throw InterruptedException without expected sleep. This behavior breaks the 
 intention for independent sleep in each loop
 I mentioned above in HBASE-10497 and this jira is created to handle it 
 separately per [~nkeywal]'s suggestion



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Reopened] (HBASE-10539) HRegion.addAndGetGlobalMemstoreSize() is expected to return the new memstore size after added, but actually the previous size before added is returned instead

2014-02-18 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon reopened HBASE-10539:
-


 HRegion.addAndGetGlobalMemstoreSize() is expected to return the new memstore 
 size after added, but actually the previous size before added is returned 
 instead
 --

 Key: HBASE-10539
 URL: https://issues.apache.org/jira/browse/HBASE-10539
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10539-trunk_v1.patch


 HRegion.addAndGetGlobalMemstoreSize(addedSize) is called once some write 
 succeeds and 'addedSize' is the size of the edits newly put to the memstore, 
 the returned value of HRegion.addAndGetGlobalMemstoreSize(addedSize) is then 
 checked against the flush threshold to determine if a flush for the region 
 should be triggered.
 By design the returned value should be the updated memstore size after adding 
 'addedSize', but current implementation uses this.memstoreSize.getAndAdd 
 which returns the previous size before adding, actually 'addAndGet' rather 
 than 'getAndAdd' should be used here.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10539) HRegion.addAndGetGlobalMemstoreSize() is expected to return the new memstore size after added, but actually the previous size before added is returned instead

2014-02-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903920#comment-13903920
 ] 

Nicolas Liochon commented on HBASE-10539:
-

Reopening, per comments above :-)

 HRegion.addAndGetGlobalMemstoreSize() is expected to return the new memstore 
 size after added, but actually the previous size before added is returned 
 instead
 --

 Key: HBASE-10539
 URL: https://issues.apache.org/jira/browse/HBASE-10539
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10539-trunk_v1.patch


 HRegion.addAndGetGlobalMemstoreSize(addedSize) is called once some write 
 succeeds and 'addedSize' is the size of the edits newly put to the memstore, 
 the returned value of HRegion.addAndGetGlobalMemstoreSize(addedSize) is then 
 checked against the flush threshold to determine if a flush for the region 
 should be triggered.
 By design the returned value should be the updated memstore size after adding 
 'addedSize', but current implementation uses this.memstoreSize.getAndAdd 
 which returns the previous size before adding, actually 'addAndGet' rather 
 than 'getAndAdd' should be used here.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10516) Refactor code where Threads.sleep is called within a while/for loop

2014-02-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903994#comment-13903994
 ] 

Nicolas Liochon commented on HBASE-10516:
-

bq. Should we treat it case-by-case, or per-class, or uniformly for all 
occurrences?
There is a general strategy, then the exceptions :-).
I think we agree on the general strategy, let me formulate it to confirm:
 - never never never shallow IE
 - if possible, stop the current task, if not possible, continue, and manage 
the interruption when the non interruptible task has been done
 - try to throw a clear exception (IE or IOE), if not restore the status

catch and rethrow a RuntimeException instead is generally bad. It means that 
you do not accept to be interrupted. So we should not add such kind of code. 
But in such a code
void f(){
 partOne(); //shallows exception
 partTwo(); // throw Runtime on Exception
}
It's acceptable, imho, to change the part one to make it throws the same 
runtime as the part 2.








 Refactor code where Threads.sleep is called within a while/for loop
 ---

 Key: HBASE-10516
 URL: https://issues.apache.org/jira/browse/HBASE-10516
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10516-trunk_v1.patch, HBASE-10516-trunk_v2.patch


 Threads.sleep implementation:
 {code}
  public static void sleep(long millis) {
 try {
   Thread.sleep(millis);
 } catch (InterruptedException e) {
   e.printStackTrace();
   Thread.currentThread().interrupt();
 }
   }
 {code}
 From above implementation, the current thread's interrupt status is 
 restored/reset when InterruptedException is caught and handled. If this 
 method is called within a while/for loop, if a first InterruptedException is 
 thrown during sleep, it will make the Threads.sleep in next loop immediately 
 throw InterruptedException without expected sleep. This behavior breaks the 
 intention for independent sleep in each loop
 I mentioned above in HBASE-10497 and this jira is created to handle it 
 separately per [~nkeywal]'s suggestion



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10516) Refactor code where Threads.sleep is called within a while/for loop

2014-02-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904012#comment-13904012
 ] 

Nicolas Liochon commented on HBASE-10516:
-

ok :-)

 Refactor code where Threads.sleep is called within a while/for loop
 ---

 Key: HBASE-10516
 URL: https://issues.apache.org/jira/browse/HBASE-10516
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10516-trunk_v1.patch, HBASE-10516-trunk_v2.patch


 Threads.sleep implementation:
 {code}
  public static void sleep(long millis) {
 try {
   Thread.sleep(millis);
 } catch (InterruptedException e) {
   e.printStackTrace();
   Thread.currentThread().interrupt();
 }
   }
 {code}
 From above implementation, the current thread's interrupt status is 
 restored/reset when InterruptedException is caught and handled. If this 
 method is called within a while/for loop, if a first InterruptedException is 
 thrown during sleep, it will make the Threads.sleep in next loop immediately 
 throw InterruptedException without expected sleep. This behavior breaks the 
 intention for independent sleep in each loop
 I mentioned above in HBASE-10497 and this jira is created to handle it 
 separately per [~nkeywal]'s suggestion



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10539) HRegion.addAndGetGlobalMemstoreSize() is expected to return the new memstore size after added, but actually the previous size before added is returned instead

2014-02-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904028#comment-13904028
 ] 

Nicolas Liochon commented on HBASE-10539:
-

Yes :-).
As well, could you set the versions (affects  fix). Fix is especially 
important if you want the fix to be committed in non trunk versions.

 HRegion.addAndGetGlobalMemstoreSize() is expected to return the new memstore 
 size after added, but actually the previous size before added is returned 
 instead
 --

 Key: HBASE-10539
 URL: https://issues.apache.org/jira/browse/HBASE-10539
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10539-trunk_v1.patch


 HRegion.addAndGetGlobalMemstoreSize(addedSize) is called once some write 
 succeeds and 'addedSize' is the size of the edits newly put to the memstore, 
 the returned value of HRegion.addAndGetGlobalMemstoreSize(addedSize) is then 
 checked against the flush threshold to determine if a flush for the region 
 should be triggered.
 By design the returned value should be the updated memstore size after adding 
 'addedSize', but current implementation uses this.memstoreSize.getAndAdd 
 which returns the previous size before adding, actually 'addAndGet' rather 
 than 'getAndAdd' should be used here.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-18 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10525:


Status: Open  (was: Patch Available)

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch, 10525.v2.patch, 
 HBaseclient-EventualConsistency.pdf


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-18 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10525:


Attachment: 10525.v2.patch

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch, 10525.v2.patch, 10525.v2.patch, 
 HBaseclient-EventualConsistency.pdf


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-18 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904313#comment-13904313
 ] 

Nicolas Liochon commented on HBASE-10525:
-

v2
comments above taken into account
htace hooks added (is that the right way to do that, [~eclark] ?) 
optimization / race condition fixed in cleanupCalls
simplification in waitForWork

I will push a v3 with the option set to false to get an hadoop qa run with the 
two settings

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch, 10525.v2.patch, 
 HBaseclient-EventualConsistency.pdf


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-18 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10525:


Attachment: (was: 10525.v2.patch)

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch, 10525.v2.patch, 
 HBaseclient-EventualConsistency.pdf


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-18 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10525:


Status: Patch Available  (was: Open)

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch, 10525.v2.patch, 
 HBaseclient-EventualConsistency.pdf


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10548) PerfEval work around wrong runtime dependency version

2014-02-17 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903239#comment-13903239
 ] 

Nicolas Liochon commented on HBASE-10548:
-

We can commit the change, but there is something I don't get, at least for 
trunk (I haven't checked 0.98  0.96): hbase pom seems clean, and does require 
common:math:2.2, whatever the hadoop version. How comes we're finally installed 
with the 2.1 version?

I mean, we don't care for this specific case, but if HBase requires the 2.2 
version, it should not be deployed with an older one. Here the error message is 
clear and it's in the test code, but it could be less clear and in the 
production code...


 PerfEval work around wrong runtime dependency version
 -

 Key: HBASE-10548
 URL: https://issues.apache.org/jira/browse/HBASE-10548
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.96.2, 0.98.1, 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Attachments: HBASE-10548.00.patch


 From my 
 [comment|https://issues.apache.org/jira/browse/HBASE-10511?focusedCommentId=13902238page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13902238]
  on HBASE-10511:
 I have hadoop-1.2.1 installed from tgz, which packages commons-math-2.1. This 
 is *different* from the listed maven dependency, 2.2.
 {noformat}
 $ tar tvf hadoop-1.2.1.tar.gz | grep commons-math
 -rw-rw-r--  0 0  0  832410 Jul 22  2013 
 hadoop-1.2.1/lib/commons-math-2.1.jar
 $ mvn -f pom.xml.hadoop1 dependency:tree | grep commons-math
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile (version managed 
 from 2.1)
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile (version managed 
 from 2.1)
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile (version managed 
 from 2.1)
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile (version managed 
 from 2.1)
 [INFO] +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile
 {noformat}
 This is a problem because the 2.1 version of 
 [DescriptiveStatistics|http://commons.apache.org/proper/commons-math/javadocs/api-2.1/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html]
  doesn't have a double[] constructor. Running the MR job, mappers fail:
 {noformat}
 java.lang.NoSuchMethodError: 
 org.apache.commons.math.stat.descriptive.DescriptiveStatistics.init([D)V
   at 
 org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest.testTakedown(PerformanceEvaluation.java:1163)
   at 
 org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:984)
   at 
 org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:1401)
   at 
 org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:522)
   at 
 org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:474)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10548) PerfEval work around wrong runtime dependency version

2014-02-17 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903292#comment-13903292
 ] 

Nicolas Liochon commented on HBASE-10548:
-

bq. no sure what it means
maven sees  that hadoop asks for 2.1 while hbase asks for 2.2, so it builds 
with 2.2 (the newest version). That's maven doing its job. So we should have 
the 2.2 when we deploy. If we don't have this, it means that we don't use the 
expected versions of our third parties when they are different than the hadoop 
ones.


 PerfEval work around wrong runtime dependency version
 -

 Key: HBASE-10548
 URL: https://issues.apache.org/jira/browse/HBASE-10548
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.96.2, 0.98.1, 0.99.0
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Attachments: HBASE-10548.00.patch


 From my 
 [comment|https://issues.apache.org/jira/browse/HBASE-10511?focusedCommentId=13902238page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13902238]
  on HBASE-10511:
 I have hadoop-1.2.1 installed from tgz, which packages commons-math-2.1. This 
 is *different* from the listed maven dependency, 2.2.
 {noformat}
 $ tar tvf hadoop-1.2.1.tar.gz | grep commons-math
 -rw-rw-r--  0 0  0  832410 Jul 22  2013 
 hadoop-1.2.1/lib/commons-math-2.1.jar
 $ mvn -f pom.xml.hadoop1 dependency:tree | grep commons-math
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile (version managed 
 from 2.1)
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile (version managed 
 from 2.1)
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile (version managed 
 from 2.1)
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile (version managed 
 from 2.1)
 [INFO] +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile
 [INFO] |  +- org.apache.commons:commons-math:jar:2.2:compile
 {noformat}
 This is a problem because the 2.1 version of 
 [DescriptiveStatistics|http://commons.apache.org/proper/commons-math/javadocs/api-2.1/org/apache/commons/math/stat/descriptive/DescriptiveStatistics.html]
  doesn't have a double[] constructor. Running the MR job, mappers fail:
 {noformat}
 java.lang.NoSuchMethodError: 
 org.apache.commons.math.stat.descriptive.DescriptiveStatistics.init([D)V
   at 
 org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest.testTakedown(PerformanceEvaluation.java:1163)
   at 
 org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:984)
   at 
 org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:1401)
   at 
 org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:522)
   at 
 org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:474)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10516) Refactor code where Threads.sleep is called within a while/for loop

2014-02-17 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903302#comment-13903302
 ] 

Nicolas Liochon commented on HBASE-10516:
-

sleepBeforeRetry = sure
createDir = why not thowing an InterrupedIOException?
rename = same
deleteDir = same
HRegionServer#waitOnAllRegionsToClose = I suppose it's the best option

For InterrupedIOException, my *feeling* is that we can already receive these 
exceptions today (as we're going network  disk operations, they are likely to 
check the interrupt status already, or may do it in the future).


 Refactor code where Threads.sleep is called within a while/for loop
 ---

 Key: HBASE-10516
 URL: https://issues.apache.org/jira/browse/HBASE-10516
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10516-trunk_v1.patch


 Threads.sleep implementation:
 {code}
  public static void sleep(long millis) {
 try {
   Thread.sleep(millis);
 } catch (InterruptedException e) {
   e.printStackTrace();
   Thread.currentThread().interrupt();
 }
   }
 {code}
 From above implementation, the current thread's interrupt status is 
 restored/reset when InterruptedException is caught and handled. If this 
 method is called within a while/for loop, if a first InterruptedException is 
 thrown during sleep, it will make the Threads.sleep in next loop immediately 
 throw InterruptedException without expected sleep. This behavior breaks the 
 intention for independent sleep in each loop
 I mentioned above in HBASE-10497 and this jira is created to handle it 
 separately per [~nkeywal]'s suggestion



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10355) Failover RPC's from client using region replicas

2014-02-17 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903414#comment-13903414
 ] 

Nicolas Liochon commented on HBASE-10355:
-

v2:
 - continue when it can't get a region, or if the number of replicas is below 
expectation
 - compiles with jdk 1.6

 Failover RPC's from client using region replicas
 

 Key: HBASE-10355
 URL: https://issues.apache.org/jira/browse/HBASE-10355
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Enis Soztutar
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10355.v1.patch, 10355.v2.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10355) Failover RPC's from client using region replicas

2014-02-17 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10355:


Attachment: 10355.v2.patch

 Failover RPC's from client using region replicas
 

 Key: HBASE-10355
 URL: https://issues.apache.org/jira/browse/HBASE-10355
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Enis Soztutar
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10355.v1.patch, 10355.v2.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10511) Add latency percentiles on PerformanceEvaluation

2014-02-14 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901211#comment-13901211
 ] 

Nicolas Liochon commented on HBASE-10511:
-

Thanks, Enis.

 Add latency percentiles on PerformanceEvaluation
 

 Key: HBASE-10511
 URL: https://issues.apache.org/jira/browse/HBASE-10511
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.96.2, 0.98.1, 0.99.0, hbase-10070

 Attachments: 10511.v1.patch


 The latency is reported as an array or float. It's easier to deal with 
 percentiles :-)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10516) Refactor code where Threads.sleep is called within a while/for loop

2014-02-14 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901213#comment-13901213
 ] 

Nicolas Liochon commented on HBASE-10516:
-

It seems ok to me, but give me some time to review it in details (for today or 
Monday: I'm off this week end).

 Refactor code where Threads.sleep is called within a while/for loop
 ---

 Key: HBASE-10516
 URL: https://issues.apache.org/jira/browse/HBASE-10516
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10516-trunk_v1.patch


 Threads.sleep implementation:
 {code}
  public static void sleep(long millis) {
 try {
   Thread.sleep(millis);
 } catch (InterruptedException e) {
   e.printStackTrace();
   Thread.currentThread().interrupt();
 }
   }
 {code}
 From above implementation, the current thread's interrupt status is 
 restored/reset when InterruptedException is caught and handled. If this 
 method is called within a while/for loop, if a first InterruptedException is 
 thrown during sleep, it will make the Threads.sleep in next loop immediately 
 throw InterruptedException without expected sleep. This behavior breaks the 
 intention for independent sleep in each loop
 I mentioned above in HBASE-10497 and this jira is created to handle it 
 separately per [~nkeywal]'s suggestion



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10355) Failover RPC's from client using region replicas

2014-02-14 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13901221#comment-13901221
 ] 

Nicolas Liochon commented on HBASE-10355:
-

bq. the information that he sees would be partial 
Hum. Yes, it's not well managed in the patch as it is. It's a little bit 
complex, because the contract of the getLocation is to retry, so the code 
expects the locations to be good. If we don't do that, we will have two level 
of retries (one in getLocation, one in the retrying callable). I can either 
manage the case explicitly, either getLocation should return a full  good 
location set.

On the same  range, if we change dynamically (without disable / enable) the 
number of replica, the client we would be surprised.

Lastly, as it is today, we stop on the first 'definitive' error: we don't wait 
for all replica to fails, we stop when one of the replica says 'it does not 
work'. It's a trade off, doing differently could make sense as well.

bq. Maybe if the location is null, the client should hit the meta always?
I can do that (force reload).

About the jdk 1.7, it's the completion service that is not available on 1.6.
It's not a big dependency, but makes the code much simpler and safer.

We actually didn't conclude when we discussed 1.7 in this thread, 6 months ago. 
http://search-hadoop.com/m/X9ivp1d7Xoc2subj=Re+Default+JDK+settings+for+Jenkins+projects

 Failover RPC's from client using region replicas
 

 Key: HBASE-10355
 URL: https://issues.apache.org/jira/browse/HBASE-10355
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Enis Soztutar
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10355.v1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10355) Failover RPC's from client using region replicas

2014-02-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900140#comment-13900140
 ] 

Nicolas Liochon commented on HBASE-10355:
-

Another dependency btw is that I'm using some jdk 1.7 API.

This patch is awaiting reviews :-)

 Failover RPC's from client using region replicas
 

 Key: HBASE-10355
 URL: https://issues.apache.org/jira/browse/HBASE-10355
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Enis Soztutar
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10355.v1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900141#comment-13900141
 ] 

Nicolas Liochon commented on HBASE-10490:
-

bq. redundant given that in setException there is a notify already being done
yes, exactly.

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10511) Add latency percentiles on PerformanceEvaluation

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10511:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Add latency percentiles on PerformanceEvaluation
 

 Key: HBASE-10511
 URL: https://issues.apache.org/jira/browse/HBASE-10511
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10511.v1.patch


 The latency is reported as an array or float. It's easier to deal with 
 percentiles :-)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10511) Add latency percentiles on PerformanceEvaluation

2014-02-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900146#comment-13900146
 ] 

Nicolas Liochon commented on HBASE-10511:
-

Committed as it is. If someone wants the raw data, we can use JMk/Nic idea of 
logging as trace/debug (but may be this someone will say that he doesn't want 
to use trace/debug because it changes the latency, this kind of of things. 
That's why I prefer to wait here).

Thanks for the review.

[~andrew.purt...@gmail.com] What's the process to put it in the .98 while a RC 
is in progress. Do I have to wait for the end of the RC?

 Add latency percentiles on PerformanceEvaluation
 

 Key: HBASE-10511
 URL: https://issues.apache.org/jira/browse/HBASE-10511
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10511.v1.patch


 The latency is reported as an array or float. It's easier to deal with 
 percentiles :-)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Attachment: 10490.v5.patch

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch, 
 10490.v5.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Status: Open  (was: Patch Available)

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch, 
 10490.v5.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900161#comment-13900161
 ] 

Nicolas Liochon commented on HBASE-10490:
-

v5 is what I will commit.

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch, 
 10490.v5.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Status: Patch Available  (was: Open)

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch, 
 10490.v5.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Attachment: 10490.v6.patch

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch, 
 10490.v5.patch, 10490.v6.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed, thanks for the reviews, all.

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch, 
 10490.v5.patch, 10490.v6.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900361#comment-13900361
 ] 

Nicolas Liochon commented on HBASE-10490:
-

The remaining issues are:
 - the broken rpc timeout
 - the sometimes strange synchronization

But it's already a nice progress.

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch, 
 10490.v5.patch, 10490.v6.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-13 Thread Nicolas Liochon (JIRA)
Nicolas Liochon created HBASE-10525:
---

 Summary: Allow the client to use a different thread for writing to 
ease interrupt
 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0


This is an issue in the HBASE-10070 context, but as well more generally if you 
want to interrupt an operation with a limited cost. 

I will attach a doc with a more detailed explanation.

This adds a thread per region server; so it's otional. The first patch 
activates it by default to see how it behaves on a full hadoop-qa run. The 
target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10525:


Status: Patch Available  (was: Open)

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10185:


Assignee: (was: Nicolas Liochon)

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
 Fix For: 0.99.0

 Attachments: 10185.v1.patch, 10185.v2.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10525:


Attachment: 10525.v1.patch

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900418#comment-13900418
 ] 

Nicolas Liochon commented on HBASE-10185:
-

The patch v2 has been incorporated in HBASE-10490 (trunk). We can use this jira 
for the 0.94 version.

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
 Fix For: 0.99.0

 Attachments: 10185.v1.patch, 10185.v2.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10525:


Attachment: 10525.v2.patch

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch, 10525.v2.patch


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900449#comment-13900449
 ] 

Nicolas Liochon commented on HBASE-10525:
-

TestIPC uses a rpcTimeout of zero. Patch updated.

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch, 10525.v2.patch


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10525:


Status: Patch Available  (was: Open)

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch, 10525.v2.patch


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10525:


Status: Open  (was: Patch Available)

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch, 10525.v2.patch


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10524) Correct wrong handling and add proper handling for swallowed InterruptedException thrown by Thread.sleep in regionserver

2014-02-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900465#comment-13900465
 ] 

Nicolas Liochon commented on HBASE-10524:
-

Yeah, for HRegionServer I think you're right.

 Correct wrong handling and add proper handling for swallowed 
 InterruptedException thrown by Thread.sleep in regionserver
 

 Key: HBASE-10524
 URL: https://issues.apache.org/jira/browse/HBASE-10524
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10524-trunk_v1.patch


 A sub-task of HBASE-10497
 # correct wrong handling of InterruptedException where 
 Thread.currentThread.interrupt() is called within while loops
 # add proper handling for swallowed InterruptedException



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10524) Correct wrong handling and add proper handling for swallowed InterruptedException thrown by Thread.sleep in regionserver

2014-02-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900476#comment-13900476
 ] 

Nicolas Liochon commented on HBASE-10524:
-

Agreed as well for SplitLogWorker. I would propose not to log the exception 
however. The existing code was doing it, but it seems wrong. We should log when 
we receive the interruption, not when we restore it.
(note: you should be able to 'submit' a patch, this will trigger a hadoop-qa 
run).

 Correct wrong handling and add proper handling for swallowed 
 InterruptedException thrown by Thread.sleep in regionserver
 

 Key: HBASE-10524
 URL: https://issues.apache.org/jira/browse/HBASE-10524
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
 Attachments: HBASE-10524-trunk_v1.patch


 A sub-task of HBASE-10497
 # correct wrong handling of InterruptedException where 
 Thread.currentThread.interrupt() is called within while loops
 # add proper handling for swallowed InterruptedException



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-13 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900495#comment-13900495
 ] 

Nicolas Liochon commented on HBASE-10525:
-

bq. I dont follow the above exactly?
It's basically a Java issue. Interrupting a process that is doing some i/o is 
complex, and difficult to do in the same way on all platforms. So the JVM guys 
went for the solution that was implementable on all platforms: close the socket 
when there is an interrupt.

bq. What will 'name' be?
The same name as the reader, postfixed with 'writer').

bq. yet another thread in client seems crazy
Yeah. But I think that we're not far from being able to put a single pool for 
all the region servers, instead of 1 thread per region server as today (and 
with this new option on). There is this nasty rpcTimeout, and then we're clean 
enough to do the change I think. It's something I would like to do with a 
direct buffer write. Ir's more or less mandatory to scale on large sized 
cluster imho.

bq. ThreadCallSender should be CallRunner or Call Executor
I don't know. It does not execute much, it just writes on the socket.  It uses 
the output stream of the connection, so it has to be an instance if I want to 
limit the impact on the existing code.

bq. + callsToWrite.remove(cts); Do you have to cancel the future itself too?
setting call.done allows the reading thread to skip this part
removing it from the callsToWrite list allows us to not send the call to the 
remote server at all.
So we do both.
Now if the call was sent to the server, it's not stopped (at least with this 
patch :-), just that we will skip the answer. Canceling on the server is 
interesting, but on the other hand it's another rpc call.

bq. Imlement Closeable since it has a close? Not necessary at all but... 
agreed. Will do.

bq. Can you write up a big class comment on the mechanism you are instituting 
here so it is clear what is going on both for reviewers and for the folks who 
come along afterward trying to make sense of it all
Sure. I'm polishing a doc as well that I will put in this jira.


 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch, 10525.v2.patch


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10525) Allow the client to use a different thread for writing to ease interrupt

2014-02-13 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10525:


Attachment: HBaseclient-EventualConsistency.pdf

 Allow the client to use a different thread for writing to ease interrupt
 

 Key: HBASE-10525
 URL: https://issues.apache.org/jira/browse/HBASE-10525
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10525.v1.patch, 10525.v2.patch, 
 HBaseclient-EventualConsistency.pdf


 This is an issue in the HBASE-10070 context, but as well more generally if 
 you want to interrupt an operation with a limited cost. 
 I will attach a doc with a more detailed explanation.
 This adds a thread per region server; so it's otional. The first patch 
 activates it by default to see how it behaves on a full hadoop-qa run. The 
 target is to be unset by default.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899132#comment-13899132
 ] 

Nicolas Liochon commented on HBASE-10490:
-

As well, a rpcTimeout set to 0 might still work, or at least still work as it 
used to. The code around timeout is still difficult. I kept the ThreadLocal, 
but this code scares me, and clearly does not work for all cases. It's an issue 
for me, because when something goes wrong here I'm never sure it does not come 
from a race condition. As well, for example, we can set very low timeouts.  So 
the cleanup on timeout is not in this jira, but will have to be done a day for 
sure.

Anyway, here a a v2 with the typo fixed and the removal of the closeReason.

Thanks for the feedback



 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Status: Open  (was: Patch Available)

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Attachment: 10490.v2.patch

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899138#comment-13899138
 ] 

Nicolas Liochon commented on HBASE-10490:
-

btw, The v2 removes ~140 LoC in RpcClient (~10%)

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10506) Fail-fast if client connection is lost before the real call be executed in RPC layer

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899145#comment-13899145
 ] 

Nicolas Liochon commented on HBASE-10506:
-

I think it would be better to log at the debug level, with a nice if 
LOG.isDebugEnabled, and to say something like client is disconnected, skipping 
 + call.


 Fail-fast if client connection is lost before the real call be executed in 
 RPC layer
 

 Key: HBASE-10506
 URL: https://issues.apache.org/jira/browse/HBASE-10506
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.3
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17

 Attachments: HBASE-10506-0.94.txt, HBASE-10506-trunk.txt


 In current HBase rpc impletement, there is no any connection double-checking 
 just before the call be invoked, considing there's a gc or other OS 
 scheduling or the call queue is full enough(e.g. the server side is slow/hang 
 due to some issues), and if the client side has a small rpc timeout value, it 
 could be possible when this request be taken from call queue, the client 
 connection is lost in that moment. we'd better has some fail-fast code before 
 the reall call be invoked, it just waste the server side resource.
 Here is a strace trace from our production env:
 2014-02-11,18:16:19,525 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 Responder, call get([B@3eae6c77, 
 {timeRange:[0,9223372036854775807],totalColumns:1,cacheBlocks:true,families:{X:[T]},maxVersions:1,row:0741031-m8997060}),
  rpc version=1, client version=29, methodsFingerPrint=-241105381 from 
 10.101.10.181:43252: output error
 2014-02-11,18:16:19,526 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 151 on 12600 caught a ClosedChannelException, this means that the 
 server was processing a request but the client went away. The error message 
 was: null
 2014-02-11,18:16:19,797 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer:
 org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call 
 get([B@3f10ffd2, 
 {timeRange:[0,9223372036854775807],totalColumns:1,cacheBlocks:true,families:{X:[T]},maxVersions:1,row:4245978-m7281526}),
  rpc version=1, client version=29, methodsFingerPrint=-241105381 from 
 10.101.10.181:43259 after 0 ms, since caller disconnected
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:450)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3633)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3590)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3615)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4414)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4387)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2075)
 at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:460)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1457)
 2014-02-11,18:16:19,802 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 Responder, call get([B@3f10ffd2, 
 {timeRange:[0,9223372036854775807],totalColumns:1,cacheBlocks:true,families:{X:[T]},maxVersions:1,row:4245978-m7281526}),
  rpc version=1, client version=29, methodsFingerPrint=-241105381 from 
 10.101.10.181:43259: output error
 2014-02-11,18:16:19,802 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 handler 46 on 12600 caught a ClosedChannelException, this means that the 
 server was processing a request but the client went away. The error message 
 was: null
 With this fix, we can reduce this hit probability at least:) the upstream 
 hadoop has this checking already, see: 
 https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L2034-L2036



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Status: Patch Available  (was: Open)

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899374#comment-13899374
 ] 

Nicolas Liochon commented on HBASE-10497:
-

bq. The helper method Threads.sleep is implemented as below:
Wow. We should just remove Threads.sleep: it's a sure way to shoot ourselves in 
the foot... (it doesn't have to be done in this patch)

 Add standard handling for swallowed InterruptedException thrown by 
 Thread.sleep under HBase-Client/HBase-Server folders systematically
 --

 Key: HBASE-10497
 URL: https://issues.apache.org/jira/browse/HBASE-10497
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Minor
 Attachments: HBASE-10497-trunk_v1.patch, HBASE-10497-trunk_v2.patch


 There are many places where InterruptedException thrown by Thread.sleep are 
 swallowed silently (which are neither declared in the caller method's throws 
 clause nor rethrown immediately) under HBase-Client/HBase-Server folders.
 It'd be better to add standard 'log and call currentThread.interrupt' for 
 such cases.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically

2014-02-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10497:


Status: Patch Available  (was: Open)

 Add standard handling for swallowed InterruptedException thrown by 
 Thread.sleep under HBase-Client/HBase-Server folders systematically
 --

 Key: HBASE-10497
 URL: https://issues.apache.org/jira/browse/HBASE-10497
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Minor
 Attachments: HBASE-10497-trunk_v1.patch, HBASE-10497-trunk_v2.patch


 There are many places where InterruptedException thrown by Thread.sleep are 
 swallowed silently (which are neither declared in the caller method's throws 
 clause nor rethrown immediately) under HBase-Client/HBase-Server folders.
 It'd be better to add standard 'log and call currentThread.interrupt' for 
 such cases.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899422#comment-13899422
 ] 

Nicolas Liochon commented on HBASE-10497:
-

I've submitted the patch, to see what hadoop-qa says.

The patch is big. It touches some scaring part. I would propose to split it:
 - some parts are in the mini cluster: even if we do something wrong here it's 
not a big deal
 - some parts seems good: RpcClient (I can +1 this) 
 - some parts are more complex: MemStoreFlusher: I'm not sure. It's a runnable, 
restoring the status seems meaningless; SplitLogWorker seems wrong, it's a 
runnable ;Leases. I don't know. Could be wrong again; HRegionServer
 - some parts, imho, should not be managed by restoring the status but by 
throwing an exception or even explaining why we do nothing. That's the case 
here.

{code}
  public TableListModel getTableList() throws IOException {

StringBuilder path = new StringBuilder();
path.append('/');
if (accessToken != null) {
  path.append(accessToken);
  path.append('/');
}

int code = 0;
for (int i = 0; i  maxRetries; i++) {
  // Response response = client.get(path.toString(),
  // Constants.MIMETYPE_XML);
  Response response = client.get(path.toString(),
  Constants.MIMETYPE_PROTOBUF);
  code = response.getCode();
  switch (code) {
  case 200:
TableListModel t = new TableListModel();
return (TableListModel) t.getObjectFromMessage(response.getBody());
  case 404:
throw new IOException(Table list not found);
  case 509:
try {
  Thread.sleep(sleepTime);
} catch (InterruptedException e) {
  thrown new InterruptedIOException();
}
break;
  default:
throw new IOException(get request to  + path.toString()
+  request returned  + code);
  }
}
throw new IOException(get request to  + path.toString()
+  request timed out);
  }
{code}

So at the end, we could have a few sub jiras:
 - the ones with a high level of confidence
 - the less confident ones bug not very risky
 - the ones when we don't restore the status but we properly manage the 
exception
 - a set of small ones per area for the others. They will be easier to review.

 Add standard handling for swallowed InterruptedException thrown by 
 Thread.sleep under HBase-Client/HBase-Server folders systematically
 --

 Key: HBASE-10497
 URL: https://issues.apache.org/jira/browse/HBASE-10497
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Minor
 Attachments: HBASE-10497-trunk_v1.patch, HBASE-10497-trunk_v2.patch


 There are many places where InterruptedException thrown by Thread.sleep are 
 swallowed silently (which are neither declared in the caller method's throws 
 clause nor rethrown immediately) under HBase-Client/HBase-Server folders.
 It'd be better to add standard 'log and call currentThread.interrupt' for 
 such cases.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899463#comment-13899463
 ] 

Nicolas Liochon commented on HBASE-10490:
-

I've have 3 +1, but it could more on the idea of removing the ping. If the 
hadoop-qa is fine, I plan to commit this tomorrow, please tell me if you want 
to do a deeper review and need more time.

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899463#comment-13899463
 ] 

Nicolas Liochon edited comment on HBASE-10490 at 2/12/14 7:24 PM:
--

I have 3 +1, but it could more on the idea of removing the ping. If the 
hadoop-qa is fine, I plan to commit this tomorrow, please tell me if you want 
to do a deeper review and need more time.


was (Author: nkeywal):
I've have 3 +1, but it could more on the idea of removing the ping. If the 
hadoop-qa is fine, I plan to commit this tomorrow, please tell me if you want 
to do a deeper review and need more time.

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10354) Add an API for defining consistency per request

2014-02-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10354:


Attachment: 10354.add.patch

 Add an API for defining consistency per request
 ---

 Key: HBASE-10354
 URL: https://issues.apache.org/jira/browse/HBASE-10354
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.99.0, hbase-10070

 Attachments: 10354.add.patch, hbase-10354_v1.patch, 
 hbase-10354_v2.patch


 We should add an API to be able to define the expected consistency level per 
 operation. This API should also allow to check whether the results coming 
 from a query (get or scan) is stale or not. The defaults should reflect the 
 current semantics. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10354) Add an API for defining consistency per request

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899470#comment-13899470
 ] 

Nicolas Liochon commented on HBASE-10354:
-

An addendum for the serialization of Result.

 Add an API for defining consistency per request
 ---

 Key: HBASE-10354
 URL: https://issues.apache.org/jira/browse/HBASE-10354
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.99.0, hbase-10070

 Attachments: 10354.add.patch, hbase-10354_v1.patch, 
 hbase-10354_v2.patch


 We should add an API to be able to define the expected consistency level per 
 operation. This API should also allow to check whether the results coming 
 from a query (get or scan) is stale or not. The defaults should reflect the 
 current semantics. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10511) Add latency percentiles on PerformanceEvaluation

2014-02-12 Thread Nicolas Liochon (JIRA)
Nicolas Liochon created HBASE-10511:
---

 Summary: Add latency percentiles on PerformanceEvaluation
 Key: HBASE-10511
 URL: https://issues.apache.org/jira/browse/HBASE-10511
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0


The latency is reported as an array or float. It's easier to deal with 
percentiles :-)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10511) Add latency percentiles on PerformanceEvaluation

2014-02-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10511:


Status: Patch Available  (was: Open)

 Add latency percentiles on PerformanceEvaluation
 

 Key: HBASE-10511
 URL: https://issues.apache.org/jira/browse/HBASE-10511
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10511.v1.patch


 The latency is reported as an array or float. It's easier to deal with 
 percentiles :-)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10511) Add latency percentiles on PerformanceEvaluation

2014-02-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10511:


Attachment: 10511.v1.patch

 Add latency percentiles on PerformanceEvaluation
 

 Key: HBASE-10511
 URL: https://issues.apache.org/jira/browse/HBASE-10511
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10511.v1.patch


 The latency is reported as an array or float. It's easier to deal with 
 percentiles :-)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10511) Add latency percentiles on PerformanceEvaluation

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899505#comment-13899505
 ] 

Nicolas Liochon commented on HBASE-10511:
-

bq. I will have kept the LOG.info(randomRead latency log (ms):  + 
Arrays.toString(times));
If you have a lot of measure, it makes the output terrible. We could add 
another option, but it seems overkill for something which is not 'core'. 
If someone wants to do it I'm fine, but for myself I would love to keep it as 
simple as possible...

 Add latency percentiles on PerformanceEvaluation
 

 Key: HBASE-10511
 URL: https://issues.apache.org/jira/browse/HBASE-10511
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10511.v1.patch


 The latency is reported as an array or float. It's easier to deal with 
 percentiles :-)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10511) Add latency percentiles on PerformanceEvaluation

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899681#comment-13899681
 ] 

Nicolas Liochon commented on HBASE-10511:
-

Any issue if I commit it like it is?

 Add latency percentiles on PerformanceEvaluation
 

 Key: HBASE-10511
 URL: https://issues.apache.org/jira/browse/HBASE-10511
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10511.v1.patch


 The latency is reported as an array or float. It's easier to deal with 
 percentiles :-)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Status: Open  (was: Patch Available)

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Attachment: 10490.v3.patch

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Status: Patch Available  (was: Open)

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899689#comment-13899689
 ] 

Nicolas Liochon commented on HBASE-10490:
-

The issue raised by hadoop-qa was real. the lock order is Connection instance 
- DataOutputStream. Doing it in the other side leads to dead locks.. Fixed. I 
put the calls counter in an atomic counter as well.

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899826#comment-13899826
 ] 

Nicolas Liochon commented on HBASE-10490:
-

The warnings are false positives that can be fixed by with the ad hoc findbugs 
setting. I will do that on commit/

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10355) Failover RPC's from client using region replicas

2014-02-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10355:


Attachment: 10355.v1.patch

 Failover RPC's from client using region replicas
 

 Key: HBASE-10355
 URL: https://issues.apache.org/jira/browse/HBASE-10355
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Enis Soztutar
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10355.v1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10355) Failover RPC's from client using region replicas

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899917#comment-13899917
 ] 

Nicolas Liochon commented on HBASE-10355:
-

the patch relies on:
 - basically everything else in the 10070 scope
 - interruption like 10472
 - cleanup like 10490
 - and another patch around interruption in RpcClient that I will submit later 
on.



 Failover RPC's from client using region replicas
 

 Key: HBASE-10355
 URL: https://issues.apache.org/jira/browse/HBASE-10355
 Project: HBase
  Issue Type: Sub-task
  Components: Client
Reporter: Enis Soztutar
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10355.v1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-12 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899966#comment-13899966
 ] 

Nicolas Liochon commented on HBASE-10490:
-

It's a typo. It's interesting that it works actually. Thanks for catching
this. I will fix this on commit.



 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch, 10490.v2.patch, 10490.v3.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-11 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897660#comment-13897660
 ] 

Nicolas Liochon commented on HBASE-10490:
-

Nice catch. The max idle time is actually used on the server as well. But it 
does not really work, because the default are different:
server: maxIdleTime = 2*conf.getInt(ipc.client.connection.maxidletime, 1000);
client: maxIdleTime = conf.getInt(hbase.ipc.client.connection.maxidletime, 
1); //10s

So it means that the server disconnects any client that have not spoken for 2 
seconds; while the client pings every 10 seconds.
not as well that one is prefixed by 'hbase.' while the other is not.  In 2008 
they were sharing the same name then it diverged.

I suppose the code on the server doesn't do much because of this:
{code}
  protected int thresholdIdleConnections; // the number of idle
  // connections after which we
  // will start cleaning up idle
  // connections
{code}

thresholdIdleConnections is defaulted to 4000. Likely it never happens, and if 
it was happening it would not work because of the difference in default values.

I suppose the best way of doing this is: order the connection not used for at 
least x seconds, kills some of the oldest.

But, we can say as well that if we're satisfied with the way the server behaves 
today, we can remove the ping on the client without changing anything in the 
server: the behavior won't change.



 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10497) Add standard handling for swallowed InterruptedException thrown by Thread.sleep under HBase-Client/HBase-Server folders systematically

2014-02-11 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13897708#comment-13897708
 ] 

Nicolas Liochon commented on HBASE-10497:
-

I'm not sure of this one.
{code}
+++ 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSyncUp.java
  (working copy)
@@ -111,6 +111,7 @@
   }
 } catch (InterruptedException e) {
   System.err.println(didn't wait long enough: + e);
+  Thread.currentThread().interrupt();
   return (-1);
 }
{code}
Because we took care of the interruption already (may be wrongly however), so 
the thread is not interrupted any more: the -1 means: we stop.
It's the same for the one with the split worker likely.

This one is likely wrong:
{code}
@@ -185,6 +185,8 @@
   try {
 Thread.sleep(100);
   } catch (InterruptedException ignored) {
+LOG.warn(Interrupted while sleeping);
+Thread.currentThread().interrupt();
   }
   if (System.currentTimeMillis()  startTime + 3) {
 throw new RuntimeException(Master not active after 30 seconds);
{code}
As it's inside a loop, we're likely to loop again, then the sleep will be 
interrupted immediately as we restored the interruption status, then we will 
log again = we will flood the logs.

I haven't checked the whole patch, but inside a loop you can't simply restore 
the status, you need to take a decision (stop the loop) or store the 
interruption to restore the status later. When we can, it's better to take care 
of the interruption explicitly by stopping our process and/or rethrowing an 
exception to the caller. In the case above, may be be we should throw a runtime 
exception, as in Master not active after 30 seconds?

See as well ExceptionUtils (it's new).

 Add standard handling for swallowed InterruptedException thrown by 
 Thread.sleep under HBase-Client/HBase-Server folders systematically
 --

 Key: HBASE-10497
 URL: https://issues.apache.org/jira/browse/HBASE-10497
 Project: HBase
  Issue Type: Improvement
  Components: Client, regionserver
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Minor
 Attachments: HBASE-10497-trunk_v1.patch


 There are many places where InterruptedException thrown by Thread.sleep are 
 swallowed silently (which are neither declared in the caller method's throws 
 clause nor rethrown immediately) under HBase-Client/HBase-Server folders.
 It'd be better to add standard 'log and call currentThread.interrupt' for 
 such cases.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-10 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Attachment: 10490.v1.patch

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10490) Simplify RpcClient code

2014-02-10 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10490:


Status: Patch Available  (was: Open)

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10490) Simplify RpcClient code

2014-02-10 Thread Nicolas Liochon (JIRA)
Nicolas Liochon created HBASE-10490:
---

 Summary: Simplify RpcClient code
 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0


The code is complex. Here is a set of proposed changes, for trunk:
1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. I 
expect that we always have a rpcTimeout. So we can remove the code.
2) remove the sendPing: instead, just close the connection if it's not used for 
a while, instead of trying to ping the server.
3) remove maxIddle time: to avoid the confusion if someone has overwritten the 
conf.
4) remove shouldCloseConnection: it was more or less synchronized with 
closeException. Having a single variable instead of two avoids the synchro
5) remove lastActivity: instead of trying to have an exact timeout, just kill 
the connection after some time. lastActivity could be set to wrong values if 
the server was slow to answer.
6) hopefully, a better management of the exception; we don't use the close 
exception of someone else as an input for another one.  Same goes for 
interruption.

I may have something wrong in the code. I will review it myself again. Feedback 
welcome, especially on the ping removal: I hope I got all the use cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10490) Simplify RpcClient code

2014-02-10 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896802#comment-13896802
 ] 

Nicolas Liochon commented on HBASE-10490:
-

bq. Spelling minIddleTimeBeforeClose
Sure.

bq. i.e. keeping around the exception
We're clean now: we now use it only in logs and as initCause in the exceptions 
we throw. So we can as well fully remove it (basically, between the boolean and 
the exception, I kept the exception, but keeping only the boolean should be 
fine as well). 

 Simplify RpcClient code
 ---

 Key: HBASE-10490
 URL: https://issues.apache.org/jira/browse/HBASE-10490
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10490.v1.patch


 The code is complex. Here is a set of proposed changes, for trunk:
 1) remove PingInputStream. if rpcTimeout  0 it just rethrows the exception. 
 I expect that we always have a rpcTimeout. So we can remove the code.
 2) remove the sendPing: instead, just close the connection if it's not used 
 for a while, instead of trying to ping the server.
 3) remove maxIddle time: to avoid the confusion if someone has overwritten 
 the conf.
 4) remove shouldCloseConnection: it was more or less synchronized with 
 closeException. Having a single variable instead of two avoids the synchro
 5) remove lastActivity: instead of trying to have an exact timeout, just kill 
 the connection after some time. lastActivity could be set to wrong values if 
 the server was slow to answer.
 6) hopefully, a better management of the exception; we don't use the close 
 exception of someone else as an input for another one.  Same goes for 
 interruption.
 I may have something wrong in the code. I will review it myself again. 
 Feedback welcome, especially on the ping removal: I hope I got all the use 
 cases. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10472) Manage the interruption in ZKUtil#getData

2014-02-07 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894331#comment-13894331
 ] 

Nicolas Liochon commented on HBASE-10472:
-

bq. I guess we need this for checking the meta location from zk right? 
Yes. 
bq. Do we need that to be interruptible as well?
If the API is synchronous, it needs to be interruptible...

bq. I guess we might need it if we ever want to go meta replicas. 
Actually, we need it already, for example because a split task can be 
interrupted. Some HBase code relies on being interruptible already. With 
replicas, or without replica, you may want to rely on this to free the handlers 
(for example, if we get stuck on a region, we may use all the handlers of the 
region server today: if the client cancels the query interrupting the call does 
help).

bq. We are not setting the interrupted flag for the thread in 
ReplicationPeersZKImpl, ZKLeaderManager
If we throw an exception, it means that we took care of the interruption, so we 
don't need to reset the interrupt flag (and should not). If we abort the 
server, we can consider that we took care of the interruption. Keeping the 
thread interrupted could mean interrupting the abort, which may not be what was 
expected.

bq.  Can we get away with some of the changes, if we keep the original 
ZKUtil.getData() and have ZKUtil.getDataInterruptible() ? 
Not really, imho: we need the code to be clean when there are interruptions. 
ZooKeeper API is clean. Our wrapping is not clean today.

 Manage the interruption in ZKUtil#getData
 -

 Key: HBASE-10472
 URL: https://issues.apache.org/jira/browse/HBASE-10472
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.1, 0.99.0

 Attachments: 10472.v1.patch, 10472.v2.patch


 Many, but not all, methods in ZKUtil partly hides the interruption: they 
 return null or something like that. Many times, this will result in a NPE, or 
 something undefined. 
 This jira is limited to the getData to keep things small enough (it's used in 
 hbase-client code).
 The code is supposed to behave at least 'as well as before', or better 
 (hopefully).
 It could be included in a later .98 release (98.1) and in .99



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-07 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10185:


Status: Open  (was: Patch Available)

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10185.v1.patch, 10185.v2.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-07 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10185:


Status: Patch Available  (was: Open)

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10185.v1.patch, 10185.v2.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-07 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10185:


Attachment: 10185.v2.patch

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10185.v1.patch, 10185.v2.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-07 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894391#comment-13894391
 ] 

Nicolas Liochon commented on HBASE-10185:
-

v2 cleans up the way we manage the 'calls' list. synchronization should be as 
today everywhere.

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10185.v1.patch, 10185.v2.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10472) Manage the interruption in ZKUtil#getData

2014-02-07 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895011#comment-13895011
 ] 

Nicolas Liochon commented on HBASE-10472:
-

bq. we can make ZkUtil methods interruptible one by one in the long term. 
Yep.

Thanks for the review, I've committed to the .99.

 Manage the interruption in ZKUtil#getData
 -

 Key: HBASE-10472
 URL: https://issues.apache.org/jira/browse/HBASE-10472
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.1, 0.99.0

 Attachments: 10472.v1.patch, 10472.v2.patch


 Many, but not all, methods in ZKUtil partly hides the interruption: they 
 return null or something like that. Many times, this will result in a NPE, or 
 something undefined. 
 This jira is limited to the getData to keep things small enough (it's used in 
 hbase-client code).
 The code is supposed to behave at least 'as well as before', or better 
 (hopefully).
 It could be included in a later .98 release (98.1) and in .99



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10472) Manage the interruption in ZKUtil#getData

2014-02-07 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10472:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Manage the interruption in ZKUtil#getData
 -

 Key: HBASE-10472
 URL: https://issues.apache.org/jira/browse/HBASE-10472
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.1, 0.99.0

 Attachments: 10472.v1.patch, 10472.v2.patch


 Many, but not all, methods in ZKUtil partly hides the interruption: they 
 return null or something like that. Many times, this will result in a NPE, or 
 something undefined. 
 This jira is limited to the getData to keep things small enough (it's used in 
 hbase-client code).
 The code is supposed to behave at least 'as well as before', or better 
 (hopefully).
 It could be included in a later .98 release (98.1) and in .99



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10472) Manage the interruption in ZKUtil#getData

2014-02-07 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10472:


Fix Version/s: (was: 0.98.1)

 Manage the interruption in ZKUtil#getData
 -

 Key: HBASE-10472
 URL: https://issues.apache.org/jira/browse/HBASE-10472
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10472.v1.patch, 10472.v2.patch


 Many, but not all, methods in ZKUtil partly hides the interruption: they 
 return null or something like that. Many times, this will result in a NPE, or 
 something undefined. 
 This jira is limited to the getData to keep things small enough (it's used in 
 hbase-client code).
 The code is supposed to behave at least 'as well as before', or better 
 (hopefully).
 It could be included in a later .98 release (98.1) and in .99



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-07 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895140#comment-13895140
 ] 

Nicolas Liochon commented on HBASE-10185:
-

There is actually an issue with the v2 (unrelated to the precommit issue): in 
waitForWork, we have some special conditions on the calls.empty(), and we can 
now trigger them. I will fix this.

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10185.v1.patch, 10185.v2.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-06 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893195#comment-13893195
 ] 

Nicolas Liochon commented on HBASE-10185:
-

Yes, if it's an issue in trunk, we should solve it at least there. 
In addCall, we reusing the exception accross calls, so other calls may receive 
'DoNotRetry'. We can't take a decision based on the exception that caused the 
connection to be closed, as it's shared between the calls.
Connection closed should come from connection issue, i.e. we don't expect a  
DoNotRetryIOException, as a connection error will be retried all the time.
May be the issue is that an error (or some errors?) in Writable.write should 
not close the connection... Or that Writable.write should just not send a 
DoNotRetryIOException: it's an application level exception, difficult to manage 
in the network connectivity layer... I suppose that the 0.96+ cannot have this, 
as the serialization is done with protobuf.

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12
Reporter: Samarth

 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10472) Manage the interruption in ZKUtil#getData

2014-02-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10472:


Status: Open  (was: Patch Available)

 Manage the interruption in ZKUtil#getData
 -

 Key: HBASE-10472
 URL: https://issues.apache.org/jira/browse/HBASE-10472
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.1, 0.99.0

 Attachments: 10472.v1.patch, 10472.v2.patch


 Many, but not all, methods in ZKUtil partly hides the interruption: they 
 return null or something like that. Many times, this will result in a NPE, or 
 something undefined. 
 This jira is limited to the getData to keep things small enough (it's used in 
 hbase-client code).
 The code is supposed to behave at least 'as well as before', or better 
 (hopefully).
 It could be included in a later .98 release (98.1) and in .99



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10472) Manage the interruption in ZKUtil#getData

2014-02-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10472:


Attachment: 10472.v2.patch

 Manage the interruption in ZKUtil#getData
 -

 Key: HBASE-10472
 URL: https://issues.apache.org/jira/browse/HBASE-10472
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.1, 0.99.0

 Attachments: 10472.v1.patch, 10472.v2.patch


 Many, but not all, methods in ZKUtil partly hides the interruption: they 
 return null or something like that. Many times, this will result in a NPE, or 
 something undefined. 
 This jira is limited to the getData to keep things small enough (it's used in 
 hbase-client code).
 The code is supposed to behave at least 'as well as before', or better 
 (hopefully).
 It could be included in a later .98 release (98.1) and in .99



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10472) Manage the interruption in ZKUtil#getData

2014-02-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10472:


Status: Patch Available  (was: Open)

v2 fixes the findbugs warnings.


 Manage the interruption in ZKUtil#getData
 -

 Key: HBASE-10472
 URL: https://issues.apache.org/jira/browse/HBASE-10472
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.98.1, 0.99.0

 Attachments: 10472.v1.patch, 10472.v2.patch


 Many, but not all, methods in ZKUtil partly hides the interruption: they 
 return null or something like that. Many times, this will result in a NPE, or 
 something undefined. 
 This jira is limited to the getData to keep things small enough (it's used in 
 hbase-client code).
 The code is supposed to behave at least 'as well as before', or better 
 (hopefully).
 It could be included in a later .98 release (98.1) and in .99



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10477) Regression from HBASE-10337

2014-02-06 Thread Nicolas Liochon (JIRA)
Nicolas Liochon created HBASE-10477:
---

 Summary: Regression from HBASE-10337
 Key: HBASE-10477
 URL: https://issues.apache.org/jira/browse/HBASE-10477
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Fix For: 0.98.0, 0.99.0


a piece of code that should not have make it...
ping  [~andrew.purt...@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10477) Regression from HBASE-10337

2014-02-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10477:


Attachment: 10477.v1.patch

 Regression from HBASE-10337
 ---

 Key: HBASE-10477
 URL: https://issues.apache.org/jira/browse/HBASE-10477
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10477.v1.patch


 a piece of code that should not have make it...
 ping  [~andrew.purt...@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10477) Regression from HBASE-10337

2014-02-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10477:


Status: Patch Available  (was: Open)

 Regression from HBASE-10337
 ---

 Key: HBASE-10477
 URL: https://issues.apache.org/jira/browse/HBASE-10477
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10477.v1.patch


 a piece of code that should not have make it...
 ping  [~andrew.purt...@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-06 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893409#comment-13893409
 ] 

Nicolas Liochon commented on HBASE-10185:
-

looking more at the code, it seems that writeRequest could send an exception if 
it fails during the write. Then this exception could be thrown by the caller, 
w/o risking to send an exception created by somebody else.

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12
Reporter: Samarth

 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-06 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893452#comment-13893452
 ] 

Nicolas Liochon commented on HBASE-10185:
-

I'm not far from jira hijacking here ;-)
I'm reopening to submit the patch on trunk. I expect the code to be close on 
.94 actually.
We've got something a little bit similar on trunk, because of the codec: we can 
imagine a specific codec that cannot handle some kind of messages. Then it will 
need to send a doNotRetry, and we should not need to close the connection, as 
the tcp connection itself is fine.

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
 Fix For: 0.99.0

 Attachments: 10185.v1.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10185:


Fix Version/s: 0.99.0
Affects Version/s: 0.99.0
   Status: Patch Available  (was: Reopened)

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
 Fix For: 0.99.0

 Attachments: 10185.v1.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10185:


Attachment: 10185.v1.patch

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
 Fix For: 0.99.0

 Attachments: 10185.v1.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Reopened] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon reopened HBASE-10185:
-


 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
 Fix For: 0.99.0

 Attachments: 10185.v1.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10477) Regression from HBASE-10337

2014-02-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10477:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Regression from HBASE-10337
 ---

 Key: HBASE-10477
 URL: https://issues.apache.org/jira/browse/HBASE-10477
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10477.v1.patch


 a piece of code that should not have make it...
 ping  [~andrew.purt...@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10477) Regression from HBASE-10337

2014-02-06 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893560#comment-13893560
 ] 

Nicolas Liochon commented on HBASE-10477:
-

Committed, hopefully on time for the current .98 RC. Thanks for the quick 
review, Nick

 Regression from HBASE-10337
 ---

 Key: HBASE-10477
 URL: https://issues.apache.org/jira/browse/HBASE-10477
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.98.0, 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 10477.v1.patch


 a piece of code that should not have make it...
 ping  [~andrew.purt...@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown

2014-02-06 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon reassigned HBASE-10185:
---

Assignee: Nicolas Liochon

 HBaseClient retries even though a DoNotRetryException was thrown
 

 Key: HBASE-10185
 URL: https://issues.apache.org/jira/browse/HBASE-10185
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 0.94.12, 0.99.0
Reporter: Samarth
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10185.v1.patch


 Throwing a DoNotRetryIOException inside  Writable.write(Dataoutput) method 
 doesn't prevent HBase from retrying. Debugging the code locally, I figured 
 that the bug lies in the way HBaseClient simply throws an IOException when it 
 sees that a connection has been closed unexpectedly.  
 Method:
 public Writable call(Writable param, InetSocketAddress addr,
Class? extends VersionedProtocol protocol,
User ticket, int rpcTimeout)
 Excerpt of code where the bug is present:
 while (!call.done) {
 if (connection.shouldCloseConnection.get()) {
   throw new IOException(Unexpected closed connection);
 }
 Throwing this IOException causes the ServerCallable.translateException(t) to 
 be a no-op resulting in HBase retrying. 
 From my limited view and understanding of the code, one way I could think of 
 handling this is by looking at the closeConnection member variable of a 
 connection to determine what kind of exception should be thrown. 
 Specifically, when a connection is closed, the current code does this: 
 protected synchronized void markClosed(IOException e) {
   if (shouldCloseConnection.compareAndSet(false, true)) {
 closeException = e;
 notifyAll();
   }
 }
 Within HBaseClient's call method, the code could possibly be modified to:
 while (!call.done) {
 if (connection.shouldCloseConnection.get() ) {
  if(connection.closeException instanceof   
 DoNotRetryIOException) {
 throw closeException;
 }
   throw new IOException(Unexpected closed connection);
 }



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


<    4   5   6   7   8   9   10   11   12   13   >