[ 
https://issues.apache.org/jira/browse/HBASE-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682445#comment-14682445
 ] 

Andrew Purtell edited comment on HBASE-14209 at 8/12/15 12:48 AM:
------------------------------------------------------------------

So, the problem here is a missed ZooKeeper watch notification. I modified 
TestShell to set the default log level to DEBUG and can confirm we get all the 
way through to where the visibility label cache processes the client request 
and writes data to zookeeper, but then the expected cache update via watcher 
trigger doesn't happen. This seems specific to 0.98 because I've not seen this 
test failure with branch-1.0 or higher.

I'm attaching a patch for VisibilityLabelsCache that works around the problem 
by updating local state before writing it to zookeeper. It's a hack, but using 
ZK as notification bus for this is itself a hack slated to be replaced by proc 
v2. The patch has three hunks. The first two fix where we are not protecting 
'ordinalVsLabels' with the RWlock and should be applied everywhere:

{code}
@@ -206,27 +206,37 @@ public class VisibilityLabelsCache implements 
VisibilityLabelOrdinalProvider {
   }
 
   public List<String> getUserAuths(String user) {
-    List<String> auths = EMPTY_LIST;
-    Set<Integer> authOrdinals = getUserAuthsAsOrdinals(user);
-    if (!authOrdinals.equals(EMPTY_SET)) {
-      auths = new ArrayList<String>(authOrdinals.size());
-      for (Integer authOrdinal : authOrdinals) {
-        auths.add(ordinalVsLabels.get(authOrdinal));
+    this.lock.readLock().lock();
+    try {
+      List<String> auths = EMPTY_LIST;
+      Set<Integer> authOrdinals = getUserAuthsAsOrdinals(user);
+      if (!authOrdinals.equals(EMPTY_SET)) {
+        auths = new ArrayList<String>(authOrdinals.size());
+        for (Integer authOrdinal : authOrdinals) {
+          auths.add(ordinalVsLabels.get(authOrdinal));
+        }
       }
+      return auths;
+    } finally {
+      this.lock.readLock().unlock();
     }
-    return auths;
   }
 
   public List<String> getGroupAuths(String[] groups) {
-    List<String> auths = EMPTY_LIST;
-    Set<Integer> authOrdinals = getGroupAuthsAsOrdinals(groups);
-    if (!authOrdinals.equals(EMPTY_SET)) {
-      auths = new ArrayList<String>(authOrdinals.size());
-      for (Integer authOrdinal : authOrdinals) {
-        auths.add(ordinalVsLabels.get(authOrdinal));
+    this.lock.readLock().lock();
+    try {
+      List<String> auths = EMPTY_LIST;
+      Set<Integer> authOrdinals = getGroupAuthsAsOrdinals(groups);
+      if (!authOrdinals.equals(EMPTY_SET)) {
+        auths = new ArrayList<String>(authOrdinals.size());
+        for (Integer authOrdinal : authOrdinals) {
+          auths.add(ordinalVsLabels.get(authOrdinal));
+        }
       }
+      return auths;
+    } finally {
+      this.lock.readLock().unlock();
     }
-    return auths;
   }
 
   /**
{code}

The third hunk is the test fix hack for 0.98 and doesn't necessarily need to be 
applied to later code lines. The effect is the local state may be updated twice 
- on one cluster node only, where the client call is processed - once when 
writeToZookeeper() is called, later again when processing a watcher 
notification. This can be expensive, but updates to this data are expected to 
be infrequent, and this is less serious than missing the watcher notification 
and failing to update the cached data.

{code}
@@ -270,7 +280,15 @@ public class VisibilityLabelsCache implements 
VisibilityLabelOrdinalProvider {
     }
   }
 
-  public void writeToZookeeper(byte[] data, boolean labelsOrUserAuths) {
+  public void writeToZookeeper(byte[] data, boolean labelsOrUserAuths) throws 
IOException {
+    // Update local state, then send it to zookeeper
+    if (labelsOrUserAuths) {
+      // True for labels
+      this.refreshLabelsCache(data);
+    } else {
+      // False for user auths
+      this.refreshUserAuthsCache(data);
+    }
     this.zkVisibilityWatcher.writeToZookeeper(data, labelsOrUserAuths);
   }
 }
{code}

What do you think [~anoop.hbase], [~ram_krish] ?



was (Author: apurtell):
So, the problem here is a missed ZooKeeper watch notification. I modified 
TestShell to set the default log level to DEBUG and can confirm we get all the 
way through to where the visibility label cache processes the client request 
and writes data to zookeeper, but then the expected cache update via watcher 
trigger doesn't happen. This seems specific to 0.98 because I've not seen this 
test failure with branch-1.0 or higher.

I'm attaching a patch for VisibilityLabelsCache that works around the problem 
by updating local state before writing it to zookeeper. It's a hack, but using 
ZK as notification bus for this is itself a hack slated to be replaced by proc 
v2. The patch has three hunks. The first two fix where we are not protecting 
'ordinalVsLabels' with the RWlock and should be applied everywhere:

{code}
@@ -206,27 +206,37 @@ public class VisibilityLabelsCache implements 
VisibilityLabelOrdinalProvider {
   }
 
   public List<String> getUserAuths(String user) {
-    List<String> auths = EMPTY_LIST;
-    Set<Integer> authOrdinals = getUserAuthsAsOrdinals(user);
-    if (!authOrdinals.equals(EMPTY_SET)) {
-      auths = new ArrayList<String>(authOrdinals.size());
-      for (Integer authOrdinal : authOrdinals) {
-        auths.add(ordinalVsLabels.get(authOrdinal));
+    this.lock.readLock().lock();
+    try {
+      List<String> auths = EMPTY_LIST;
+      Set<Integer> authOrdinals = getUserAuthsAsOrdinals(user);
+      if (!authOrdinals.equals(EMPTY_SET)) {
+        auths = new ArrayList<String>(authOrdinals.size());
+        for (Integer authOrdinal : authOrdinals) {
+          auths.add(ordinalVsLabels.get(authOrdinal));
+        }
       }
+      return auths;
+    } finally {
+      this.lock.readLock().unlock();
     }
-    return auths;
   }
 
   public List<String> getGroupAuths(String[] groups) {
-    List<String> auths = EMPTY_LIST;
-    Set<Integer> authOrdinals = getGroupAuthsAsOrdinals(groups);
-    if (!authOrdinals.equals(EMPTY_SET)) {
-      auths = new ArrayList<String>(authOrdinals.size());
-      for (Integer authOrdinal : authOrdinals) {
-        auths.add(ordinalVsLabels.get(authOrdinal));
+    this.lock.readLock().lock();
+    try {
+      List<String> auths = EMPTY_LIST;
+      Set<Integer> authOrdinals = getGroupAuthsAsOrdinals(groups);
+      if (!authOrdinals.equals(EMPTY_SET)) {
+        auths = new ArrayList<String>(authOrdinals.size());
+        for (Integer authOrdinal : authOrdinals) {
+          auths.add(ordinalVsLabels.get(authOrdinal));
+        }
       }
+      return auths;
+    } finally {
+      this.lock.readLock().unlock();
     }
-    return auths;
   }
 
   /**
{code}

The third hunk is the test fix hack for 0.98 and doesn't necessarily need to be 
applied to later code lines. The effect is the local state may be updated 
twice, once when writeToZookeeper() is called, later again when processing a 
watcher notification. This can be expensive, but updates to this data are 
expected to be infrequent, and this is less serious than missing the watcher 
notification and failing to update the cached data.

{code}
@@ -270,7 +280,15 @@ public class VisibilityLabelsCache implements 
VisibilityLabelOrdinalProvider {
     }
   }
 
-  public void writeToZookeeper(byte[] data, boolean labelsOrUserAuths) {
+  public void writeToZookeeper(byte[] data, boolean labelsOrUserAuths) throws 
IOException {
+    // Update local state, then send it to zookeeper
+    if (labelsOrUserAuths) {
+      // True for labels
+      this.refreshLabelsCache(data);
+    } else {
+      // False for user auths
+      this.refreshUserAuthsCache(data);
+    }
     this.zkVisibilityWatcher.writeToZookeeper(data, labelsOrUserAuths);
   }
 }
{code}

What do you think [~anoop.hbase], [~ram_krish] ?


> [0.98] TestShell visibility tests failing
> -----------------------------------------
>
>                 Key: HBASE-14209
>                 URL: https://issues.apache.org/jira/browse/HBASE-14209
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.14
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 0.98.14
>
>         Attachments: HBASE-14209-0.98.patch
>
>
> This is after HBASE-14105  but we've seen this earlier where adding ruby 
> units to the shell tests can cause the visibility tests to fail inexplicably. 
> We can't just avoid adding ruby units to TestShell in 0.98 so figure out the 
> root cause and fix or disable these. 
> {noformat}
>   1) Error:
> test_The_get/put_methods_should_work_for_data_written_with_Visibility(Hbase::VisibilityLabelsAdminMethodsTest):
> NativeException: junit.framework.AssertionFailedError: Waiting timed out 
> after [10,000] msec
>     junit/framework/Assert.java:57:in `fail'
>     org/apache/hadoop/hbase/Waiter.java:193:in `waitFor'
>     org/apache/hadoop/hbase/Waiter.java:128:in `waitFor'
>     org/apache/hadoop/hbase/HBaseTestingUtility.java:3514:in `waitFor'
>     org/apache/hadoop/hbase/HBaseTestingUtility.java:3576:in 
> `waitLabelAvailable'
>     ./src/test/ruby/hbase/visibility_labels_admin_test.rb:73:in 
> `test_The_get/put_methods_should_work_for_data_written_with_Visibility'
>     org/jruby/RubyProc.java:270:in `call'
>     org/jruby/RubyKernel.java:2105:in `send'
>     org/jruby/RubyArray.java:1620:in `each'
>     org/jruby/RubyArray.java:1620:in `each'
>   2) Error:
> test_The_set/clear_methods_should_work_with_authorizations(Hbase::VisibilityLabelsAdminMethodsTest):
> NativeException: junit.framework.AssertionFailedError: Waiting timed out 
> after [10,000] msec
>     junit/framework/Assert.java:57:in `fail'
>     org/apache/hadoop/hbase/Waiter.java:193:in `waitFor'
>     org/apache/hadoop/hbase/Waiter.java:128:in `waitFor'
>     org/apache/hadoop/hbase/HBaseTestingUtility.java:3514:in `waitFor'
>     org/apache/hadoop/hbase/HBaseTestingUtility.java:3576:in 
> `waitLabelAvailable'
>     ./src/test/ruby/hbase/visibility_labels_admin_test.rb:52:in 
> `test_The_set/clear_methods_should_work_with_authorizations'
>     org/jruby/RubyProc.java:270:in `call'
>     org/jruby/RubyKernel.java:2105:in `send'
>     org/jruby/RubyArray.java:1620:in `each'
>     org/jruby/RubyArray.java:1620:in `each'
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to