[jira] [Commented] (ACCUMULO-3967) bulk import loses records when loading pre-split table

2015-08-24 Thread Edward Seidl (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709323#comment-14709323
 ] 

Edward Seidl commented on ACCUMULO-3967:


Thanks Josh!  Can't wait to try out the fixed version.  

 bulk import loses records when loading pre-split table
 --

 Key: ACCUMULO-3967
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3967
 Project: Accumulo
  Issue Type: Bug
  Components: client, tserver
Affects Versions: 1.4.5, 1.5.3, 1.6.0, 1.6.1, 1.6.2, 1.6.3, 1.7.0
 Environment: generic hadoop 2.6.0, zookeeper 3.4.6 on redhat 6.7
 7 node cluster
Reporter: Edward Seidl
Assignee: Josh Elser
Priority: Blocker
 Fix For: 1.6.4, 1.7.1, 1.8.0, 1.5.4

  Time Spent: 1h 10m
  Remaining Estimate: 0h

 I just noticed that some records I'm loading via importDirectory go missing.  
 After a lot of digging around trying to reproduce the problem, I discovered 
 that it occurs most frequently when loading a table that I have just recently 
 added splits to.  In the tserver logs I'll see messages like 
 20 16:25:36,805 [client.BulkImporter] INFO : Could not assign 1 map files to 
 tablet 1xw;18;17 because : Not Serving Tablet .  Will retry ...
  
 or
 20 16:25:44,826 [tserver.TabletServer] INFO : files 
 [hdfs://:54310/accumulo/tables/1xw/b-00jnmxe/I00jnmxq.rf] not imported to 
 1xw;03;02: tablet 1xw;03;02 is closed
 these appear after messages about unloading tablets...it seems that tablets 
 are being redistributed at the same time as the bulk import is occuring.
 Steps to reproduce
 1) I run a mapreduce job that produces random data in rfiles
 2) copy the rfiles to an import directory
 3) create table or deleterows -f
 4) addsplits
 5) importdirectory
 I have also performed the above completely within the mapreduce job, with 
 similar results.  The difference with the mapreduce job is that the time 
 between adding splits and the import directory is minutes rather than seconds.
 my current test creates 100 records, and after the importdirectory 
 returns a count of rows will be anywhere from ~80 to 100.
 With my original workflow, I found that re-importing the same set of rfiles 
 three times would eventually get all rows loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3970) Generating multiple views of a value at scan time

2015-08-24 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709418#comment-14709418
 ] 

Billie Rinaldi commented on ACCUMULO-3970:
--

I guess the problem is that this represents a shift in control of data 
visibility from the data producer to, say, the table admin.  There's nothing to 
enforce that the value is transformed, so someone could set up an iterator that 
just outputs the key/value pair with a different visibility:
{noformat}
(pt_id, demographic, pt_dob, SHD_DOB) - 1925-08-22
{noformat}
Perhaps designing an iterator that only performed masking or truncation would 
address this issue.  We'd still want the data producer to be able to control 
the amount of transformation required, and possibly specify a list of 
visibilities that can view the masked values.

 Generating multiple views of a value at scan time
 -

 Key: ACCUMULO-3970
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3970
 Project: Accumulo
  Issue Type: New Feature
Reporter: Russ Weeks
Priority: Minor
 Fix For: 1.8.0


 It would be useful to have the ability to generate different representations 
 of a key-value pair at scan time, based on the scan authorizations.
 For example, consider [HIPPA safe harbour 
 de-identification|http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html#dates].
  One of the rules for de-identifying a patient's date of birth is that if a 
 patient is 89 years old or younger, you can disclose his exact year of birth. 
 If a patient is 90 years old or over, you pretend that he's 90 years old.
 You can imagine implementing this as a key/value mapping in accumulo like,
 {{(pt_id, demographic, pt_dob, PII_DOB) - 1925-08-22}}
 {{(pt_id, demographic, pt_dob, SHD_DOB) - 1925}}
 Where the value corresponding to visibility SHD_DOB is produced at scan-time, 
 depending on the patient's current age.
 Another example would be the ability to produce a salted hash of a unique 
 identifier like a social security number or medical record number, where the 
 salt (or the hash algorithm, or the work factor...) could be specified 
 dynamically without having to re-code all the values in the system.
 More broadly speaking, this feature would give organizations more flexibility 
 to change how they deidentify, transform or anonymize data to suit different 
 access levels.
 Of course, to do this you'd need to have a pluggable component that can 
 process key/value pairs before visibilities are evaluated. I can see why this 
 might give a lot of people the heeby-jeebies but I'd like to gather as much 
 feedback as possible. Looking forward to hearing your thoughts!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (ACCUMULO-3958) Monitor.fetchData should check for null instance name

2015-08-24 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser reopened ACCUMULO-3958:
--

Nope, I'm wrong, not a dupe.

 Monitor.fetchData should check for null instance name
 -

 Key: ACCUMULO-3958
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3958
 Project: Accumulo
  Issue Type: Bug
  Components: monitor
Reporter: Josh Elser
Priority: Minor

 Had an odd situation where the monitor would throw an NPE constantly when I 
 tried to view any page.
 It was throwing a NPE trying to accessed the cached instance name. It appears 
 that {{HdfsZooInstance.getInstance().getInstanceName(}} may return null, but 
 the monitor code is not written to account for that.
 The monitor should check for null values before setting the instance name 
 into {{cachedInstanceName}}. When the instance name is null, it should 
 reschedule the timertask to try to fetch a (non-null) instance name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-3958) Monitor.fetchData should check for null instance name

2015-08-24 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved ACCUMULO-3958.
--
Resolution: Duplicate
  Assignee: (was: Josh Elser)

ACCUMULO-3837

 Monitor.fetchData should check for null instance name
 -

 Key: ACCUMULO-3958
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3958
 Project: Accumulo
  Issue Type: Bug
  Components: monitor
Reporter: Josh Elser
Priority: Minor
 Fix For: 1.6.4, 1.7.1, 1.8.0


 Had an odd situation where the monitor would throw an NPE constantly when I 
 tried to view any page.
 It was throwing a NPE trying to accessed the cached instance name. It appears 
 that {{HdfsZooInstance.getInstance().getInstanceName(}} may return null, but 
 the monitor code is not written to account for that.
 The monitor should check for null values before setting the instance name 
 into {{cachedInstanceName}}. When the instance name is null, it should 
 reschedule the timertask to try to fetch a (non-null) instance name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3958) Monitor.fetchData should check for null instance name

2015-08-24 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated ACCUMULO-3958:
-
Fix Version/s: (was: 1.6.4)
   (was: 1.7.1)
   (was: 1.8.0)

 Monitor.fetchData should check for null instance name
 -

 Key: ACCUMULO-3958
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3958
 Project: Accumulo
  Issue Type: Bug
  Components: monitor
Reporter: Josh Elser
Priority: Minor

 Had an odd situation where the monitor would throw an NPE constantly when I 
 tried to view any page.
 It was throwing a NPE trying to accessed the cached instance name. It appears 
 that {{HdfsZooInstance.getInstance().getInstanceName(}} may return null, but 
 the monitor code is not written to account for that.
 The monitor should check for null values before setting the instance name 
 into {{cachedInstanceName}}. When the instance name is null, it should 
 reschedule the timertask to try to fetch a (non-null) instance name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3958) Monitor.fetchData should check for null instance name

2015-08-24 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated ACCUMULO-3958:
-
Fix Version/s: 1.8.0
   1.7.1
   1.6.4

 Monitor.fetchData should check for null instance name
 -

 Key: ACCUMULO-3958
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3958
 Project: Accumulo
  Issue Type: Bug
  Components: monitor
Reporter: Josh Elser
Priority: Minor
 Fix For: 1.6.4, 1.7.1, 1.8.0


 Had an odd situation where the monitor would throw an NPE constantly when I 
 tried to view any page.
 It was throwing a NPE trying to accessed the cached instance name. It appears 
 that {{HdfsZooInstance.getInstance().getInstanceName(}} may return null, but 
 the monitor code is not written to account for that.
 The monitor should check for null values before setting the instance name 
 into {{cachedInstanceName}}. When the instance name is null, it should 
 reschedule the timertask to try to fetch a (non-null) instance name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-3969) NPE in monitor

2015-08-24 Thread Eric Newton (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Newton resolved ACCUMULO-3969.
---
Resolution: Fixed

 NPE in monitor
 --

 Key: ACCUMULO-3969
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3969
 Project: Accumulo
  Issue Type: Bug
  Components: monitor
Affects Versions: 1.6.3
Reporter: Eric Newton
 Fix For: 1.8.0


 From the mailing list:
 {quote}
   We have Accumulo 1.6.3 set up on the HortonWorks install with a basic 
 cluster of one master and one tablet server. Initially everything seemed good 
 but now I'm getting a NullPointerException when I go to the monitor web page. 
 The logs also keep adding a message to that effect as well. I searched and 
 couldn't find any posts with the same problem. I went back through all of the 
 config files and made sure they conform to the configuration instructions. 
 I'm hoping someone can point me to the right place to look for 
 troubleshooting. 
 http://machine-name:50095 responds with a page that has the following:
 {noformat}
 java.lang.NullPointerException
   at org.apache.accumulo.monitor.Monitor.fetchData(Monitor.java:252)
   at 
 org.apache.accumulo.monitor.servlets.BasicServlet.doGet(BasicServlet.java:59)
   at 
 org.apache.accumulo.monitor.servlets.DefaultServlet.doGet(DefaultServlet.java:101)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
   at 
 org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
   at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
   at org.eclipse.jetty.server.Server.handle(Server.java:370)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
   at 
 org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
   at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
   at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}
 The logs are recording the following error:
 {noformat}
 2015-08-21 16:37:21,152 [monitor.Monitor] WARN :
 java.lang.NullPointerException
 at org.apache.accumulo.monitor.Monitor.fetchData(Monitor.java:252)
 at org.apache.accumulo.monitor.Monitor$2.run(Monitor.java:508)
 at 
 org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3969) NPE in monitor

2015-08-24 Thread Eric Newton (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Newton updated ACCUMULO-3969:
--
Fix Version/s: (was: 1.8.0)

 NPE in monitor
 --

 Key: ACCUMULO-3969
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3969
 Project: Accumulo
  Issue Type: Bug
  Components: monitor
Affects Versions: 1.6.3
Reporter: Eric Newton

 From the mailing list:
 {quote}
   We have Accumulo 1.6.3 set up on the HortonWorks install with a basic 
 cluster of one master and one tablet server. Initially everything seemed good 
 but now I'm getting a NullPointerException when I go to the monitor web page. 
 The logs also keep adding a message to that effect as well. I searched and 
 couldn't find any posts with the same problem. I went back through all of the 
 config files and made sure they conform to the configuration instructions. 
 I'm hoping someone can point me to the right place to look for 
 troubleshooting. 
 http://machine-name:50095 responds with a page that has the following:
 {noformat}
 java.lang.NullPointerException
   at org.apache.accumulo.monitor.Monitor.fetchData(Monitor.java:252)
   at 
 org.apache.accumulo.monitor.servlets.BasicServlet.doGet(BasicServlet.java:59)
   at 
 org.apache.accumulo.monitor.servlets.DefaultServlet.doGet(DefaultServlet.java:101)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
   at 
 org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
   at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
   at org.eclipse.jetty.server.Server.handle(Server.java:370)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
   at 
 org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
   at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
   at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}
 The logs are recording the following error:
 {noformat}
 2015-08-21 16:37:21,152 [monitor.Monitor] WARN :
 java.lang.NullPointerException
 at org.apache.accumulo.monitor.Monitor.fetchData(Monitor.java:252)
 at org.apache.accumulo.monitor.Monitor$2.run(Monitor.java:508)
 at 
 org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-3969) NPE in monitor

2015-08-24 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved ACCUMULO-3969.
--
Resolution: Duplicate

 NPE in monitor
 --

 Key: ACCUMULO-3969
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3969
 Project: Accumulo
  Issue Type: Bug
  Components: monitor
Affects Versions: 1.6.3
Reporter: Eric Newton

 From the mailing list:
 {quote}
   We have Accumulo 1.6.3 set up on the HortonWorks install with a basic 
 cluster of one master and one tablet server. Initially everything seemed good 
 but now I'm getting a NullPointerException when I go to the monitor web page. 
 The logs also keep adding a message to that effect as well. I searched and 
 couldn't find any posts with the same problem. I went back through all of the 
 config files and made sure they conform to the configuration instructions. 
 I'm hoping someone can point me to the right place to look for 
 troubleshooting. 
 http://machine-name:50095 responds with a page that has the following:
 {noformat}
 java.lang.NullPointerException
   at org.apache.accumulo.monitor.Monitor.fetchData(Monitor.java:252)
   at 
 org.apache.accumulo.monitor.servlets.BasicServlet.doGet(BasicServlet.java:59)
   at 
 org.apache.accumulo.monitor.servlets.DefaultServlet.doGet(DefaultServlet.java:101)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
   at 
 org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
   at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
   at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
   at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
   at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
   at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
   at org.eclipse.jetty.server.Server.handle(Server.java:370)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
   at 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
   at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
   at 
 org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
   at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
   at 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
   at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}
 The logs are recording the following error:
 {noformat}
 2015-08-21 16:37:21,152 [monitor.Monitor] WARN :
 java.lang.NullPointerException
 at org.apache.accumulo.monitor.Monitor.fetchData(Monitor.java:252)
 at org.apache.accumulo.monitor.Monitor$2.run(Monitor.java:508)
 at 
 org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (ACCUMULO-3958) Monitor.fetchData should check for null instance name

2015-08-24 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser reassigned ACCUMULO-3958:


Assignee: Josh Elser

 Monitor.fetchData should check for null instance name
 -

 Key: ACCUMULO-3958
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3958
 Project: Accumulo
  Issue Type: Bug
  Components: monitor
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Minor
 Fix For: 1.6.4, 1.7.1, 1.8.0


 Had an odd situation where the monitor would throw an NPE constantly when I 
 tried to view any page.
 It was throwing a NPE trying to accessed the cached instance name. It appears 
 that {{HdfsZooInstance.getInstance().getInstanceName(}} may return null, but 
 the monitor code is not written to account for that.
 The monitor should check for null values before setting the instance name 
 into {{cachedInstanceName}}. When the instance name is null, it should 
 reschedule the timertask to try to fetch a (non-null) instance name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-3958) Monitor.fetchData should check for null instance name

2015-08-24 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved ACCUMULO-3958.
--
Resolution: Fixed

 Monitor.fetchData should check for null instance name
 -

 Key: ACCUMULO-3958
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3958
 Project: Accumulo
  Issue Type: Bug
  Components: monitor
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Minor
 Fix For: 1.6.4, 1.7.1, 1.8.0

  Time Spent: 0.5h
  Remaining Estimate: 0h

 Had an odd situation where the monitor would throw an NPE constantly when I 
 tried to view any page.
 It was throwing a NPE trying to accessed the cached instance name. It appears 
 that {{HdfsZooInstance.getInstance().getInstanceName(}} may return null, but 
 the monitor code is not written to account for that.
 The monitor should check for null values before setting the instance name 
 into {{cachedInstanceName}}. When the instance name is null, it should 
 reschedule the timertask to try to fetch a (non-null) instance name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3971) Better doc for tracing

2015-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709731#comment-14709731
 ] 

ASF GitHub Bot commented on ACCUMULO-3971:
--

Github user joshelser commented on the pull request:

https://github.com/apache/accumulo/pull/44#issuecomment-134317691
  
This is all really great! +1 from me, would be good to hear from 
@billierinaldi  and/or @ericnewton if they have some time.

The unrelated formatting change will be squashed by me -- I noticed I 
goofed that up over the weekend.


 Better doc for tracing
 --

 Key: ACCUMULO-3971
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3971
 Project: Accumulo
  Issue Type: Improvement
  Components: docs, trace
Affects Versions: 1.7.0
Reporter: Dylan Hutchison
Assignee: Dylan Hutchison
Priority: Minor
  Labels: documentation
 Fix For: 1.7.1


 Developers who want to go beyond Accumulo's default tracing are too much on 
 their own to figure out how to use Accumulo's tracing features.  I had to 
 dive deep into source code to figure out how to understand the trace table 
 format and add a ZipkinSpanReceiver.  I'd like to share what I found so that 
 other developers do not have to to the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3959) Confusing wording on BatchScanner javadoc

2015-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709737#comment-14709737
 ] 

ASF GitHub Bot commented on ACCUMULO-3959:
--

Github user joshelser commented on the pull request:

https://github.com/apache/accumulo/pull/45#issuecomment-134319018
  
+1 from me. Will merge today.


 Confusing wording on BatchScanner javadoc
 -

 Key: ACCUMULO-3959
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3959
 Project: Accumulo
  Issue Type: Improvement
  Components: docs
Affects Versions: 1.6.3, 1.7.0
Reporter: Dylan Hutchison
Assignee: Dylan Hutchison
Priority: Minor
  Labels: docuentation
 Fix For: 1.6.4, 1.7.1


 The following sentence in the [BatchScanner 
 Javadoc|https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/BatchScanner.html]
  has confused my colleagues into using Scanners and wondering why performance 
 doesn't scale.
 bq. If you want to lookup a few ranges and expect those ranges to contain a 
 lot of data, then use the Scanner instead.
 Also regarding this next sentence, from what I see of the BatchScanner it 
 will break up large Range objects that span multiple extents (tablets) into 
 multiple ranges, possibly one for each tablet.
 bq. Use this when looking up lots of ranges and you expect each range to 
 contain a small amount of data.
 If the client is okay with unsorted order and it is okay with using multiple 
 threads, then isn't it always a better decision to use a BatchScanner than 
 regular Scanner?  In the worst case, one Range over a single row, the 
 BatchScanner will perform the same as a regular Scanner, ya?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3971) Better doc for tracing

2015-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709793#comment-14709793
 ] 

ASF GitHub Bot commented on ACCUMULO-3971:
--

Github user billierinaldi commented on the pull request:

https://github.com/apache/accumulo/pull/44#issuecomment-134325576
  
This looks great.  Thanks, Dylan!  I have two suggestions.  One is about 
the sentence, Span entries record full span information, whereas index and 
start time entries provide indexing into span information useful for quickly 
finding spans by type or start time.  Actually both the span and start time 
entries have the full span information, while only the index entries provide 
indexing.

The other suggestion is to mention other configuration properties that 
would need to be set for the zipkin span receiver in accumulo-site.xml and in 
the client configuration, such as:
```
property
  nametrace.span.receiver.zipkin.collector-hostname/name
  valuelocalhost/value
/property
property
  nametrace.span.receiver.zipkin.collector-port/name
  value9410/value
/property
```


 Better doc for tracing
 --

 Key: ACCUMULO-3971
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3971
 Project: Accumulo
  Issue Type: Improvement
  Components: docs, trace
Affects Versions: 1.7.0
Reporter: Dylan Hutchison
Assignee: Dylan Hutchison
Priority: Minor
  Labels: documentation
 Fix For: 1.7.1


 Developers who want to go beyond Accumulo's default tracing are too much on 
 their own to figure out how to use Accumulo's tracing features.  I had to 
 dive deep into source code to figure out how to understand the trace table 
 format and add a ZipkinSpanReceiver.  I'd like to share what I found so that 
 other developers do not have to to the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3959) Confusing wording on BatchScanner javadoc

2015-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709800#comment-14709800
 ] 

ASF GitHub Bot commented on ACCUMULO-3959:
--

Github user keith-turner commented on a diff in the pull request:

https://github.com/apache/accumulo/pull/45#discussion_r37784571
  
--- Diff: 
core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java ---
@@ -16,19 +16,20 @@
  */
 package org.apache.accumulo.core.client;
 
+import org.apache.accumulo.core.data.Range;
+
 import java.util.Collection;
 import java.util.concurrent.TimeUnit;
 
-import org.apache.accumulo.core.data.Range;
-
 /**
  * Implementations of BatchScanner support efficient lookups of many 
ranges in accumulo.
+ * BatchScanners are also appropriate for large, single ranges,
+ * as a BatchScanner will break those ranges up into separate RPCs
+ * provided the range spans more than one tablet
+ * and there are sufficiently many scan threads available.
  *
- * Use this when looking up lots of ranges and you expect each range to 
contain a small amount of data. Also only use this when you do not care about 
the
- * returned data being in sorted order.
- *
- * If you want to lookup a few ranges and expect those ranges to contain a 
lot of data, then use the Scanner instead. Also, the Scanner will return data in
- * sorted order, this will not.
+ * Only use this when you do not care about returned data being in sorted 
order.
--- End diff --

This was already broken before your patch, but I think javadoc need `p` 
markup for paragraphs.   Not sure it will render as intended w/o it.

Did you format these changes?


 Confusing wording on BatchScanner javadoc
 -

 Key: ACCUMULO-3959
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3959
 Project: Accumulo
  Issue Type: Improvement
  Components: docs
Affects Versions: 1.6.3, 1.7.0
Reporter: Dylan Hutchison
Assignee: Dylan Hutchison
Priority: Minor
  Labels: docuentation
 Fix For: 1.6.4, 1.7.1


 The following sentence in the [BatchScanner 
 Javadoc|https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/BatchScanner.html]
  has confused my colleagues into using Scanners and wondering why performance 
 doesn't scale.
 bq. If you want to lookup a few ranges and expect those ranges to contain a 
 lot of data, then use the Scanner instead.
 Also regarding this next sentence, from what I see of the BatchScanner it 
 will break up large Range objects that span multiple extents (tablets) into 
 multiple ranges, possibly one for each tablet.
 bq. Use this when looking up lots of ranges and you expect each range to 
 contain a small amount of data.
 If the client is okay with unsorted order and it is okay with using multiple 
 threads, then isn't it always a better decision to use a BatchScanner than 
 regular Scanner?  In the worst case, one Range over a single row, the 
 BatchScanner will perform the same as a regular Scanner, ya?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3959) Confusing wording on BatchScanner javadoc

2015-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709810#comment-14709810
 ] 

ASF GitHub Bot commented on ACCUMULO-3959:
--

Github user dhutchis commented on a diff in the pull request:

https://github.com/apache/accumulo/pull/45#discussion_r37785744
  
--- Diff: 
core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java ---
@@ -16,19 +16,20 @@
  */
 package org.apache.accumulo.core.client;
 
+import org.apache.accumulo.core.data.Range;
+
 import java.util.Collection;
 import java.util.concurrent.TimeUnit;
 
-import org.apache.accumulo.core.data.Range;
-
 /**
  * Implementations of BatchScanner support efficient lookups of many 
ranges in accumulo.
+ * BatchScanners are also appropriate for large, single ranges,
+ * as a BatchScanner will break those ranges up into separate RPCs
+ * provided the range spans more than one tablet
+ * and there are sufficiently many scan threads available.
  *
- * Use this when looking up lots of ranges and you expect each range to 
contain a small amount of data. Also only use this when you do not care about 
the
- * returned data being in sorted order.
- *
- * If you want to lookup a few ranges and expect those ranges to contain a 
lot of data, then use the Scanner instead. Also, the Scanner will return data in
- * sorted order, this will not.
+ * Only use this when you do not care about returned data being in sorted 
order.
--- End diff --

Correct, I see that the p tag is necessary from the online javadoc at

http://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/BatchScanner.html

Will fix tonight when I return to my laptop.  I don't think my editor
(IntelliJ with the Eclipse code formatter plugin) adds the HTML tags
automatically.

On Mon, Aug 24, 2015 at 2:25 PM, Keith Turner notificati...@github.com
wrote:

 In core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java
 https://github.com/apache/accumulo/pull/45#discussion_r37784571:

*
  - * Use this when looking up lots of ranges and you expect each range 
to contain a small amount of data. Also only use this when you do not care 
about the
  - * returned data being in sorted order.
  - *
  - * If you want to lookup a few ranges and expect those ranges to 
contain a lot of data, then use the Scanner instead. Also, the Scanner will 
return data in
  - * sorted order, this will not.
  + * Only use this when you do not care about returned data being in 
sorted order.

 This was already broken before your patch, but I think javadoc need p
 markup for paragraphs. Not sure it will render as intended w/o it.

 Did you format these changes?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/accumulo/pull/45/files#r37784571.




 Confusing wording on BatchScanner javadoc
 -

 Key: ACCUMULO-3959
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3959
 Project: Accumulo
  Issue Type: Improvement
  Components: docs
Affects Versions: 1.6.3, 1.7.0
Reporter: Dylan Hutchison
Assignee: Dylan Hutchison
Priority: Minor
  Labels: docuentation
 Fix For: 1.6.4, 1.7.1


 The following sentence in the [BatchScanner 
 Javadoc|https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/BatchScanner.html]
  has confused my colleagues into using Scanners and wondering why performance 
 doesn't scale.
 bq. If you want to lookup a few ranges and expect those ranges to contain a 
 lot of data, then use the Scanner instead.
 Also regarding this next sentence, from what I see of the BatchScanner it 
 will break up large Range objects that span multiple extents (tablets) into 
 multiple ranges, possibly one for each tablet.
 bq. Use this when looking up lots of ranges and you expect each range to 
 contain a small amount of data.
 If the client is okay with unsorted order and it is okay with using multiple 
 threads, then isn't it always a better decision to use a BatchScanner than 
 regular Scanner?  In the worst case, one Range over a single row, the 
 BatchScanner will perform the same as a regular Scanner, ya?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3971) Better doc for tracing

2015-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709811#comment-14709811
 ] 

ASF GitHub Bot commented on ACCUMULO-3971:
--

Github user dhutchis commented on the pull request:

https://github.com/apache/accumulo/pull/44#issuecomment-134330086
  
Thanks for spotting that sentence Billie, will fix tonight.  It's good to
throw in the rest of the Zipkin configuration as well (I was using the
default localhost and port).

On Mon, Aug 24, 2015 at 2:22 PM, billierinaldi notificati...@github.com
wrote:

 This looks great. Thanks, Dylan! I have two suggestions. One is about the
 sentence, Span entries record full span information, whereas index and
 start time entries provide indexing into span information useful for
 quickly finding spans by type or start time. Actually both the span and
 start time entries have the full span information, while only the index
 entries provide indexing.

 The other suggestion is to mention other configuration properties that
 would need to be set for the zipkin span receiver in accumulo-site.xml and
 in the client configuration, such as:

 property
   nametrace.span.receiver.zipkin.collector-hostname/name
   valuelocalhost/value
 /property
 property
   nametrace.span.receiver.zipkin.collector-port/name
   value9410/value
 /property

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/accumulo/pull/44#issuecomment-134325576.




 Better doc for tracing
 --

 Key: ACCUMULO-3971
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3971
 Project: Accumulo
  Issue Type: Improvement
  Components: docs, trace
Affects Versions: 1.7.0
Reporter: Dylan Hutchison
Assignee: Dylan Hutchison
Priority: Minor
  Labels: documentation
 Fix For: 1.7.1


 Developers who want to go beyond Accumulo's default tracing are too much on 
 their own to figure out how to use Accumulo's tracing features.  I had to 
 dive deep into source code to figure out how to understand the trace table 
 format and add a ZipkinSpanReceiver.  I'd like to share what I found so that 
 other developers do not have to to the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-1.6 - Build # 889 - Still Failing

2015-08-24 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-1.6 (build #889)

Status: Still Failing

Check console output at https://builds.apache.org/job/Accumulo-1.6/889/ to view 
the results.

[jira] [Commented] (ACCUMULO-3959) Confusing wording on BatchScanner javadoc

2015-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709855#comment-14709855
 ] 

ASF GitHub Bot commented on ACCUMULO-3959:
--

Github user keith-turner commented on a diff in the pull request:

https://github.com/apache/accumulo/pull/45#discussion_r37789259
  
--- Diff: 
core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java ---
@@ -16,19 +16,20 @@
  */
 package org.apache.accumulo.core.client;
 
+import org.apache.accumulo.core.data.Range;
+
 import java.util.Collection;
 import java.util.concurrent.TimeUnit;
 
-import org.apache.accumulo.core.data.Range;
-
 /**
  * Implementations of BatchScanner support efficient lookups of many 
ranges in accumulo.
+ * BatchScanners are also appropriate for large, single ranges,
+ * as a BatchScanner will break those ranges up into separate RPCs
+ * provided the range spans more than one tablet
+ * and there are sufficiently many scan threads available.
--- End diff --

Maybe instead of suggesting how to use the batch scanner, we could focus 
more on describing possible behavior?

  * May parallelize reading of data.  Multiple input ranges may be read in 
parallel or sub ranges of individual input ranges may be read in parallel.
  * May return data in unsorted order.
  * May batch multiple ranges into a single RPC to a tserver.   

Could still mention possible use cases like :

  * Looking up lots of small ranges
  * Parallelizing (spell check in web browser does not like that word) a 
computation over an entire table using a large range w/ iterators.   This case 
may not return lots of data, although lots of data may be read by the iterators.



 Confusing wording on BatchScanner javadoc
 -

 Key: ACCUMULO-3959
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3959
 Project: Accumulo
  Issue Type: Improvement
  Components: docs
Affects Versions: 1.6.3, 1.7.0
Reporter: Dylan Hutchison
Assignee: Dylan Hutchison
Priority: Minor
  Labels: docuentation
 Fix For: 1.6.4, 1.7.1


 The following sentence in the [BatchScanner 
 Javadoc|https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/BatchScanner.html]
  has confused my colleagues into using Scanners and wondering why performance 
 doesn't scale.
 bq. If you want to lookup a few ranges and expect those ranges to contain a 
 lot of data, then use the Scanner instead.
 Also regarding this next sentence, from what I see of the BatchScanner it 
 will break up large Range objects that span multiple extents (tablets) into 
 multiple ranges, possibly one for each tablet.
 bq. Use this when looking up lots of ranges and you expect each range to 
 contain a small amount of data.
 If the client is okay with unsorted order and it is okay with using multiple 
 threads, then isn't it always a better decision to use a BatchScanner than 
 regular Scanner?  In the worst case, one Range over a single row, the 
 BatchScanner will perform the same as a regular Scanner, ya?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3959) Confusing wording on BatchScanner javadoc

2015-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709859#comment-14709859
 ] 

ASF GitHub Bot commented on ACCUMULO-3959:
--

Github user keith-turner commented on a diff in the pull request:

https://github.com/apache/accumulo/pull/45#discussion_r37789345
  
--- Diff: 
core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java ---
@@ -16,19 +16,20 @@
  */
 package org.apache.accumulo.core.client;
 
+import org.apache.accumulo.core.data.Range;
+
 import java.util.Collection;
 import java.util.concurrent.TimeUnit;
 
-import org.apache.accumulo.core.data.Range;
-
 /**
  * Implementations of BatchScanner support efficient lookups of many 
ranges in accumulo.
+ * BatchScanners are also appropriate for large, single ranges,
+ * as a BatchScanner will break those ranges up into separate RPCs
+ * provided the range spans more than one tablet
+ * and there are sufficiently many scan threads available.
  *
- * Use this when looking up lots of ranges and you expect each range to 
contain a small amount of data. Also only use this when you do not care about 
the
- * returned data being in sorted order.
- *
- * If you want to lookup a few ranges and expect those ranges to contain a 
lot of data, then use the Scanner instead. Also, the Scanner will return data in
- * sorted order, this will not.
--- End diff --

Not 100% sure, but I think I wrote this javadoc.   My thinking was that the 
Scanner does pipelining and may have better performance for reading data to a 
single client.   However, this may be completely wrong.  I think removing this 
suggestions is a good idea.


 Confusing wording on BatchScanner javadoc
 -

 Key: ACCUMULO-3959
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3959
 Project: Accumulo
  Issue Type: Improvement
  Components: docs
Affects Versions: 1.6.3, 1.7.0
Reporter: Dylan Hutchison
Assignee: Dylan Hutchison
Priority: Minor
  Labels: docuentation
 Fix For: 1.6.4, 1.7.1


 The following sentence in the [BatchScanner 
 Javadoc|https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/BatchScanner.html]
  has confused my colleagues into using Scanners and wondering why performance 
 doesn't scale.
 bq. If you want to lookup a few ranges and expect those ranges to contain a 
 lot of data, then use the Scanner instead.
 Also regarding this next sentence, from what I see of the BatchScanner it 
 will break up large Range objects that span multiple extents (tablets) into 
 multiple ranges, possibly one for each tablet.
 bq. Use this when looking up lots of ranges and you expect each range to 
 contain a small amount of data.
 If the client is okay with unsorted order and it is okay with using multiple 
 threads, then isn't it always a better decision to use a BatchScanner than 
 regular Scanner?  In the worst case, one Range over a single row, the 
 BatchScanner will perform the same as a regular Scanner, ya?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3959) Confusing wording on BatchScanner javadoc

2015-08-24 Thread Christopher Tubbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs updated ACCUMULO-3959:

Fix Version/s: 1.8.0

 Confusing wording on BatchScanner javadoc
 -

 Key: ACCUMULO-3959
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3959
 Project: Accumulo
  Issue Type: Improvement
  Components: docs
Affects Versions: 1.6.3, 1.7.0
Reporter: Dylan Hutchison
Assignee: Dylan Hutchison
Priority: Minor
  Labels: docuentation
 Fix For: 1.6.4, 1.7.1, 1.8.0


 The following sentence in the [BatchScanner 
 Javadoc|https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/BatchScanner.html]
  has confused my colleagues into using Scanners and wondering why performance 
 doesn't scale.
 bq. If you want to lookup a few ranges and expect those ranges to contain a 
 lot of data, then use the Scanner instead.
 Also regarding this next sentence, from what I see of the BatchScanner it 
 will break up large Range objects that span multiple extents (tablets) into 
 multiple ranges, possibly one for each tablet.
 bq. Use this when looking up lots of ranges and you expect each range to 
 contain a small amount of data.
 If the client is okay with unsorted order and it is okay with using multiple 
 threads, then isn't it always a better decision to use a BatchScanner than 
 regular Scanner?  In the worst case, one Range over a single row, the 
 BatchScanner will perform the same as a regular Scanner, ya?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3970) Generating multiple views of a value at scan time

2015-08-24 Thread Russ Weeks (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709888#comment-14709888
 ] 

Russ Weeks commented on ACCUMULO-3970:
--

Thanks for your comment, Billie. I see what you mean about not wanting to rely 
on the table admin
to manage data visiblities. I guess my use case could be summarized as, As a 
user of Accumulo, I
want to be able to put sensitive data into the system, and have different users 
see different views
of that data in accordance with organizational policy. Flipping it around, 
you're not relying on
the data producer to get de-identification right. In my mind it's a valuable 
feature, but I get
that I be in a minority among Accumulo's users in wanting it.


 Generating multiple views of a value at scan time
 -

 Key: ACCUMULO-3970
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3970
 Project: Accumulo
  Issue Type: New Feature
Reporter: Russ Weeks
Priority: Minor
 Fix For: 1.8.0


 It would be useful to have the ability to generate different representations 
 of a key-value pair at scan time, based on the scan authorizations.
 For example, consider [HIPPA safe harbour 
 de-identification|http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html#dates].
  One of the rules for de-identifying a patient's date of birth is that if a 
 patient is 89 years old or younger, you can disclose his exact year of birth. 
 If a patient is 90 years old or over, you pretend that he's 90 years old.
 You can imagine implementing this as a key/value mapping in accumulo like,
 {{(pt_id, demographic, pt_dob, PII_DOB) - 1925-08-22}}
 {{(pt_id, demographic, pt_dob, SHD_DOB) - 1925}}
 Where the value corresponding to visibility SHD_DOB is produced at scan-time, 
 depending on the patient's current age.
 Another example would be the ability to produce a salted hash of a unique 
 identifier like a social security number or medical record number, where the 
 salt (or the hash algorithm, or the work factor...) could be specified 
 dynamically without having to re-code all the values in the system.
 More broadly speaking, this feature would give organizations more flexibility 
 to change how they deidentify, transform or anonymize data to suit different 
 access levels.
 Of course, to do this you'd need to have a pluggable component that can 
 process key/value pairs before visibilities are evaluated. I can see why this 
 might give a lot of people the heeby-jeebies but I'd like to gather as much 
 feedback as possible. Looking forward to hearing your thoughts!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3971) Better doc for tracing

2015-08-24 Thread Christopher Tubbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs updated ACCUMULO-3971:

Fix Version/s: 1.8.0

 Better doc for tracing
 --

 Key: ACCUMULO-3971
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3971
 Project: Accumulo
  Issue Type: Improvement
  Components: docs, trace
Affects Versions: 1.7.0
Reporter: Dylan Hutchison
Assignee: Dylan Hutchison
Priority: Minor
  Labels: documentation
 Fix For: 1.7.1, 1.8.0


 Developers who want to go beyond Accumulo's default tracing are too much on 
 their own to figure out how to use Accumulo's tracing features.  I had to 
 dive deep into source code to figure out how to understand the trace table 
 format and add a ZipkinSpanReceiver.  I'd like to share what I found so that 
 other developers do not have to to the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-1.6 - Build # 888 - Failure

2015-08-24 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-1.6 (build #888)

Status: Failure

Check console output at https://builds.apache.org/job/Accumulo-1.6/888/ to view 
the results.

Accumulo-Integration-Tests - Build # 243 - Aborted! -- 1.7

2015-08-24 Thread elserj
Accumulo-Integration-Tests - Build # 243 - Aborted:

Check console output at 
https://secure.penguinsinabox.com/jenkins/job/Accumulo-Integration-Tests/243/ 
to view the results.

Accumulo-Integration-Tests - Build # 244 - Aborted! -- master

2015-08-24 Thread elserj
Accumulo-Integration-Tests - Build # 244 - Aborted:

Check console output at 
https://secure.penguinsinabox.com/jenkins/job/Accumulo-Integration-Tests/244/ 
to view the results.

[jira] [Created] (ACCUMULO-3972) setshelliter command tries to load class

2015-08-24 Thread Keith Turner (JIRA)
Keith Turner created ACCUMULO-3972:
--

 Summary: setshelliter command tries to load class
 Key: ACCUMULO-3972
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3972
 Project: Accumulo
  Issue Type: Bug
Affects Versions: 1.6.3
Reporter: Keith Turner
 Fix For: 1.6.4, 1.7.1, 1.8.0


The setshelliter command does not check that a class exists in the correct way. 
 It should reach out to  a tserver and see if it can load the iterator class.  
Instead the command tries to load the class itself.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-1.6 - Build # 890 - Fixed

2015-08-24 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-1.6 (build #890)

Status: Fixed

Check console output at https://builds.apache.org/job/Accumulo-1.6/890/ to view 
the results.

[jira] [Commented] (ACCUMULO-3971) Better doc for tracing

2015-08-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710257#comment-14710257
 ] 

ASF GitHub Bot commented on ACCUMULO-3971:
--

Github user dhutchis commented on the pull request:

https://github.com/apache/accumulo/pull/44#issuecomment-134410576
  
Okay, this is set.  On a slightly related note, I see that Zipkin supports 
HBase, Cassandra and Anorm connections as backing stores for traces.  I wonder 
how much work it would take to get Zipkin's interface and/or 
trace-aggregating-statistics to use Accumulo as a backing store.


 Better doc for tracing
 --

 Key: ACCUMULO-3971
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3971
 Project: Accumulo
  Issue Type: Improvement
  Components: docs, trace
Affects Versions: 1.7.0
Reporter: Dylan Hutchison
Assignee: Dylan Hutchison
Priority: Minor
  Labels: documentation
 Fix For: 1.7.1, 1.8.0


 Developers who want to go beyond Accumulo's default tracing are too much on 
 their own to figure out how to use Accumulo's tracing features.  I had to 
 dive deep into source code to figure out how to understand the trace table 
 format and add a ZipkinSpanReceiver.  I'd like to share what I found so that 
 other developers do not have to to the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-Master - Build # 1699 - Failure

2015-08-24 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Master (build #1699)

Status: Failure

Check console output at https://builds.apache.org/job/Accumulo-Master/1699/ to 
view the results.

Accumulo-Integration-Tests - Build # 245 - Successful! -- 1.6

2015-08-24 Thread elserj
Accumulo-Integration-Tests - Build # 245 - Successful:

Check console output at 
https://secure.penguinsinabox.com/jenkins/job/Accumulo-Integration-Tests/245/ 
to view the results.