[ 
https://issues.apache.org/jira/browse/ACCUMULO-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709855#comment-14709855
 ] 

ASF GitHub Bot commented on ACCUMULO-3959:
------------------------------------------

Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/45#discussion_r37789259
  
    --- Diff: 
core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java ---
    @@ -16,19 +16,20 @@
      */
     package org.apache.accumulo.core.client;
     
    +import org.apache.accumulo.core.data.Range;
    +
     import java.util.Collection;
     import java.util.concurrent.TimeUnit;
     
    -import org.apache.accumulo.core.data.Range;
    -
     /**
      * Implementations of BatchScanner support efficient lookups of many 
ranges in accumulo.
    + * BatchScanners are also appropriate for large, single ranges,
    + * as a BatchScanner will break those ranges up into separate RPCs
    + * provided the range spans more than one tablet
    + * and there are sufficiently many scan threads available.
    --- End diff --
    
    Maybe instead of suggesting how to use the batch scanner, we could focus 
more on describing possible behavior?
    
      * May parallelize reading of data.  Multiple input ranges may be read in 
parallel or sub ranges of individual input ranges may be read in parallel.
      * May return data in unsorted order.
      * May batch multiple ranges into a single RPC to a tserver.   
    
    Could still mention possible use cases like :
    
      * Looking up lots of small ranges
      * Parallelizing (spell check in web browser does not like that word) a 
computation over an entire table using a large range w/ iterators.   This case 
may not return lots of data, although lots of data may be read by the iterators.



> Confusing wording on BatchScanner javadoc
> -----------------------------------------
>
>                 Key: ACCUMULO-3959
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3959
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: docs
>    Affects Versions: 1.6.3, 1.7.0
>            Reporter: Dylan Hutchison
>            Assignee: Dylan Hutchison
>            Priority: Minor
>              Labels: docuentation
>             Fix For: 1.6.4, 1.7.1
>
>
> The following sentence in the [BatchScanner 
> Javadoc|https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/BatchScanner.html]
>  has confused my colleagues into using Scanners and wondering why performance 
> doesn't scale.
> bq. If you want to lookup a few ranges and expect those ranges to contain a 
> lot of data, then use the Scanner instead.
> Also regarding this next sentence, from what I see of the BatchScanner it 
> will break up "large Range objects" that span multiple extents (tablets) into 
> multiple ranges, possibly one for each tablet.
> bq. Use this when looking up lots of ranges and you expect each range to 
> contain a small amount of data.
> If the client is okay with unsorted order and it is okay with using multiple 
> threads, then isn't it always a better decision to use a BatchScanner than 
> regular Scanner?  In the worst case, one Range over a single row, the 
> BatchScanner will perform the same as a regular Scanner, ya?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to