[ https://issues.apache.org/jira/browse/ACCUMULO-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709855#comment-14709855 ]
ASF GitHub Bot commented on ACCUMULO-3959: ------------------------------------------ Github user keith-turner commented on a diff in the pull request: https://github.com/apache/accumulo/pull/45#discussion_r37789259 --- Diff: core/src/main/java/org/apache/accumulo/core/client/BatchScanner.java --- @@ -16,19 +16,20 @@ */ package org.apache.accumulo.core.client; +import org.apache.accumulo.core.data.Range; + import java.util.Collection; import java.util.concurrent.TimeUnit; -import org.apache.accumulo.core.data.Range; - /** * Implementations of BatchScanner support efficient lookups of many ranges in accumulo. + * BatchScanners are also appropriate for large, single ranges, + * as a BatchScanner will break those ranges up into separate RPCs + * provided the range spans more than one tablet + * and there are sufficiently many scan threads available. --- End diff -- Maybe instead of suggesting how to use the batch scanner, we could focus more on describing possible behavior? * May parallelize reading of data. Multiple input ranges may be read in parallel or sub ranges of individual input ranges may be read in parallel. * May return data in unsorted order. * May batch multiple ranges into a single RPC to a tserver. Could still mention possible use cases like : * Looking up lots of small ranges * Parallelizing (spell check in web browser does not like that word) a computation over an entire table using a large range w/ iterators. This case may not return lots of data, although lots of data may be read by the iterators. > Confusing wording on BatchScanner javadoc > ----------------------------------------- > > Key: ACCUMULO-3959 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3959 > Project: Accumulo > Issue Type: Improvement > Components: docs > Affects Versions: 1.6.3, 1.7.0 > Reporter: Dylan Hutchison > Assignee: Dylan Hutchison > Priority: Minor > Labels: docuentation > Fix For: 1.6.4, 1.7.1 > > > The following sentence in the [BatchScanner > Javadoc|https://accumulo.apache.org/1.7/apidocs/org/apache/accumulo/core/client/BatchScanner.html] > has confused my colleagues into using Scanners and wondering why performance > doesn't scale. > bq. If you want to lookup a few ranges and expect those ranges to contain a > lot of data, then use the Scanner instead. > Also regarding this next sentence, from what I see of the BatchScanner it > will break up "large Range objects" that span multiple extents (tablets) into > multiple ranges, possibly one for each tablet. > bq. Use this when looking up lots of ranges and you expect each range to > contain a small amount of data. > If the client is okay with unsorted order and it is okay with using multiple > threads, then isn't it always a better decision to use a BatchScanner than > regular Scanner? In the worst case, one Range over a single row, the > BatchScanner will perform the same as a regular Scanner, ya? -- This message was sent by Atlassian JIRA (v6.3.4#6332)