[jira] [Commented] (PHOENIX-3744) Support snapshot scanners for MR-based queries

ASF GitHub Bot (JIRA) Wed, 24 May 2017 10:20:26 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023267#comment-16023267
 ]


ASF GitHub Bot commented on PHOENIX-3744:
-----------------------------------------

Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/239#discussion_r118312938
  
    --- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelScanGrouper.java
 ---
    @@ -17,46 +17,79 @@
      */
     package org.apache.phoenix.iterate;
     
    +import java.sql.SQLException;
     import java.util.List;
     
    +import com.google.common.base.Preconditions;
    +import org.apache.hadoop.hbase.HRegionLocation;
     import org.apache.hadoop.hbase.client.Scan;
     import org.apache.phoenix.compile.QueryPlan;
    +import org.apache.phoenix.compile.StatementContext;
     import org.apache.phoenix.schema.PTable;
     import org.apache.phoenix.schema.PTable.IndexType;
     import org.apache.phoenix.schema.SaltingUtil;
    +import org.apache.phoenix.schema.TableRef;
     import org.apache.phoenix.util.ScanUtil;
     
     /**
      * Default implementation that creates a scan group if a plan is row key 
ordered (which requires a merge sort),
    - * or if a scan crosses a region boundary and the table is salted or a 
local index.   
    + * or if a scan crosses a region boundary and the table is salted or a 
local index.
      */
     public class DefaultParallelScanGrouper implements ParallelScanGrouper {
    -   
    -   private static final DefaultParallelScanGrouper INSTANCE = new 
DefaultParallelScanGrouper();
     
    -    public static DefaultParallelScanGrouper getInstance() {
    -        return INSTANCE;
    -    }
    -    
    -    private DefaultParallelScanGrouper() {}
    -
    -   @Override
    -   public boolean shouldStartNewScan(QueryPlan plan, List<Scan> scans, 
byte[] startKey, boolean crossedRegionBoundary) {
    -           PTable table = plan.getTableRef().getTable();
    -           boolean startNewScanGroup = false;
    -        if (!plan.isRowKeyOrdered()) {
    -            startNewScanGroup = true;
    -        } else if (crossedRegionBoundary) {
    -            if (table.getIndexType() == IndexType.LOCAL) {
    -                startNewScanGroup = true;
    -            } else if (table.getBucketNum() != null) {
    -                startNewScanGroup = scans.isEmpty() ||
    -                        ScanUtil.crossesPrefixBoundary(startKey,
    -                                
ScanUtil.getPrefix(scans.get(scans.size()-1).getStartRow(), 
SaltingUtil.NUM_SALTING_BYTES), 
    -                                SaltingUtil.NUM_SALTING_BYTES);
    -            }
    -        }
    -        return startNewScanGroup;
    +  private static DefaultParallelScanGrouper INSTANCE = new 
DefaultParallelScanGrouper();
    --- End diff --
    
    I don't think that DefaultParallelScanGrouper can be a singleton with the 
state of context and tableName inside of it.


> Support snapshot scanners for MR-based queries
> ----------------------------------------------
>
>                 Key: PHOENIX-3744
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3744
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Akshita Malhotra
>         Attachments: PHOENIX-3744.patch
>
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses 
> the region directly in HDFS. We should make sure that Phoenix can support 
> that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some 
> ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the 
> past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes 
> that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any 
> data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3744) Support snapshot scanners for MR-based queries

Reply via email to