[ 
https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033733#comment-16033733
 ] 

ASF GitHub Bot commented on PHOENIX-3744:
-----------------------------------------

Github user JamesRTaylor commented on the issue:

    https://github.com/apache/phoenix/pull/239
  
    Patch looks very good, @akshita-malhotra. What's the advantage, @lhofhansl, 
of forcing users to create the snapshot themselves before starting the job? 
Wouldn't it be simpler for the snapshot to be created during the 
setup/initialization of the MR job? In the non MR case, when we want to support 
running arbitrary queries over snapshot(s), seems like we'd want Phoenix to 
create them, no? Otherwise, we'd need to provide the user with some means of 
associating a snapshot with a table name (which might get cumbersome). The 
alternative is to let Phoenix manage this transparently.


> Support snapshot scanners for MR-based queries
> ----------------------------------------------
>
>                 Key: PHOENIX-3744
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3744
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Akshita Malhotra
>         Attachments: PHOENIX-3744.patch, PHOENIX-3744.patch, 
> PHOENIX-3744.patch
>
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses 
> the region directly in HDFS. We should make sure that Phoenix can support 
> that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some 
> ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the 
> past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes 
> that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any 
> data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to