[ 
https://issues.apache.org/jira/browse/PHOENIX-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095360#comment-15095360
 ] 

James Taylor commented on PHOENIX-2446:
---------------------------------------

I'm not sure of the exact processing that occurs during a flush, but logically 
it writes the data that's in the memstore to disk. How long does it take and if 
you sleep for that amount of time (instead of doing the flush), do your tests 
pass?

Have you tried running the UPSERT/SELECT at a lower priority?

I suppose we need to understand the timing of everything to understand why it's 
failing. Is the data inflight when the index population occurs runs? Or does 
the data arrive at the server while the UPSERT/SELECT is running?

Would be good to make a timeline like this:
* CREATE INDEX IDX ON T(x) statement compiled
** Table T is resolved at t0
* CREATE INDEX executed
** HBase metadata created for new IDX table (index on view or local index may 
not create any new metadata)
** Phoenix metadata inserted for new index
** UPSERT SELECT run to populate index using t0 for scan and put timestamp
*** Execute n scans in parallel chunk-by-chunk for each guidepost and submit 
batch mutation for initial index population.
** Mark new index as active (i.e. updating Phoenix metadata through coprocessor 
call)

The above is time ordered, so the UPSERT SELECT is running as of an earlier 
timestamp. When does other client's batch of mutations have to hit the server 
for the problem to occur?


> Immutable index - Index vs base table row count does not match when index is 
> created during data load
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2446
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2446
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.6.0
>            Reporter: Mujtaba Chohan
>            Assignee: Thomas D'Silva
>             Fix For: 4.7.0
>
>         Attachments: PHOENIX-2446-wip.patch, PHOENIX-2446.patch
>
>
> I'll add more details later but here's the scenario that consistently 
> produces wrong row count for index table vs base table for immutable async 
> index.
> 1. Start data upsert
> 2. Create async index
> 3. Trigger M/R index build
> 4. Keep data upsert going in background during step 2,3 and a while after M/R 
> index finishes.
> 5. End data upsert. 
> Now count with index enabled vs count with hint to not use index is off by a 
> large factor. Will get a cleaner repro for this issue soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to