[jira] [Commented] (PHOENIX-2742) Add new batchmutate APIs in HRegion without mvcc and region level locks

Enis Soztutar (JIRA) Mon, 07 Mar 2016 17:46:14 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184197#comment-15184197
 ]


Enis Soztutar commented on PHOENIX-2742:
----------------------------------------

bq. I think this is for local index where they will write to a shadow column 
and that is part of the same table and same region but different Cf?
Yes, same region, different column family and also different row for each local 
index definition. 

bq. When it is cells to be written to same region (Another CF), we can include 
new cells into each mutation in the pre hook right? We dont need to do another 
batchMutate() call. Am I missing any thing?
I think so. See my comment from parent jira replicated here: 

bq. Given the above usage of batchMutate(), I’m assuming that means that local 
index updates are not transactional with the data updates. I suppose we can do 
this in a follow up JIRA, but wouldn’t it be possible to use the 
MultiRowMutationService.mutateRows() call that includes both index and data 
updates to get transactionality? We could have our Indexer.preBatchMutate() 
call e.bypass() which is a technique I’ve seen used by Tephra to skip the 
normal processing. Having the index updates be transactional is one of the most 
important advantages of local indexing IMHO, so it'd be a shame not to have 
this.
Agreed, that is what I was proposing above: 
https://issues.apache.org/jira/browse/PHOENIX-1734?focusedCommentId=14986604&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14986604

I was curious to see how would this work with mutateRowsWithLocks(). The 
problem is that the writes to the data table arrives at the regular 
{{HRegion.doMiniBatchMutation}} rather than {{mutateRowsWithLocks()}}. However, 
I was able to write a simple coprocessor test to demonstrate that adding the 
extra local index mutations coming from the {{IndexCodec}} should be possible 
in {{preBatchMutate()}}. 

We can do this: 
{code}
// add all index updates:
    private Put addUnchecked(Cell kv, Put put) {
      byte [] family = CellUtil.cloneFamily(kv);
      NavigableMap<byte[], List<Cell>> familyMap = put.getFamilyCellMap();
      List<Cell> list = put.getFamilyCellMap().get(family);
      if (list == null) {
        list = new ArrayList<Cell>();
      }

      list.add(kv);
      familyMap.put(family, list);
      return put;
    }
{code}

at {{preBatchMutate()}} for every mutation, and it seems to be working: 
{code}
keyvalues={a/info:c1/1447717036828/Put/vlen=1/seqid=0}
keyvalues={a1d/L_info:c1/1447717036828/Put/vlen=0/seqid=0}
keyvalues={a2c/L_info:c1/1447717036828/Put/vlen=0/seqid=0}
keyvalues={a3b/L_info:c1/1447717036828/Put/vlen=0/seqid=0}
keyvalues={a4a/L_info:c1/1447717036828/Put/vlen=0/seqid=0}
keyvalues={b/info:c1/1447717036828/Put/vlen=1/seqid=0}
keyvalues={c/info:c1/1447717036828/Put/vlen=1/seqid=0}
keyvalues={d/info:c1/1447717036828/Put/vlen=1/seqid=0}
{code}

Since this is pre-mutate, the changes should have atomic visibility. 
The only fishy part is that, Mutation should normally only hold the Cell's for 
the same row. Here we are adding Cells for the local index that is for 
different rows. The rowLocks should be fine since locking the original row only 
should be enough for index updates. I did not do the deep inspection at the 
doMiniBatchMutation() to see whether there are assumptions that might break. 

More findings: 
 - In {{doMiniBatchMutation()}}, we call getFamilyCellMap() before 
{{preBatchMutate()}}. However, {{prePut()}}, {{preDelete()}} happens before 
even {{doMiniBatchMutation}} is called. 
{code}
        Map<byte[], List<Cell>> familyMap = mutation.getFamilyCellMap();
{code}
Thus, Indexer should inject at the prePut() level, not {{preBatchMutate()}} 
level for local indexes if we are following this path. 

- It seems that other code paths in doMiniBatchMutate() does not need the row 
key from the Mutation. Only rowLock and {{checkRow()}} needs that. Local index 
updates in theory always happen to be in the current region by definition, thus 
checkRow() is not needed (unless there is a bug). Can we have a case where the 
row key for the index update is the same as a data row key? If that is 
possible, lacking row locks might cause problems? 

bq. Sounds like from your experiment, it wouldn't matter which Mutation in 
miniBatchOp you added the index Mutations too, right? Though if need be, it 
wouldn't be difficult to add them to the data Mutation they're associated with
I think it will be safest to add the index updates to the same row. The 
doMiniBatchMutation() is not atomic across operations in the batch, and every 
operation has a different success / exception tracker. 



> Add new batchmutate APIs in HRegion without mvcc and region level locks
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-2742
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2742
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Enis Soztutar
>             Fix For: 4.8.0
>
>
> Currently we cannot write mutations to same table in (pre/post)BatchMutate 
> hooks because of mvcc. It would be better to add new API to Region which 
> allows to write to table without locks and also with out memstore size check. 
> Need to see how sequence id's going to effect when the API used in 
> coprocessor hooks. 
> Just raising here to track it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2742) Add new batchmutate APIs in HRegion without mvcc and region level locks

Reply via email to