Re: Regarding the Secondary Index write path

2018-10-31 Thread Abhishek Talluri
Thanks for confirming on the WALEdit Vincent.

I did see there were multiple comments saying
// skip this mutation if we aren't enabling indexing
but there isn't a way to make it skip these indexing steps or is there a
variable that needs to be set on a table level to make it skip indexing.

IMO, it would be good to check for indexes in the
PheonixIndexBuilder#isEnabled method rather than taking this entire path
and realizing that no mutation needs to be indexed. In that way, we can
still load the coprocessor with every table by default and still skip these
extra ops.

On Tue, Oct 30, 2018 at 9:57 PM Vincent Poon 
wrote:

> Looks like you're right, the Indexer is loaded for all base tables.  I got
> confused with the case where the Indexer coproc is not loaded for *index*
> tables.
> I wonder what the overhead is like for having that.  It does seem from the
> code like loading it was intentional, as there are lines like this in
> preBatchMutate:
>
> // skip this mutation if we aren't enabling indexing
>
> However a lot has changed since then - one off the top of my head is we are
> doing write locking of rows within Phoenix itself now.  It seems if a table
> has no indexes, we can skip this locking.
>
> I created PHOENIX-5002 to investigate this.
>
>
> For the steps, the WALEdit in preBatchMutate is the same WALEdit for the
> data table.  The index updates get written alongside the data table updates
> in the WAL.
>
> You can see in preBatchMutate the Indexer is grabbing the WALEdit passed
> down from doMiniBatchMutation:
>
> WALEdit edit = miniBatchOp.getWalEdit(0);
>
>
>
>
> On Tue, Oct 30, 2018 at 6:08 PM Abhishek Talluri
>  wrote:
>
> > Thanks Vincent. Just to clarify, if I create a table through Phoenix and
> > check the describe on the hbase table, I see that the Indexer
> co-processor
> > is loaded with the table. Is there some code that loads only the required
> > co-processors when there is an Observer event?
> >
> > Coming back to the sequence of steps, in preBatchMutate, I do see that
> > after calculating index updates, we write the entries to a single WALEdit
> > saying that all the mutations that need to be indexed are durable at this
> > point (I assume these are the WALEdits for the index updates but not the
> > WALEdits for the data table itself).
> > preBatchMutate
> > -Calculate index updates
> > -Write the index updates to WAL and make them durable (helpful because if
> > RS crashes after the write to data table)
> > doMiniBatchMutation
> > - write to WAL without sync ( I am assuming that these edits are for the
> > actual data table itself )
> > - aquire MVCC
> > - write to memstore
> > - sync WAL
> > - advance MVCC (write becomes visible)
> > postBatchMutate
> > - write index update
> >
> > Correct me if I am wrong in assuming that WALEdit in preBatchMutate is
> > different from the wal edits that will be generated for data table.
> > Once again thanks for the quick response.
> >
> >
> > On Tue, Oct 30, 2018 at 4:16 PM Vincent Poon 
> > wrote:
> >
> >> If the table has no indexes, the Indexer coprocessor won't be loaded.
> >>
> >> As for your original question, the answer is a little nuanced.  For
> >> Phoenix
> >> 4.14+:
> >> The index updates are calculated in preBatchMutate, and the index
> updates
> >> are written in postBatchMutate.
> >> So from HRegion#doMiniBatchMutation, you can see the order of things.
> In
> >> short, it's something like:
> >> - Calculate index updates
> >> - write to WAL without sync
> >> - aquire MVCC
> >> - write to memstore
> >> - sync WAL
> >> - advance MVCC (write becomes visible)
> >> - write index update
> >>
> >> Note that when you write to the memstore, you have not advanced the MVCC
> >> yet.
> >> The order is pretty much what you suggested.
> >>
> >> On Tue, Oct 30, 2018 at 12:16 PM Abhishek Talluri
> >>  wrote:
> >>
> >> > Thanks Geoffrey for confirming that. Will go through that
> presentation.
> >> >
> >> > I have a follow-up question though,
> >> > Let’s say if a table does not have any indexes on it, will these
> >> > co-processors still be triggered and try to calculate index updates,
> >> since
> >> > these are loaded by default for any table that is created through
> >> phoenix
> >> > OR will this write path be entirely skipped since there are no indexes
> >> on
> >> > the table? Asking this because I could not find a flag check in the
> code
> >> > which checks if the indexes are present or not in pre/post Mutate
> >> > operations.
> >> >
> >> > Regards,
> >> > Abhishek
> >> >
> >> >
> >> >
> >> > On Tue, Oct 30, 2018 at 2:25 PM Geoffrey Jacoby 
> >> > wrote:
> >> >
> >> >> Abhishek,
> >> >>
> >> >> You might want to check out Vincent Poon's excellent presentation at
> >> this
> >> >> year's PhoenixCon about recent changes over the past couple of years
> to
> >> >> the
> >> >> index pipeline.
> >> >>
> >> >> https://www.youtube.com/watch?v=VBONDM7sD40
> >> >>
> >> >> One of those changes is the one you observed. Global mutable index
> >>

Re: Regarding the Secondary Index write path

2018-10-30 Thread Vincent Poon
Looks like you're right, the Indexer is loaded for all base tables.  I got
confused with the case where the Indexer coproc is not loaded for *index*
tables.
I wonder what the overhead is like for having that.  It does seem from the
code like loading it was intentional, as there are lines like this in
preBatchMutate:

// skip this mutation if we aren't enabling indexing

However a lot has changed since then - one off the top of my head is we are
doing write locking of rows within Phoenix itself now.  It seems if a table
has no indexes, we can skip this locking.

I created PHOENIX-5002 to investigate this.


For the steps, the WALEdit in preBatchMutate is the same WALEdit for the
data table.  The index updates get written alongside the data table updates
in the WAL.

You can see in preBatchMutate the Indexer is grabbing the WALEdit passed
down from doMiniBatchMutation:

WALEdit edit = miniBatchOp.getWalEdit(0);




On Tue, Oct 30, 2018 at 6:08 PM Abhishek Talluri
 wrote:

> Thanks Vincent. Just to clarify, if I create a table through Phoenix and
> check the describe on the hbase table, I see that the Indexer co-processor
> is loaded with the table. Is there some code that loads only the required
> co-processors when there is an Observer event?
>
> Coming back to the sequence of steps, in preBatchMutate, I do see that
> after calculating index updates, we write the entries to a single WALEdit
> saying that all the mutations that need to be indexed are durable at this
> point (I assume these are the WALEdits for the index updates but not the
> WALEdits for the data table itself).
> preBatchMutate
> -Calculate index updates
> -Write the index updates to WAL and make them durable (helpful because if
> RS crashes after the write to data table)
> doMiniBatchMutation
> - write to WAL without sync ( I am assuming that these edits are for the
> actual data table itself )
> - aquire MVCC
> - write to memstore
> - sync WAL
> - advance MVCC (write becomes visible)
> postBatchMutate
> - write index update
>
> Correct me if I am wrong in assuming that WALEdit in preBatchMutate is
> different from the wal edits that will be generated for data table.
> Once again thanks for the quick response.
>
>
> On Tue, Oct 30, 2018 at 4:16 PM Vincent Poon 
> wrote:
>
>> If the table has no indexes, the Indexer coprocessor won't be loaded.
>>
>> As for your original question, the answer is a little nuanced.  For
>> Phoenix
>> 4.14+:
>> The index updates are calculated in preBatchMutate, and the index updates
>> are written in postBatchMutate.
>> So from HRegion#doMiniBatchMutation, you can see the order of things.  In
>> short, it's something like:
>> - Calculate index updates
>> - write to WAL without sync
>> - aquire MVCC
>> - write to memstore
>> - sync WAL
>> - advance MVCC (write becomes visible)
>> - write index update
>>
>> Note that when you write to the memstore, you have not advanced the MVCC
>> yet.
>> The order is pretty much what you suggested.
>>
>> On Tue, Oct 30, 2018 at 12:16 PM Abhishek Talluri
>>  wrote:
>>
>> > Thanks Geoffrey for confirming that. Will go through that presentation.
>> >
>> > I have a follow-up question though,
>> > Let’s say if a table does not have any indexes on it, will these
>> > co-processors still be triggered and try to calculate index updates,
>> since
>> > these are loaded by default for any table that is created through
>> phoenix
>> > OR will this write path be entirely skipped since there are no indexes
>> on
>> > the table? Asking this because I could not find a flag check in the code
>> > which checks if the indexes are present or not in pre/post Mutate
>> > operations.
>> >
>> > Regards,
>> > Abhishek
>> >
>> >
>> >
>> > On Tue, Oct 30, 2018 at 2:25 PM Geoffrey Jacoby 
>> > wrote:
>> >
>> >> Abhishek,
>> >>
>> >> You might want to check out Vincent Poon's excellent presentation at
>> this
>> >> year's PhoenixCon about recent changes over the past couple of years to
>> >> the
>> >> index pipeline.
>> >>
>> >> https://www.youtube.com/watch?v=VBONDM7sD40
>> >>
>> >> One of those changes is the one you observed. Global mutable index
>> writes
>> >> were moved later in the HBase write pipeline to avoid some nasty
>> deadlock
>> >> and starvation cases that could occur when making MemStore writes /
>> MVCC
>> >> advancement wait on cross-server index RPCs to complete.
>> >>
>> >> Geoffrey
>> >>
>> >>
>> >> On Tue, Oct 30, 2018 at 10:31 AM Abhishek Talluri
>> >>  wrote:
>> >>
>> >> > Hi All,
>> >> >
>> >> > I am referring to the presentation that is given in SF Hbase Meetup
>> in
>> >> > 2013, attached it for your reference. The write path states that,
>> >> > co-processor calculates the index updates and wal edits
>> >> > -> Writes it to the WAL (Making it durable)
>> >> > -> Then write the index updates
>> >> > -> Then proceed to the Memstore.
>> >> >
>> >> > But after looking at the code base (looked into 4.7 and 4.14), it
>> looks
>> >> > like the write to index tables happen in the postB

Re: Regarding the Secondary Index write path

2018-10-30 Thread Abhishek Talluri
Thanks Vincent. Just to clarify, if I create a table through Phoenix and
check the describe on the hbase table, I see that the Indexer co-processor
is loaded with the table. Is there some code that loads only the required
co-processors when there is an Observer event?

Coming back to the sequence of steps, in preBatchMutate, I do see that
after calculating index updates, we write the entries to a single WALEdit
saying that all the mutations that need to be indexed are durable at this
point (I assume these are the WALEdits for the index updates but not the
WALEdits for the data table itself).
preBatchMutate
-Calculate index updates
-Write the index updates to WAL and make them durable (helpful because if
RS crashes after the write to data table)
doMiniBatchMutation
- write to WAL without sync ( I am assuming that these edits are for the
actual data table itself )
- aquire MVCC
- write to memstore
- sync WAL
- advance MVCC (write becomes visible)
postBatchMutate
- write index update

Correct me if I am wrong in assuming that WALEdit in preBatchMutate is
different from the wal edits that will be generated for data table.
Once again thanks for the quick response.


On Tue, Oct 30, 2018 at 4:16 PM Vincent Poon 
wrote:

> If the table has no indexes, the Indexer coprocessor won't be loaded.
>
> As for your original question, the answer is a little nuanced.  For Phoenix
> 4.14+:
> The index updates are calculated in preBatchMutate, and the index updates
> are written in postBatchMutate.
> So from HRegion#doMiniBatchMutation, you can see the order of things.  In
> short, it's something like:
> - Calculate index updates
> - write to WAL without sync
> - aquire MVCC
> - write to memstore
> - sync WAL
> - advance MVCC (write becomes visible)
> - write index update
>
> Note that when you write to the memstore, you have not advanced the MVCC
> yet.
> The order is pretty much what you suggested.
>
> On Tue, Oct 30, 2018 at 12:16 PM Abhishek Talluri
>  wrote:
>
> > Thanks Geoffrey for confirming that. Will go through that presentation.
> >
> > I have a follow-up question though,
> > Let’s say if a table does not have any indexes on it, will these
> > co-processors still be triggered and try to calculate index updates,
> since
> > these are loaded by default for any table that is created through phoenix
> > OR will this write path be entirely skipped since there are no indexes on
> > the table? Asking this because I could not find a flag check in the code
> > which checks if the indexes are present or not in pre/post Mutate
> > operations.
> >
> > Regards,
> > Abhishek
> >
> >
> >
> > On Tue, Oct 30, 2018 at 2:25 PM Geoffrey Jacoby 
> > wrote:
> >
> >> Abhishek,
> >>
> >> You might want to check out Vincent Poon's excellent presentation at
> this
> >> year's PhoenixCon about recent changes over the past couple of years to
> >> the
> >> index pipeline.
> >>
> >> https://www.youtube.com/watch?v=VBONDM7sD40
> >>
> >> One of those changes is the one you observed. Global mutable index
> writes
> >> were moved later in the HBase write pipeline to avoid some nasty
> deadlock
> >> and starvation cases that could occur when making MemStore writes / MVCC
> >> advancement wait on cross-server index RPCs to complete.
> >>
> >> Geoffrey
> >>
> >>
> >> On Tue, Oct 30, 2018 at 10:31 AM Abhishek Talluri
> >>  wrote:
> >>
> >> > Hi All,
> >> >
> >> > I am referring to the presentation that is given in SF Hbase Meetup in
> >> > 2013, attached it for your reference. The write path states that,
> >> > co-processor calculates the index updates and wal edits
> >> > -> Writes it to the WAL (Making it durable)
> >> > -> Then write the index updates
> >> > -> Then proceed to the Memstore.
> >> >
> >> > But after looking at the code base (looked into 4.7 and 4.14), it
> looks
> >> > like the write to index tables happen in the postBatchMutate phase
> >> which is
> >> > after the MemStore write finishes. I wanted to check with the
> community
> >> to
> >> > see if the flowchart is outdated. I feel that series of steps should
> be:
> >> > co-processor calculates the index updates and wal edits
> >> > -> Writes it to the WAL (Making it durable)
> >> > -> Then proceed to the Memstore.
> >> > -> Then write the index updates
> >> >
> >> > Appreciate any input on this. I want to clarify this because we see a
> >> case
> >> > where the write path could be creating some delay between write to WAL
> >> and
> >> > MemStore and creating some out of sync issue when using hbase lily
> >> indexer.
> >> >
> >> > Thanks,
> >> > Abhishek Talluri
> >> >
> >> >
> >>
> >
> >
> > --
> > Thanks,
> > Abhishek Talluri
> > Ph:9292405270
> >
> >
> >
>


-- 
Thanks,
Abhishek Talluri
Ph:9292405270


Re: Regarding the Secondary Index write path

2018-10-30 Thread Vincent Poon
If the table has no indexes, the Indexer coprocessor won't be loaded.

As for your original question, the answer is a little nuanced.  For Phoenix
4.14+:
The index updates are calculated in preBatchMutate, and the index updates
are written in postBatchMutate.
So from HRegion#doMiniBatchMutation, you can see the order of things.  In
short, it's something like:
- Calculate index updates
- write to WAL without sync
- aquire MVCC
- write to memstore
- sync WAL
- advance MVCC (write becomes visible)
- write index update

Note that when you write to the memstore, you have not advanced the MVCC
yet.
The order is pretty much what you suggested.

On Tue, Oct 30, 2018 at 12:16 PM Abhishek Talluri
 wrote:

> Thanks Geoffrey for confirming that. Will go through that presentation.
>
> I have a follow-up question though,
> Let’s say if a table does not have any indexes on it, will these
> co-processors still be triggered and try to calculate index updates, since
> these are loaded by default for any table that is created through phoenix
> OR will this write path be entirely skipped since there are no indexes on
> the table? Asking this because I could not find a flag check in the code
> which checks if the indexes are present or not in pre/post Mutate
> operations.
>
> Regards,
> Abhishek
>
>
>
> On Tue, Oct 30, 2018 at 2:25 PM Geoffrey Jacoby 
> wrote:
>
>> Abhishek,
>>
>> You might want to check out Vincent Poon's excellent presentation at this
>> year's PhoenixCon about recent changes over the past couple of years to
>> the
>> index pipeline.
>>
>> https://www.youtube.com/watch?v=VBONDM7sD40
>>
>> One of those changes is the one you observed. Global mutable index writes
>> were moved later in the HBase write pipeline to avoid some nasty deadlock
>> and starvation cases that could occur when making MemStore writes / MVCC
>> advancement wait on cross-server index RPCs to complete.
>>
>> Geoffrey
>>
>>
>> On Tue, Oct 30, 2018 at 10:31 AM Abhishek Talluri
>>  wrote:
>>
>> > Hi All,
>> >
>> > I am referring to the presentation that is given in SF Hbase Meetup in
>> > 2013, attached it for your reference. The write path states that,
>> > co-processor calculates the index updates and wal edits
>> > -> Writes it to the WAL (Making it durable)
>> > -> Then write the index updates
>> > -> Then proceed to the Memstore.
>> >
>> > But after looking at the code base (looked into 4.7 and 4.14), it looks
>> > like the write to index tables happen in the postBatchMutate phase
>> which is
>> > after the MemStore write finishes. I wanted to check with the community
>> to
>> > see if the flowchart is outdated. I feel that series of steps should be:
>> > co-processor calculates the index updates and wal edits
>> > -> Writes it to the WAL (Making it durable)
>> > -> Then proceed to the Memstore.
>> > -> Then write the index updates
>> >
>> > Appreciate any input on this. I want to clarify this because we see a
>> case
>> > where the write path could be creating some delay between write to WAL
>> and
>> > MemStore and creating some out of sync issue when using hbase lily
>> indexer.
>> >
>> > Thanks,
>> > Abhishek Talluri
>> >
>> >
>>
>
>
> --
> Thanks,
> Abhishek Talluri
> Ph:9292405270
>
>
>


Re: Regarding the Secondary Index write path

2018-10-30 Thread Abhishek Talluri
Thanks Geoffrey for confirming that. Will go through that presentation.

I have a follow-up question though,
Let’s say if a table does not have any indexes on it, will these
co-processors still be triggered and try to calculate index updates, since
these are loaded by default for any table that is created through phoenix
OR will this write path be entirely skipped since there are no indexes on
the table? Asking this because I could not find a flag check in the code
which checks if the indexes are present or not in pre/post Mutate
operations.

Regards,
Abhishek



On Tue, Oct 30, 2018 at 2:25 PM Geoffrey Jacoby  wrote:

> Abhishek,
>
> You might want to check out Vincent Poon's excellent presentation at this
> year's PhoenixCon about recent changes over the past couple of years to the
> index pipeline.
>
> https://www.youtube.com/watch?v=VBONDM7sD40
>
> One of those changes is the one you observed. Global mutable index writes
> were moved later in the HBase write pipeline to avoid some nasty deadlock
> and starvation cases that could occur when making MemStore writes / MVCC
> advancement wait on cross-server index RPCs to complete.
>
> Geoffrey
>
>
> On Tue, Oct 30, 2018 at 10:31 AM Abhishek Talluri
>  wrote:
>
> > Hi All,
> >
> > I am referring to the presentation that is given in SF Hbase Meetup in
> > 2013, attached it for your reference. The write path states that,
> > co-processor calculates the index updates and wal edits
> > -> Writes it to the WAL (Making it durable)
> > -> Then write the index updates
> > -> Then proceed to the Memstore.
> >
> > But after looking at the code base (looked into 4.7 and 4.14), it looks
> > like the write to index tables happen in the postBatchMutate phase which
> is
> > after the MemStore write finishes. I wanted to check with the community
> to
> > see if the flowchart is outdated. I feel that series of steps should be:
> > co-processor calculates the index updates and wal edits
> > -> Writes it to the WAL (Making it durable)
> > -> Then proceed to the Memstore.
> > -> Then write the index updates
> >
> > Appreciate any input on this. I want to clarify this because we see a
> case
> > where the write path could be creating some delay between write to WAL
> and
> > MemStore and creating some out of sync issue when using hbase lily
> indexer.
> >
> > Thanks,
> > Abhishek Talluri
> >
> >
>


-- 
Thanks,
Abhishek Talluri
Ph:9292405270


Re: Regarding the Secondary Index write path

2018-10-30 Thread Geoffrey Jacoby
Abhishek,

You might want to check out Vincent Poon's excellent presentation at this
year's PhoenixCon about recent changes over the past couple of years to the
index pipeline.

https://www.youtube.com/watch?v=VBONDM7sD40

One of those changes is the one you observed. Global mutable index writes
were moved later in the HBase write pipeline to avoid some nasty deadlock
and starvation cases that could occur when making MemStore writes / MVCC
advancement wait on cross-server index RPCs to complete.

Geoffrey


On Tue, Oct 30, 2018 at 10:31 AM Abhishek Talluri
 wrote:

> Hi All,
>
> I am referring to the presentation that is given in SF Hbase Meetup in
> 2013, attached it for your reference. The write path states that,
> co-processor calculates the index updates and wal edits
> -> Writes it to the WAL (Making it durable)
> -> Then write the index updates
> -> Then proceed to the Memstore.
>
> But after looking at the code base (looked into 4.7 and 4.14), it looks
> like the write to index tables happen in the postBatchMutate phase which is
> after the MemStore write finishes. I wanted to check with the community to
> see if the flowchart is outdated. I feel that series of steps should be:
> co-processor calculates the index updates and wal edits
> -> Writes it to the WAL (Making it durable)
> -> Then proceed to the Memstore.
> -> Then write the index updates
>
> Appreciate any input on this. I want to clarify this because we see a case
> where the write path could be creating some delay between write to WAL and
> MemStore and creating some out of sync issue when using hbase lily indexer.
>
> Thanks,
> Abhishek Talluri
>
>