Re: Regarding the Secondary Index write path
Thanks for confirming on the WALEdit Vincent. I did see there were multiple comments saying // skip this mutation if we aren't enabling indexing but there isn't a way to make it skip these indexing steps or is there a variable that needs to be set on a table level to make it skip indexing. IMO, it would be good to check for indexes in the PheonixIndexBuilder#isEnabled method rather than taking this entire path and realizing that no mutation needs to be indexed. In that way, we can still load the coprocessor with every table by default and still skip these extra ops. On Tue, Oct 30, 2018 at 9:57 PM Vincent Poon wrote: > Looks like you're right, the Indexer is loaded for all base tables. I got > confused with the case where the Indexer coproc is not loaded for *index* > tables. > I wonder what the overhead is like for having that. It does seem from the > code like loading it was intentional, as there are lines like this in > preBatchMutate: > > // skip this mutation if we aren't enabling indexing > > However a lot has changed since then - one off the top of my head is we are > doing write locking of rows within Phoenix itself now. It seems if a table > has no indexes, we can skip this locking. > > I created PHOENIX-5002 to investigate this. > > > For the steps, the WALEdit in preBatchMutate is the same WALEdit for the > data table. The index updates get written alongside the data table updates > in the WAL. > > You can see in preBatchMutate the Indexer is grabbing the WALEdit passed > down from doMiniBatchMutation: > > WALEdit edit = miniBatchOp.getWalEdit(0); > > > > > On Tue, Oct 30, 2018 at 6:08 PM Abhishek Talluri > wrote: > > > Thanks Vincent. Just to clarify, if I create a table through Phoenix and > > check the describe on the hbase table, I see that the Indexer > co-processor > > is loaded with the table. Is there some code that loads only the required > > co-processors when there is an Observer event? > > > > Coming back to the sequence of steps, in preBatchMutate, I do see that > > after calculating index updates, we write the entries to a single WALEdit > > saying that all the mutations that need to be indexed are durable at this > > point (I assume these are the WALEdits for the index updates but not the > > WALEdits for the data table itself). > > preBatchMutate > > -Calculate index updates > > -Write the index updates to WAL and make them durable (helpful because if > > RS crashes after the write to data table) > > doMiniBatchMutation > > - write to WAL without sync ( I am assuming that these edits are for the > > actual data table itself ) > > - aquire MVCC > > - write to memstore > > - sync WAL > > - advance MVCC (write becomes visible) > > postBatchMutate > > - write index update > > > > Correct me if I am wrong in assuming that WALEdit in preBatchMutate is > > different from the wal edits that will be generated for data table. > > Once again thanks for the quick response. > > > > > > On Tue, Oct 30, 2018 at 4:16 PM Vincent Poon > > wrote: > > > >> If the table has no indexes, the Indexer coprocessor won't be loaded. > >> > >> As for your original question, the answer is a little nuanced. For > >> Phoenix > >> 4.14+: > >> The index updates are calculated in preBatchMutate, and the index > updates > >> are written in postBatchMutate. > >> So from HRegion#doMiniBatchMutation, you can see the order of things. > In > >> short, it's something like: > >> - Calculate index updates > >> - write to WAL without sync > >> - aquire MVCC > >> - write to memstore > >> - sync WAL > >> - advance MVCC (write becomes visible) > >> - write index update > >> > >> Note that when you write to the memstore, you have not advanced the MVCC > >> yet. > >> The order is pretty much what you suggested. > >> > >> On Tue, Oct 30, 2018 at 12:16 PM Abhishek Talluri > >> wrote: > >> > >> > Thanks Geoffrey for confirming that. Will go through that > presentation. > >> > > >> > I have a follow-up question though, > >> > Let’s say if a table does not have any indexes on it, will these > >> > co-processors still be triggered and try to calculate index updates, > >> since > >> > these are loaded by default for any table that is created through > >> phoenix > >> > OR will this write path be entirely skipped since there are no indexes > >> on > >> > the table? Asking this because I could not find a flag check in the > code > >> > which checks if the indexes are present or not in pre/post Mutate > >> > operations. > >> > > >> > Regards, > >> > Abhishek > >> > > >> > > >> > > >> > On Tue, Oct 30, 2018 at 2:25 PM Geoffrey Jacoby > >> > wrote: > >> > > >> >> Abhishek, > >> >> > >> >> You might want to check out Vincent Poon's excellent presentation at > >> this > >> >> year's PhoenixCon about recent changes over the past couple of years > to > >> >> the > >> >> index pipeline. > >> >> > >> >> https://www.youtube.com/watch?v=VBONDM7sD40 > >> >> > >> >> One of those changes is the one you observed. Global mutable index > >>
Re: Regarding the Secondary Index write path
Looks like you're right, the Indexer is loaded for all base tables. I got confused with the case where the Indexer coproc is not loaded for *index* tables. I wonder what the overhead is like for having that. It does seem from the code like loading it was intentional, as there are lines like this in preBatchMutate: // skip this mutation if we aren't enabling indexing However a lot has changed since then - one off the top of my head is we are doing write locking of rows within Phoenix itself now. It seems if a table has no indexes, we can skip this locking. I created PHOENIX-5002 to investigate this. For the steps, the WALEdit in preBatchMutate is the same WALEdit for the data table. The index updates get written alongside the data table updates in the WAL. You can see in preBatchMutate the Indexer is grabbing the WALEdit passed down from doMiniBatchMutation: WALEdit edit = miniBatchOp.getWalEdit(0); On Tue, Oct 30, 2018 at 6:08 PM Abhishek Talluri wrote: > Thanks Vincent. Just to clarify, if I create a table through Phoenix and > check the describe on the hbase table, I see that the Indexer co-processor > is loaded with the table. Is there some code that loads only the required > co-processors when there is an Observer event? > > Coming back to the sequence of steps, in preBatchMutate, I do see that > after calculating index updates, we write the entries to a single WALEdit > saying that all the mutations that need to be indexed are durable at this > point (I assume these are the WALEdits for the index updates but not the > WALEdits for the data table itself). > preBatchMutate > -Calculate index updates > -Write the index updates to WAL and make them durable (helpful because if > RS crashes after the write to data table) > doMiniBatchMutation > - write to WAL without sync ( I am assuming that these edits are for the > actual data table itself ) > - aquire MVCC > - write to memstore > - sync WAL > - advance MVCC (write becomes visible) > postBatchMutate > - write index update > > Correct me if I am wrong in assuming that WALEdit in preBatchMutate is > different from the wal edits that will be generated for data table. > Once again thanks for the quick response. > > > On Tue, Oct 30, 2018 at 4:16 PM Vincent Poon > wrote: > >> If the table has no indexes, the Indexer coprocessor won't be loaded. >> >> As for your original question, the answer is a little nuanced. For >> Phoenix >> 4.14+: >> The index updates are calculated in preBatchMutate, and the index updates >> are written in postBatchMutate. >> So from HRegion#doMiniBatchMutation, you can see the order of things. In >> short, it's something like: >> - Calculate index updates >> - write to WAL without sync >> - aquire MVCC >> - write to memstore >> - sync WAL >> - advance MVCC (write becomes visible) >> - write index update >> >> Note that when you write to the memstore, you have not advanced the MVCC >> yet. >> The order is pretty much what you suggested. >> >> On Tue, Oct 30, 2018 at 12:16 PM Abhishek Talluri >> wrote: >> >> > Thanks Geoffrey for confirming that. Will go through that presentation. >> > >> > I have a follow-up question though, >> > Let’s say if a table does not have any indexes on it, will these >> > co-processors still be triggered and try to calculate index updates, >> since >> > these are loaded by default for any table that is created through >> phoenix >> > OR will this write path be entirely skipped since there are no indexes >> on >> > the table? Asking this because I could not find a flag check in the code >> > which checks if the indexes are present or not in pre/post Mutate >> > operations. >> > >> > Regards, >> > Abhishek >> > >> > >> > >> > On Tue, Oct 30, 2018 at 2:25 PM Geoffrey Jacoby >> > wrote: >> > >> >> Abhishek, >> >> >> >> You might want to check out Vincent Poon's excellent presentation at >> this >> >> year's PhoenixCon about recent changes over the past couple of years to >> >> the >> >> index pipeline. >> >> >> >> https://www.youtube.com/watch?v=VBONDM7sD40 >> >> >> >> One of those changes is the one you observed. Global mutable index >> writes >> >> were moved later in the HBase write pipeline to avoid some nasty >> deadlock >> >> and starvation cases that could occur when making MemStore writes / >> MVCC >> >> advancement wait on cross-server index RPCs to complete. >> >> >> >> Geoffrey >> >> >> >> >> >> On Tue, Oct 30, 2018 at 10:31 AM Abhishek Talluri >> >> wrote: >> >> >> >> > Hi All, >> >> > >> >> > I am referring to the presentation that is given in SF Hbase Meetup >> in >> >> > 2013, attached it for your reference. The write path states that, >> >> > co-processor calculates the index updates and wal edits >> >> > -> Writes it to the WAL (Making it durable) >> >> > -> Then write the index updates >> >> > -> Then proceed to the Memstore. >> >> > >> >> > But after looking at the code base (looked into 4.7 and 4.14), it >> looks >> >> > like the write to index tables happen in the postB
Re: Regarding the Secondary Index write path
Thanks Vincent. Just to clarify, if I create a table through Phoenix and check the describe on the hbase table, I see that the Indexer co-processor is loaded with the table. Is there some code that loads only the required co-processors when there is an Observer event? Coming back to the sequence of steps, in preBatchMutate, I do see that after calculating index updates, we write the entries to a single WALEdit saying that all the mutations that need to be indexed are durable at this point (I assume these are the WALEdits for the index updates but not the WALEdits for the data table itself). preBatchMutate -Calculate index updates -Write the index updates to WAL and make them durable (helpful because if RS crashes after the write to data table) doMiniBatchMutation - write to WAL without sync ( I am assuming that these edits are for the actual data table itself ) - aquire MVCC - write to memstore - sync WAL - advance MVCC (write becomes visible) postBatchMutate - write index update Correct me if I am wrong in assuming that WALEdit in preBatchMutate is different from the wal edits that will be generated for data table. Once again thanks for the quick response. On Tue, Oct 30, 2018 at 4:16 PM Vincent Poon wrote: > If the table has no indexes, the Indexer coprocessor won't be loaded. > > As for your original question, the answer is a little nuanced. For Phoenix > 4.14+: > The index updates are calculated in preBatchMutate, and the index updates > are written in postBatchMutate. > So from HRegion#doMiniBatchMutation, you can see the order of things. In > short, it's something like: > - Calculate index updates > - write to WAL without sync > - aquire MVCC > - write to memstore > - sync WAL > - advance MVCC (write becomes visible) > - write index update > > Note that when you write to the memstore, you have not advanced the MVCC > yet. > The order is pretty much what you suggested. > > On Tue, Oct 30, 2018 at 12:16 PM Abhishek Talluri > wrote: > > > Thanks Geoffrey for confirming that. Will go through that presentation. > > > > I have a follow-up question though, > > Let’s say if a table does not have any indexes on it, will these > > co-processors still be triggered and try to calculate index updates, > since > > these are loaded by default for any table that is created through phoenix > > OR will this write path be entirely skipped since there are no indexes on > > the table? Asking this because I could not find a flag check in the code > > which checks if the indexes are present or not in pre/post Mutate > > operations. > > > > Regards, > > Abhishek > > > > > > > > On Tue, Oct 30, 2018 at 2:25 PM Geoffrey Jacoby > > wrote: > > > >> Abhishek, > >> > >> You might want to check out Vincent Poon's excellent presentation at > this > >> year's PhoenixCon about recent changes over the past couple of years to > >> the > >> index pipeline. > >> > >> https://www.youtube.com/watch?v=VBONDM7sD40 > >> > >> One of those changes is the one you observed. Global mutable index > writes > >> were moved later in the HBase write pipeline to avoid some nasty > deadlock > >> and starvation cases that could occur when making MemStore writes / MVCC > >> advancement wait on cross-server index RPCs to complete. > >> > >> Geoffrey > >> > >> > >> On Tue, Oct 30, 2018 at 10:31 AM Abhishek Talluri > >> wrote: > >> > >> > Hi All, > >> > > >> > I am referring to the presentation that is given in SF Hbase Meetup in > >> > 2013, attached it for your reference. The write path states that, > >> > co-processor calculates the index updates and wal edits > >> > -> Writes it to the WAL (Making it durable) > >> > -> Then write the index updates > >> > -> Then proceed to the Memstore. > >> > > >> > But after looking at the code base (looked into 4.7 and 4.14), it > looks > >> > like the write to index tables happen in the postBatchMutate phase > >> which is > >> > after the MemStore write finishes. I wanted to check with the > community > >> to > >> > see if the flowchart is outdated. I feel that series of steps should > be: > >> > co-processor calculates the index updates and wal edits > >> > -> Writes it to the WAL (Making it durable) > >> > -> Then proceed to the Memstore. > >> > -> Then write the index updates > >> > > >> > Appreciate any input on this. I want to clarify this because we see a > >> case > >> > where the write path could be creating some delay between write to WAL > >> and > >> > MemStore and creating some out of sync issue when using hbase lily > >> indexer. > >> > > >> > Thanks, > >> > Abhishek Talluri > >> > > >> > > >> > > > > > > -- > > Thanks, > > Abhishek Talluri > > Ph:9292405270 > > > > > > > -- Thanks, Abhishek Talluri Ph:9292405270
Re: Regarding the Secondary Index write path
If the table has no indexes, the Indexer coprocessor won't be loaded. As for your original question, the answer is a little nuanced. For Phoenix 4.14+: The index updates are calculated in preBatchMutate, and the index updates are written in postBatchMutate. So from HRegion#doMiniBatchMutation, you can see the order of things. In short, it's something like: - Calculate index updates - write to WAL without sync - aquire MVCC - write to memstore - sync WAL - advance MVCC (write becomes visible) - write index update Note that when you write to the memstore, you have not advanced the MVCC yet. The order is pretty much what you suggested. On Tue, Oct 30, 2018 at 12:16 PM Abhishek Talluri wrote: > Thanks Geoffrey for confirming that. Will go through that presentation. > > I have a follow-up question though, > Let’s say if a table does not have any indexes on it, will these > co-processors still be triggered and try to calculate index updates, since > these are loaded by default for any table that is created through phoenix > OR will this write path be entirely skipped since there are no indexes on > the table? Asking this because I could not find a flag check in the code > which checks if the indexes are present or not in pre/post Mutate > operations. > > Regards, > Abhishek > > > > On Tue, Oct 30, 2018 at 2:25 PM Geoffrey Jacoby > wrote: > >> Abhishek, >> >> You might want to check out Vincent Poon's excellent presentation at this >> year's PhoenixCon about recent changes over the past couple of years to >> the >> index pipeline. >> >> https://www.youtube.com/watch?v=VBONDM7sD40 >> >> One of those changes is the one you observed. Global mutable index writes >> were moved later in the HBase write pipeline to avoid some nasty deadlock >> and starvation cases that could occur when making MemStore writes / MVCC >> advancement wait on cross-server index RPCs to complete. >> >> Geoffrey >> >> >> On Tue, Oct 30, 2018 at 10:31 AM Abhishek Talluri >> wrote: >> >> > Hi All, >> > >> > I am referring to the presentation that is given in SF Hbase Meetup in >> > 2013, attached it for your reference. The write path states that, >> > co-processor calculates the index updates and wal edits >> > -> Writes it to the WAL (Making it durable) >> > -> Then write the index updates >> > -> Then proceed to the Memstore. >> > >> > But after looking at the code base (looked into 4.7 and 4.14), it looks >> > like the write to index tables happen in the postBatchMutate phase >> which is >> > after the MemStore write finishes. I wanted to check with the community >> to >> > see if the flowchart is outdated. I feel that series of steps should be: >> > co-processor calculates the index updates and wal edits >> > -> Writes it to the WAL (Making it durable) >> > -> Then proceed to the Memstore. >> > -> Then write the index updates >> > >> > Appreciate any input on this. I want to clarify this because we see a >> case >> > where the write path could be creating some delay between write to WAL >> and >> > MemStore and creating some out of sync issue when using hbase lily >> indexer. >> > >> > Thanks, >> > Abhishek Talluri >> > >> > >> > > > -- > Thanks, > Abhishek Talluri > Ph:9292405270 > > >
Re: Regarding the Secondary Index write path
Thanks Geoffrey for confirming that. Will go through that presentation. I have a follow-up question though, Let’s say if a table does not have any indexes on it, will these co-processors still be triggered and try to calculate index updates, since these are loaded by default for any table that is created through phoenix OR will this write path be entirely skipped since there are no indexes on the table? Asking this because I could not find a flag check in the code which checks if the indexes are present or not in pre/post Mutate operations. Regards, Abhishek On Tue, Oct 30, 2018 at 2:25 PM Geoffrey Jacoby wrote: > Abhishek, > > You might want to check out Vincent Poon's excellent presentation at this > year's PhoenixCon about recent changes over the past couple of years to the > index pipeline. > > https://www.youtube.com/watch?v=VBONDM7sD40 > > One of those changes is the one you observed. Global mutable index writes > were moved later in the HBase write pipeline to avoid some nasty deadlock > and starvation cases that could occur when making MemStore writes / MVCC > advancement wait on cross-server index RPCs to complete. > > Geoffrey > > > On Tue, Oct 30, 2018 at 10:31 AM Abhishek Talluri > wrote: > > > Hi All, > > > > I am referring to the presentation that is given in SF Hbase Meetup in > > 2013, attached it for your reference. The write path states that, > > co-processor calculates the index updates and wal edits > > -> Writes it to the WAL (Making it durable) > > -> Then write the index updates > > -> Then proceed to the Memstore. > > > > But after looking at the code base (looked into 4.7 and 4.14), it looks > > like the write to index tables happen in the postBatchMutate phase which > is > > after the MemStore write finishes. I wanted to check with the community > to > > see if the flowchart is outdated. I feel that series of steps should be: > > co-processor calculates the index updates and wal edits > > -> Writes it to the WAL (Making it durable) > > -> Then proceed to the Memstore. > > -> Then write the index updates > > > > Appreciate any input on this. I want to clarify this because we see a > case > > where the write path could be creating some delay between write to WAL > and > > MemStore and creating some out of sync issue when using hbase lily > indexer. > > > > Thanks, > > Abhishek Talluri > > > > > -- Thanks, Abhishek Talluri Ph:9292405270
Re: Regarding the Secondary Index write path
Abhishek, You might want to check out Vincent Poon's excellent presentation at this year's PhoenixCon about recent changes over the past couple of years to the index pipeline. https://www.youtube.com/watch?v=VBONDM7sD40 One of those changes is the one you observed. Global mutable index writes were moved later in the HBase write pipeline to avoid some nasty deadlock and starvation cases that could occur when making MemStore writes / MVCC advancement wait on cross-server index RPCs to complete. Geoffrey On Tue, Oct 30, 2018 at 10:31 AM Abhishek Talluri wrote: > Hi All, > > I am referring to the presentation that is given in SF Hbase Meetup in > 2013, attached it for your reference. The write path states that, > co-processor calculates the index updates and wal edits > -> Writes it to the WAL (Making it durable) > -> Then write the index updates > -> Then proceed to the Memstore. > > But after looking at the code base (looked into 4.7 and 4.14), it looks > like the write to index tables happen in the postBatchMutate phase which is > after the MemStore write finishes. I wanted to check with the community to > see if the flowchart is outdated. I feel that series of steps should be: > co-processor calculates the index updates and wal edits > -> Writes it to the WAL (Making it durable) > -> Then proceed to the Memstore. > -> Then write the index updates > > Appreciate any input on this. I want to clarify this because we see a case > where the write path could be creating some delay between write to WAL and > MemStore and creating some out of sync issue when using hbase lily indexer. > > Thanks, > Abhishek Talluri > >