Re: Excessive ExecService RPCs with multi-threaded ingest
FYI, you can retrieve an IndexMaintainer from the index PTable given the data PTable and it will be lazily cached on the index PTable. On Wed, Nov 23, 2016 at 7:34 AM Josh Elserwrote: > Hrm, that sounds like it'd be cleaner to me. Just thinking about this > problem again made me shudder at the client-side complexity :) > > I'll have to find some time to revisit this one. > > Thanks for the suggestion, Ankit! > > Ankit Singhal wrote: > > How about not sending the IndexMaintainers from the client and prepare > them > > at the server itself and cache/refresh it per table like we do currently > > for PTable? > > > > On Mon, Oct 24, 2016 at 9:32 AM, Josh Elser > wrote: > > > >> If anyone is interested, I did hack on this some more over the weekend. > >> > >> https://github.com/joshelser/phoenix/tree/reduced-server-cache-rpc > >> > >> Very much in a state of "well, it compiles". Will try to find some more > >> time to poke at it and measure whether or not it actually makes a > positive > >> impact (with serialized IndexMaintainers only being about 20bytes for > one > >> index table, the server-side memory impact certainly isn't that crazy, > but > >> the extra RPCs likely adds up). > >> > >> Feedback welcome from the brave :) > >> > >> > >> Josh Elser wrote: > >> > >>> Hi folks, > >>> > >>> I was doing some testing earlier this week and Enis's keen eye caught > >>> something rather interesting. > >>> > >>> When using YCSB to ingest data into a table with a secondary index > using > >>> 8 threads and batch size of 1000 rows, the number of ExecService > >>> coprocessor calls actually exceeded the number of Multi calls to write > >>> the data (something like 21k ExecService calls to 18k Multi calls). > >>> > >>> I dug into this some more and noticed that it's because each thread is > >>> creating its own ServerCache to store the serialized IndexMetadata > >>> before shipping the data table updates. So, when we have 8 threads all > >>> writing mutations for the same data and index table, we have ~8x the > >>> ServerCache entries being created than if we had just one thread. > >>> > >>> Looking at the code, I completely understand why they're local to the > >>> thread and not shared on the Connection (very tricky), but I'm curious > >>> if anyone had noticed this before or if there are reasons to not try to > >>> share these ServerCache(s) across threads. Looking at the data being > put > >>> into the ServerCache, it appears to be exactly the same for each of the > >>> threads sending mutations. I'm thinking that we could do safely by > >>> tracking when we are loading (or have loaded) the data into the > >>> ServerCache and doing some reference counting to determine when its > >>> actually safe to delete the ServerCache. > >>> > >>> I hope to find/make some time to get a patch up, but thought I'd take a > >>> moment to write it up if anyone has opinions/feedback. > >>> > >>> Thanks! > >>> > >>> - Josh > >>> > > >
Re: Excessive ExecService RPCs with multi-threaded ingest
Hrm, that sounds like it'd be cleaner to me. Just thinking about this problem again made me shudder at the client-side complexity :) I'll have to find some time to revisit this one. Thanks for the suggestion, Ankit! Ankit Singhal wrote: How about not sending the IndexMaintainers from the client and prepare them at the server itself and cache/refresh it per table like we do currently for PTable? On Mon, Oct 24, 2016 at 9:32 AM, Josh Elserwrote: If anyone is interested, I did hack on this some more over the weekend. https://github.com/joshelser/phoenix/tree/reduced-server-cache-rpc Very much in a state of "well, it compiles". Will try to find some more time to poke at it and measure whether or not it actually makes a positive impact (with serialized IndexMaintainers only being about 20bytes for one index table, the server-side memory impact certainly isn't that crazy, but the extra RPCs likely adds up). Feedback welcome from the brave :) Josh Elser wrote: Hi folks, I was doing some testing earlier this week and Enis's keen eye caught something rather interesting. When using YCSB to ingest data into a table with a secondary index using 8 threads and batch size of 1000 rows, the number of ExecService coprocessor calls actually exceeded the number of Multi calls to write the data (something like 21k ExecService calls to 18k Multi calls). I dug into this some more and noticed that it's because each thread is creating its own ServerCache to store the serialized IndexMetadata before shipping the data table updates. So, when we have 8 threads all writing mutations for the same data and index table, we have ~8x the ServerCache entries being created than if we had just one thread. Looking at the code, I completely understand why they're local to the thread and not shared on the Connection (very tricky), but I'm curious if anyone had noticed this before or if there are reasons to not try to share these ServerCache(s) across threads. Looking at the data being put into the ServerCache, it appears to be exactly the same for each of the threads sending mutations. I'm thinking that we could do safely by tracking when we are loading (or have loaded) the data into the ServerCache and doing some reference counting to determine when its actually safe to delete the ServerCache. I hope to find/make some time to get a patch up, but thought I'd take a moment to write it up if anyone has opinions/feedback. Thanks! - Josh
Re: Excessive ExecService RPCs with multi-threaded ingest
How about not sending the IndexMaintainers from the client and prepare them at the server itself and cache/refresh it per table like we do currently for PTable? On Mon, Oct 24, 2016 at 9:32 AM, Josh Elserwrote: > If anyone is interested, I did hack on this some more over the weekend. > > https://github.com/joshelser/phoenix/tree/reduced-server-cache-rpc > > Very much in a state of "well, it compiles". Will try to find some more > time to poke at it and measure whether or not it actually makes a positive > impact (with serialized IndexMaintainers only being about 20bytes for one > index table, the server-side memory impact certainly isn't that crazy, but > the extra RPCs likely adds up). > > Feedback welcome from the brave :) > > > Josh Elser wrote: > >> Hi folks, >> >> I was doing some testing earlier this week and Enis's keen eye caught >> something rather interesting. >> >> When using YCSB to ingest data into a table with a secondary index using >> 8 threads and batch size of 1000 rows, the number of ExecService >> coprocessor calls actually exceeded the number of Multi calls to write >> the data (something like 21k ExecService calls to 18k Multi calls). >> >> I dug into this some more and noticed that it's because each thread is >> creating its own ServerCache to store the serialized IndexMetadata >> before shipping the data table updates. So, when we have 8 threads all >> writing mutations for the same data and index table, we have ~8x the >> ServerCache entries being created than if we had just one thread. >> >> Looking at the code, I completely understand why they're local to the >> thread and not shared on the Connection (very tricky), but I'm curious >> if anyone had noticed this before or if there are reasons to not try to >> share these ServerCache(s) across threads. Looking at the data being put >> into the ServerCache, it appears to be exactly the same for each of the >> threads sending mutations. I'm thinking that we could do safely by >> tracking when we are loading (or have loaded) the data into the >> ServerCache and doing some reference counting to determine when its >> actually safe to delete the ServerCache. >> >> I hope to find/make some time to get a patch up, but thought I'd take a >> moment to write it up if anyone has opinions/feedback. >> >> Thanks! >> >> - Josh >> >
Re: Excessive ExecService RPCs with multi-threaded ingest
If anyone is interested, I did hack on this some more over the weekend. https://github.com/joshelser/phoenix/tree/reduced-server-cache-rpc Very much in a state of "well, it compiles". Will try to find some more time to poke at it and measure whether or not it actually makes a positive impact (with serialized IndexMaintainers only being about 20bytes for one index table, the server-side memory impact certainly isn't that crazy, but the extra RPCs likely adds up). Feedback welcome from the brave :) Josh Elser wrote: Hi folks, I was doing some testing earlier this week and Enis's keen eye caught something rather interesting. When using YCSB to ingest data into a table with a secondary index using 8 threads and batch size of 1000 rows, the number of ExecService coprocessor calls actually exceeded the number of Multi calls to write the data (something like 21k ExecService calls to 18k Multi calls). I dug into this some more and noticed that it's because each thread is creating its own ServerCache to store the serialized IndexMetadata before shipping the data table updates. So, when we have 8 threads all writing mutations for the same data and index table, we have ~8x the ServerCache entries being created than if we had just one thread. Looking at the code, I completely understand why they're local to the thread and not shared on the Connection (very tricky), but I'm curious if anyone had noticed this before or if there are reasons to not try to share these ServerCache(s) across threads. Looking at the data being put into the ServerCache, it appears to be exactly the same for each of the threads sending mutations. I'm thinking that we could do safely by tracking when we are loading (or have loaded) the data into the ServerCache and doing some reference counting to determine when its actually safe to delete the ServerCache. I hope to find/make some time to get a patch up, but thought I'd take a moment to write it up if anyone has opinions/feedback. Thanks! - Josh
Excessive ExecService RPCs with multi-threaded ingest
Hi folks, I was doing some testing earlier this week and Enis's keen eye caught something rather interesting. When using YCSB to ingest data into a table with a secondary index using 8 threads and batch size of 1000 rows, the number of ExecService coprocessor calls actually exceeded the number of Multi calls to write the data (something like 21k ExecService calls to 18k Multi calls). I dug into this some more and noticed that it's because each thread is creating its own ServerCache to store the serialized IndexMetadata before shipping the data table updates. So, when we have 8 threads all writing mutations for the same data and index table, we have ~8x the ServerCache entries being created than if we had just one thread. Looking at the code, I completely understand why they're local to the thread and not shared on the Connection (very tricky), but I'm curious if anyone had noticed this before or if there are reasons to not try to share these ServerCache(s) across threads. Looking at the data being put into the ServerCache, it appears to be exactly the same for each of the threads sending mutations. I'm thinking that we could do safely by tracking when we are loading (or have loaded) the data into the ServerCache and doing some reference counting to determine when its actually safe to delete the ServerCache. I hope to find/make some time to get a patch up, but thought I'd take a moment to write it up if anyone has opinions/feedback. Thanks! - Josh