Re: Excessive ExecService RPCs with multi-threaded ingest

2016-11-23 Thread James Taylor
FYI, you can retrieve an IndexMaintainer from the index PTable given the
data PTable and it will be lazily cached on the index PTable.

On Wed, Nov 23, 2016 at 7:34 AM Josh Elser  wrote:

> Hrm, that sounds like it'd be cleaner to me. Just thinking about this
> problem again made me shudder at the client-side complexity :)
>
> I'll have to find some time to revisit this one.
>
> Thanks for the suggestion, Ankit!
>
> Ankit Singhal wrote:
> > How about not sending the IndexMaintainers from the client and prepare
> them
> > at the server itself and cache/refresh it per table like we do currently
> > for PTable?
> >
> > On Mon, Oct 24, 2016 at 9:32 AM, Josh Elser
> wrote:
> >
> >> If anyone is interested, I did hack on this some more over the weekend.
> >>
> >> https://github.com/joshelser/phoenix/tree/reduced-server-cache-rpc
> >>
> >> Very much in a state of "well, it compiles". Will try to find some more
> >> time to poke at it and measure whether or not it actually makes a
> positive
> >> impact (with serialized IndexMaintainers only being about 20bytes for
> one
> >> index table, the server-side memory impact certainly isn't that crazy,
> but
> >> the extra RPCs likely adds up).
> >>
> >> Feedback welcome from the brave :)
> >>
> >>
> >> Josh Elser wrote:
> >>
> >>> Hi folks,
> >>>
> >>> I was doing some testing earlier this week and Enis's keen eye caught
> >>> something rather interesting.
> >>>
> >>> When using YCSB to ingest data into a table with a secondary index
> using
> >>> 8 threads and batch size of 1000 rows, the number of ExecService
> >>> coprocessor calls actually exceeded the number of Multi calls to write
> >>> the data (something like 21k ExecService calls to 18k Multi calls).
> >>>
> >>> I dug into this some more and noticed that it's because each thread is
> >>> creating its own ServerCache to store the serialized IndexMetadata
> >>> before shipping the data table updates. So, when we have 8 threads all
> >>> writing mutations for the same data and index table, we have ~8x the
> >>> ServerCache entries being created than if we had just one thread.
> >>>
> >>> Looking at the code, I completely understand why they're local to the
> >>> thread and not shared on the Connection (very tricky), but I'm curious
> >>> if anyone had noticed this before or if there are reasons to not try to
> >>> share these ServerCache(s) across threads. Looking at the data being
> put
> >>> into the ServerCache, it appears to be exactly the same for each of the
> >>> threads sending mutations. I'm thinking that we could do safely by
> >>> tracking when we are loading (or have loaded) the data into the
> >>> ServerCache and doing some reference counting to determine when its
> >>> actually safe to delete the ServerCache.
> >>>
> >>> I hope to find/make some time to get a patch up, but thought I'd take a
> >>> moment to write it up if anyone has opinions/feedback.
> >>>
> >>> Thanks!
> >>>
> >>> - Josh
> >>>
> >
>


Re: Excessive ExecService RPCs with multi-threaded ingest

2016-11-23 Thread Josh Elser
Hrm, that sounds like it'd be cleaner to me. Just thinking about this 
problem again made me shudder at the client-side complexity :)


I'll have to find some time to revisit this one.

Thanks for the suggestion, Ankit!

Ankit Singhal wrote:

How about not sending the IndexMaintainers from the client and prepare them
at the server itself and cache/refresh it per table like we do currently
for PTable?

On Mon, Oct 24, 2016 at 9:32 AM, Josh Elser  wrote:


If anyone is interested, I did hack on this some more over the weekend.

https://github.com/joshelser/phoenix/tree/reduced-server-cache-rpc

Very much in a state of "well, it compiles". Will try to find some more
time to poke at it and measure whether or not it actually makes a positive
impact (with serialized IndexMaintainers only being about 20bytes for one
index table, the server-side memory impact certainly isn't that crazy, but
the extra RPCs likely adds up).

Feedback welcome from the brave :)


Josh Elser wrote:


Hi folks,

I was doing some testing earlier this week and Enis's keen eye caught
something rather interesting.

When using YCSB to ingest data into a table with a secondary index using
8 threads and batch size of 1000 rows, the number of ExecService
coprocessor calls actually exceeded the number of Multi calls to write
the data (something like 21k ExecService calls to 18k Multi calls).

I dug into this some more and noticed that it's because each thread is
creating its own ServerCache to store the serialized IndexMetadata
before shipping the data table updates. So, when we have 8 threads all
writing mutations for the same data and index table, we have ~8x the
ServerCache entries being created than if we had just one thread.

Looking at the code, I completely understand why they're local to the
thread and not shared on the Connection (very tricky), but I'm curious
if anyone had noticed this before or if there are reasons to not try to
share these ServerCache(s) across threads. Looking at the data being put
into the ServerCache, it appears to be exactly the same for each of the
threads sending mutations. I'm thinking that we could do safely by
tracking when we are loading (or have loaded) the data into the
ServerCache and doing some reference counting to determine when its
actually safe to delete the ServerCache.

I hope to find/make some time to get a patch up, but thought I'd take a
moment to write it up if anyone has opinions/feedback.

Thanks!

- Josh





Re: Excessive ExecService RPCs with multi-threaded ingest

2016-11-23 Thread Ankit Singhal
How about not sending the IndexMaintainers from the client and prepare them
at the server itself and cache/refresh it per table like we do currently
for PTable?

On Mon, Oct 24, 2016 at 9:32 AM, Josh Elser  wrote:

> If anyone is interested, I did hack on this some more over the weekend.
>
> https://github.com/joshelser/phoenix/tree/reduced-server-cache-rpc
>
> Very much in a state of "well, it compiles". Will try to find some more
> time to poke at it and measure whether or not it actually makes a positive
> impact (with serialized IndexMaintainers only being about 20bytes for one
> index table, the server-side memory impact certainly isn't that crazy, but
> the extra RPCs likely adds up).
>
> Feedback welcome from the brave :)
>
>
> Josh Elser wrote:
>
>> Hi folks,
>>
>> I was doing some testing earlier this week and Enis's keen eye caught
>> something rather interesting.
>>
>> When using YCSB to ingest data into a table with a secondary index using
>> 8 threads and batch size of 1000 rows, the number of ExecService
>> coprocessor calls actually exceeded the number of Multi calls to write
>> the data (something like 21k ExecService calls to 18k Multi calls).
>>
>> I dug into this some more and noticed that it's because each thread is
>> creating its own ServerCache to store the serialized IndexMetadata
>> before shipping the data table updates. So, when we have 8 threads all
>> writing mutations for the same data and index table, we have ~8x the
>> ServerCache entries being created than if we had just one thread.
>>
>> Looking at the code, I completely understand why they're local to the
>> thread and not shared on the Connection (very tricky), but I'm curious
>> if anyone had noticed this before or if there are reasons to not try to
>> share these ServerCache(s) across threads. Looking at the data being put
>> into the ServerCache, it appears to be exactly the same for each of the
>> threads sending mutations. I'm thinking that we could do safely by
>> tracking when we are loading (or have loaded) the data into the
>> ServerCache and doing some reference counting to determine when its
>> actually safe to delete the ServerCache.
>>
>> I hope to find/make some time to get a patch up, but thought I'd take a
>> moment to write it up if anyone has opinions/feedback.
>>
>> Thanks!
>>
>> - Josh
>>
>


Re: Excessive ExecService RPCs with multi-threaded ingest

2016-10-23 Thread Josh Elser

If anyone is interested, I did hack on this some more over the weekend.

https://github.com/joshelser/phoenix/tree/reduced-server-cache-rpc

Very much in a state of "well, it compiles". Will try to find some more 
time to poke at it and measure whether or not it actually makes a 
positive impact (with serialized IndexMaintainers only being about 
20bytes for one index table, the server-side memory impact certainly 
isn't that crazy, but the extra RPCs likely adds up).


Feedback welcome from the brave :)

Josh Elser wrote:

Hi folks,

I was doing some testing earlier this week and Enis's keen eye caught
something rather interesting.

When using YCSB to ingest data into a table with a secondary index using
8 threads and batch size of 1000 rows, the number of ExecService
coprocessor calls actually exceeded the number of Multi calls to write
the data (something like 21k ExecService calls to 18k Multi calls).

I dug into this some more and noticed that it's because each thread is
creating its own ServerCache to store the serialized IndexMetadata
before shipping the data table updates. So, when we have 8 threads all
writing mutations for the same data and index table, we have ~8x the
ServerCache entries being created than if we had just one thread.

Looking at the code, I completely understand why they're local to the
thread and not shared on the Connection (very tricky), but I'm curious
if anyone had noticed this before or if there are reasons to not try to
share these ServerCache(s) across threads. Looking at the data being put
into the ServerCache, it appears to be exactly the same for each of the
threads sending mutations. I'm thinking that we could do safely by
tracking when we are loading (or have loaded) the data into the
ServerCache and doing some reference counting to determine when its
actually safe to delete the ServerCache.

I hope to find/make some time to get a patch up, but thought I'd take a
moment to write it up if anyone has opinions/feedback.

Thanks!

- Josh


Excessive ExecService RPCs with multi-threaded ingest

2016-10-21 Thread Josh Elser

Hi folks,

I was doing some testing earlier this week and Enis's keen eye caught 
something rather interesting.


When using YCSB to ingest data into a table with a secondary index using 
8 threads and batch size of 1000 rows, the number of ExecService 
coprocessor calls actually exceeded the number of Multi calls to write 
the data (something like 21k ExecService calls to 18k Multi calls).


I dug into this some more and noticed that it's because each thread is 
creating its own ServerCache to store the serialized IndexMetadata 
before shipping the data table updates. So, when we have 8 threads all 
writing mutations for the same data and index table, we have ~8x the 
ServerCache entries being created than if we had just one thread.


Looking at the code, I completely understand why they're local to the 
thread and not shared on the Connection (very tricky), but I'm curious 
if anyone had noticed this before or if there are reasons to not try to 
share these ServerCache(s) across threads. Looking at the data being put 
into the ServerCache, it appears to be exactly the same for each of the 
threads sending mutations. I'm thinking that we could do safely by 
tracking when we are loading (or have loaded) the data into the 
ServerCache and doing some reference counting to determine when its 
actually safe to delete the ServerCache.


I hope to find/make some time to get a patch up, but thought I'd take a 
moment to write it up if anyone has opinions/feedback.


Thanks!

- Josh