[ https://issues.apache.org/jira/browse/KUDU-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin updated KUDU-1693: -------------------------------- Description: Currently, the Kudu C++ client buffers incoming operations regardless of their destination tablet server. Accordingly, it's possible to set limit on the _total_ buffer space, not per tablet server. This approach works but there is room for improvement: there are real-world scenarios where per-TS buffering would be more robust. Besides, tablet servers impose limit on the RPC operations size. Grouping write operations on per-tablet-server basis would be beneficial for 'one-out-of-many lagging tablet server' scenario. There, all tablet servers for a table perform well except for one which runs slow due to excessive IO, network issues, failing disk, etc. The problem is that the lagging server hinders the overall performance. This is due to the current approach to the buffer turnaround: a buffer is considered 'flushed' and its space is reclaimed at once when _all_ operations in the buffer are completed. So, if 1000 operations have already been sent but there is 1 operation still in progress, the whole buffer space is 'locked' and cannot be used. Accordingly, introducing per-tablet-server buffer limit would help to address scenarios with concurrent writes into tables with extremely diverse partition factors (like 2 and 100). E.g., consider a case when incoming write operations for tables with diverse partition factors are intermixed in the context of one session. The problem is that setting the total buffer space limit high is beneficial for the writes into the table with many partitions (assuming those writes are evenly distributed across participating tablets), but it may be over the server-side's limit for max transaction size if those writes are targeted for the table with a few partitions. was: Grouping write operations on per-tablet-server basis would be beneficial for 'one-out-of-many lagging tablet server' scenario. There, all tablet servers for a table perform good except for one which runs slow due to some reason (excessive IO, network issues, failing disk, etc.). The problem is that the lagging server hinders buffers turnaround: a buffer is considered 'flushed' and its space is reclaimed at once when _all_ operations in the buffer are completed. So, if 1000 operations are flushed but there is 1 operation still in progress, the whole buffer space is 'locked' and cannot be used. Accordingly, introducing per-tablet-server buffer limit for write operations would help to address scenarios with concurrent writes into tables with very different partition factors (like 2 and 100). E.g., the incoming operations for tables with very different partition factors are intermixed in the context of the same session. The problem is that setting the total buffer space limit high is fine for the writes into the table with many partitions (assuming those writes are evenly distributed across participating tablets), but it may be over the server-side's limit for max transaction size if those writes are targeted for a table with a few partitions. > Flush write operations on per-TS basis and add corresponding limit on the > buffer space > -------------------------------------------------------------------------------------- > > Key: KUDU-1693 > URL: https://issues.apache.org/jira/browse/KUDU-1693 > Project: Kudu > Issue Type: Improvement > Components: client > Affects Versions: 1.0.0 > Reporter: Alexey Serbin > > Currently, the Kudu C++ client buffers incoming operations regardless of > their destination tablet server. Accordingly, it's possible to set limit on > the _total_ buffer space, not per tablet server. This approach works but > there is room for improvement: there are real-world scenarios where per-TS > buffering would be more robust. Besides, tablet servers impose limit on the > RPC operations size. > Grouping write operations on per-tablet-server basis would be beneficial for > 'one-out-of-many lagging tablet server' scenario. There, all tablet servers > for a table perform well except for one which runs slow due to excessive IO, > network issues, failing disk, etc. The problem is that the lagging server > hinders the overall performance. This is due to the current approach to the > buffer turnaround: a buffer is considered 'flushed' and its space is > reclaimed at once when _all_ operations in the buffer are completed. So, if > 1000 operations have already been sent but there is 1 operation still in > progress, the whole buffer space is 'locked' and cannot be used. > Accordingly, introducing per-tablet-server buffer limit would help to address > scenarios with concurrent writes into tables with extremely diverse partition > factors (like 2 and 100). E.g., consider a case when incoming write > operations for tables with diverse partition factors are intermixed in the > context of one session. The problem is that setting the total buffer space > limit high is beneficial for the writes into the table with many partitions > (assuming those writes are evenly distributed across participating tablets), > but it may be over the server-side's limit for max transaction size if those > writes are targeted for the table with a few partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)