[ https://issues.apache.org/jira/browse/CASSANDRA-15096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Louay Kamel updated CASSANDRA-15096: ------------------------------------ Description: *Problem* The current implementation of unset_value regularly fails (see Issues). We need to implement a new unset_value(s) mechanism which is robust and will work well for v4+ protocols. *Issues* +1- A client has to encode unset_value for all the columns+ +in an insert-prepared query values.+ example: INSERT INTO table(pkey,ckey,col1,col2,col3,col4) values(?,?,?,?,?,?); An execute query should unset all the columns one by one by encoding unset_value as "int(-2)" a- binded-values = (pkey_value, ckey_value, col1_value, unset_value, unset_value, unset_value) or b- binded-values = (pkey_value, ckey_value, unset_value, unset_value, unset_value, col4_value) etc. this increase the execute query binary buffer which is in term increase the bandwidth and latency for both request/response. +2- Returning Select-queries buffer not differentiate between null and unset_value for a subset of given rows.+ example: imagine you have a dataset in the table where each row of the returning select response have different unset/null columns, consider the following query: SELECT * FROM table where pkey = pkey_value; and with a page_size = 3 rows , row1 -> pkey_value, ckey_value, col1_value, null/unset_value, null/unset_value, null/unset_value. row2 -> pkey_value, ckey_value, null/unset_value, null/unset_value, null/unset_value, col4_value. row3 -> pkey_value, ckey_value, null/unset_value, null/unset_value, col3_value, null/unset_value. *Proposed solution* Instead of just having null(-1) and unset_value(-2), extending the unset_value(s) to a range from unset_(-2) to unset_(-2,147,483,648), where unset_value = unset_(-2) unset_rest = unset_(-2,147,483,648) anything in between will be unset_(neg_integer). +Solution for issue_1:+ a- binded-values = (pkey_value, ckey_value, col1_value, unset_rest) b- binded-values = (pkey_value, ckey_value, unset_(-4), col4_value) +Solution for issue_2:+ work with all select-un/prepared responses. row1 buffer -> pkey_value, ckey_value, col1_value, unset_rest. this will enable the buffer to shift to a new row. row2 buffer -> pkey_value, ckey_value, unset_(-4), col4_value. this will enable the buffer to skip the columns metadata -4+1=-3 columns and start decoding from col4 for the next cell_value in the row. row3 buffer -> pkey_value, ckey_value, unset_(-3), col3_value, unset_rest. this buffer is a mix of row1/row2. this solution not limited to unset_(neg-int) , it can be used on null cell responses to decrease the bandwidth between CQL and client. to be compatible with all the current v4+ cql/drivers, we should force the client to send a flag with the select query request (either in the frame-header or somewhere in the cql statement), and for returning buffer we could use the rows flags (ex, has_unset_values?: boolean) to let the driver know if it exist in the page. *Benefits* -implementing this will enable apps to design complex data-model up to 2 billion columns without trading off anything. -reducing the number of write-prepared statements in datamodel with millions of columns to a highest degree. -huge impact on the bandwidth/cpu-cycles. -easy to implement in the client side. *Record of votes* +1 Louay Kamel was: *Problem* The current implementation of unset_value regularly fails (see Issues). We need to implement a new unset_value(s) mechanism which is robust and will work well for v4+ protocols. *Issues* +1- A client has to encode unset_value for all the columns+ +in an insert-prepared query values.+ example: INSERT INTO table(pkey,ckey,col1,col2,col3,col4) values(?,?,?,?,?,?); An execute query should unset all the columns one by one by encoding unset_value as "int(-2)" a- binded-values = (pkey_value, ckey_value, col1_value, unset_value, unset_value, unset_value) or b- binded-values = (pkey_value, ckey_value, unset_value, unset_value, unset_value, col4_value) etc. this increase the execute query binary buffer which is in term increase the bandwidth and latency for both request/response. +2- Returning Select-queries buffer not differentiate between null and unset_value for a subset of given rows.+ example: imagine you have a dataset in the table where each row of the returning select response have different unset/null columns, consider the following query: SELECT * FROM table where pkey = pkey_value; and with a page_size = 3 rows , row1 -> pkey_value, ckey_value, col1_value, null/unset_value, null/unset_value, null/unset_value. row2 -> pkey_value, ckey_value, null/unset_value, null/unset_value, null/unset_value, col4_value. row3 -> pkey_value, ckey_value, null/unset_value, null/unset_value, col3_value, null/unset_value. *Proposed solution* Instead of just having null(-1) and unset_value(-2), extending the unset_value(s) to a range from unset_(-2) to unset_(-2,147,483,648), where unset_value = unset_(-2) unset_rest = unset_(-2,147,483,648) anything in between will be unset_(neg_integer). +Solution for issue_1:+ a- binded-values = (pkey_value, ckey_value, col1_value, unset_rest) b- binded-values = (pkey_value, ckey_value, unset_(-4), col4_value) +Solution for issue_2:+ work with all select-un/prepared responses. row1 buffer -> pkey_value, ckey_value, col1_value, unset_rest. this will enable the buffer to shift to a new row. row2 buffer -> pkey_value, ckey_value, unset_(-4), col4_value. this will enable the buffer to skip the columns metadata -4+1=-3 columns and start decoding from col4 for the next cell_value in the row. row3 buffer -> pkey_value, ckey_value, unset_(-3), col3_value, unset_rest. this buffer is a mix of row1/row2. this solution not limited to unset_(neg-int) , it can be used on null cell responses to decrease the bandwidth between CQL and client. to be compatible with all the current v4+ cql/drivers, we should force the client to send a flag with the select query request (either in the frame-header or somewhere in the cql statement), and for returning buffer we could use the rows flags (ex, has_unset_values?: boolean) to let the driver know if it exist in the page. *Benefits* -implementing this will enable apps to design complex data-model up to 2 billion columns without trading off anything. -reducing the number of write-prepared statements in datamodel with millions of columns to a highest degree. -huge impact on the bandwidth/cpu-cycles. -easy to implement in the client side. # Record of votes +1 Louay Kamel > [RFC CQL v4+] cql_extension: wide range of unset_values. > -------------------------------------------------------- > > Key: CASSANDRA-15096 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15096 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Interpreter, CQL/Semantics > Reporter: Louay Kamel > Priority: Normal > > *Problem* > The current implementation of unset_value regularly fails (see Issues). > We need to implement a new unset_value(s) mechanism which is robust and will > work well for v4+ protocols. > *Issues* > +1- A client has to encode unset_value for all the columns+ > +in an insert-prepared query values.+ > example: INSERT INTO table(pkey,ckey,col1,col2,col3,col4) values(?,?,?,?,?,?); > An execute query should unset all the columns one by one by encoding > unset_value as "int(-2)" > a- binded-values = (pkey_value, ckey_value, col1_value, unset_value, > unset_value, unset_value) or > b- binded-values = (pkey_value, ckey_value, unset_value, unset_value, > unset_value, col4_value) etc. > this increase the execute query binary buffer which is in term increase the > bandwidth and latency for both request/response. > +2- Returning Select-queries buffer not differentiate between null and > unset_value for a subset of given rows.+ > example: > imagine you have a dataset in the table where each row of the returning > select response have different > unset/null columns, consider the following query: > SELECT * FROM table where pkey = pkey_value; > and with a page_size = 3 rows , > row1 -> pkey_value, ckey_value, col1_value, null/unset_value, > null/unset_value, null/unset_value. > row2 -> pkey_value, ckey_value, null/unset_value, null/unset_value, > null/unset_value, col4_value. > row3 -> pkey_value, ckey_value, null/unset_value, null/unset_value, > col3_value, null/unset_value. > *Proposed solution* > Instead of just having null(-1) and unset_value(-2), extending the > unset_value(s) > to a range from unset_(-2) to unset_(-2,147,483,648), > where unset_value = unset_(-2) > unset_rest = unset_(-2,147,483,648) > anything in between will be unset_(neg_integer). > +Solution for issue_1:+ > a- binded-values = (pkey_value, ckey_value, col1_value, unset_rest) > b- binded-values = (pkey_value, ckey_value, unset_(-4), col4_value) > +Solution for issue_2:+ > work with all select-un/prepared responses. > row1 buffer -> pkey_value, ckey_value, col1_value, unset_rest. > this will enable the buffer to shift to a new row. > row2 buffer -> pkey_value, ckey_value, unset_(-4), col4_value. > this will enable the buffer to skip the columns metadata -4+1=-3 columns and > start decoding from col4 for the next cell_value in the row. > row3 buffer -> pkey_value, ckey_value, unset_(-3), col3_value, unset_rest. > this buffer is a mix of row1/row2. > this solution not limited to unset_(neg-int) , it can be used on null cell > responses to decrease the bandwidth between CQL and client. > to be compatible with all the current v4+ cql/drivers, we should force the > client to send a flag with the select query request (either in the > frame-header or somewhere in the cql statement), > and for returning buffer we could use the rows flags (ex, has_unset_values?: > boolean) to let the driver know if it exist in the page. > *Benefits* > -implementing this will enable apps to design complex data-model up to 2 > billion columns without trading off anything. > -reducing the number of write-prepared statements in datamodel with millions > of columns to a highest degree. > -huge impact on the bandwidth/cpu-cycles. > > -easy to implement in the client side. > *Record of votes* > +1 Louay Kamel -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org