[jira] [Commented] (HBASE-28174) DELETE endpoint in REST API does not support deleting binary row keys/columns

James Udiljak (Jira) Tue, 24 Oct 2023 19:08:04 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-28174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779296#comment-17779296
 ]


James Udiljak commented on HBASE-28174:
---------------------------------------

Thanks [~wchevreuil],

I'm not sure I have enough Java knowledge to attempt solution 1), although it 
would be my preferred solution as it would open the door for multiple delete 
operations in a single request.
Solution 2) seems like it should be within the limits of my Java abilities, 
though. An advantage of approach 2) is that the {{RowSpec}} class is shared 
with the GET/PUT/POST operations, so these would also gain the ability to be 
used with binary row/column keys specified on the URL as base64.

I was thinking of placing something like this near the top of 
{{RowSpec.parseRowKeys}}:

{java}
// pseudocode
if(queryStringParams.getValueOrDefault("keyEnc") == "b64") { // Maybe hoist 
this into a member variable in the RowSpec class
  StringBuilder sb = new StringBuilder();
  char c;
  while (i++ < path.length() && (c = path.charAt(i)) != '/') {
    sb.append(c);
  }
  this.row = java.util.Base64.Decoder.decode(URLDecoder.decode(sb.toString(), 
HConstants.UTF8_ENCODING);
  return i;
}

// Otherwise, continue with original parsing code
// ...
{java}

The function currently supports providing both a start and end row separated by 
a comma character. This should be easy enough to support in the b64 code path: 
Simply allow sending two b64 encoded row keys separated by an 
unencoded/percent-encoded comma. Start/end ranges can also currently be 
specified by providing only a single row key prefix and ending with a {{*}} 
character. Again, this could potentially be supported in the b64 code path by 
providing an unencoded/percent-encoded asterisk character at the end of the row 
key.

The function {{RowSpec.parseColumns}} could be modified in a similar way to 
handle b64 encoded column families/qualifiers, although the convention set by 
other parts of the REST API is that the first appearance of the literal byte 
0x3A (the ASCII/UTF-8 encoding of {{:}}) is used to separate the column family 
from the offset in a binary column key. We could elect to follow that 
convention here rather than having two base64-encoded values separated by an 
unencoded/percent-encoded colon character, or even support both.

Does this sound good to you?

> DELETE endpoint in REST API does not support deleting binary row keys/columns
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-28174
>                 URL: https://issues.apache.org/jira/browse/HBASE-28174
>             Project: HBase
>          Issue Type: Bug
>          Components: REST
>    Affects Versions: 2.4.17
>            Reporter: James Udiljak
>            Priority: Blocker
>         Attachments: delete_base64_1.png
>
>
> h2. Notes
> This is the first time I have raised an issue in the ASF Jira. Please let me 
> know if there's anything I need to adjust on the issue to fit in with your 
> development flow.
> I have marked the priority as "blocker" because this issue blocks me as a 
> user of the HBase REST API from deploying an effective solution for our 
> setup. Please feel free to change this if the Priority field has another 
> meaning to you.
> I have also chosen 2.4.17 as the affected version because this is the version 
> I am running, however looking at the source code on GitHub in the default 
> branch, I think many other versions would be affected.
> h2. Description of Issue
> The DELETE operation in the [HBase REST 
> API|https://hbase.apache.org/1.2/apidocs/org/apache/hadoop/hbase/rest/package-summary.html#operation_delete]
>  requires specifying row keys and column families/offsets in the URI (i.e. as 
> UTF-8 text). This makes it impossible to specify a delete operation via the 
> REST API for a binary row key or column family/offset, as single bytes with a 
> decimal value greater than 127 are not valid in UTF-8.
> Percent-encoding these "high" values does not work around the issue, as the 
> HBase REST API uses Java's {{URLDecoder.Decode(percentEncodedString, 
> "UTF-8")}} function, which replaces any percent-encoded byte in the range 
> {{%80}} to {{%FF}} with the [replacement 
> character|https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character].
>  Even if this were not the case, the row-key is ultimately [converted to a 
> byte 
> array|https://github.com/apache/hbase/blob/rel/2.4.17/hbase-rest/src/main/java/org/apache/hadoop/hbase/rest/RowSpec.java#L60-L100]
>  using UTF-8 encoding, wherein code points >127 are encoded across multiple 
> bytes, corrupting the user-supplied row key.
> h2. Proposed Solution
> I do not believe it is possible to allow encoding of arbitrary bytes in the 
> URL for the DELETE endpoint without breaking compatibility for any users who 
> may have been unknowingly UTF-8 encoding their binary row keys. Even if it 
> were possible, the syntax would likely be terse.
> Instead, I propose a new version of the DELETE endpoint that would accept row 
> keys and column families/offsets in the request _body_ (using Base64 encoding 
> for the JSON and XML formats, and bare binary for protobuf). This new 
> endpoint would follow the same conventions as the PUT operations, except that 
> cell values would not need to be specified (unless the user is performing a 
> check-and-delete operation).
> As an additional benefit, using the request body could potentially allow for 
> deleting multiple rows in a single request, which would drastically improve 
> the efficiency of my use case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-28174) DELETE endpoint in REST API does not support deleting binary row keys/columns

Reply via email to