[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781152#comment-17781152
 ] 

Colin McCabe commented on KAFKA-15754:
--------------------------------------

You can run this code yourself if you are curious. Here it is. You will need 
bash 4 or better. (my version is `GNU bash, version 5.2.15(1)-release 
(aarch64-apple-darwin21.6.0)`)

{code}
#!/usr/bin/env bash

declare -A IDS_PER_INITIAL_LETTER
for ((i = 0; i < 10000 ; i++)); do
    ./kafka-storage.sh random-uuid > /tmp/out 2> /dev/null
    FIRST_LETTER=$(head -c 1 /tmp/out)
    
IDS_PER_INITIAL_LETTER[$FIRST_LETTER]=$((IDS_PER_INITIAL_LETTER[$FIRST_LETTER]+1))
done

for k in "${!IDS_PER_INITIAL_LETTER[@]}"; do
    echo "IDs starting with $k : ${IDS_PER_INITIAL_LETTER[$k]}"
done
{code}

> The kafka-storage tool can generate UUID starting with "-"
> ----------------------------------------------------------
>
>                 Key: KAFKA-15754
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15754
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.6.0
>            Reporter: Paolo Patierno
>            Assignee: Paolo Patierno
>            Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to