[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-11-01 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781842#comment-17781842
 ] 

Colin McCabe commented on KAFKA-15754:
--

bq. Going to close this again, even if it's a mistery why this call 
Uuid.randomUuid().toString() produced a UUID starting with "-" in our code.

My guess would be that you are depending on an older version of the Kafka 
client libraries where this was possible.

bq. Going to close this again

Ack.

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-31 Thread Paolo Patierno (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781261#comment-17781261
 ] 

Paolo Patierno commented on KAFKA-15754:


Ok we found what I didn't see ... the while loop generating the UUID is already 
using the toString before comparing with "-"
{code:java}
while (RESERVED.contains(uuid) || uuid.toString().startsWith("-")) { {code}
So I was technically right that Base64 URL encoding can generate a string 
starting with "-" but the Uuid code was already avoiding it. I missed the 
toString used there :)

Going to close this again.

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-31 Thread Paolo Patierno (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781249#comment-17781249
 ] 

Paolo Patierno commented on KAFKA-15754:


{quote}I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, 
in fact, generate uuids starting with {{{}-{}}}.

You can see this via analysis of the code or by just running it as I did
{quote}
[~cmccabe] can you point me to such an analysis please? Maybe I am missing 
something and I want to have evidence compared to my description where I was 
referring to the RFC4648_URLSAFE alphabet which is used by the Base64 URL 
encoder and such an alphabet has the "-" so, even if it could be rare, it can 
happen. Not sure running tests and tests could make sense because it could not 
happen. I can confirm it happend to me.
{quote}I think this issue can happen if some code does _not_ use 
`Uuid.toString()` and instead uses Java's `UUID.toString()` somehow.
{quote}
[~ijuma], in the project I am working on we use:
{code:java}
Uuid.randomUuid().toString() {code}
Which is actually what the kafka-storage tool does, or? It calls the toString 
from the Uuid not UUID and such toString uses the Base64 URL encoder which 
could generate a "-" as first character.

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781155#comment-17781155
 ] 

Colin McCabe commented on KAFKA-15754:
--

{quote}
I was wondering if there is any good reason for using a Base64 URL encoder and 
not just the RFC4648 (not URL safe) which uses the common Base64 alphabet not 
containing the "-".
{quote}

At one point, I did raise the question of why dash was used to serialize Kafka 
Uuids. But by the time I did so we were already using it in a few places so the 
question was not relevant. We're not going to change Uuid serialization now.

I think the general rationale was that dash and underscore were friendlier than 
slash and plus sign. But that's debatable, of course. Slash, at least, is not 
filesystem-safe.

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781152#comment-17781152
 ] 

Colin McCabe commented on KAFKA-15754:
--

You can run this code yourself if you are curious. Here it is. You will need 
bash 4 or better. (my version is `GNU bash, version 5.2.15(1)-release 
(aarch64-apple-darwin21.6.0)`)

{code}
#!/usr/bin/env bash

declare -A IDS_PER_INITIAL_LETTER
for ((i = 0; i < 1 ; i++)); do
./kafka-storage.sh random-uuid > /tmp/out 2> /dev/null
FIRST_LETTER=$(head -c 1 /tmp/out)

IDS_PER_INITIAL_LETTER[$FIRST_LETTER]=$((IDS_PER_INITIAL_LETTER[$FIRST_LETTER]+1))
done

for k in "${!IDS_PER_INITIAL_LETTER[@]}"; do
echo "IDs starting with $k : ${IDS_PER_INITIAL_LETTER[$k]}"
done
{code}

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Colin McCabe (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781151#comment-17781151
 ] 

Colin McCabe commented on KAFKA-15754:
--

I ran `kafka-storage.sh random-uuid` 10,000 times and got the following 
distribution of first characters:
{code}
IDs starting with 0 : 166
IDs starting with 1 : 174
IDs starting with 2 : 135
IDs starting with 3 : 172
IDs starting with 4 : 155
IDs starting with 5 : 154
IDs starting with 6 : 152
IDs starting with 7 : 172
IDs starting with 8 : 170
IDs starting with 9 : 166
IDs starting with A : 147
IDs starting with B : 161
IDs starting with C : 172
IDs starting with D : 158
IDs starting with E : 164
IDs starting with F : 164
IDs starting with G : 146
IDs starting with H : 156
IDs starting with I : 166
IDs starting with J : 172
IDs starting with K : 177
IDs starting with L : 143
IDs starting with M : 171
IDs starting with N : 144
IDs starting with O : 157
IDs starting with P : 162
IDs starting with Q : 144
IDs starting with R : 157
IDs starting with S : 161
IDs starting with T : 158
IDs starting with U : 174
IDs starting with V : 166
IDs starting with W : 166
IDs starting with X : 159
IDs starting with Y : 165
IDs starting with Z : 161
IDs starting with _ : 159
IDs starting with a : 145
IDs starting with b : 169
IDs starting with c : 166
IDs starting with d : 171
IDs starting with e : 162
IDs starting with f : 154
IDs starting with g : 132
IDs starting with h : 152
IDs starting with i : 136
IDs starting with j : 166
IDs starting with k : 159
IDs starting with l : 156
IDs starting with m : 154
IDs starting with n : 155
IDs starting with o : 154
IDs starting with p : 158
IDs starting with q : 141
IDs starting with r : 165
IDs starting with s : 154
IDs starting with t : 162
IDs starting with u : 146
IDs starting with v : 161
IDs starting with w : 164
IDs starting with x : 154
IDs starting with y : 164
IDs starting with z : 154
{code}

No IDs were generated with a first character of `-`, as expected. 

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Ismael Juma (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781069#comment-17781069
 ] 

Ismael Juma commented on KAFKA-15754:
-

[~jolshan] I think this issue can happen if some code does _not_ use 
`Uuid.toString()` and instead uses Java's `UUID.toString()` somehow.

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Justine Olshan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781061#comment-17781061
 ] 

Justine Olshan commented on KAFKA-15754:


Hey [~ppatierno] sorry you encountered this issue. I think we originally chose 
this in case the topic ID was to be included in a URL.

Can you explain how you encountered the issue? It seems like the fix prevents 
random Uuids from having the dash. Unless there is a different way to generate 
the toString.

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Assignee: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"

2023-10-30 Thread Paolo Patierno (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780949#comment-17780949
 ] 

Paolo Patierno commented on KAFKA-15754:


While I could provide a fix by using the common Base64 RFC4648 (not URL safe) 
encoder, I was wondering if there was any reasons for not just using it.

> The kafka-storage tool can generate UUID starting with "-"
> --
>
> Key: KAFKA-15754
> URL: https://issues.apache.org/jira/browse/KAFKA-15754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Paolo Patierno
>Priority: Major
>
> Using the kafka-storage.sh tool, it seems that it can still generate a UUID 
> starting with a dash "-", which then breaks how the argparse4j library works. 
> With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with 
> the following error:
> kafka-storage: error: argument --cluster-id/-t: expected one argument
> Said that, it seems that this problem was already addressed in the 
> Uuid.randomUuid method which keeps generating a new UUID until it doesn't 
> start with "-". This is the commit addressing it 
> [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce]
> The problem is that when the toString is called on the Uuid instance, it's 
> going to do a Base64 encoding on the generated UUID this way:
> {code:java}
> Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); 
> {code}
> Not sure why, but the code is using an URL (safe) encoder which, taking a 
> look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using 
> the following alphabet:
>  
> {code:java}
> private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', 
> 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 
> 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 
> 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
> 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code}
> which as you can see includes the "-" character.
> So despite the current Uuid.randomUuid is avoiding the generation of a UUID 
> containing a dash, the Base64 encoding operation can return a final UUID 
> starting with the dash instead.
>  
> I was wondering if there is any good reason for using a Base64 URL encoder 
> and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet 
> not containing the "-".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)