[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781261#comment-17781261 ] Paolo Patierno edited comment on KAFKA-15754 at 10/31/23 8:14 AM: -- Ok we found what I didn't see ... the while loop generating the UUID is already using the toString before comparing with "-" {code:java} while (RESERVED.contains(uuid) || uuid.toString().startsWith("-")) { {code} So I was technically right that Base64 URL encoding can generate a string starting with "-" but the Uuid code was already avoiding it. I missed the toString used there :) Going to close this again, even if it's a mistery why this call Uuid.randomUuid().toString() produced a UUID starting with "-" in our code. was (Author: ppatierno): Ok we found what I didn't see ... the while loop generating the UUID is already using the toString before comparing with "-" {code:java} while (RESERVED.contains(uuid) || uuid.toString().startsWith("-")) { {code} So I was technically right that Base64 URL encoding can generate a string starting with "-" but the Uuid code was already avoiding it. I missed the toString used there :) Going to close this again. > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781261#comment-17781261 ] Paolo Patierno edited comment on KAFKA-15754 at 10/31/23 8:14 AM: -- Ok we found what I didn't see ... the while loop generating the UUID is already using the toString before comparing with "-" {code:java} while (RESERVED.contains(uuid) || uuid.toString().startsWith("-")) { {code} So I was technically right that Base64 URL encoding can generate a string starting with "-" but the Uuid code was already avoiding it. I missed the toString used there :) Going to close this again, even if it's a mistery why this call Uuid.randomUuid().toString() produced a UUID starting with "-" in our code. was (Author: ppatierno): Ok we found what I didn't see ... the while loop generating the UUID is already using the toString before comparing with "-" {code:java} while (RESERVED.contains(uuid) || uuid.toString().startsWith("-")) { {code} So I was technically right that Base64 URL encoding can generate a string starting with "-" but the Uuid code was already avoiding it. I missed the toString used there :) Going to close this again, even if it's a mistery why this call Uuid.randomUuid().toString() produced a UUID starting with "-" in our code. > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781249#comment-17781249 ] Paolo Patierno edited comment on KAFKA-15754 at 10/31/23 7:47 AM: -- {quote}I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, in fact, generate uuids starting with {{{}-{}}}. You can see this via analysis of the code or by just running it as I did {quote} [~cmccabe] can you point me to such an analysis please? Maybe I am missing something and I want to have evidence compared to my description where I was referring to the RFC4648_URLSAFE alphabet which is used by the Base64 URL encoder and such an alphabet has the "-" so, even if it could be rare, it can happen. Not sure running tests and tests could make sense because it could not happen. I can confirm it happend to me. {quote}I think this issue can happen if some code does _not_ use `Uuid.toString()` and instead uses Java's `UUID.toString()` somehow. {quote} [~ijuma], in the project I am working on we use: {code:java} Uuid.randomUuid().toString() {code} Which is actually what the kafka-storage tool does, or? It calls the toString from the Uuid not UUID and such toString uses the Base64 URL encoder which could generate a "-" as first character. This is the reference to the RFC [https://datatracker.ietf.org/doc/html/rfc4648#section-5] The table shows that a "-" character could be present into the encoded value. was (Author: ppatierno): {quote}I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, in fact, generate uuids starting with {{{}-{}}}. You can see this via analysis of the code or by just running it as I did {quote} [~cmccabe] can you point me to such an analysis please? Maybe I am missing something and I want to have evidence compared to my description where I was referring to the RFC4648_URLSAFE alphabet which is used by the Base64 URL encoder and such an alphabet has the "-" so, even if it could be rare, it can happen. Not sure running tests and tests could make sense because it could not happen. I can confirm it happend to me. {quote}I think this issue can happen if some code does _not_ use `Uuid.toString()` and instead uses Java's `UUID.toString()` somehow. {quote} [~ijuma], in the project I am working on we use: {code:java} Uuid.randomUuid().toString() {code} Which is actually what the kafka-storage tool does, or? It calls the toString from the Uuid not UUID and such toString uses the Base64 URL encoder which could generate a "-" as first character. > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sen
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781152#comment-17781152 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:21 PM: - You can run this code yourself if you are curious. Here it is. You will need bash 4 or better. (my version is {{GNU bash, version 5.2.15(1)-release (aarch64-apple-darwin21.6.0)}}) {code} #!/usr/bin/env bash declare -A IDS_PER_INITIAL_LETTER for ((i = 0; i < 1 ; i++)); do ./kafka-storage.sh random-uuid > /tmp/out 2> /dev/null FIRST_LETTER=$(head -c 1 /tmp/out) IDS_PER_INITIAL_LETTER[$FIRST_LETTER]=$((IDS_PER_INITIAL_LETTER[$FIRST_LETTER]+1)) done for k in "${!IDS_PER_INITIAL_LETTER[@]}"; do echo "IDs starting with $k : ${IDS_PER_INITIAL_LETTER[$k]}" done {code} was (Author: cmccabe): You can run this code yourself if you are curious. Here it is. You will need bash 4 or better. (my version is `GNU bash, version 5.2.15(1)-release (aarch64-apple-darwin21.6.0)`) {code} #!/usr/bin/env bash declare -A IDS_PER_INITIAL_LETTER for ((i = 0; i < 1 ; i++)); do ./kafka-storage.sh random-uuid > /tmp/out 2> /dev/null FIRST_LETTER=$(head -c 1 /tmp/out) IDS_PER_INITIAL_LETTER[$FIRST_LETTER]=$((IDS_PER_INITIAL_LETTER[$FIRST_LETTER]+1)) done for k in "${!IDS_PER_INITIAL_LETTER[@]}"; do echo "IDs starting with $k : ${IDS_PER_INITIAL_LETTER[$k]}" done {code} > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781153#comment-17781153 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:20 PM: - I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, in fact, generate uuids starting with {{-}}. You can see this via analysis of the code or by just running it as I did was (Author: cmccabe): I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, in fact, generate uuids starting with '-' > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781153#comment-17781153 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:20 PM: - I am closing this JIRA because {{kafka-storage.sh random-uuid}} can not, in fact, generate uuids starting with '-' was (Author: cmccabe): I am closing this JIRA because `kafka-storage.sh` can not, in fact, generate uuids starting with '-' > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781151#comment-17781151 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:20 PM: - I ran {{kafka-storage.sh random-uuid}} 10,000 times and got the following distribution of first characters: {code} IDs starting with 0 : 166 IDs starting with 1 : 174 IDs starting with 2 : 135 IDs starting with 3 : 172 IDs starting with 4 : 155 IDs starting with 5 : 154 IDs starting with 6 : 152 IDs starting with 7 : 172 IDs starting with 8 : 170 IDs starting with 9 : 166 IDs starting with A : 147 IDs starting with B : 161 IDs starting with C : 172 IDs starting with D : 158 IDs starting with E : 164 IDs starting with F : 164 IDs starting with G : 146 IDs starting with H : 156 IDs starting with I : 166 IDs starting with J : 172 IDs starting with K : 177 IDs starting with L : 143 IDs starting with M : 171 IDs starting with N : 144 IDs starting with O : 157 IDs starting with P : 162 IDs starting with Q : 144 IDs starting with R : 157 IDs starting with S : 161 IDs starting with T : 158 IDs starting with U : 174 IDs starting with V : 166 IDs starting with W : 166 IDs starting with X : 159 IDs starting with Y : 165 IDs starting with Z : 161 IDs starting with _ : 159 IDs starting with a : 145 IDs starting with b : 169 IDs starting with c : 166 IDs starting with d : 171 IDs starting with e : 162 IDs starting with f : 154 IDs starting with g : 132 IDs starting with h : 152 IDs starting with i : 136 IDs starting with j : 166 IDs starting with k : 159 IDs starting with l : 156 IDs starting with m : 154 IDs starting with n : 155 IDs starting with o : 154 IDs starting with p : 158 IDs starting with q : 141 IDs starting with r : 165 IDs starting with s : 154 IDs starting with t : 162 IDs starting with u : 146 IDs starting with v : 161 IDs starting with w : 164 IDs starting with x : 154 IDs starting with y : 164 IDs starting with z : 154 {code} No IDs were generated with a first character of {{-}}, as expected. was (Author: cmccabe): I ran {kafka-storage.sh random-uuid} 10,000 times and got the following distribution of first characters: {code} IDs starting with 0 : 166 IDs starting with 1 : 174 IDs starting with 2 : 135 IDs starting with 3 : 172 IDs starting with 4 : 155 IDs starting with 5 : 154 IDs starting with 6 : 152 IDs starting with 7 : 172 IDs starting with 8 : 170 IDs starting with 9 : 166 IDs starting with A : 147 IDs starting with B : 161 IDs starting with C : 172 IDs starting with D : 158 IDs starting with E : 164 IDs starting with F : 164 IDs starting with G : 146 IDs starting with H : 156 IDs starting with I : 166 IDs starting with J : 172 IDs starting with K : 177 IDs starting with L : 143 IDs starting with M : 171 IDs starting with N : 144 IDs starting with O : 157 IDs starting with P : 162 IDs starting with Q : 144 IDs starting with R : 157 IDs starting with S : 161 IDs starting with T : 158 IDs starting with U : 174 IDs starting with V : 166 IDs starting with W : 166 IDs starting with X : 159 IDs starting with Y : 165 IDs starting with Z : 161 IDs starting with _ : 159 IDs starting with a : 145 IDs starting with b : 169 IDs starting with c : 166 IDs starting with d : 171 IDs starting with e : 162 IDs starting with f : 154 IDs starting with g : 132 IDs starting with h : 152 IDs starting with i : 136 IDs starting with j : 166 IDs starting with k : 159 IDs starting with l : 156 IDs starting with m : 154 IDs starting with n : 155 IDs starting with o : 154 IDs starting with p : 158 IDs starting with q : 141 IDs starting with r : 165 IDs starting with s : 154 IDs starting with t : 162 IDs starting with u : 146 IDs starting with v : 161 IDs starting with w : 164 IDs starting with x : 154 IDs starting with y : 164 IDs starting with z : 154 {code} No IDs were generated with a first character of {-}, as expected. > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781153#comment-17781153 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:19 PM: - I am closing this JIRA because `kafka-storage.sh` can not, in fact, generate uuids starting with '-' was (Author: cmccabe): kafka-storage tool can not, in fact, generate uuids starting with '-' > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781151#comment-17781151 ] Colin McCabe edited comment on KAFKA-15754 at 10/30/23 11:19 PM: - I ran {kafka-storage.sh random-uuid} 10,000 times and got the following distribution of first characters: {code} IDs starting with 0 : 166 IDs starting with 1 : 174 IDs starting with 2 : 135 IDs starting with 3 : 172 IDs starting with 4 : 155 IDs starting with 5 : 154 IDs starting with 6 : 152 IDs starting with 7 : 172 IDs starting with 8 : 170 IDs starting with 9 : 166 IDs starting with A : 147 IDs starting with B : 161 IDs starting with C : 172 IDs starting with D : 158 IDs starting with E : 164 IDs starting with F : 164 IDs starting with G : 146 IDs starting with H : 156 IDs starting with I : 166 IDs starting with J : 172 IDs starting with K : 177 IDs starting with L : 143 IDs starting with M : 171 IDs starting with N : 144 IDs starting with O : 157 IDs starting with P : 162 IDs starting with Q : 144 IDs starting with R : 157 IDs starting with S : 161 IDs starting with T : 158 IDs starting with U : 174 IDs starting with V : 166 IDs starting with W : 166 IDs starting with X : 159 IDs starting with Y : 165 IDs starting with Z : 161 IDs starting with _ : 159 IDs starting with a : 145 IDs starting with b : 169 IDs starting with c : 166 IDs starting with d : 171 IDs starting with e : 162 IDs starting with f : 154 IDs starting with g : 132 IDs starting with h : 152 IDs starting with i : 136 IDs starting with j : 166 IDs starting with k : 159 IDs starting with l : 156 IDs starting with m : 154 IDs starting with n : 155 IDs starting with o : 154 IDs starting with p : 158 IDs starting with q : 141 IDs starting with r : 165 IDs starting with s : 154 IDs starting with t : 162 IDs starting with u : 146 IDs starting with v : 161 IDs starting with w : 164 IDs starting with x : 154 IDs starting with y : 164 IDs starting with z : 154 {code} No IDs were generated with a first character of {-}, as expected. was (Author: cmccabe): I ran `kafka-storage.sh random-uuid` 10,000 times and got the following distribution of first characters: {code} IDs starting with 0 : 166 IDs starting with 1 : 174 IDs starting with 2 : 135 IDs starting with 3 : 172 IDs starting with 4 : 155 IDs starting with 5 : 154 IDs starting with 6 : 152 IDs starting with 7 : 172 IDs starting with 8 : 170 IDs starting with 9 : 166 IDs starting with A : 147 IDs starting with B : 161 IDs starting with C : 172 IDs starting with D : 158 IDs starting with E : 164 IDs starting with F : 164 IDs starting with G : 146 IDs starting with H : 156 IDs starting with I : 166 IDs starting with J : 172 IDs starting with K : 177 IDs starting with L : 143 IDs starting with M : 171 IDs starting with N : 144 IDs starting with O : 157 IDs starting with P : 162 IDs starting with Q : 144 IDs starting with R : 157 IDs starting with S : 161 IDs starting with T : 158 IDs starting with U : 174 IDs starting with V : 166 IDs starting with W : 166 IDs starting with X : 159 IDs starting with Y : 165 IDs starting with Z : 161 IDs starting with _ : 159 IDs starting with a : 145 IDs starting with b : 169 IDs starting with c : 166 IDs starting with d : 171 IDs starting with e : 162 IDs starting with f : 154 IDs starting with g : 132 IDs starting with h : 152 IDs starting with i : 136 IDs starting with j : 166 IDs starting with k : 159 IDs starting with l : 156 IDs starting with m : 154 IDs starting with n : 155 IDs starting with o : 154 IDs starting with p : 158 IDs starting with q : 141 IDs starting with r : 165 IDs starting with s : 154 IDs starting with t : 162 IDs starting with u : 146 IDs starting with v : 161 IDs starting with w : 164 IDs starting with x : 154 IDs starting with y : 164 IDs starting with z : 154 {code} No IDs were generated with a first character of `-`, as expected. > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Assignee: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780949#comment-17780949 ] Paolo Patierno edited comment on KAFKA-15754 at 10/30/23 10:54 AM: --- While I could provide a fix by using the common Base64 RFC4648 (not URL safe) encoder, I was wondering if there was any reasons for not just using it. [~mumrah] I think you were the one applying the fix, can you enlight me please? Or maybe [~jolshan] who added the UUID class for the first time? was (Author: ppatierno): While I could provide a fix by using the common Base64 RFC4648 (not URL safe) encoder, I was wondering if there was any reasons for not just using it. [~mumrah] I think you were the one applying the fix, can you enlight me please? > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-15754) The kafka-storage tool can generate UUID starting with "-"
[ https://issues.apache.org/jira/browse/KAFKA-15754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780949#comment-17780949 ] Paolo Patierno edited comment on KAFKA-15754 at 10/30/23 10:39 AM: --- While I could provide a fix by using the common Base64 RFC4648 (not URL safe) encoder, I was wondering if there was any reasons for not just using it. [~mumrah] I think you were the one applying the fix, can you enlight me please? was (Author: ppatierno): While I could provide a fix by using the common Base64 RFC4648 (not URL safe) encoder, I was wondering if there was any reasons for not just using it. > The kafka-storage tool can generate UUID starting with "-" > -- > > Key: KAFKA-15754 > URL: https://issues.apache.org/jira/browse/KAFKA-15754 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Paolo Patierno >Priority: Major > > Using the kafka-storage.sh tool, it seems that it can still generate a UUID > starting with a dash "-", which then breaks how the argparse4j library works. > With such an UUID (i.e. -rmdB0m4T4–Y4thlNXk4Q in my case) the tool exits with > the following error: > kafka-storage: error: argument --cluster-id/-t: expected one argument > Said that, it seems that this problem was already addressed in the > Uuid.randomUuid method which keeps generating a new UUID until it doesn't > start with "-". This is the commit addressing it > [https://github.com/apache/kafka/commit/5c1dd493d6f608b566fdad5ab3a896cb13622bce] > The problem is that when the toString is called on the Uuid instance, it's > going to do a Base64 encoding on the generated UUID this way: > {code:java} > Base64.getUrlEncoder().withoutPadding().encodeToString(getBytesFromUuid()); > {code} > Not sure why, but the code is using an URL (safe) encoder which, taking a > look at the Base64 class in Java, is using a RFC4648_URLSAFE encoder using > the following alphabet: > > {code:java} > private static final char[] toBase64URL = new char[]{'A', 'B', 'C', 'D', 'E', > 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', > 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', > 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', > 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-', '_'}; {code} > which as you can see includes the "-" character. > So despite the current Uuid.randomUuid is avoiding the generation of a UUID > containing a dash, the Base64 encoding operation can return a final UUID > starting with the dash instead. > > I was wondering if there is any good reason for using a Base64 URL encoder > and not just the RFC4648 (not URL safe) which uses the common Base64 alphabet > not containing the "-". -- This message was sent by Atlassian Jira (v8.20.10#820010)