BsoBird commented on code in PR #5891:
URL: https://github.com/apache/hive/pull/5891#discussion_r2166715290


##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFUUIDV7.java:
##########
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.udf;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDF;
+import org.apache.hadoop.io.Text;
+
+import java.util.UUID;
+import java.util.concurrent.ThreadLocalRandom;
+
+/**
+ * UDFUUIDV7.
+ *
+ */
+@Description(name = "uuid_v7",
+        value = "_FUNC_() - Returns a universally unique identifier (UUID_V7) 
string.",
+        extended = """
+                The value is returned as a canonical UUID 36-character string.
+                Example:
+                  > SELECT _FUNC_();
+                  '0baf1f52-53df-487f-8292-99a03716b688'
+                  > SELECT _FUNC_();
+                  '36718a53-84f5-45d6-8796-4f79983ad49d'""")
+@UDFType(deterministic = false)
+public class UDFUUIDV7 extends UDF {
+  private final Text result = new Text();
+  /**
+   * Returns a universally unique identifier (UUID_V7) string (36 characters).
+   *
+   * @return Text
+   */
+  public Text evaluate() {
+    result.set(randomUUIDV7().toString());
+    return result;
+  }
+
+
+  private UUID randomUUIDV7() {

Review Comment:
   @deniskuzZ But sir, a production rate of five thousand per second is really 
too slow. Additionally, the UUIDs generated by pseudo-randomness can at most be 
predicted, but what impact could that possibly have? HIVE's UDFs themselves 
extensively use pseudo-random functions. For example, the rand() function.
   
   In the vast majority of database/data warehouse cases, uuid_v7 is only used 
as a unique key for records. In this scenario, even if the UUID is predicted, 
it doesn't seem to have much impact.
   
   Assuming we generate 20 billion uuid_v7 ids, with 300 running in parallel, 
it would take a full 3.7 hours to complete. However, using the original uuid 
(uuidv4) function only takes 0.009 hours. This gap is simply too large. I can 
hardly imagine how the inefficient UUID_V7 could be effectively used in a 
production environment.
   
   Assuming the existing secure version of uuid_v7 can achieve a production 
rate of around 100,000 per second, I would definitely choose the secure 
version. Because at this point, the impact on efficiency is acceptable.
   
   If we absolutely need a secure uuid function, how about we add parameters to 
the uuid_v7 function, similar to pg, to control the generation behavior of 
uuid_v7 by passing parameters?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to