thomasmueller commented on code in PR #1791:
URL: https://github.com/apache/jackrabbit-oak/pull/1791#discussion_r1799587771


##########
oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/index/ElasticIndexHelper.java:
##########
@@ -94,8 +94,9 @@ private static ObjectBuilder<TypeMapping> 
loadMappings(@NotNull TypeMapping.Buil
     }
 
     private static void mapInternalProperties(@NotNull TypeMapping.Builder 
builder) {
-        builder.properties(FieldNames.PATH,
-                        b1 -> b1.keyword(builder3 -> builder3))
+        builder.properties(FieldNames.PATH, b1 -> b1.keyword(builder3 -> 
builder3))

Review Comment:
   ```suggestion
           builder.properties(FieldNames.PATH, 
                           b1 -> b1.keyword(builder3 -> builder3))
                   .properties(ElasticIndexDefinition.PATH_RANDOM_VALUE,
                           b1 -> b1.short_(b2 -> 
b2.docValues(true).index(false)))
   ```



##########
oak-search-elastic/src/main/java/org/apache/jackrabbit/oak/plugins/index/elastic/util/ElasticIndexUtils.java:
##########
@@ -52,6 +53,31 @@ public static String idFromPath(@NotNull String path) {
         return path;
     }
 
+    /**
+     * Generates a consistent short value based on a given string using the 
SHA-256 algorithm.
+     * The generated value  is a 16-bit integer ranging from -32,768 to 32,767.
+     *
+     * @param input the input string to be hashed
+     * @return a short value from the input string
+     * @throws IllegalStateException if the SHA-256 algorithm is not available
+     */
+    public static short getRandomShortFromString(String input) {
+        try {
+            // Get a SHA-256 MessageDigest instance
+            MessageDigest md = MessageDigest.getInstance("SHA-256");

Review Comment:
   So you use a secure hash function, SHA-256, without seed, but then truncate 
to 2 bytes. I assume the reason to use SHA-256 is to ensure it is safe (from 
attackers that could cause a hash collision)? But then truncating to 16 bits 
and not using a seed will remove the safety... It is easy to generate text such 
that 2 bytes of the SHA-256 collide.
   
   For the approximate node counter, I used SipHash, and a seed. SipHash with 
seed is faster than SHA-256, and does protect against some attacker (who could, 
in theory, cause collisions).
   
   I would recommend to use a seed as well, and SipHash... actually CRC32 would 
be enough I guess.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to