[GitHub] [jackrabbit-oak] steffenvan commented on a diff in pull request #1071: OAK-10384: Fix stripping of large indexed ordered properties

via GitHub Thu, 07 Sep 2023 02:44:23 -0700


steffenvan commented on code in PR #1071:
URL: https://github.com/apache/jackrabbit-oak/pull/1071#discussion_r1318360063



##########
oak-lucene/src/main/java/org/apache/jackrabbit/oak/plugins/index/lucene/LuceneDocumentMaker.java:
##########
@@ -315,6 +313,38 @@ protected boolean indexTypeOrderedFields(Document doc, 
String pname, int tag, Pr
         }
         return fieldAdded;
     }
+    
+    protected static BytesRef checkTruncateLength(String prop, String value, 
String path, int maxLength) {
+        log.trace("Property {} at path:[{}] has value {}", prop, path, value);
+
+        BytesRef ref = new BytesRef(value);
+        if (ref.length <= maxLength) {
+            return ref;
+        }
+        log.info("Truncating property {} at path:[{}] as length after encoding 
{} is > {} ",
+            prop, path, ref.length, maxLength);
+        int end = maxLength - 1;
+        // skip over tails of utf-8 multi-byte sequences (up to 3 bytes)
+        while ((ref.bytes[end] & 0b11000000) == 0b10000000) {

Review Comment:
   Could we add examples of these skipping and removals of multi-byte 
sequences? And also add explanations for why they are necessary? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [jackrabbit-oak] steffenvan commented on a diff in pull request #1071: OAK-10384: Fix stripping of large indexed ordered properties

Reply via email to