Re: [PR] feat: Optimize SmartCn Dictionaries and Add Dictionary Loading Tests [lucenenet]

via GitHub Thu, 24 Apr 2025 09:47:20 -0700


NehanPathan commented on code in PR #1154:
URL: https://github.com/apache/lucenenet/pull/1154#discussion_r2058857674



##########
src/Lucene.Net.Analysis.SmartCn/Hhmm/BigramDictionary.cs:
##########
@@ -286,37 +304,37 @@ public virtual void LoadFromFile(string dctFilePath)
                 int j = 0;
                 while (j < cnt)
                 {
-                    dctFile.Read(intBuffer, 0, intBuffer.Length);
-                    buffer[0] = 
ByteBuffer.Wrap(intBuffer).SetOrder(ByteOrder.LittleEndian)
-                        .GetInt32();// frequency
-                    dctFile.Read(intBuffer, 0, intBuffer.Length);
-                    buffer[1] = 
ByteBuffer.Wrap(intBuffer).SetOrder(ByteOrder.LittleEndian)
-                        .GetInt32();// length
-                    dctFile.Read(intBuffer, 0, intBuffer.Length);
-                    // buffer[2] = ByteBuffer.wrap(intBuffer).order(
-                    // ByteOrder.LITTLE_ENDIAN).getInt();// handle
+                    // LUCENENET: Use BinaryReader to decode little endian 
instead of ByteBuffer, since this is the default in .NET
+                    buffer[0] = reader.ReadInt32(); // frequency
+                    buffer[1] = reader.ReadInt32(); // length
+                    buffer[2] = reader.ReadInt32(); // Skip handle value 
(unused)
 
                     length = buffer[1];
-                    if (length > 0)
+                    if (length > 0 && length <= MAX_VALID_LENGTH && 
dctFile.Position + length <= dctFile.Length)

Review Comment:
   
   ---
   
   
   "Hi, regarding the `maxLength` check:
   
   - The `maxLength` was originally used to restrict the length of words read 
from the dictionary file, likely to avoid reading overly large or corrupted 
entries. However, this constraint wasn’t necessary in the current 
implementation, as there was no upstream requirement for it and our test cases 
also pass smoothly.
   - As such, we’ve removed the `maxLength` check for now. If it becomes 
necessary in the future (for example, to limit word sizes or handle specific 
use cases), we can easily reintroduce it.
   
   
   ---
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: Optimize SmartCn Dictionaries and Add Dictionary Loading Tests [lucenenet]

Reply via email to