NehanPathan commented on code in PR #1154:
URL: https://github.com/apache/lucenenet/pull/1154#discussion_r2058857674
##########
src/Lucene.Net.Analysis.SmartCn/Hhmm/BigramDictionary.cs:
##########
@@ -286,37 +304,37 @@ public virtual void LoadFromFile(string dctFilePath)
int j = 0;
while (j < cnt)
{
- dctFile.Read(intBuffer, 0, intBuffer.Length);
- buffer[0] =
ByteBuffer.Wrap(intBuffer).SetOrder(ByteOrder.LittleEndian)
- .GetInt32();// frequency
- dctFile.Read(intBuffer, 0, intBuffer.Length);
- buffer[1] =
ByteBuffer.Wrap(intBuffer).SetOrder(ByteOrder.LittleEndian)
- .GetInt32();// length
- dctFile.Read(intBuffer, 0, intBuffer.Length);
- // buffer[2] = ByteBuffer.wrap(intBuffer).order(
- // ByteOrder.LITTLE_ENDIAN).getInt();// handle
+ // LUCENENET: Use BinaryReader to decode little endian
instead of ByteBuffer, since this is the default in .NET
+ buffer[0] = reader.ReadInt32(); // frequency
+ buffer[1] = reader.ReadInt32(); // length
+ buffer[2] = reader.ReadInt32(); // Skip handle value
(unused)
length = buffer[1];
- if (length > 0)
+ if (length > 0 && length <= MAX_VALID_LENGTH &&
dctFile.Position + length <= dctFile.Length)
Review Comment:
---
"Hi, regarding the `maxLength` check:
- The `maxLength` was originally used to restrict the length of words read
from the dictionary file, likely to avoid reading overly large or corrupted
entries. However, this constraint wasn’t necessary in the current
implementation, as there was no upstream requirement for it and our test cases
also pass smoothly.
- As such, we’ve removed the `maxLength` check for now. If it becomes
necessary in the future (for example, to limit word sizes or handle specific
use cases), we can easily reintroduce it.
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]