NehanPathan opened a new pull request, #1154:
URL: https://github.com/apache/lucenenet/pull/1154
---
### π― **Objective:**
This pull request (PR) optimizes the SmartCn dictionary loading process and
introduces unit tests to ensure correctness and maintainability.
---
### π₯ **Key Changes:**
β
**1. Dictionary Optimization:**
- Replaced `ByteBuffer` with `BinaryReader.ReadInt32()` for faster and more
efficient data reading.
- Implemented `ReadOnlySpan<char>` to minimize memory usage and improve
overall performance.
β
**2. Comprehensive Unit Tests Added:**
- **Test File:** `DictionaryTests.cs`
- Contains tests for loading dictionaries and verifying dictionary
operations.
- **BigramDictionary Tests:**
- `GetInstance()` method to ensure correct singleton instantiation.
- `LoadFromFile()` method to verify successful loading of the dictionary
from `bigramDict.dct`.
- `GetFrequency()` method to test frequency retrieval of valid and
non-existent entries.
- **WordDictionary Tests:**
- `GetInstance()` to confirm proper instantiation.
- Future tests can be added if `LoadMainDataFromFile()` becomes
accessible (Currently it is private method).
β
**3. Resource Files Added:**
- **Location:** `Lucene.Net.Tests.Analysis.SmartCn.Resources`
- `bigramDict.dct`
- `coreDict.dct`
β
**4. Embedded Resource Loading:**
- Embedded both `.dct` files as resources in the test assembly to eliminate
external dependencies.
- Created a utility in `LuceneTestCase` to extract these resources as
temporary files during tests.
---
### π§ͺ **Testing Details:**
π **Test Scenarios:**
- Validated successful loading of both `bigramDict.dct` and `coreDict.dct`
from embedded resources.
- Checked frequency retrieval for valid entries (`hello`, `world`) and
ensured non-existent entries return `0`.
- Verified that the `GetInstance()` method returns a non-null singleton
instance.
β
**Assertions Included:**
- Frequency correctness for known entries.
- Proper dictionary instantiation.
- No regression in dictionary functionality.
---
### π **Why These Changes?**
π‘ **Performance Improvements:**
- Faster dictionary loading with reduced memory overhead.
π‘ **Increased Test Coverage:**
- Ensures that dictionary operations work correctly and efficiently.
π‘ **Simplified Testing Workflow:**
- Embedded resource handling eliminates file path dependencies.
---
### π **Future Considerations:**
π **Testing `WordDictionary`:**
- Currently limited to `GetInstance()` due to private access of
`LoadMainDataFromFile()`.
- Additional tests can be added when the methodβs visibility is updated.
π **Performance Enhancements:**
- Future work may include further performance optimization of dictionary
lookups and hash collision handling.
---
### π **Issue Reference:**
Fixes #1153
---
### π **Checklist:**
- [x] Read and followed the [[Contributor
Guide](https://github.com/apache/lucenenet/blob/main/CONTRIBUTING.md)](https://github.com/apache/lucenenet/blob/main/CONTRIBUTING.md)
and [[Code of
Conduct](https://www.apache.org/foundation/policies/conduct.html)](https://www.apache.org/foundation/policies/conduct.html).
- [x] Included relevant unit or integration tests.
- [x] Added inline documentation where applicable.
- [x] Created an open issue and linked it to this PR.
---
## π **How to Run Tests:**
1. Build the solution using `dotnet build`.
2. Run tests using `dotnet test` to verify that all dictionary operations
work correctly.
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]