Re: [I] Eliminate usage of ByteBuffer in SmartCn [lucenenet]

via GitHub Sun, 30 Mar 2025 06:08:53 -0700


NightOwl888 commented on issue #1153:
URL: https://github.com/apache/lucenenet/issues/1153#issuecomment-2764556685


   @NehanPathan 
   
   Thank you for volunteering to take on this project. I have assigned it to 
you.
   
   > Should the test files be added directly to the 
`Lucene.Net.Analysis.SmartCn` assembly to avoid `InternalsVisibleTo` 
complications?
   
   No. These are *test* files, they should be added to the 
`Lucene.Net.Tests.Analysis.SmartCn` project.
   
   > If added to the `Lucene.Net.Tests.Analysis.SmartCn` project, we have two 
options:
   > - Make internal classes public (which is not ideal).
   > - Use `[InternalsVisibleTo]` with a public key, but this approach requires 
generating a strong-name key (.snk) to sign the assembly. Since the main 
assembly is strongly named, adding a friend assembly also requires a public 
key, and generating this key is usually a maintainer’s responsibility, not a 
contributor's task.
   
   We already have the build [setup to auto-generate `InternalsVisibleTo` 
attributes with the public 
key](https://github.com/apache/lucenenet/blob/a0578d6ea2c17c06be925adb15acd3c64d5fc824/Directory.Build.targets#L170-L177).
 To use it, just set an `ItemGroup` with an `InternalsVisibleTo` element and 
the project that you wish to give visibility to.
   
   ```xml
   <ItemGroup>
     <InternalsVisibleTo Include="Lucene.Net.Analysis.SmartCn" />
   </ItemGroup>
   ```
   
   That being said, I am not sure we actually need to access any internal APIs 
and `InternalsVisibleTo` doesn't affect the visibility of embedded resources 
(they are always public).
   
   I asked ChatGPT how to do this task (it is recommended to use LLMs for 
research because documentation for Lucene is scarce) and here is an approach it 
came up with for creating files with a small amount of test data.
   
   https://chatgpt.com/share/67e93675-c90c-8005-98e7-2f62d1491f6b
   
   > Note that the Kuromoji module has a [tool to generate the dictionary 
files](https://lucenenet.apache.org/docs/4.8.0-beta00017/cli/analysis/kuromoji-build-dictionary.html).
 I am not sure why there isn't a similar tool for the SmartCn module, but 
ChatGPT shows how the file can be generated using a .NET console app.
   >
   > Also note that the example only shows the `coredict.dct` file. You will 
need to ask ChatGPT what the format for the `bigramdict.dct` file is supposed 
to look like and ask it to generate a test file that is compatible with the 
first one.
   
   The test would be slightly different than what is shown.
   
   1. We embed loose files into the test DLL as an embedded resource so nothing 
other than the DLL is required for testing.
   2. The test class should subclass `LuceneTestCase`.
   3. Temp directories or individual temp files can be created using the 
[`LuceneTestCase.CreateTempDir()` or `LuceneTestCase.CreateTempFile()` method 
overloads](https://lucenenet.apache.org/docs/4.8.0-beta00017/cli/analysis/kuromoji-build-dictionary.html),
 depending on whether there is one temp file or multiple to group together for 
the test. The test framework takes care of cleaning up the files at the end of 
the test or optionally leaving them on disk for debugging.
   4. Once a temp file or directory is created, the contents of the embedded 
file can be copied to it. The embedded stream can be retrieved relative to a 
class using the `FindAndGetManifestResourceStream()` extension method. For that 
to work, the embedded file should be in the same directory as the test class. 
Then the output temp file can be opened for writing. The copy can be done using 
`Stream.CopyTo()` from the embedded resource stream to the output stream. 
   5. The same temp file or folder path plus filename can then be used for the 
real disk load test.
   
   Let me know if you need any additional assistance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Eliminate usage of ByteBuffer in SmartCn [lucenenet]

Reply via email to