NightOwl888 commented on issue #1153: URL: https://github.com/apache/lucenenet/issues/1153#issuecomment-2764556685
@NehanPathan Thank you for volunteering to take on this project. I have assigned it to you. > Should the test files be added directly to the `Lucene.Net.Analysis.SmartCn` assembly to avoid `InternalsVisibleTo` complications? No. These are *test* files, they should be added to the `Lucene.Net.Tests.Analysis.SmartCn` project. > If added to the `Lucene.Net.Tests.Analysis.SmartCn` project, we have two options: > - Make internal classes public (which is not ideal). > - Use `[InternalsVisibleTo]` with a public key, but this approach requires generating a strong-name key (.snk) to sign the assembly. Since the main assembly is strongly named, adding a friend assembly also requires a public key, and generating this key is usually a maintainer’s responsibility, not a contributor's task. We already have the build [setup to auto-generate `InternalsVisibleTo` attributes with the public key](https://github.com/apache/lucenenet/blob/a0578d6ea2c17c06be925adb15acd3c64d5fc824/Directory.Build.targets#L170-L177). To use it, just set an `ItemGroup` with an `InternalsVisibleTo` element and the project that you wish to give visibility to. ```xml <ItemGroup> <InternalsVisibleTo Include="Lucene.Net.Analysis.SmartCn" /> </ItemGroup> ``` That being said, I am not sure we actually need to access any internal APIs and `InternalsVisibleTo` doesn't affect the visibility of embedded resources (they are always public). I asked ChatGPT how to do this task (it is recommended to use LLMs for research because documentation for Lucene is scarce) and here is an approach it came up with for creating files with a small amount of test data. https://chatgpt.com/share/67e93675-c90c-8005-98e7-2f62d1491f6b > Note that the Kuromoji module has a [tool to generate the dictionary files](https://lucenenet.apache.org/docs/4.8.0-beta00017/cli/analysis/kuromoji-build-dictionary.html). I am not sure why there isn't a similar tool for the SmartCn module, but ChatGPT shows how the file can be generated using a .NET console app. > > Also note that the example only shows the `coredict.dct` file. You will need to ask ChatGPT what the format for the `bigramdict.dct` file is supposed to look like and ask it to generate a test file that is compatible with the first one. The test would be slightly different than what is shown. 1. We embed loose files into the test DLL as an embedded resource so nothing other than the DLL is required for testing. 2. The test class should subclass `LuceneTestCase`. 3. Temp directories or individual temp files can be created using the [`LuceneTestCase.CreateTempDir()` or `LuceneTestCase.CreateTempFile()` method overloads](https://lucenenet.apache.org/docs/4.8.0-beta00017/cli/analysis/kuromoji-build-dictionary.html), depending on whether there is one temp file or multiple to group together for the test. The test framework takes care of cleaning up the files at the end of the test or optionally leaving them on disk for debugging. 4. Once a temp file or directory is created, the contents of the embedded file can be copied to it. The embedded stream can be retrieved relative to a class using the `FindAndGetManifestResourceStream()` extension method. For that to work, the embedded file should be in the same directory as the test class. Then the output temp file can be opened for writing. The copy can be done using `Stream.CopyTo()` from the embedded resource stream to the output stream. 5. The same temp file or folder path plus filename can then be used for the real disk load test. Let me know if you need any additional assistance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
