pchencal commented on code in PR #13054:
URL: https://github.com/apache/lucene/pull/13054#discussion_r2067443555
##########
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymMap.java:
##########
@@ -218,12 +231,26 @@ public void add(CharsRef input, CharsRef output, boolean
includeOrig) {
add(input, countWords(input), output, countWords(output), includeOrig);
}
- /** Builds an {@link SynonymMap} and returns it. */
+ /** Buils a {@link SynonymMap} and returns it. */
public SynonymMap build() throws IOException {
+ return build(null);
+ }
+
+ /**
+ * Builds a {@link SynonymMap} and returns it. If directory is non-null,
it will write the
+ * compiled SynonymMap to disk and return an off-heap version.
+ */
+ public SynonymMap build(SynonymMapDirectory directory) throws IOException {
Review Comment:
When implementing the new build() method which accepts a directory path
parameter for off-heap SynonymMap storage, I encountered
`FileAlreadyExistsException` despite implementing a unique directory creation
mechanism using `System.currentTimeMillis()`.
Current implementation looks like something like this:
```
String synonymPath = ".../lucene/src/build/temp-FST";
Path dirPath = Path.of(synonymPath);
SynonymMap synonymMap;
try {
// Create directory if it doesn't exist
if (!Files.exists(dirPath)) {
Files.createDirectories(dirPath);
}
// Create a unique directory for this run
Path uniqueDirPath = dirPath.resolve("synonyms_" +
System.currentTimeMillis());
Files.createDirectory(uniqueDirPath);
synonymMap = builder.build(new
SynonymMapDirectory(uniqueDirPath));
...
```
Error observed:
```
Caused by: java.nio.file.FileAlreadyExistsException:
/temp-FST/synonyms_1745894622943
at org.apache.lucene.analysis.FilteringTokenFilter.incrementToken\\n
at
org.apache.lucene.index.IndexingChain$PerField.invertTokenStream(IndexingChain.java:1205)\\n
at
org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1183)\\n
at
org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:735)\\n
at
org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:609)\\n
at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263)\\n
at
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425)\\n
at
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1562)\\n
at
org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1520)\
at
org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:391)\\n
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:160)\\n
at
org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)\\n
at
org.apache.lucene.index.IndexingChain$PerField.invertTokenStream(IndexingChain.java:1205)\\n
at
org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1183)\\n
at
org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:735)\\n
at
org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:609)\\n
at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263)\\n
at
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425)\\n
at
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1562)\\n
at
org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1520)\\n
... [truncated]```
The exception occurs during concurrent synonym map creation, appearing
multiple times in the logs. Would appreciate guidance on proper handling of
concurrent directory creation for off-heap storage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]