Re: [PR] Move synonym map off-heap for SynonymGraphFilter [lucene]

via GitHub Tue, 29 Apr 2025 14:17:26 -0700


pchencal commented on code in PR #13054:
URL: https://github.com/apache/lucene/pull/13054#discussion_r2067443555



##########
lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymMap.java:
##########
@@ -218,12 +231,26 @@ public void add(CharsRef input, CharsRef output, boolean 
includeOrig) {
       add(input, countWords(input), output, countWords(output), includeOrig);
     }
 
-    /** Builds an {@link SynonymMap} and returns it. */
+    /** Buils a {@link SynonymMap} and returns it. */
     public SynonymMap build() throws IOException {
+      return build(null);
+    }
+
+    /**
+     * Builds a {@link SynonymMap} and returns it. If directory is non-null, 
it will write the
+     * compiled SynonymMap to disk and return an off-heap version.
+     */
+    public SynonymMap build(SynonymMapDirectory directory) throws IOException {

Review Comment:
   When implementing the new build() method which accepts a directory path 
parameter for off-heap SynonymMap storage, I encountered 
`FileAlreadyExistsException` despite implementing a unique directory creation 
mechanism using `System.currentTimeMillis()`.
   
   Current implementation looks like something like this:     
   ```
   String synonymPath = ".../lucene/src/build/temp-FST"; 
   Path dirPath = Path.of(synonymPath);
   SynonymMap synonymMap;
           try {
               // Create directory if it doesn't exist
               if (!Files.exists(dirPath)) {
                   Files.createDirectories(dirPath);
               }
               // Create a unique directory for this run
               Path uniqueDirPath = dirPath.resolve("synonyms_" + 
System.currentTimeMillis());
               Files.createDirectory(uniqueDirPath);
               synonymMap = builder.build(new 
SynonymMapDirectory(uniqueDirPath)); 
               ...
   ```
   
   Error observed:
   ``` 
   Caused by: java.nio.file.FileAlreadyExistsException: 
/temp-FST/synonyms_1745894622943
       at org.apache.lucene.analysis.FilteringTokenFilter.incrementToken\\n
       at 
org.apache.lucene.index.IndexingChain$PerField.invertTokenStream(IndexingChain.java:1205)\\n
   
       at 
org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1183)\\n
     
       at 
org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:735)\\n 
       at 
org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:609)\\n
      
       at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263)\\n
        
       at 
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425)\\n
   
       at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1562)\\n 
       at 
org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1520)\
       at 
org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:391)\\n
      
       at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:160)\\n 
       at 
org.apache.lucene.analysis.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:51)\\n
        
       at 
org.apache.lucene.index.IndexingChain$PerField.invertTokenStream(IndexingChain.java:1205)\\n
   
       at 
org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1183)\\n
     
       at 
org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:735)\\n 
       at 
org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:609)\\n
      
       at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263)\\n
        
       at 
org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425)\\n
   
       at 
org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1562)\\n 
       at 
org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1520)\\n    
       ... [truncated]```
   
   The exception occurs during concurrent synonym map creation,  appearing 
multiple times in the logs. Would appreciate guidance on proper handling of 
concurrent directory creation for off-heap storage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Move synonym map off-heap for SynonymGraphFilter [lucene]

Reply via email to