Hello, I refactored out the HDFS directory implementation from Solr to use in my own project and was surprised to see how it performed. I'm using the both the HDFSDirectory class and the HdfsDirectoryFactory class.
On my local machine when using the cache there was a significant speed up. It was a small enough that each file making up lucene index (12 docs) fit into one block inside the cache. When running it on a multinode cluster on AWS the performance pulling back 1031 docs with the cache was not that much better than without. According to my log statements, the cache was being hit every time, but the difference between this an my local was that there were several blocks per file. When setting up the cache I used the default setting as specified in HdfsDirectoryFactory. Any ideas on how to speed up searches? Should I change the block size? Is there something that blur does to put a wrapper around the cache? ON A MULTI NODE CLUSTER Number of documents in directory[1031] Try #1 -> Total execution time: 3776 Try #2 -> Total execution time: 2995 Try #3 -> Total execution time: 2683 Try #4 -> Total execution time: 2301 Try #5 -> Total execution time: 2174 Try #6 -> Total execution time: 2253 Try #7 -> Total execution time: 2184 Try #8 -> Total execution time: 2087 Try #9 -> Total execution time: 2157 Try #10 -> Total execution time: 2089 Cached try #1 -> Total execution time: 2065 Cached try #2 -> Total execution time: 2298 Cached try #3 -> Total execution time: 2398 Cached try #4 -> Total execution time: 2421 Cached try #5 -> Total execution time: 2080 Cached try #6 -> Total execution time: 2060 Cached try #7 -> Total execution time: 2285 Cached try #8 -> Total execution time: 2048 Cached try #9 -> Total execution time: 2087 Cached try #10 -> Total execution time: 2106 ON MY LOCAL Number of documents in directory[12] Try #1 -> Total execution time: 627 Try #2 -> Total execution time: 620 Try #3 -> Total execution time: 637 Try #4 -> Total execution time: 535 Try #5 -> Total execution time: 486 Try #6 -> Total execution time: 527 Try #7 -> Total execution time: 363 Try #8 -> Total execution time: 430 Try #9 -> Total execution time: 431 Try #10 -> Total execution time: 337 Cached try #1 -> Total execution time: 38 Cached try #2 -> Total execution time: 38 Cached try #3 -> Total execution time: 36 Cached try #4 -> Total execution time: 35 Cached try #5 -> Total execution time: 135 Cached try #6 -> Total execution time: 31 Cached try #7 -> Total execution time: 36 Cached try #8 -> Total execution time: 30 Cached try #9 -> Total execution time: 29 Cached try #10 -> Total execution time: 28 Thanks, Josh