Hi, Some more update on my progress,
i have multithreaded indexing in my application, i have used thread pool executor and used a pool size of 4 but had a very slight increase in the performace very negligible, still it is taking around 20 minutes of time to index around 30k files, Some more info on what am i doing method where indexing is done: private void indexAllFields(IResource resource) { IFile ifile = (IFile) resource; File file = resource.getLocation().toFile(); Document doc = new Document(); try { doc.add(new StringField(FIELD_FILE_PATH, getIndexFilePath(resource), Store.YES)); doc.add(new StringField(FIELD_FILE_TYPE, ifile.getFileExtension().toLowerCase(), Store.YES)); //indexContents(file, doc); /** * Calling updateDocument will make sure that only one indexed document will be added per IFile. * Because this method deletes any existing document with the given Term and adds a new document. * This Fixes Sonic00039677 */ //iWriter.addDocument(doc); iWriter.updateDocument(new Term(FIELD_FILE_PATH, getIndexFilePath(resource)), doc); iWriter.commit(); } catch (FileNotFoundException e) { } catch (IOException e) { } } //Runnable to schedule a indexing job class IndexingJob implements Runnable{ private IResource resource; public IndexingJob(IResource resource) { this.resource = resource; } @Override public void run() { indexAllFields(resource); } } //method to queue files to be indexed void doJob(){ ThreadPoolExecutor executor = new ThreadPoolExecutor(4, 6, Long.MAX_VALUE, TimeUnit.SECONDS, workQueue); for (IResource iResource : files) { addToIndexQueue(iResource,executor); //updateBasedOnTimeStamp(iResource); } executor.shutdown(); try { executor.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } } Still with the multi threaded approach it is taking very long. TIA, Nischal Y On Mon, Sep 2, 2013 at 8:07 PM, Erick Erickson <erickerick...@gmail.com>wrote: > Stop. Back up. Test. <G>.... > > The very _first_ thing I'd do is just comment out the bit that > actually indexes the content. I'm guessing you have some > loop like: > > while (more files) { > read the file > transform the data > create a Lucene document > index the document > } > > Just comment out the "index the document" line and see how > long _that_ takes. 9 times out of 10, the bottleneck is here. > As a comparison, I can index 3-4K docs/second on my laptop. > This is using Solr and is the Wikipedia dump so the docs > are several K each. > > So, if you're going to multi-thread, you'll probably want to > multi-thread the acquisition of the data and feed that > through a separate thread that actually does the indexing, > you don't want multiple IndexWriters active at once. > > FWIW, > Erick > > > > On Mon, Sep 2, 2013 at 10:13 AM, nischal reddy > <nischal.srini...@gmail.com>wrote: > > > Hi, > > > > I am thinking to make my lucene indexing multi threaded, can someone > throw > > some light on the best approach to be followed for achieving this. > > > > I will give short gist about what i am trying to do, please suggest me > the > > best way to tackle this. > > > > What am i trying to do? > > > > I am building an index for files (around 30000 files), and later will use > > this index to search the contents of the files. The usual sequential > > approach works fine but is taking humungous amount of time (around 30 > > minutes is this the expected time or am i screwing up things somewhere?). > > > > What am i thinking to do? > > > > So to improve the performance i am thinking to make my application > > multithreaded > > > > Need suggestions :) > > > > Please suggest me best ways to do this and normally how long does lucene > > take to index 30k files? > > > > Please suggest me some links of examples (or probably best practices for > > multithreading lucene) for making my application more robust. > > > > TIA, > > Nischal Y > > >