If I were you, I would write like this. Not sure this helps. Let me how it works
public static void createIndex() throws CorruptIndexException, LockObtainFailedException, IOException { Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); Directory indexDir = FSDirectory.open(new File("/media/work/WIKI/indexes/")); boolean recreateIndexIfExists = true; IndexWriter indexWriter = new IndexWriter(indexDir, analyzer, recreateIndexIfExists, IndexWriter.MaxFieldLength.UNLIMITED); File dir = new File(FILES_TO_INDEX_DIRECTORY); File[] files = dir.listFiles(); for (File file : files) { Document document = new Document(); //String path = file.getCanonicalPath(); document.add(new Field(FIELD_NAME, Field_Value, Field.Store.YES, Field.Index.NOT_ANALYZED)); indexWriter.addDocument(document); } indexWriter.close(); } Good Luck. Qi Li On Wed, Oct 20, 2010 at 2:45 PM, Sahin Buyrukbilen < sahin.buyrukbi...@gmail.com> wrote: > with the different parameters I still got the same error. My code is very > simple, indeed I am only concerned with creating the index and then I will > do some private information retrieval experiments on the inverted index > file, which I created with the information extracted from the index. That > is > why I didnt go over optimization until now. the database size I had was ery > small compared to 4.5million. > My code is as follows: > > public static void createIndex() throws CorruptIndexException, > LockObtainFailedException, IOException { > Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); > Directory indexDir = FSDirectory.open(new > File("/media/work/WIKI/indexes/")); > boolean recreateIndexIfExists = true; > IndexWriter indexWriter = new IndexWriter(indexDir, analyzer, > recreateIndexIfExists, IndexWriter.MaxFieldLength.UNLIMITED); > indexWriter.setUseCompoundFile(false); > File dir = new File(FILES_TO_INDEX_DIRECTORY); > File[] files = dir.listFiles(); > for (File file : files) { > Document document = new Document(); > > //String path = file.getCanonicalPath(); > //document.add(new Field(FIELD_PATH, path, Field.Store.YES, > Field.Index.NOT_ANALYZED)); > > Reader reader = new FileReader(file); > document.add(new Field(FIELD_CONTENTS, reader)); > > indexWriter.addDocument(document); > } > indexWriter.optimize(); > indexWriter.close(); > } > > > On Wed, Oct 20, 2010 at 2:39 PM, Qi Li <aler...@gmail.com> wrote: > > > 1. What is the difference when you used different vm parameters? > > 2 What merge policy and optimization strategy did you use? > > 3. How did you use the commit or flush ? > > > > Qi > > > > On Wed, Oct 20, 2010 at 2:05 PM, Sahin Buyrukbilen < > > sahin.buyrukbi...@gmail.com> wrote: > > > > > Thank you so much for this infor. it looks pretty complicated for me > but > > I > > > will try. > > > > > > > > > > > > On Wed, Oct 20, 2010 at 1:18 AM, Johnbin Wang <johnbin.w...@gmail.com > > > >wrote: > > > > > > > You can start a fixedThreadPool to index all these files in the > > multhread > > > > way. Every thread execute an index task which could index a part of > all > > > the > > > > files. In the index task, when indexing 10000 files, you need execute > > the > > > > indexWrite.commit() method to flush all the index add operation to > disk > > > > file. > > > > > > > > If you need index all these files into only one index file, you need > to > > > > hold > > > > only one indexWriter instance among all the index thread. > > > > > > > > Hope it's helpful. > > > > > > > > > > > > > > > > On Wed, Oct 20, 2010 at 1:05 PM, Sahin Buyrukbilen < > > > > sahin.buyrukbi...@gmail.com> wrote: > > > > > > > > > Thank you Johnbin, > > > > > do you know which parameter I have to play with? > > > > > > > > > > On Wed, Oct 20, 2010 at 12:59 AM, Johnbin Wang < > > johnbin.w...@gmail.com > > > > > >wrote: > > > > > > > > > > > I think you can write index file once every 10,000 files or less > > have > > > > > been > > > > > > read. > > > > > > > > > > > > On Wed, Oct 20, 2010 at 12:11 PM, Sahin Buyrukbilen < > > > > > > sahin.buyrukbi...@gmail.com> wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > I have to index about 4.5Million txt files. When I run the my > > > > indexing > > > > > > > application through Eclipse, I get this error : "Exception in > > > thread > > > > > > "main" > > > > > > > java.lang.OutOfMemoryError: Java heap space" > > > > > > > > > > > > > > eclipse -vmargs -Xmx2000m -Xss8192k > > > > > > > > > > > > > > eclipse -vmargs -Xms40M -Xmx2G > > > > > > > > > > > > > > I tried running Eclipse with above memory parameters, but > still > > > had > > > > > the > > > > > > > same error. The architecture of my computer is AMD x2 64bit > 2GHz > > > > > > processor, > > > > > > > Ubuntu 10.04 LTS 64bit. java-6-openjdk. > > > > > > > > > > > > > > Anybody has a suggestion? > > > > > > > > > > > > > > thank you. > > > > > > > Sahin. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > cheers, > > > > > > Johnbin Wang > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > cheers, > > > > Johnbin Wang > > > > > > > > > >