I dont know why I am getting this error, but it looks normal to me now. because when I try to list the contents of the folder I cannot get a response from linux shell. Now I have created a folder with 100.000 files and running eclipse with -Xmx2G parameter. it is still indexing for about 15 minutes now, but I am happy it works.
After this I will try Toke's method. Create 100.000 filed folders and try to index them recursively. On Thu, Oct 21, 2010 at 4:57 AM, Toke Eskildsen <t...@statsbiblioteket.dk>wrote: > On Thu, 2010-10-21 at 05:01 +0200, Sahin Buyrukbilen wrote: > > Unfortunately both methods didnt go through. I am getting memory error > even > > at reading the directory contents. > > Then your problem is probably not Lucene related, but the sheer number > of files returned by listFiles. > > A Java File contains the full path name for the file. Let's say that > this is 50 characters, which translates to about (50 * 2 + 45) ~ 150 > bytes for the Java String. Add an int (4 bytes) plus bookkeeping and > we're up to about 200 bytes/File. > > 4.5 million Files thus takes up about 1 GB. Not enough to explain the > OOM, but if the full path name of your files is 150 characters, the list > takes up 2 GB. > > > Now, I am thinking this: What if I split 4.5million files into 100.000 > (or > > less depending on java error) files directories, index each of them > > separately and merge those indexes(if possible). > > You don't need to create separate indexes and merge them. Just split > your 4.5 million files into folders of more manageable sizes and perform > a recursive descend. Something like > > public static void addFolder(IndexWriter writer, File folder) { > File[] files = folder.listFiles(); > for (File file: files) { > if (file.isDirectory()) { > addFolder(writer, file); > } else { > // Create Document from file and add it using the writer > } > } > } > > - Toke > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >