On Thu, 2010-10-21 at 05:01 +0200, Sahin Buyrukbilen wrote: > Unfortunately both methods didnt go through. I am getting memory error even > at reading the directory contents.
Then your problem is probably not Lucene related, but the sheer number of files returned by listFiles. A Java File contains the full path name for the file. Let's say that this is 50 characters, which translates to about (50 * 2 + 45) ~ 150 bytes for the Java String. Add an int (4 bytes) plus bookkeeping and we're up to about 200 bytes/File. 4.5 million Files thus takes up about 1 GB. Not enough to explain the OOM, but if the full path name of your files is 150 characters, the list takes up 2 GB. > Now, I am thinking this: What if I split 4.5million files into 100.000 (or > less depending on java error) files directories, index each of them > separately and merge those indexes(if possible). You don't need to create separate indexes and merge them. Just split your 4.5 million files into folders of more manageable sizes and perform a recursive descend. Something like public static void addFolder(IndexWriter writer, File folder) { File[] files = folder.listFiles(); for (File file: files) { if (file.isDirectory()) { addFolder(writer, file); } else { // Create Document from file and add it using the writer } } } - Toke --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org