thank you for all. I tried Peter's suggestion and it really worked. I added -Xmx2G to the run/debug VM arguments and it worked. However, before getting his advice I had already started to split the files into folders 100.000 by 100.000 and now indexing recursively :) Anyway, I have learned much from all of you guys.
Thank you. Sahin. On Fri, Oct 22, 2010 at 9:22 AM, Peter Keegan <peterlkee...@gmail.com>wrote: > > running eclipse with -Xmx2G parameter. > > This only affects the Eclipse JVM, not the JVM launched by Eclipse to run > your application. > Did you add -Xmx2G to the 'VM arguments' of your Debug or Run > configuration? > > Peter > > On Thu, Oct 21, 2010 at 3:26 PM, Sahin Buyrukbilen < > sahin.buyrukbi...@gmail.com> wrote: > > > I dont know why I am getting this error, but it looks normal to me now. > > because when I try to list the contents of the folder I cannot get a > > response from linux shell. Now I have created a folder with 100.000 files > > and running eclipse with -Xmx2G parameter. it is still indexing for about > > 15 > > minutes now, but I am happy it works. > > > > After this I will try Toke's method. Create 100.000 filed folders and try > > to > > index them recursively. > > > > On Thu, Oct 21, 2010 at 4:57 AM, Toke Eskildsen <t...@statsbiblioteket.dk > > >wrote: > > > > > On Thu, 2010-10-21 at 05:01 +0200, Sahin Buyrukbilen wrote: > > > > Unfortunately both methods didnt go through. I am getting memory > error > > > even > > > > at reading the directory contents. > > > > > > Then your problem is probably not Lucene related, but the sheer number > > > of files returned by listFiles. > > > > > > A Java File contains the full path name for the file. Let's say that > > > this is 50 characters, which translates to about (50 * 2 + 45) ~ 150 > > > bytes for the Java String. Add an int (4 bytes) plus bookkeeping and > > > we're up to about 200 bytes/File. > > > > > > 4.5 million Files thus takes up about 1 GB. Not enough to explain the > > > OOM, but if the full path name of your files is 150 characters, the > list > > > takes up 2 GB. > > > > > > > Now, I am thinking this: What if I split 4.5million files into > 100.000 > > > (or > > > > less depending on java error) files directories, index each of them > > > > separately and merge those indexes(if possible). > > > > > > You don't need to create separate indexes and merge them. Just split > > > your 4.5 million files into folders of more manageable sizes and > perform > > > a recursive descend. Something like > > > > > > public static void addFolder(IndexWriter writer, File folder) { > > > File[] files = folder.listFiles(); > > > for (File file: files) { > > > if (file.isDirectory()) { > > > addFolder(writer, file); > > > } else { > > > // Create Document from file and add it using the writer > > > } > > > } > > > } > > > > > > - Toke > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > >