Hi Team, Thanks for the responses.
I was able to upload 25000 folders each with 15 documents in a derby database. When i tried to add a new document to one of these folders, It is taking a lot of time to do this addition of new document. The document size that i used is 2.5 MB pdf document. I used profiler to look into this issue, It seems PDFbox is taking a lot of time. Also i had set "indexMergerPoolSize" parameter to 50, "extractorPoolSize" parameter to 50. Can you help me to resolve this problem. Thanks Ajai G Stefan Guggisberg wrote: > > On Mon, Jul 27, 2009 at 4:36 PM, Ajai<[email protected]> wrote: >> >> Actually i am doing the right way as you mentioned, having session.save() >> after each file. >> But i do have text extractors and indexes turned on. >> My Configuration: >> >> for searchindex: >> >> <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> </SearchIndex> >> >> >> My Index config: >> >> <?xml version="1.0"?> >> <!DOCTYPE configuration SYSTEM >> "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd"> >> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0" >> xmlns:jcr="http://www.jcp.org/jcr/1.0"> >> <index-rule nodeType="nt:file"> >> <property>jcr:content</property> >> </index-rule> >> <index-rule nodeType="nt:resource"> >> <property>jcr:data</property> >> </index-rule> >> </configuration> >> >> Kindly tell me the optimal way to use them. > > as already suggested in my earlier post: > > 1. disable search index or text extractors and compare results > 2. remove checkin() call and compare results > 3. use embedded derby and compare results > 4. if you provide GenRandom.java, i'll run the test on my own machine. > > cheers > stefan > >> >> >> Thanks >> Ajai G >> >> >> >> Guo Du wrote: >>> >>> On Mon, Jul 27, 2009 at 2:56 PM, Ajai<[email protected]> wrote: >>>> >>>> Hi Guo, >>>> >>>> Yes, i am adding a document to the repository. >>>> Is there multiple ways to do a save? >>>> >>>> I am doing it the following way, >>>> >>>> fileNode = matterNode.addNode(fileName, "nt:file"); >>>> fileNode.addMixin("mix:versionable"); >>>> fileNode.addMixin("mix:referenceable"); >>>> Node resNode = fileNode.addNode("jcr:content", "nt:resource"); >>>> resNode.addMixin("mix:versionable"); >>>> resNode.addMixin("mix:referenceable"); >>>> resNode.setProperty("jcr:mimeType", mimeType); >>>> resNode.setProperty("jcr:encoding", ENCODING_UTF_8); >>>> resNode.setProperty("jcr:data", new FileInputStream(file)); >>>> Calendar lastModified = Calendar.getInstance(); >>>> lastModified.setTimeInMillis(file.lastModified()); >>>> resNode.setProperty("jcr:lastModified", lastModified); >>>> // finally >>>> session.save(); >>>> >>>> Please suggest if any changes can be done. >>>> >>> >>> >>> Your code doesn't show details of the loop. >>> >>> >>> WRONG >>> ============== >>> loop{ // 375000 times >>> addNode(...) >>> } >>> session.save(); >>> ============== >>> >>> >>> >>> CORRECT >>> ============== >>> loop{ // 375000 times >>> addNode(...) >>> session.save(); >>> } >>> ============== >>> You may also add multiple documents before call session.save() to take >>> advantage of batch process more efficiently. But not after add all >>> 375000 documents. >>> >>> --Guo >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Performance-of-Jackrabbit-tp24619853p24681862.html >> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/Performance-of-Jackrabbit-tp24619853p24702639.html Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
