Re: Performance of Jackrabbit

Ajai Tue, 28 Jul 2009 09:07:55 -0700

Hi Team,

Thanks for the responses.


I was able to upload 25000 folders each with 15 documents in a derby
database.

When i tried to add a new document to one of these folders, It is taking a
lot of time to do this addition of new document. The document size that i
used is 2.5 MB pdf document.

I used profiler to look into this issue, It seems PDFbox is taking a lot of
time.
Also i had set "indexMergerPoolSize" parameter to 50, "extractorPoolSize"
parameter to 50.

Can you help me to resolve this problem.

Thanks 
Ajai G



Stefan Guggisberg wrote:
> 
> On Mon, Jul 27, 2009 at 4:36 PM, Ajai<[email protected]> wrote:
>>
>> Actually i am doing the right way as you mentioned, having session.save()
>> after each file.
>> But i do have text extractors and indexes turned on.
>> My Configuration:
>>
>> for searchindex:
>>
>> <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>                </SearchIndex>
>>
>>
>> My Index config:
>>
>> <?xml version="1.0"?>
>> <!DOCTYPE configuration SYSTEM
>> "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd";>
>> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0";
>>        xmlns:jcr="http://www.jcp.org/jcr/1.0";>
>>        <index-rule nodeType="nt:file">
>>                <property>jcr:content</property>
>>        </index-rule>
>>        <index-rule nodeType="nt:resource">
>>                <property>jcr:data</property>
>>        </index-rule>
>> </configuration>
>>
>> Kindly tell me the optimal way to use them.
> 
> as already suggested in my earlier post:
> 
> 1. disable search index or text extractors and compare results
> 2. remove checkin() call and compare results
> 3. use embedded derby and compare results
> 4. if you provide GenRandom.java, i'll run the test on my own machine.
> 
> cheers
> stefan
> 
>>
>>
>> Thanks
>> Ajai G
>>
>>
>>
>> Guo Du wrote:
>>>
>>> On Mon, Jul 27, 2009 at 2:56 PM, Ajai<[email protected]> wrote:
>>>>
>>>> Hi Guo,
>>>>
>>>> Yes, i am adding a document to the repository.
>>>> Is there multiple ways to do a save?
>>>>
>>>> I am doing it the following way,
>>>>
>>>> fileNode = matterNode.addNode(fileName, "nt:file");
>>>> fileNode.addMixin("mix:versionable");
>>>> fileNode.addMixin("mix:referenceable");
>>>> Node resNode = fileNode.addNode("jcr:content", "nt:resource");
>>>> resNode.addMixin("mix:versionable");
>>>> resNode.addMixin("mix:referenceable");
>>>> resNode.setProperty("jcr:mimeType", mimeType);
>>>> resNode.setProperty("jcr:encoding", ENCODING_UTF_8);
>>>> resNode.setProperty("jcr:data", new FileInputStream(file));
>>>> Calendar lastModified = Calendar.getInstance();
>>>> lastModified.setTimeInMillis(file.lastModified());
>>>> resNode.setProperty("jcr:lastModified", lastModified);
>>>> // finally
>>>> session.save();
>>>>
>>>> Please suggest if any changes can be done.
>>>>
>>>
>>>
>>> Your code doesn't show details of the loop.
>>>
>>>
>>> WRONG
>>> ==============
>>> loop{ // 375000 times
>>>   addNode(...)
>>> }
>>> session.save();
>>> ==============
>>>
>>>
>>>
>>> CORRECT
>>> ==============
>>> loop{ // 375000 times
>>>   addNode(...)
>>>   session.save();
>>> }
>>> ==============
>>> You may also add multiple documents before call session.save() to take
>>> advantage of batch process more efficiently. But not after add all
>>> 375000 documents.
>>>
>>> --Guo
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Performance-of-Jackrabbit-tp24619853p24681862.html
>> Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Performance-of-Jackrabbit-tp24619853p24702639.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

Re: Performance of Jackrabbit

Reply via email to