Of course, here is the repository config :
//////////////////////////////////////////////////
// START REPOSITORY.XML//
//////////////////////////////////////////////////
<?xml version="1.0"?>
<!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit
1.2//EN"
"http://jackrabbit.apache.org/dtd/repository-1.2.dtd">
<Repository>
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/repository"/>
</FileSystem>
<Security appName="Jackrabbit">
<!--
access manager:
class: FQN of class implementing the AccessManager
interface
-->
<AccessManager
class="org.apache.jackrabbit.core.security.SimpleAccessManager">
<!-- <param name="config"
value="${rep.home}/access.xml"/> -->
</AccessManager>
<LoginModule
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
<!-- anonymous user name ('anonymous' is the default
value) -->
<param name="anonymousId" value="anonymous"/>
<!--
default user name to be used instead of the
anonymous user
when no login credentials are provided (unset
by default)
-->
<!-- <param name="defaultUserId" value="superuser"/> -->
</LoginModule>
</Security>
<!--
location of workspaces root directory and name of default
workspace
-->
<Workspaces rootPath="${rep.home}/workspaces"
defaultWorkspace="default"/>
<!--
workspace configuration template:
used to create the initial workspace if there's no workspace yet
-->
<Workspace name="${wsp.name}">
<!--
virtual file system of the workspace:
class: FQN of class implementing the FileSystem
interface
-->
<FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${wsp.home}"/>
</FileSystem>
<!--
persistence manager of the workspace:
class: FQN of class implementing the PersistenceManager
interface
-->
<PersistenceManager
class="org.apache.jackrabbit.core.persistence.db.SimpleDbPersistenceManager">
<param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url"
value="jdbc:mysql:///testJack?autoReconnect=true"/>
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="${wsp.name}_"/>
<param name="externalBLOBs" value="false"/>
<param name="user" value="root"/>
<param name="password" value="password"/>
</PersistenceManager>
<!--
Search index and the file system it uses.
class: FQN of class implementing the QueryHandler
interface
-->
<SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
</SearchIndex>
</Workspace>
<!--
Configures the versioning
-->
<Versioning rootPath="${rep.home}/version">
<!--
Configures the filesystem to use for versioning for the
respective
persistence manager
-->
<FileSystem
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/version"/>
</FileSystem>
<!--
Configures the persistence manager to be used for
persisting version state.
Please note that the current versioning implementation
is based on
a 'normal' persistence manager, but this could change
in future
implementations.
-->
<PersistenceManager
class="org.apache.jackrabbit.core.persistence.db.SimpleDbPersistenceManager">
<param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url"
value="jdbc:mysql:///testJackVer?autoReconnect=true"/>
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="version_"/>
<param name="externalBLOBs" value="false"/>
<param name="user" value="root"/>
<param name="password" value="password"/>
</PersistenceManager>
</Versioning>
<!--
Search index for content that is shared repository wide
(/jcr:system tree, contains mainly versions)
-->
<SearchIndex
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${rep.home}/repository/index"/>
</SearchIndex>
</Repository>
///////////////////////////////////////////////
// END REPOSITORY.XML//
//////////////////////////////////////////////
And the code doing the creation, I give you the two algortihm implementations :
/////////////////////////////////////////////////////////////////
// FIRST ALGORITHM : Node by Node//
////////////////////////////////////////////////////////////////
Node contractors = (Node) session.getItem("/lgw:root/lgw:contractors");
int count = number_of_nodes; // whatever, put the number of nodes to create
for (int i = 0; i < count; i++) {
Node contractor = contractors.addNode("lgw:contractor");
initializeContractor(session, contractor);
created++;
}
session.save();
////////////////////////////////////////////////
// END FIRST ALGORITHM //
////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////
// SECOND ALGORITHM : Node by Node//
/////////////////////////////////////////////////////////////////////
Node contractors = (Node) session.getItem("/lgw:root/lgw:contractors");
int count = number_of_nodes; // whatever, put the number of nodes to create
for (int i = 0; i < count; i++) {
Node contractor = contractors.addNode("lgw:contractor");
initializeContractor(session, contractor);
created++;
session.save();
}
/////////////////////////////////////////////////////
// END SECOND ALGORITHM //
////////////////////////////////////////////////////
Frédéric Esnault - Ingénieur R&D
-----Message d'origine-----
De : Thomas Mueller [mailto:[EMAIL PROTECTED]
Envoyé : mercredi 20 juin 2007 09:51
À : [email protected]
Objet : Re: atomic vs group node creation/storage
Hi,
Could you send the configuration (repository.xml file), and the code
if possible (so I don't have to write it again). Just recently I
though I saw a similar problem, but I am not sure if it's related.
Thanks,
Thomas
On 6/20/07, Frédéric Esnault <[EMAIL PROTECTED]> wrote:
> Hello there !
>
>
>
> It seems to me that there is a storage problem, when you create a lot of
> nodes, one by one, using this algorithm :
>
> 1. for each node to create
>
> a. create node
> b. fill node properties/child nodes
> c. save session
>
> 2. end for
>
>
>
> The default_node and default_prop tables number of rows (and size) increases
> very fast, and in an unacceptable way.
>
> I had a 35 million default_node table after inserting like this 27 000 nodes
> in a repository.
>
>
>
> Then I used the other algorithm :
>
> 1. for each node to create
>
> a. create node
> b. fill node properties/child nodes
>
> 2. end for
> 3. save session
>
>
>
> And this gives a much better situation (currently I have a 36 000 content
> repository, and my tables are correct - 60 000 rows for node table,
>
> 576 000 rows for properties).
>
>
>
> The problem here is that in a production environment, users are going to
> create their nodes one by one, day after day, never by full blocks.
>
> So is there a storage problem ?
>
>
>
> Frederic Esnault
>
>