Hi Frédéric,

Now this makes a whole lot more sense to me :-)

The first algorithm creates a number of nodes and properties in transient
space, which is currently kept in memory. The higher the number of nodes,
the higher the memory consumption. The second algorithm just creates a
single node and its properties in the transient space before saving them
away and releasing used memory (or at least making it available for GC).

This is currently an issue of the implementation of the transient space.
Stefan might have more elaborate details. For the time being, you should
probably go with the "node by node save" algorithm.

Hope this helps.

Regards
Felix

PS: In your initial post you seem to have switched algorithm descriptions
which caused some confusion :-)

On 6/20/07, Frédéric Esnault <[EMAIL PROTECTED]> wrote:

Of course, here is the repository config :

//////////////////////////////////////////////////
// START REPOSITORY.XML//
//////////////////////////////////////////////////

<?xml version="1.0"?>
<!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD
Jackrabbit 1.2//EN"
                                "
http://jackrabbit.apache.org/dtd/repository-1.2.dtd";>

<Repository>
        <FileSystem class="
org.apache.jackrabbit.core.fs.local.LocalFileSystem">
                <param name="path" value="${rep.home}/repository"/>
        </FileSystem>

        <Security appName="Jackrabbit">

                <!--
                        access manager:
                        class: FQN of class implementing the AccessManager
interface
                -->
                <AccessManager class="
org.apache.jackrabbit.core.security.SimpleAccessManager">
                        <!-- <param name="config" 
value="${rep.home}/access.xml"/>
-->
                </AccessManager>

                <LoginModule class="
org.apache.jackrabbit.core.security.SimpleLoginModule">
                        <!-- anonymous user name ('anonymous' is the
default value) -->
                        <param name="anonymousId" value="anonymous"/>
                        <!--
                                default user name to be used instead of
the anonymous user
                                when no login credentials are provided
(unset by default)
                        -->
                        <!-- <param name="defaultUserId"
value="superuser"/> -->
                </LoginModule>

        </Security>

        <!--
                location of workspaces root directory and name of default
workspace
        -->
        <Workspaces rootPath="${rep.home}/workspaces"
defaultWorkspace="default"/>

        <!--
                workspace configuration template:
                used to create the initial workspace if there's no
workspace yet
        -->
        <Workspace name="${wsp.name}">

                <!--
                        virtual file system of the workspace:
                        class: FQN of class implementing the FileSystem
interface
                -->
                <FileSystem class="
org.apache.jackrabbit.core.fs.local.LocalFileSystem">
                        <param name="path" value="${wsp.home}"/>
                </FileSystem>

                <!--
                        persistence manager of the workspace:
                        class: FQN of class implementing the
PersistenceManager interface
                -->
                <PersistenceManager class="
org.apache.jackrabbit.core.persistence.db.SimpleDbPersistenceManager">
                        <param name="driver" value="com.mysql.jdbc.Driver
"/>
                        <param name="url"
value="jdbc:mysql:///testJack?autoReconnect=true"/>
                        <param name="schema" value="mysql"/>
                        <param name="schemaObjectPrefix" value="${wsp.name
}_"/>
                        <param name="externalBLOBs" value="false"/>
                        <param name="user" value="root"/>
                        <param name="password" value="password"/>
                </PersistenceManager>

                <!--
                        Search index and the file system it uses.
                        class: FQN of class implementing the QueryHandler
interface
                -->
                <SearchIndex class="
org.apache.jackrabbit.core.query.lucene.SearchIndex">
                        <param name="path" value="${wsp.home}/index"/>
                </SearchIndex>

        </Workspace>

        <!--
                Configures the versioning
        -->
        <Versioning rootPath="${rep.home}/version">

                <!--
                        Configures the filesystem to use for versioning
for the respective
                        persistence manager
                -->
                <FileSystem class="
org.apache.jackrabbit.core.fs.local.LocalFileSystem">
                        <param name="path" value="${rep.home}/version"/>
                </FileSystem>

                <!--
                        Configures the persistence manager to be used for
persisting version state.
                        Please note that the current versioning
implementation is based on
                        a 'normal' persistence manager, but this could
change in future
                        implementations.
                -->

                <PersistenceManager class="
org.apache.jackrabbit.core.persistence.db.SimpleDbPersistenceManager">
                        <param name="driver" value="com.mysql.jdbc.Driver
"/>
                        <param name="url"
value="jdbc:mysql:///testJackVer?autoReconnect=true"/>
                        <param name="schema" value="mysql"/>
                        <param name="schemaObjectPrefix"
value="version_"/>
                        <param name="externalBLOBs" value="false"/>
                        <param name="user" value="root"/>
                        <param name="password" value="password"/>
                </PersistenceManager>
        </Versioning>

        <!--
                Search index for content that is shared repository wide
                (/jcr:system tree, contains mainly versions)
        -->
        <SearchIndex class="
org.apache.jackrabbit.core.query.lucene.SearchIndex">
                <param name="path" value="${rep.home}/repository/index"/>
        </SearchIndex>

</Repository>

///////////////////////////////////////////////
// END  REPOSITORY.XML//
//////////////////////////////////////////////

And the code doing the creation, I give you the two algortihm
implementations :


/////////////////////////////////////////////////////////////////
// FIRST ALGORITHM : Node by Node//
////////////////////////////////////////////////////////////////

Node contractors = (Node) session.getItem("/lgw:root/lgw:contractors");
int count = number_of_nodes; // whatever,  put the number of nodes to
create
for (int i = 0; i < count; i++) {
        Node contractor = contractors.addNode("lgw:contractor");
        initializeContractor(session, contractor);
        created++;
}
session.save();

////////////////////////////////////////////////
// END FIRST ALGORITHM //
////////////////////////////////////////////////

/////////////////////////////////////////////////////////////////////
// SECOND ALGORITHM : Node by Node//
/////////////////////////////////////////////////////////////////////

Node contractors = (Node) session.getItem("/lgw:root/lgw:contractors");
int count = number_of_nodes; // whatever,  put the number of nodes to
create
for (int i = 0; i < count; i++) {
        Node contractor = contractors.addNode("lgw:contractor");
        initializeContractor(session, contractor);
        created++;
        session.save();
}

/////////////////////////////////////////////////////
// END SECOND ALGORITHM //
////////////////////////////////////////////////////



Frédéric Esnault - Ingénieur R&D


-----Message d'origine-----
De: Thomas Mueller [mailto:[EMAIL PROTECTED]
Envoyé: mercredi 20 juin 2007 09:51
À: [email protected]
Objet: Re: atomic vs group node creation/storage

Hi,

Could you send the configuration (repository.xml file), and the code
if possible (so I don't have to write it again). Just recently I
though I saw a similar problem, but I am not sure if it's related.

Thanks,
Thomas


On 6/20/07, Frédéric Esnault <[EMAIL PROTECTED]> wrote:
> Hello there !
>
>
>
> It seems to me that there is a storage problem, when you create a lot of
nodes, one by one, using this algorithm :
>
> 1.      for each node to create
>
>         a.      create node
>         b.      fill node properties/child nodes
>         c.      save session
>
> 2.      end for
>
>
>
> The default_node and default_prop tables number of rows (and size)
increases very fast, and in an unacceptable way.
>
> I had a 35 million default_node table after inserting like this 27 000
nodes in a repository.
>
>
>
> Then I used the other algorithm :
>
> 1.      for each node to create
>
>         a.      create node
>         b.      fill node properties/child nodes
>
> 2.      end for
> 3.      save session
>
>
>
> And this gives a much better situation (currently I have a 36 000
content repository, and my tables are correct - 60 000 rows for node table,
>
> 576 000 rows for properties).
>
>
>
> The problem here is that in a production environment, users are going to
create their nodes one by one, day after day, never by full blocks.
>
> So is there a storage problem ?
>
>
>
> Frederic Esnault
>
>

Reply via email to