RE: atomic vs group node creation/storage

Frédéric Esnault Wed, 20 Jun 2007 01:34:15 -0700

Of course, here is the repository config :

//////////////////////////////////////////////////
// START REPOSITORY.XML//
//////////////////////////////////////////////////


<?xml version="1.0"?>
<!DOCTYPE Repository PUBLIC "-//The Apache Software Foundation//DTD Jackrabbit 
1.2//EN"
                                
"http://jackrabbit.apache.org/dtd/repository-1.2.dtd";>

<Repository>
        <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
                <param name="path" value="${rep.home}/repository"/>
        </FileSystem>

        <Security appName="Jackrabbit">

                <!--
                        access manager:
                        class: FQN of class implementing the AccessManager 
interface
                -->
                <AccessManager 
class="org.apache.jackrabbit.core.security.SimpleAccessManager">
                        <!-- <param name="config" 
value="${rep.home}/access.xml"/> -->
                </AccessManager>

                <LoginModule 
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
                        <!-- anonymous user name ('anonymous' is the default 
value) -->
                        <param name="anonymousId" value="anonymous"/>
                        <!--
                                default user name to be used instead of the 
anonymous user
                                when no login credentials are provided (unset 
by default)
                        -->
                        <!-- <param name="defaultUserId" value="superuser"/> -->
                </LoginModule>

        </Security>

        <!--
                location of workspaces root directory and name of default 
workspace
        -->
        <Workspaces rootPath="${rep.home}/workspaces" 
defaultWorkspace="default"/>

        <!--
                workspace configuration template:
                used to create the initial workspace if there's no workspace yet
        -->
        <Workspace name="${wsp.name}">

                <!--
                        virtual file system of the workspace:
                        class: FQN of class implementing the FileSystem 
interface
                -->
                <FileSystem 
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
                        <param name="path" value="${wsp.home}"/>
                </FileSystem>

                <!--
                        persistence manager of the workspace:
                        class: FQN of class implementing the PersistenceManager 
interface
                -->
                <PersistenceManager 
class="org.apache.jackrabbit.core.persistence.db.SimpleDbPersistenceManager">
                        <param name="driver" value="com.mysql.jdbc.Driver"/>
                        <param name="url" 
value="jdbc:mysql:///testJack?autoReconnect=true"/>
                        <param name="schema" value="mysql"/>
                        <param name="schemaObjectPrefix" value="${wsp.name}_"/>
                        <param name="externalBLOBs" value="false"/>
                        <param name="user" value="root"/>
                        <param name="password" value="password"/>
                </PersistenceManager>

                <!--
                        Search index and the file system it uses.
                        class: FQN of class implementing the QueryHandler 
interface
                -->
                <SearchIndex 
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
                        <param name="path" value="${wsp.home}/index"/>
                </SearchIndex>

        </Workspace>

        <!--
                Configures the versioning
        -->
        <Versioning rootPath="${rep.home}/version">

                <!--
                        Configures the filesystem to use for versioning for the 
respective
                        persistence manager
                -->
                <FileSystem 
class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
                        <param name="path" value="${rep.home}/version"/>
                </FileSystem>

                <!--
                        Configures the persistence manager to be used for 
persisting version state.
                        Please note that the current versioning implementation 
is based on
                        a 'normal' persistence manager, but this could change 
in future
                        implementations.
                -->

                <PersistenceManager 
class="org.apache.jackrabbit.core.persistence.db.SimpleDbPersistenceManager">
                        <param name="driver" value="com.mysql.jdbc.Driver"/>
                        <param name="url" 
value="jdbc:mysql:///testJackVer?autoReconnect=true"/>
                        <param name="schema" value="mysql"/>
                        <param name="schemaObjectPrefix" value="version_"/>
                        <param name="externalBLOBs" value="false"/>
                        <param name="user" value="root"/>
                        <param name="password" value="password"/>
                </PersistenceManager>
        </Versioning>

        <!--
                Search index for content that is shared repository wide
                (/jcr:system tree, contains mainly versions)
        -->
        <SearchIndex 
class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
                <param name="path" value="${rep.home}/repository/index"/>
        </SearchIndex>

</Repository>

///////////////////////////////////////////////
// END  REPOSITORY.XML//
//////////////////////////////////////////////

And the code doing the creation, I give you the two algortihm implementations :


/////////////////////////////////////////////////////////////////
// FIRST ALGORITHM : Node by Node//
////////////////////////////////////////////////////////////////

Node contractors = (Node) session.getItem("/lgw:root/lgw:contractors");
int count = number_of_nodes; // whatever,  put the number of nodes to create
for (int i = 0; i < count; i++) {
        Node contractor = contractors.addNode("lgw:contractor");
        initializeContractor(session, contractor);
        created++;
}
session.save();

////////////////////////////////////////////////
// END FIRST ALGORITHM //
////////////////////////////////////////////////

/////////////////////////////////////////////////////////////////////
// SECOND ALGORITHM : Node by Node//
/////////////////////////////////////////////////////////////////////

Node contractors = (Node) session.getItem("/lgw:root/lgw:contractors");
int count = number_of_nodes; // whatever,  put the number of nodes to create
for (int i = 0; i < count; i++) {
        Node contractor = contractors.addNode("lgw:contractor");
        initializeContractor(session, contractor);
        created++;
        session.save();
}

/////////////////////////////////////////////////////
// END SECOND ALGORITHM //
////////////////////////////////////////////////////



Frédéric Esnault - Ingénieur R&D


-----Message d'origine-----
De : Thomas Mueller [mailto:[EMAIL PROTECTED] 
Envoyé : mercredi 20 juin 2007 09:51
À : [email protected]
Objet : Re: atomic vs group node creation/storage

Hi,

Could you send the configuration (repository.xml file), and the code
if possible (so I don't have to write it again). Just recently I
though I saw a similar problem, but I am not sure if it's related.

Thanks,
Thomas


On 6/20/07, Frédéric Esnault <[EMAIL PROTECTED]> wrote:
> Hello there !
>
>
>
> It seems to me that there is a storage problem, when you create a lot of 
> nodes, one by one, using this algorithm :
>
> 1.      for each node to create
>
>         a.      create node
>         b.      fill node properties/child nodes
>         c.      save session
>
> 2.      end for
>
>
>
> The default_node and default_prop tables number of rows (and size) increases 
> very fast, and in an unacceptable way.
>
> I had a 35 million default_node table after inserting like this 27 000 nodes 
> in a repository.
>
>
>
> Then I used the other algorithm :
>
> 1.      for each node to create
>
>         a.      create node
>         b.      fill node properties/child nodes
>
> 2.      end for
> 3.      save session
>
>
>
> And this gives a much better situation (currently I have a 36 000 content 
> repository, and my tables are correct - 60 000 rows for node table,
>
> 576 000 rows for properties).
>
>
>
> The problem here is that in a production environment, users are going to 
> create their nodes one by one, day after day, never by full blocks.
>
> So is there a storage problem ?
>
>
>
> Frederic Esnault
>
>

RE: atomic vs group node creation/storage

Reply via email to