Hi Jun,

you can control the number of files the users will create by using the new 
document wizard [1]. With this wizard the you decide in which folder the new 
document will be created based on some criteria by the user (like the date, 
business unit, file type). If you split the files in subdirectories of the 
current year/month/day you will probably get less files in one folder. If 
that's not enough there may be a different top level split like on document 
type or business unit.

[1] http://www.hippocms.org/display/CMS/New+document+wizard

Regards,

Jasha


-----Original Message-----
From: [EMAIL PROTECTED] on behalf of [EMAIL PROTECTED]
Sent: Thu 11/13/2008 02:24
To: [email protected]
Subject: RE: [HippoCMS-dev] too many files in one directory & performancetuning
 
Hey guys,
Thank you so much your discussion on this. 

I agree with Bartosz that we probably don't want too many files under
one directory for editing purposes. Although in some cases, it will be
difficult to control the number of files users will create under a
directory; or as Reinier mentioned, a process drops the files to
repository that only frontend looks it up. 

We've decided to split files into subdirectories, so we didn't pursue
further with the performance tuning. 
Here are some of our findings in case anyone is interested though:
1.      We are running Hippo CMS with oracle backend on redhat. We start
to get long waiting time in CMS's explorer to open a directory with
about 1000 files (probably ~1min).
2.      It took about 2 hours to upload 30000 small xml files to the
repository. 
3.      To clean out a big directory, the easiest way is to filesync it
against an empty local directory. Deleting files using a webdav client
like Konqueror doesn't seem to clear out the version_content table.

Regards,
Jun


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bart van
der Schans
Sent: Tuesday, November 11, 2008 1:46 AM
To: Hippo CMS development public mailinglist
Subject: Re: [HippoCMS-dev] too many files in one directory &
performancetuning

On 11-11-2008 10:35, Reinier van den Born wrote:
> Hmmmm,
>
> Bartosz Oudekerk wrote:
>  > [EMAIL PROTECTED] wrote:
>  >> Hi everyone,
>  >>
>  >> I found this article by Jasha about importing large number of
files to
>  >> hippo cms.
>  >>
http://blogs.hippo.nl/jasha/2008/07/importing_lots_of_data_into_hi.html
>  >> In the last paragraph, Jasha mentioned that Slide doesn't like too
many
>  >> files in one directory.
>  >> Does anyone know what this threshold is? What's your experience
with
>  >> large amount of files in one directory with hippo?
>  >
>  > I don't think there's a fixed threshold, it's simply a question of
the
>  > more you put in, the slower it will get.
>  >
>  >> We tried around 10000 files in one dir, which took Hippo
Repository
>  >> almost 5 mins for a complete file listing (with default flatfile
>  >> backend). So here is the second question: what can we do to tune
the
>  >> performance? I am going to try it with db backend, as well as
increasing
>  >> the memory. What are the other things that I can try?
>  >
>  > First of all, a 1000 files in one directory, will be unmanageable
for
>  > your editors, try finding one specific file in such a listing. And
even
>  > if you could tune it to be faster, what would be acceptable? twice
as
>  > fast? You'll get much more performance gain, by simply putting less
>  > files in a folder.
>
> Who sais editors need to manage those files?
> In one of my Hippo installations files are put in the repository by an
> external application.
> Nobody but the frontend looks at them.
> It would be very inconvenient to have to spread them over
> subdirectories, to say the least.
> Luckily for now the number of files is in the hundreds, so I need not
> worry. Yet.
>
> Question is, of course, what is slides problem?
> Getting a folder listing should in principal be linear in the number
of
> entries.
> If it sorts the entries this would get worse, but sorting of 10000
> entries should not be a real problem nowadays.
> Is this an intrinsic problem of WebDav, or is it the implementation in
> slide?
>
> Would a workaround using a DASL help? That gets entries from an index,
> which, I assume, is much faster.
> For a frontend that could easily be implemented.

Using a DASL can help. Getting the listing will indeed get too slow to 
be unusable of you add tens of thousands of files in a folder.

But keep in mind that if you use any other (webdav)  application with 
the repository then your home build one, chances are big that it will do

a document listing.

How many documents you exactly can have in one folder also depends on 
your backend. For example filesystem performance on windows is terrible 
if you have a lot of documents in one folder. Database backends don't 
have this extra overhead.

Imho you'll just have to run some tests to find out what is still usable

for your application.

Regards,
Bart

-- 
Hippo B.V.  -  Amsterdam
Oosteinde 11, 1017 WT, Amsterdam, +31(0)20-5224466

Hippo USA Inc.  -  San Francisco
101 H Street, Suite Q, Petaluma CA, 94952-3329, +1 (707) 773-4646
-----------------------------------------------------------------
http://www.onehippo.com   -  [EMAIL PROTECTED]
-----------------------------------------------------------------
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html


********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html


********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Reply via email to