One of the customers I work with follows the same pattern Scott 
outlined--symlink each of the top level directories to their own mount points 
(this is for a ~20 million object repo). For that client, we actually provided 
a custom pathid implementation, but essentially, it's the same idea.

On Apr 19, 2013, at 12:47 AM, Scott Prater <[email protected]> wrote:

> We're using the approach you describe here at UW Madison.  Basically, we 
> have a number of mounted file systems, with a structure that looks like 
> this:
> 
> /fedora/datamp/data01
> /fedora/datamp/data02
> /fedora/datamp/data03...
> 
> Then we have a directories with symbolic links into the mounted file 
> systems:
> 
> /fedora/data/objects/a -> /fedora/datamp/data01/objects/a
> /fedora/data/datastreams/a -> /fedora/datamp/data01/datastreams/a
> /fedora/data/objects/b -> /fedora/datamp/data02/objects/b
> /fedora/data/datastreams/b -> /fedora/datamp/data02/datastreams/b
> /fedora/data/objects/c -> /fedora/datamp/data03/objects/c
> /fedora/data/datastreams/c -> /fedora/datamp/data03/datastreams/c
> 
> And in our akubra.xml file, we have the object and datastream roots and 
> hash paths configured:
> 
> <bean name="fsObjectStore" class="org.akubraproject.fs.FSBlobStore" 
> singleton="true"> 
> 
> 
>     <constructor-arg value="urn:example.org:fsObjectStore"/> 
> 
> 
>     <constructor-arg value="/fedora/data/objects"/> 
> 
>           </bean>
> 
> <bean name="fsObjectStoreMapper" 
> class="org.fcrepo.server.storage.lowlevel.akubra.HashPathIdMapper" 
> singleton="true">
>     <constructor-arg value="#/##/##"/>
> </bean>
> 
> <bean name="fsDatastreamStore" class="org.akubraproject.fs.FSBlobStore" 
> singleton="true">
>     <constructor-arg value="urn:example.org:fsDatastreamStore"/>
>     <constructor-arg value="/fedora/data/datastreams"/>
> </bean>
> 
> <bean name="fsDatastreamStoreMapper" 
> class="org.fcrepo.server.storage.lowlevel.akubra.HashPathIdMapper" 
> singleton="true">
>     <constructor-arg value="#/##/##"/>
> </bean>
> 
> We have four mount point on our production machine, and four top-level 
> directories allocated per mount point.
> 
> One of the beauties of hashed directory and file paths is that all the 
> file systems should fill up evenly: the hash ensures that objects are 
> distributed across all the file systems.
> 
> /fedora/datamp/data01:  68%
> /fedora/datamp/data02:  68%
> /fedora/datamp/data03:  68%
> /fedora/datamp/data04:  69%
> 
> -- Scott
> 
> On 04/18/2013 10:42 AM, Gary Phillips wrote:
>> Hello,
>> 
>> I've spent some time looking at akubra and the HashPathIdMapper to get a
>> feel for how we would distribute our datastreamStore over multiple file
>> systems.  The default configuration (##) creates 256 potential
>> directories.  Changing that to # (or something like #/##) would give us
>> 16 top level sub-directories, which we could work with.  However, there
>> is another issue, in that I don't see how we can easily predict how each
>> of those directories would grow, if I am understanding how the files are
>> distributed across the directories.
>> 
>> I assume one solution might involve sym linking the top level
>> directories over to a few directories that each correspond to a mount point.
>> 
>> Have other people tackled this particular problem (and what solutions
>> did you come up with) or found another way around distributing large
>> amounts of Fedora data over multiple mount points?  Thanks in advance.
>> 
>> 
>> 


------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to