Hi On Thu, 2009-10-29 at 15:39 +0100, Lodewijk Bogaards wrote: > Hi Asger, > > On 10/26/09 12:46 PM, "Asger Askov Blekinge" <[email protected]> > wrote: > > > I think I can follow the design you propose, even though I am not really > > into the database code part of Fedora. > > To retell it, so you can check my understanding: There is some config in > > DefaultDOManager.dbspec that determines which part of a fedora object is > > cached in the database. You amend that config, so that the user can > > provide a config file, so that additional content is cached. > > That's all there is, right? > > Yes. The user can then, in that provided config file, select a part of the > foxml by xPath expression and map it on a column of a table. > > > I am not against the idea, but I consider it a stopgap measure. > > The problem you outline is that actually querying the foxml files is to > > slow in the fedora design. You want a faster way to access the contents, > > and thus you propose to store it in a database. So far I agree, the > > fedora backend is not fast for small queries (as the entire object is > > parsed for any query), and some indexed frontend is sometimes required. > > Now, I do not know the performance of the various open source xml > > databases, but it sounds radically simpler to store/backup the foxml > > objects in an xml database, than writing complex expressions for mapping > > selected parts to a relational database. > > Xml database would be great indeed. But I disagree that this is radically > simpler in that sense that this was only 2 days of work (being pragmatic > here). I doubt that having an XML database implementation would be 2 days of > work. I suspect that a proper brainstorm session about how this should be > handled and how this fits the Fedora architecture would probably already > last two days. Maybe it would be possible, but then, I have to admit, I am > not that familiar with XML databases and still doubt that it can really > outperform a RDBMS, especially when you take only the parts of the FOXML > that you need and add an index those parts you want to select on. This is, I > reckon, hard to outperform by any other system than RDBMS.
Okay, xmldatabases are: conceptually simpler, but not simpler to get to work. Simpler to work with once the functionality was there (as no config file to pick out the relevant bits, all bits are in the database). Probably slower. > > > Having such an database, which could either be a cache of the foxml > > files, or the primary store for the foxml files would allow fast queries > > about properties on the objects or datastreams. This should probably be > > the design we work towards, but your idea could easily serve as a > > current way of doing database integration while we have no xml database. > > I agree. This might help a few people, as it did us, and it provides > incentive to see what can be done about the querying part of Fedora. Maybe > this shows an XML database would be the way to go for Fedora. > > This is not an idea anymore though. It works, it's solid and does not > interfere with Fedora in any way. And thanks for that contribution. I feel that it should go into Fedora. It is useful functionality with no downside I can think of. Could you create an issue for it in http://fedoracommons.org/jira/secure/BrowseProject.jspa and attach the files. That is the way to proceed. Regards > > Kind regards, > > Lodewijk > > > > Regards > > > > > > On Fri, 2009-10-16 at 20:32 +0200, Lodewijk Bogaards wrote: > >> Hi, > >> > >> For speed reasons we wanted a database that contains the same information > >> Fedora contains. I have emailed before (subject: gDatabase) that I figured > >> that Fedora already has a feature to do so, for the dublin core and some > >> other digital object properties, and that with some work Fedora can be made > >> to keep the database synchronized for its user-made XML data as well. > >> Currently I have this working within Fedora. > >> > >> I am sending you the source which was made on top of the Fedora 3.2.1 > >> source > >> release, an example foxml and database schema. > >> > >> The idea is that DefaultDOManager.dbspec is extended with this line: > >> > >> <include href="server/config/custom-db.xml" /> > >> > >> Then in that file under the Fedora home dir you can put your own database > >> schema, which is an extension of the database schema used in the dbspec > >> file. > >> > >> Columns get their data by value getters. Currently I have implemented one > >> value getter that uses an xPath query to get a value. This value getting > >> code does not necessarily run for all digital objects. It is possible to > >> choose a content model and/or datastream id that must be present for the > >> tables to be updated by the digital object. Here is an example of table > >> with > >> a column: > >> > >> <table name="easyFiles" contentModel="info:fedora/fedora-system:easyfile" > >> datastreamId="file"> > >> > >> <column name="filename" type="varchar(256)" notNull="true" index="filename" > >> default="-"> > >> <value delimiterType="row" delimiter=","> > >> <valuegetter type="xPath" xPath="//easyfile:filename" > >> nsPrefix="easyfile" nsUri="http://easy.dans.knaw.nl/files" > >> delimiterType="normal" delimiter="," /> > >> </value> > >> </column> > >> > >> An xPath query may return several values. For that two kinds of delimiters > >> may be used. A row delimiter (meaning several rows are created for each > >> value) and a normal delimiter (meaning a string value is inserted after > >> every row). Also a values tag may contain several valuegetter tags, which > >> can be delimited in the same two ways. > >> If two columns return two rows those two rows are added together as one > >> row. > >> Also a defaultvalue for a second valuegetter may be used. Thus creating the > >> possibility of composing rows almost any way one wants based on Fedora > >> data. > >> > >> A pid must always be present, but does not need to be the primary key > >> (primaryKey attribute of the table). It is thus up to the user how the data > >> is composed into tables, and if the user makes a mistake an SQLException is > >> thrown and the digital object is thus not ingested/updated, thus forming > >> another kind of safety net that does not necessarily work so well if the > >> database would be filled from within the users application. > >> > >> With this simple system it is possible to do almost any kind of database > >> synchronization based on Fedora data. I have seen many projects based on > >> Fedora that employ a database alongside Fedora in order to speed up the > >> querying process. I therefore think this might be useful for many. > >> > >> Of course the search interface that comes with Fedora may also be extended > >> to make use of this new feature, but since that is not a need for our > >> project at the moment I have not taken the time to do so. > >> > >> I would be very pleased if this could become part of subsequent Fedora > >> releases. Hopefully others think so too. > >> > >> Kind regards, > >> > >> Lodewijk Bogaards > >> > >> > > > ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ Fedora-commons-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
