Clay Barnes wrote: >I have been thinking lately that though we certainly need to do >cleanup of the various bugs and such relating to the storage layer, >perhaps now is a good time to review and discuss the plans for the >semantic layer so that any outstanding concerns can be thouroughly >discussed and resolved before we get close to time to start with actual >work on that portion of Reiser4. Remember, we have a real chance at >being the first semantic storage system with a significant user base, >and that places a terrible pressure for perfection on us (and I use 'us' >loosely, since I don't have nearly the code skills in C needed to dare >touch source in non-trivial ways---I hope however that between my CS and >Linguistics degrees, I'll be able to at least contribute some ideas). >If we're first out of the gate, but we have some significant flaw in >design, we're deeply endangered. People will wait for our correction of >it (which may be impossible if it's a fundamental or debated problem), >or for another system that has less critical flaws. > >These are my cricial concerns. I know some of these have been addressed >before, but this keeps anything from being skipped under the assumption >that they've already been resolved. >1) Scope > a) Should the semantic content of files be purely user-defined? > > Yes.
> b) Should the full extricable content of a file be read into semantic > space? > > If the user wants that. The user should configure his auto-indexer that he has selected to work as he desires and to be applied to those files he desires to. By default there should be a delay (such as, until the repacker runs at night) in indexing to ensure that we only index that which will be around for a while. This is for performance reasons. > c) If so, should there be a seperation of the two forms of content? > d) How would we address the two in a simple, user-transparent way? >2) Storage > a) How do we store the semantic data so it is very rapidly accessable > and easy to update, especially if we decide to use the full textual > contentent of parsabe file? >3) Changes > a) Should we instantly index at full capacity changes, or should we > queue files needing re-indexing for a very low resource daemon to > process? > b) If we use the latter, how do we avoid disagreement between newly > changed/created files and the semanic actions regarding them while the > daemon works? > c) If we use the former, how do we mimize the impact of this sudden > spike in resources to the user without risking letting the index and > data get out of sync. >4) Portability > a) Should we provide a way to export semantic data when archiving to > formats which standards prevent from using Reiser4 (such as DVD)? > b) How do we handle exports from a partial filesystem, if we decide to > provide export capabilities? > c) Should we provide the ability to import from compeating semantic > systems? Export? >5) Code revisions > a) With emerging formats, updates to formats and the numerous ways > file standard change, how do we provide easy addition and updates to > the filters we use to index files? > b) Should we provide a simple user-editable means to change/augment > filters? > c) Can these both be resolved by placing the actual filters in > userspace/filesystemspace instead of into the code? > >I hope I haven't overstepped my relevance, and my apologies if I have, >but I just wanted to raise some concerns while they are easy to >address---before the code is started. > >Further disclaimer: I'm at work, so I may have been a little hasty >writing this (though technically, I'm *supposed* to be reasearching >semantic storage systems for our documents, so I'm not really goofing >off), so there may be errors from my minimal review/revision. > >Thanks, >Clay > > > > >