Re: database filesystem duality

carmen Tue, 20 Jul 2010 11:56:42 -0700

it began with camping, Matju had been using Ruby in Gridflow since ages before, 
so he pointed me to poignant guide and i noticed the announcement on redhanded 
and tried out


> store them in some sort of indexy thing, where we could use filesystem locks
> to keep from writing over eachother, and garbage collect / compress every
> now and then. That could work really well, and could be nice pure ruby.
> Mmmm. Is this crazy? Am I a nut for thinking that a simple multiprocess safe
> key/value store would actually be really easy to do? I've played with the
> filesystem as a storage medium a fair bit.. it seems like it should be
> almost trivial! Maybe I should make this right now!


eventualy i had wiped out all the varying parts with replacements, but it is 
important to remember Camping provided the scaffolding to get off the ground

went 1.9x Ruby because of proper lexical scoping of blocks (mainly) + fast

but that broke Markaby..and there was all this  code in there with Builder and 
such and god knows what i was suposed to fix (multipled by metaprogramming 
tweak-ness)

Ruby's Hash/Array connstructors obviated a custom template-language parser or 
meta-methodery hacks (magic?)
 http://element.rubyforge.org/git?p=element.git;a=blob;f=ruby/H.rb


so sqlite databases being locked by other processes, mysql servers that werent 
running or had a wrong password (or hardpowerd and required myisamcks). then 
redland's SWIG wrappers segfaulting ruby with memory errors


back to FS. i guess "E" class is sort of a "jquery for a filesystem" sitting at 
convergence of HTTP URIs, and filesystem paths

so i want to read today's email (delivered by getmail, with a 1 line procmailrc 
rule to put into dirs by date, and cloud-persisted across phones/netbooks with 
rsync/ceph/nfs)

so GET /mail, it goes to thiS:

fn '/mail/GET',->e,r{[303,{Location: '/m/'+(Time.now.strftime 
'%Y/%m/%d')+'/*?'+(r ?r['QUERY_STRING']:'')}]}

which constructs today's path, and redirects:

 GET /m/2010/07/20/*?view=threads

there are no 'routes' just a mapping from URI to resourceSet. which includes 
globbing, 'fragments' of documents (after #), and depth-first traversal (for 
pagination of large quantities of stuff, or sorted values)


so that glob all todays mails, extracts the triples and creates a (Hash) model 
alive for the request. views are specified in QS,
 so ?view=threads, you get a basic overview:

http://i574.photobucket.com/albums/ss187/ix9/hyper/2010-03-27-051943_1280x800_scrot.png

triple sources are functions that yield 3 values, and exist for most of the 
comon things. 


so your message, [email protected]

has an ID, and URI and the Filesystem cant just store this as is, unless you 
want 3 million files in a dir. so using sometihng git-like:

irb(main):005:0> 
E('[email protected]').d
=> 
"/var/E/ee/dc/QUFOTGtUaW10VlYwQzM5a3lQSllKLXZlMXVYSEdSSDFUc0M2eDNROEctSXBCQG1haWwuZ21haWwuY29t"

does its best to use a path similar to the URI, to not nuke everything outright

irb(main):006:0> E('http://camping.org').d
=> "/var/http://camping.org";


in addition to these paths, theres a path of metadata _about_ this path

irb(main):007:0> E('http://camping.org').u
=> #<E:0x000000015ebfd8 @uri="/http://camping.org/<>", @graph=nil>

so , in this way, you can create indexed properties:

eg, mail references are ugly index paths like:

/usr/src/index/<>/http:/rdfs.org/sioc/ns#reference/<>/E/e0/43/MTI3OTYyODYzMi4zMjcxLjEwLmNhbWVsQG1pZGdhcmQ=


so when i request a message, provide a query in the QS:

 fn 'data/thread',->d,_,m{d.walk SIOC+'reference',m}


this walks those index paths and constructs the entire thread

 def walk p,m={},v={}
    m.merge! memoModel
    v[uri]=true
    ((attr p)||[]).concat(((E p).po self)||[]).map{|r|
      r.E.walk p,m,v if !v[r.uri]}
    m
 end

..theres functions to go to/from memory models, lookup FS indexes, and so on, 
in probably camping-style (ive been told my code is 'obfuscated' anwyas)

  some other doc @ http://blog.whats-your.name/public/carmen.html


creating a 265 message thread including finding all the messages and rendering 
a view takes about a second on my laptop, which is fine for my needs. you could 
use the resourceSet X mtimes as a cache key

since all data is (convertable to/from) RDF you could go crazy with 4store and 
SPARQL if you needed more insane indexing options

so yeah, let me know what you come up with, im interested in checking it out


if a darn OS booted, you have a FS.., 
_______________________________________________
Camping-list mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/camping-list

Re: database filesystem duality

Reply via email to