Re: [Moblin Dev] Meta data storage/management

Øyvind Kolås Fri, 05 Sep 2008 04:25:01 -0700

On Wed, Sep 03, 2008 at 05:41:45PM -0700, Jimmy Huang wrote:
> I completed a design document draft of Content Manager for Moblin 2.0,
> which is part of the Application and Framework infrastructure.  It is
> based on Meta Tracker (Trackerd) and SQLite.
> 
> Please review the document and provide feedbacks, or comments are
> welcome.  Thanks.

I've been looking into meta data management for moblin2 myself this week
and sum up thoughts related to my findings in this mail.

Good meta data APIs will be instrumental in being able to create a good
innovative user interface. I am basing my assertions on different frameworks
on my own experiments in http://pippin.gimp.org/stuff/.

RDF is an extendable abstract technology similar to XML, but it is not XML
meta data is also not only about search but can also be storage of information
like the change in brightness or the crop desired on a photo.

We can use ontologies (a vocabulary for describing different types of data like
files, images, videos and applications) specified by Xesam as well as use other
ontologies originating with in the semantic web communities. We will probably
also invent meta data of our own, like the brightness and contrast as well as
cropping and sharpening applied to an image in the photo manager.

We need to do more than feed meta data in as well as query what is there, we
also need a DOM-like access to traverse the arcs of the graph when creating
visualizations and user interfaces.

Meta data is not only about search, full text indexing is a separate
issue and should be stored in a separate database. We might be able to do
without such functionality anyways.

Potential RDF storage frameworks
================================

What follows is a review of the candidate libraries I've looked at in greatest
depth (I've gone through most C based, actively developed or not RDF libraries
I've found with freshmeat and google). This list are the three I have ended up
finding most relevant and studying in deepest detail.

librdf (redland)
----------------
link: http://librdf.org/
pros: - well tested and documented, uses RDF natively.
- DOM like API to navigate the graph
- abstraction glue layer
- multiple backends, could be extended with a mobile dedicated
backend like TT.
- supports multiple and pluggable query languages, allows full reuse
of existing literature applicable to development using RDF from the
semantic web domain.
- works with multiple clients using libsql and some other backends, this
means no marshalling of data over dbus but direct access from all clients.
- written in C
cons: - large
- verbose API (can be wrapped in macros, or an abstraction layer created.)
- The library does not do file locking on berkeley DB files at least.
- doesn't support multiple clients concurrent access/syncinc with the
berkeleydb backend (can be added with transactions.) But a native
quarked string/hashtable based approach similar to tt could be a better
long term plan for mobile memory footprint/performance optimization.

TT (from stuff)
---------------
link: http://pippin.gimp.org/tmp/tt.h.txt
http://pippin.gimp.org/tmp/tt.c.txt
pros: - developed in-house
- we have plans for how to make it efficiently shared between processes
using mmap and per processes tiny indexes for efficient queries.
- small DOM like API, not as extensive as librdf though.
- fast since it works with an in memory index, at a later stage the
actual strings could be swapped out in a shared mmaped string
storage between client processes.
cons: - Experimental small minimal developer base
- No developer community.
- few features, needs development.
- will not work correctly for RDF when there are multiple objects with
the same relation (e.g. multiple dc:contributor relations).
- very simplistic query model.

Tracker:
--------
link: http://www.gnome.org/projects/tracker/
pros: - used by others, improved by nokia
- responds in real time to filesystem changes.
- has many extractors.
- could potentially have it's data store replaced with librdf, which
could allow clients direct access to the nicer APIs there without
going over dbus.
cons: - does more than what we need, and doesn't deal with RDF directly,
- own high level abstractions for types.
- lack of high level DOM API.

A plan
======

- Create a separate double bookkeeping librdf based database (using sqlite)
that can be manually populated using a commandline spidering tool.
- Allow application developers to store custom data and use various front
ends to librdf (query languages, higher level of abstraction apis etc.)
- Use tracker for monitoring the file system and track additions/deletions
changes to files on disk.
- Update dobule-booking librdf database periodically or upon changes from
tracker by patching tracker.
- Make tracker use librdf as it's backend, thus getting rid of double
book keeping.
- Create a custom footprint optimized backend for librdf (similar in spirit to
TT?) for memory constrained devices if neccesary.

This development plan makes it possible to parallelize development and avoid
having some branches of development depend on the others.

/Øyvind K.

--
▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△▼△
'Truth comes out of error more readily than confusion.' -- Francis Bacon

_______________________________________________
dev mailing list
[email protected]
https://www.moblin.org/mailman/listinfo/dev

Re: [Moblin Dev] Meta data storage/management

Reply via email to