Re: [Moblin Dev] Meta data storage/management

Jimmy Huang Mon, 08 Sep 2008 12:18:23 -0700

On Fri, 2008-09-05 at 12:23 +0100, Øyvind Kolås wrote:
> On Wed, Sep 03, 2008 at 05:41:45PM -0700, Jimmy Huang wrote:
> > I completed a design document draft of Content Manager for Moblin 2.0,
> > which is part of the Application and Framework infrastructure.  It is
> > based on Meta Tracker (Trackerd) and SQLite.
> > 
> > Please review the document and provide feedbacks, or comments are
> > welcome.  Thanks.
> 
> I've been looking into meta data management for moblin2 myself this week
> and sum up thoughts related to my findings in this mail.
> 
> Good meta data APIs will be instrumental in being able to create a good
> innovative user interface. I am basing my assertions on different frameworks
> on my own experiments in http://pippin.gimp.org/stuff/.
> 
> RDF is an extendable abstract technology similar to XML, but it is not XML
> meta data is also not only about search but can also be storage of information
> like the change in brightness or the crop desired on a photo.


I agree, because applications need a way to not only query the extracted
metadata, but also a way to store metadata that's application-specific.
This is what tracker doesn't provide a lot, it does let user give a tag
string to the file, but you can't really give labels to the information
dynamically.

> 
> We can use ontologies (a vocabulary for describing different types of data 
> like
> files, images, videos and applications) specified by Xesam as well as use 
> other
> ontologies originating with in the semantic web communities. We will probably
> also invent meta data of our own, like the brightness and contrast as well as
> cropping and sharpening applied to an image in the photo manager.
> 
> We need to do more than feed meta data in as well as query what is there, we
> also need a DOM-like access to traverse the arcs of the graph when creating
> visualizations and user interfaces.
> 
> Meta data is not only about search, full text indexing is a separate
> issue and should be stored in a separate database. We might be able to do
> without such functionality anyways.

The intention of the project is not focus on text indexing at all.  I
can view what Tracker provides as a good bonus information, and may also
derive some key usages for the Moblin framework.

> 
> 
> Potential RDF storage frameworks
> ================================
> 
> What follows is a review of the candidate libraries I've looked at in greatest
> depth (I've gone through most C based, actively developed or not RDF libraries
> I've found with freshmeat and google). This list are the three I have ended up
> finding most relevant and studying in deepest detail.
> 
> librdf (redland)
> ----------------
> link: http://librdf.org/
> pros: - well tested and documented, uses RDF natively.
>   - DOM like API to navigate the graph
>   - abstraction glue layer
>     - multiple backends, could be extended with a mobile dedicated
>       backend like TT.
>     - supports multiple and pluggable query languages, allows full reuse
>       of existing literature applicable to development using RDF from the
>       semantic web domain.
>   - works with multiple clients using libsql and some other backends, this
>     means no marshalling of data over dbus but direct access from all clients.
>   - written in C
> cons: - large
>   - verbose API (can be wrapped in macros, or an abstraction layer created.)
>   - The library does not do file locking on berkeley DB files at least.
>   - doesn't support multiple clients concurrent access/syncinc with the
>     berkeleydb backend (can be added with transactions.) But a native
>     quarked string/hashtable based approach similar to tt could be a better
>     long term plan for mobile memory footprint/performance optimization.
> 

I haven't work with rdf libraries, and I am curious of what's the memory
footprint and scalability of the API for large datasets.

> TT (from stuff)
> ---------------
> link: http://pippin.gimp.org/tmp/tt.h.txt
>       http://pippin.gimp.org/tmp/tt.c.txt
> pros: - developed in-house
>       - we have plans for how to make it efficiently shared between processes
>         using mmap and per processes tiny indexes for efficient queries.
>       - small DOM like API, not as extensive as librdf though.
>       - fast since it works with an in memory index, at a later stage the
>         actual strings could be swapped out in a shared mmaped string
>         storage between client processes.
> cons: - Experimental small minimal developer base
>       - No developer community.
>       - few features, needs development.
>       - will not work correctly for RDF when there are multiple objects with
>         the same relation (e.g. multiple dc:contributor relations).
>       - very simplistic query model.
> 

I like the simplicity of the API in TT.  I think we should provide a
similar set of APIs.

> Tracker:
> --------
> link: http://www.gnome.org/projects/tracker/
> pros: - used by others, improved by nokia
>       - responds in real time to filesystem changes.
>       - has many extractors.
>       - could potentially have it's data store replaced with librdf, which
>         could allow clients direct access to the nicer APIs there without
>         going over dbus.
> cons: - does more than what we need, and doesn't deal with RDF directly,
>       - own high level abstractions for types.
>       - lack of high level DOM API.
> 
> A plan
> ======
> 
>  - Create a separate double bookkeeping librdf based database (using sqlite)
>    that can be manually populated using a commandline spidering tool.
>  - Allow application developers to store custom data and use various front
>    ends to librdf (query languages, higher level of abstraction apis etc.)
>  - Use tracker for monitoring the file system and track additions/deletions
>    changes to files on disk.
>  - Update dobule-booking librdf database periodically or upon changes from
>    tracker by patching tracker.
>  - Make tracker use librdf as it's backend, thus getting rid of double
>    book keeping.
>  - Create a custom footprint optimized backend for librdf (similar in spirit 
> to
>    TT?) for memory constrained devices if neccesary.
> 
> This development plan makes it possible to parallelize development and avoid
> having some branches of development depend on the others.

It makes sense to do parallel development because Tracker is actively
developed and constantly optimized.  I think what's going to need to
drive the metadata storage design is how are the applications intended
to use it.  Are applications primary use it to store metadata? or are
they mostly querying?  Right now, it seems the only or primary usage of
the content manager is a photo-album like applications for storing, what
about other applications like a Contact Manager for storing contact
information?  We need more feeds from the community and application
developers for Moblin to provide ideas and feedbacks.

Scalability and performance is also a primary concern.  How's a
RDF-based backend performs when query is going to have to parse the rdf
on larget datasets on a small device like MID?  The current usage model
is that application filters through tens of thousands of media content
and easily crate a tag cloud based on tags provided.  It would be nice
to see some performance analysis on parsing/querying rdf using librdf. 

> 
> /Øyvind K.
> 

_______________________________________________
dev mailing list
[email protected]
https://www.moblin.org/mailman/listinfo/dev

Re: [Moblin Dev] Meta data storage/management

Reply via email to