First, thanks for the advice!
Now let's see...

On ה', 2014-01-02 at 13:33 +0000, Sam Thursfield wrote:
> Hi
> 
> On Mon, Dec 23, 2013 at 12:05 PM, fr33domlover <[email protected]> 
> wrote:
> > Hello,
> >
> > This question is quite general but I'd like to know how things in GNOME were
> > designed, in addition to any general advice you have.
> >
> > Assume a GUI application uses a central data backend, e.g. Tracker.
> > Currently Tracker is a single central storage service, i.e. one daemon
> > accepting connection and queries from apps.
> >
> > Now assume I want to have more than one database: for example a separate
> > database for some project I work on, a separate database for documenting
> > file system hierarchy, separate database for desktop items, etc. The common
> > approach, at least in SQL, is to have a single entry point, an SQL server,
> > which handles all the databases stored on the computer. All clients connect
> > to the same server.
> 
> Tracker stores all data in a single per-user database because there is
> no simple way to aggregate queries across multiple databases. The goal
> of Tracker is to provide a single query interface over all of that
> user's data, so this is unlikely to change overnight.

There are "federated queries", but Tracker is somewhat limited because
it is implemented using an SQL database. That's okay. The reason
seperate databases are important is:

1. It allows you to keep them as separate files and move them to
anywhere you like. It is somewhat similar to how running serves using
virtual machines makes it easy to separate, clone and backup them.
Tracker is made just for desktop data, so in this specific case it's not
a big deal I guess.

2. Some uses require special software. Again, not something Tracker does
at the moment, but in general here's an example: Assume I want to create
an RDF query interface wrapping the file system, i.e. folder hierarchy
can be queried using sophisticated SPARQL queries. Running an SQL
database for the file system would probably make it very slow, so an
efficient backend is needed, a thin wrapper above the existing file
system APIs.

So the question is whether a single server handles separate databases.
Tracker is definitely an important backend I'm interested in.

> 
> If you want to use Tracker for stuff other than storing per-user
> desktop metadata, it's not impossible to get it to write to a
> different file. The 'tracker-sandbox' program inside 'utils/' in the
> Tracker source tree shows how to do this -- basically you need to
> start a separate D-Bus session with XDG_CACHE_HOME and XDG_DATA_HOME
> pointing to the alternate location.  That D-Bus session will have its
> own tracker-store process running which will read and write to the
> alternative location.

That's great for trying things in a sandbox!

> 
> > Here's another possible approach: If the number of databases is small, it
> > may be reasonable to launch a separate server for each database, and have
> > them communicate with each other through IPC. There would probablly need to
> > be another service to route network traffic to the right server, but aside
> > from that, each database would have its own server (daemon) handling access
> > to it.
> >
> > Would the second approach be better at anything? Is it something people do,
> > something reasonable? Or the approach of a single server per computer is
> > clearly better?
> >
> > (Would the second approach be more safe or scalable or resilient etc.)
> 
> Storage and processing of complex non-desktop stuff is outside the
> normal use-case of Tracker and GNOME, so there's not really any point
> prescribing one approach over the other without knowing the specific
> requirements. We're happy to advise on customising Tracker for
> different use cases though if it seems the appropriate solution.

I think Tracker can't solve all my problems because it's based on SQL
and uses specific ontologies, which makes it unsuitable to function as a
general-purpose semantic datastore, but it definitely can serve as a
backend for the desktop data it is designed to store.

I'm working on a semantic desktop project with some similarity to the
ideas behind Haystack Semantic Desktop, a project developed in MIT. The
idea is to have all the information inside semantic databases which
function as nodes and communicate in a peer-to-peer manner. Inside the
local machine, the requirement is fast local queries and managing
several databases with federated queries, e.g. imagine an SQL server do
a query involving several databases it manages.

> 
> The second approach you describe (one server per database) is much
> simpler to actually implement using Tracker. Work on making it easier
> to run concurrent Tracker sessions would be welcome because we already
> do this in the 'functional-tests' test suite, but the code there is
> quite old and fragile.

Yes, it's probably easier to just run several independent processes. But
if they need to communicate on federated queries, the overhead of IPC
may become significant, and RAM usage in general rises as the number of
databases rises. A single server can run federated queries faster and
manage different databases using separate threads or use a single
message queue (reactor pattern), which can make the whole system much
more scalable.

Tracker is not made for these use cases, but to be honest I needed some
general advice about databases, and I knew you guys have experience with
them.

Haven't had time yet to create a distributed libre replacement for Stack
Overflow, you see... anyway Tracker is my first-priotiry backend.

> 
> Sam

fr33domlover


_______________________________________________
desktop-devel-list mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/desktop-devel-list

Reply via email to