First, thanks for the advice! Now let's see... On ה', 2014-01-02 at 13:33 +0000, Sam Thursfield wrote: > Hi > > On Mon, Dec 23, 2013 at 12:05 PM, fr33domlover <[email protected]> > wrote: > > Hello, > > > > This question is quite general but I'd like to know how things in GNOME were > > designed, in addition to any general advice you have. > > > > Assume a GUI application uses a central data backend, e.g. Tracker. > > Currently Tracker is a single central storage service, i.e. one daemon > > accepting connection and queries from apps. > > > > Now assume I want to have more than one database: for example a separate > > database for some project I work on, a separate database for documenting > > file system hierarchy, separate database for desktop items, etc. The common > > approach, at least in SQL, is to have a single entry point, an SQL server, > > which handles all the databases stored on the computer. All clients connect > > to the same server. > > Tracker stores all data in a single per-user database because there is > no simple way to aggregate queries across multiple databases. The goal > of Tracker is to provide a single query interface over all of that > user's data, so this is unlikely to change overnight.
There are "federated queries", but Tracker is somewhat limited because it is implemented using an SQL database. That's okay. The reason seperate databases are important is: 1. It allows you to keep them as separate files and move them to anywhere you like. It is somewhat similar to how running serves using virtual machines makes it easy to separate, clone and backup them. Tracker is made just for desktop data, so in this specific case it's not a big deal I guess. 2. Some uses require special software. Again, not something Tracker does at the moment, but in general here's an example: Assume I want to create an RDF query interface wrapping the file system, i.e. folder hierarchy can be queried using sophisticated SPARQL queries. Running an SQL database for the file system would probably make it very slow, so an efficient backend is needed, a thin wrapper above the existing file system APIs. So the question is whether a single server handles separate databases. Tracker is definitely an important backend I'm interested in. > > If you want to use Tracker for stuff other than storing per-user > desktop metadata, it's not impossible to get it to write to a > different file. The 'tracker-sandbox' program inside 'utils/' in the > Tracker source tree shows how to do this -- basically you need to > start a separate D-Bus session with XDG_CACHE_HOME and XDG_DATA_HOME > pointing to the alternate location. That D-Bus session will have its > own tracker-store process running which will read and write to the > alternative location. That's great for trying things in a sandbox! > > > Here's another possible approach: If the number of databases is small, it > > may be reasonable to launch a separate server for each database, and have > > them communicate with each other through IPC. There would probablly need to > > be another service to route network traffic to the right server, but aside > > from that, each database would have its own server (daemon) handling access > > to it. > > > > Would the second approach be better at anything? Is it something people do, > > something reasonable? Or the approach of a single server per computer is > > clearly better? > > > > (Would the second approach be more safe or scalable or resilient etc.) > > Storage and processing of complex non-desktop stuff is outside the > normal use-case of Tracker and GNOME, so there's not really any point > prescribing one approach over the other without knowing the specific > requirements. We're happy to advise on customising Tracker for > different use cases though if it seems the appropriate solution. I think Tracker can't solve all my problems because it's based on SQL and uses specific ontologies, which makes it unsuitable to function as a general-purpose semantic datastore, but it definitely can serve as a backend for the desktop data it is designed to store. I'm working on a semantic desktop project with some similarity to the ideas behind Haystack Semantic Desktop, a project developed in MIT. The idea is to have all the information inside semantic databases which function as nodes and communicate in a peer-to-peer manner. Inside the local machine, the requirement is fast local queries and managing several databases with federated queries, e.g. imagine an SQL server do a query involving several databases it manages. > > The second approach you describe (one server per database) is much > simpler to actually implement using Tracker. Work on making it easier > to run concurrent Tracker sessions would be welcome because we already > do this in the 'functional-tests' test suite, but the code there is > quite old and fragile. Yes, it's probably easier to just run several independent processes. But if they need to communicate on federated queries, the overhead of IPC may become significant, and RAM usage in general rises as the number of databases rises. A single server can run federated queries faster and manage different databases using separate threads or use a single message queue (reactor pattern), which can make the whole system much more scalable. Tracker is not made for these use cases, but to be honest I needed some general advice about databases, and I knew you guys have experience with them. Haven't had time yet to create a distributed libre replacement for Stack Overflow, you see... anyway Tracker is my first-priotiry backend. > > Sam fr33domlover _______________________________________________ desktop-devel-list mailing list [email protected] https://mail.gnome.org/mailman/listinfo/desktop-devel-list
