On Thursday 12 Dec 2013 21:23:51 Aaron J. Seigo wrote: > On Thursday, December 12, 2013 20:10:27 Vishesh Handa wrote: > > On Thursday 12 Dec 2013 19:40:11 Ivan Čukić wrote: > > > > If we all decide to store stuff in sqlite, then it doesn't matter if > > > > they > > > > are separate database files or the same one. > > > > > > I might be missing a few things here, but asking questions is the road > > > to > > > enlightenment :) > > > > > > - There is no way to query across different stores, which was the main > > > appeal of nepomuk? (I concluded this from the last mail) > > > > There isn't one. Not right now. I'm open to ideas on how to do something > > like if it is required. I'm slightly skeptical if it actually is required. > > for activities it’s pretty much a requirement: we have an activity and we > want to know all resources (files, contacts, bookmarks, applications, > windows ..) associated with it. so for activities we’ll either end up > querying each store separately or Baloo will need to provide a way to query > multiple stores. > > for the Plasma Active shell as it currently is, single-store querying might > be workable as we tend to keep most of the different resources separated in > the UI (though that’s one thing i want to change in future releases, so you > can group a set of bookmarks with a given file, e.g.)
I'm slightly confused. Please correct me if I haven't understood the problem correctly - You have an activity and you have a number of different resources related to that activity. The resource can be a file/contact/application/bookmark/anything. In order to store this, you could just store a mapping between activity id and resource id. Almost identical to what we have for tags. If this was stored in sql. Fetching everything related to a query would be - select * from activityRelation where activityId = 'identifier'; Then, when displaying each of these resources, you would need to query the individual stores they are in. For Contact and Emails, this would be Akonadi. For files, there is a FileFetchJob, etc. > > it would be a big problem if the tags are per-store as well; we need cross- > store tags (though from glancing at the API tonight it looks like that is > already there?) > Yes. But I'm having second thoughts about this. > this may be a question of API, of course. with different stores, collation > will need to happen somewhere. should it happen on the client side or the > server side is, i suppose, the big question. > > i would suggest server side for a simple reason: if multiple stores all > share the same physical storage system, it would be really nice to be able > to optimize queries to hit that storage system as little as possible. > example: > > Stores: S0, S1, S2 > > S0 -> xapian > S1 -> xapian > S2 -> mysql > > when fetching items from S0 and S1 that match tag T0, it would be very nice > if the backends could cooperate to merge their queries into one so that one > xapian query is done rather than 2 with post-query collation of the > results. > > for obvious reasons this can only be done in the server where the stores can > cooperate. > > a concrete use case: > > S0 = files > S1 = bookmarks > S2 = applications > > application = Plasma Active shell > > if adding stores is easy enough, i expect we’ll end up with stores for > things like geolocation, so this could balloon further. > > > > - When querying, how do I get the properties of the results? > > > > You don't. You just get the identifier and some text. You can do a > > subsequent fetch job to get additional data. > > more roundtrips doesn’t sound great for performance. if a result set has a > 1000 returned items and you then want to get properties on them (e.g. for > listing and sorting) then one needs to either send all 1000 UIDs back for > further processing or in a worst case scenario 1000 individual requests. > > this will be an issue for several things in Plasma Active, such as the file > manager. unlike Dolphin which just shows metadata for a given file, the > Active Files app relies on Nepomuk rather than the filesystem for these > things and allows filtering by ratings, tags, etc. > > > > - We talked about asynchronous querying. Is it going to happen? > > > > There is a QueryRunnable class which can be used to run queries in another > > thread. Most backends, do not seem to allow asynchronous queries, so there > > wasn't a way to run queries asynchronously by default. > > those backends could be run in a thread? iow, put the async/threading as a > first class feature that the backends must implement. even if it means > having a thread for execution in the background and queueing requests. > > making every user handle the threading sounds like we’ll have lots of code > that doesn’t ;) > Perhaps. There is always a tradeofff between keeping backend implementations simple, and having complex library code. > > > From my POV, it would be much nicer if you forced a single db (as an > > > actual > > > store, not as a cache like nepomuk is for akonadi) on the people, with > > > the > > > option to have a few things runtime defined. It would ease the > > > development > > > and would allow more fun queries which would be optimized unlike the > > > manual > > > client-side joining of different query results. > > > > But what if one doesn't use SQL for storing data? IMO Xapian is much > > better > > suited that sqlite's FTS support (or mysql). > > hopefully there would be a query object and people would not be hand coding > queries in strings that is passed to be parsed. which would make the “what > is the query language” thing moot; the sparql queries in C++ is one thing i > never really got comfortable with with nepomuk. > +1 > > When planning Baloo, I've mostly taken a look at PIM, Dolphin, KRunner > > (and > > Milou), PMC, and KPeople. Perhaps something was missed? > > usage in activities and Plasma Active are key use cases from my POV. If you want, we can discuss this over a hangout or irc where there is a smaller delay between responses. What time would be suitable for everyone? -- Vishesh Handa