Re: [MeeGo-dev] migration (back) to EDS - contacts and calendar

2011-04-01 Thread Philip Van Hoof
On Fri, 2011-04-01 at 11:57 +0200, Mathias Hasselmann wrote:
 Am Dienstag, den 29.03.2011, 16:35 +0200 schrieb Patrick Ohly:

[CUT]

 This will become very useful when implementation communication history.
 For Maemo5 we used a detached, custom sqlite database for storing
 communication history. Bringing pieces together, keeping them in sync,
 avoiding problems from concurrent access, getting reasonable response
 times and memory consumptions where a permanent problem with that
 solution.
 
 By using tracker as data store, we get the data aggregation for free,
 and we have to solve most concurency and performance problems at justone
 single place.
 
 Similar things apply for presence information. It might feel strange to
 put such information into a data store, but Tracker permits marking RDF
 properties as volatile. So presence information should not hit the disk
 if desired. Reminds me I have to push the Tracker guys harder to finally
 repair transient property support.

We have transient property support. The reason why it's not using memory
but instead is using the normal meta.db is because with direct-access
you have multiple processes connecting to the same .db file.

When you do this, you can't use SQLite's TEMPORARY and still share the
data between different processes. TEMPORARY in SQLite is per process.

We tried using a transient.db in /dev/shm and then ATTACH-ing that .db
file in SQLite for each process that connects, but this was several
times slower for a yet unknown reason. We still have this code somewhere
in a branch, but because it was slower we decided not to use it.

We do delete the last-session values of transient properties at startup
of tracker-store and we don't journal values of transient properties.

Given that meta.db is in .cache (and is considered cache), nothing is
written in the persistent data (in .local) when writing transient props.

Backup also doesn't back up values of persistent properties nor will
consequentially restore restore them.

We don't have many properties set as tracker:transient, though. Only a
few IM related ones. We don't believe that I/O introduced by these is
going to be problematic in any way.

  Disadvantages of Tracker:
 
  data protection missing in both EDS and Tracker, but less obvious how
  to implement it in Tracker

 The solution proposed for EDS is protecting its database files. Same
 solution can be applied to tracker. Since you can deal without data
 aggregation at this moment, you should not loose anything if we create a
 separate Tracker database for contacts and apply same file level
 protection as for EDS.

It's by the way very easy to start from libtracker-data and implement
your own tracker-store using its own SQLite database and etc. This
should be doable in a few lines of code, and in Vala too (like
tracker-store itself is).

You can let that one use its own D-Bus name or even let it use its own
special IPC mechanism. And protect all this as much as you want.

The effort to do this is by no means larger than the effort for adding
some sort of protection to EDS, the results are the same.

In Harmattan the database of Tracker is already protected using UNIX
file permissions and Aegis (if processes aren't in the group
metadata-users, which only Aegis can grant to them, then they can't
access meta.db - then they must go over D-Bus to do queries).

Note that with Aegis is also D-Bus protectable.

 If you look at the future Tracker can provider significantly better data
 protection than EDS. Tracker supports RDF graphs per resource property
 (EDS speak: vCard attribute). Code can be written to restrict access for
 each single graph. Virtual SQLite tables should do the job. If this
 should be implemented, you could aggregate information from different
 data sources into one single contact, and still could make sure, that
 for instance only the Addressbook itself can see information retrieved
 from e.g. Facebook. Considering privace concerns this is a very
 significant feature, which cannot be reasonable implemented with EDS -
 AFAIK.

Not trivial to implement, this idea, but mostly correct. Yes.

  slow write performance in QtContacts-Tracker (?)

 Please give concrete test data. Like Philip Van Hoof demonstrated, it
 rather seems writing performance of EDS and Tracker seems to be
 comparable - which is not surprising that I/O is the limiting factor.
 With Tracker's recent addition of sparql-update (INSERT OR REPLACE
 statement) we have seen performance improvments of up to 25%. This would
 set QtContactsTracker ahead of EDS.

My tests are indeed showing that Tracker is beating EDS in every field:
Storage speed, a lot more flexible query capability and query speed.

In every sense is Tracker scaling better than EDS. These are verifiable
and reproduceable measurements and the code of the tests for Tracker is
available in Tracker's repository (the EDS tests are available in the
mailing list archives).

On a FS that has a slow fsync, EDS

Re: [MeeGo-dev] migration (back) to EDS - contacts and calendar

2011-03-29 Thread Philip Van Hoof
On Tue, 2011-03-29 at 21:08 +0200, Patrick Ohly wrote:
 On Di, 2011-03-29 at 19:41 +0100, Michael Hasselmann wrote:
  On Tue, 2011-03-29 at 20:37 +0200, Patrick Ohly wrote:
   On Di, 2011-03-29 at 18:48 +0100, aleksandar.stojiljko...@nokia.com
   wrote:
It is important to address IM/VoIP contacts case. Now, contactsd is
storing them to tracker
(https://gitorious.org/qtcontacts-tracker/contactsd) where they are
mapped to communication history.
   
   This has never been used by the Tablet UX. The focus there is to use
   libfolks instead. Admittedly libfolks isn't ready yet (neither with nor
   without Tracker).
  
  When did Tablet UX become the focus of MeeGo?
 
 I'm not going to debate what the focus of MeeGo was, is, isn't, or
 should be - in particular not on meego-dev. Everyone, please take that
 discussion to the steering committee and/or to program managers (via
 feature requests in bugs.meego.com).

OK

https://bugs.meego.com/show_bug.cgi?id=15014


Cheers,

Philip


-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] Architecture decisions (was Re: migration (back) to EDS)

2011-03-25 Thread Philip Van Hoof
On Fri, 2011-03-25 at 12:19 +0200, Sivan Greenberg wrote:
 This is actually quite good in my view, we have a proven working in
 the wild implementation in official , while all other components are
 still there to experiment with or showcase when they become mature
 enough.

This is certainly a good compromise. But then we just develop a Tracker
miner for E-D-S (which is somthing we planned to do anyway in upstream,
at least at some point).

Some pointers and info on how to make a E-D-S miner for Tracker:

http://lists.meego.com/pipermail/meego-dev/2011-March/482147.html


Cheers,

Philip

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] migration (back) to EDS

2011-03-22 Thread Philip Van Hoof
On Tue, 2011-03-22 at 09:34 +0100, Patrick Ohly wrote:

 On Di, 2011-03-22 at 08:02 +, zoltan@nokia.com wrote:
  Option 2 from below would be the cheapest, fastest and least pain path but 
  it's duplicating contacts to tracker. Option 3 would be optimal but 
  difficult, 
  option 1 could also work. Choose one, or suggest something else. Please 
  finish 
  the job you started :).
 
 Option 2 seems like a reasonable approach to me, if we can make it so
 that the call history database is self-contained and merely mirrors the
 subset of the contact data in read-only mode that is relevant for a
 quick list of results, with more details fetched on demand from the main
 contact database.

Keeping Tracker's RDF store in sync with EDS's data sounds like a viable
option that would likely be accepted in the Tracker upstream project.

Tracker already has a Evolution plugin that keeps E-mail metadata in
Evolution in sync. You can find the code for that EPlugin here:

http://git.gnome.org/browse/tracker/tree/src/plugins/evolution

Adapting this plugin to also work in the Evolution UI as shipped on
MeeGo should as far as I know not be very difficult (if even any
adaptation is needed).

The EDS one would have to be done by implementing a TrackerMiner under
for example src/miners/eds, the existing RSS one is a good example to
get started:

http://git.gnome.org/browse/tracker/tree/src/miners/rss/tracker-miner-rss.c

And for a TrackerMiner implemented in Vala, look at the Flickr one:

http://git.gnome.org/browse/tracker/tree/src/miners/flickr/tracker-miner-flickr.vala

I can imagine this miner using EBookView to give live updates whenever a
contact changes in E-D-S, to Tracker's RDF store.

In the other direction can, by listening for GraphUpdated and/or the
Writeback signal, the miner also writeback to the E-D-S store. For an
example on how this would or could work: the Flickr one also writes back
to the Flickr website whenever metadata changes in Tracker first.

Cheers,

Philip


-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] migration (back) to EDS

2011-03-21 Thread Philip Van Hoof
On Mon, 2011-03-21 at 20:54 +0100, Patrick Ohly wrote:
 On Mo, 2011-03-21 at 17:45 +, zoltan@nokia.com wrote:
   From: Patrick Ohly, on 21 March, 2011 16:37
   In a nutshell, the goal is to keep QtContacts, QMF and KCalCore as the
   main APIs used by applications and the higher-level QtMobility APIs. If
   we can achieve that, very little will have to be done in applications
   once we change the core components.
  
  This will break libcommhistory.
 
 I'm aware of that, and I don't claim that there is an answer. There is a
 task open for those who need libcommhistory (or something like it) to
 look into this problem and comment on the current proposals.

Ok, so.. I'll need something like libcommhistory. Check.

What are the current proposals for things like libcommhistory?

(this is a real question. You're the architect: don't sidestep it again)

1. Support things like IM and call history in EDS?
2. Synchronize contact data in EDS to Tracker's RDF store?
3. Solutions that involve shared memory with EDS?
4. A big drop in messaging performance?
5. A big drop in capabilities? Applications can no longer make queries
   like these:
   http://gitorious.net/commhistory/libcommhistory/trees/master/sparql 
6. ...
7. Profit?

If I know what the proposal is, then I can comment on it.

This part is opinion:

I find it rather strange. A working solution that does scale well, that
performs as good and soon probably better than E-D-S, that supports more
use-cases and application domains than even necessary, that is already
integrated and very well tested, that has been or is being optimized to
meet the performance requirements of each and every use-case and that
does support the query flexibility required by applications ..

Is being replaced by:

o. A solution that needs 20 bullet point task items before even the
   solution itself actually works. Most of those task items are
   vague and it's all but clear how they are to be implemented
o. that has less support for all the use-cases by its design (being a
   PIM data -only storage)
o. that can not support all application domains unless rather large
   U-turns are made in the very root of the design
o. that has not been optimized for btrfs, as apparently even something
   as simple as sensible fsync() use is done wrong in E-D-S (see
   Adrien's measurements on btrfs with and without libeatmydata).
   https://mail.gnome.org/archives/tracker-list/2011-March/msg00035.html
o. A solution that makes all the IPC mistakes possible with D-Bus:
   o. Passing data, not messages over D-Bus: congestion on the bus
   o. Passing data marshalled as a DBusMessage instead of FD passing
   o. Not using D-Bus's signaling mechanism where appropriate
o. A solution that has not been tested whatsoever against neither the
   use-cases, nor the APIs

o. Basically, a solution that needs more time to be shaped into
   something workable (workable doesn't mean works well) than any
   release time line for a mobile product permits since the last ten
   years or more.

It's strange mathematics.

But, ok ..

Cheers,

Philip



-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] Some architecture changes (MSSF / Buteo / PIM storage)

2011-03-15 Thread Philip Van Hoof
On Tue, 2011-03-15 at 09:33 +0200, Adrien Bustany wrote:
 On Tue, 15 Mar 2011 08:10:18 +0100, Ville M. Vainio wrote:

Hey Adrien, and Hey Ville M. Vainio,

  On Mon, Mar 14, 2011 at 9:20 PM, Patrick Ohly 
  patrick.o...@intel.com wrote:

  I've said before and I say it again here, I consider performance
  comparisons pointless at this time.
 
  Considering that e-d-s has a much more modest feature set than 
  tracker (tracker in general being a much more ambitious project), I would 
  have expected it to to trounce tracker in performance, which doesn't seem
  to be the case.

I share the exact same opinion. I'd also like to point out, in defense
of Evolution UI was running, this creates load on the E-D-S store that
Tracker's RDF store has a direct-access method for opening the database
by clients due to its use of SQLite's WAL journaling mode.

This means that when a direct-access - enabled Tracker client reads from
the database, it doesn't disturb the tracker-store process directly
(only indirectly due to scheduling, but with process priorities in Linux
also this can be completely avoided).

GraphUpdated can be used in combination with Tracker's direct-access
mode. You get GraphUpdated about data deltas post the transaction
commit, so with SQLite WAL it's then available everywhere.

On the subject process priorities: SCHED_IDLE vs. SCHED_FIFO; with
direct-access you'll have it and you wont have to worry about the
process priority of the service tracker-store.

This of course depends on your process *having* access to meta.db (as I
mentioned in another E-mail can Aegis on Harmattan prevent you from
having access to meta.db, meaning that you must go over a IPC with
tracker-store to do read queries too). This IPC then uses FD passing
over D-Bus.

  This evidence might prompt to re-evaluate this part of the
  architectural plans. Or at least leave the door open to transitioning
  back to tracker when it's feasible.
 
  If you're interested in the saving performance of both solutions, I
  answered the thread on the Tracker ML (didn't want to cross-spam
  Meego-Dev). If you abstract the fact that EDS has no batching API (and
  therefore seems to issue a fsync after saving each contact) by running
  it over libeatmydata, EDS is approximately twice faster than
  qtcontacts-tracker (though that area is being optimized currently). I
  haven't done any contact fetching benchmarks.

Note that it looks like you have e_book_add_contacts, commit_contacts
and remove_contacts. Those look like batch APIs to me. However, in the
example that I made for Tracker, I also didn't use Tracker's batch APIs
BatchSparqlUpdate() nor did I use its UpdateArray().

BatchSpaqlUpdate() is mostly about prioritization, UpdateArray() is
mostly about reducing D-Bus traffic.

I think it's interesting to explain Tracker's fsync() behaviour of its
RDF store:

Because we believe that fsync() is only necessary to avoid data loss in
case of an unclean shutdown of the system (when the filesystem isn't
unmounted but the power is cut nonetheless), we only do fsync() and only
on our journal file in case an application calls the Resources.Sync()
D-Bus call. We also open our SQLite db in a specific way.

Our SQLite database is always opened using:

PRAGMA synchronous = OFF
PRAGMA count_changes = 0
PRAGMA temp_store = FILE
PRAGMA auto_vacuum = 0
PRAGMA encoding = UTF-8
PRAGMA journal_mode = WAL

http://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-db-manager.c#n214

With synchronous = OFF wont SQLite do any such fsync() call. This is OK
because we do our own persistent journaling:

http://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-db-journal.c


Doing an fsync() on a high-latency device like many nand and flash disks
are, is a very big contributor to ie. reduced UI responsiveness.

A monitor application that sees that the battery-bay is being opened can
for example call the Resources.Sync() D-Bus call as soon as possible. 

For devices that do a clean emergency shutdown on critical battery, but
where the battery can't be touched unless you void warranty of the
device, this isn't needed (at least theoretically: a crasher bug being
the exception here).

Cheers,

Philip

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


[MeeGo-dev] Slightly better performance comparison between E-D-S and Tracker's RDF store when storing contact data

2011-03-15 Thread Philip Van Hoof
Hi there,

The last performance analysis that I did* on E-D-S vs. Tracker's RDF
store had some problems:

a. Certain people didn't want Evolution's UI to be running while the
   test takes place.

b. Another problem was that the test was not replacing but always
   appending new contacts.

c. The test for Tracker isn't using QtContacts' API.

d. The Tracker RDF store's test was doing the REPLACE and ORIGINAL in
   one run, the two might interfere each other a bit. The ORIGINAL query
   was also leaving behind traces of nco:Affiliation instances.

So

- To address a. I decided to shut it down and on top of that I decided
  to use an empty addressbook that I fill up with 1000 contacts once.
  Then I start running the tests.

  I did the same thing for Tracker's RDF store. First I make 1000
  contacts and then I start running the tests. No softwares are using
  either of the stores while the tests run.

- To address b. I rewrote the test for E-D-S a bit. You can find both
  tests attached to this E-mail.

- I have not yet addressed c. The QtContacts team have posted an E-mail
  to the Tracker mailing list with some of their numbers:
  http://mail.gnome.org/archives/tracker-list/2011-March/msg00035.html
  I expect more numbers coming from them after they have started using
  Tracker's latest INSERT OR REPLACE feature.

- I rewrote the Tracker RDF store's test to address d. (attached too).
  It has no more indentation of the SPARQL to better mimic what
  QtContacts will generate and the query was further optimized a bit.


Errata:

o. I used the branch sparql-update of Tracker

o. I used --disable-tracker-fts on Tracker (with FTS is Tracker's INSERT
   OR REPLACE not expected to be as much faster as with FTS. FTS is
   disabled on Harmattan). Without disabling FTS are in this particular
   test case (due to INSERT OR REPLACE) results going to be different!

   Note that I didn't clarify this last time.

o. I used default CFLAGS (-g -O2 afaik) on Tracker

o. Version in Evolution's about box is 2.30.3 (Debian testing) and E-D-S
   from Debian testing too. This E-D-S uses a addressbook.db and
   a addressbook.db.summary file when I look inside the folder.

I ran both tests several times and then I took the best scores for each:

Evolution Data Server:

$ ./eds-test 
EDS 1000 contacts: 12.449457 (0 not replaced)
$ ./eds-test 
EDS 1000 contacts: 12.388330 (0 not replaced)
$ ./eds-test 
EDS 1000 contacts: 12.967997 (0 not replaced)
$ ./eds-test 
EDS 1000 contacts: 12.391114 (0 not replaced)
$ ./eds-test 
EDS 1000 contacts: 12.553720 (0 not replaced)
$

Tracker's RDF store. All contacts are replaced here:

./test-insert-or-replace
REPLACE: 1000 contacts: 12.579931
$ ./test-insert-or-replace 
REPLACE: 1000 contacts: 12.722786
$ ./test-insert-or-replace 
REPLACE: 1000 contacts: 13.674250
$ ./test-insert-or-replace 
REPLACE: 1000 contacts: 13.083288
$ ./test-insert-or-replace 
REPLACE: 1000 contacts: 13.539822
$ 


*) http://mail.gnome.org/archives/tracker-list/2011-March/msg00033.html


Cheers,

Philip

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be
#include libebook/e-book.h

const gchar *vcard = BEGIN:VCARD\n
VERSION:3.0\n
CLASS:PUBLIC\n
REV:2011-03-14T13:47:25Z\n
FN:First %d Last %d\n
N:Last %d;First %d;;;\n
EMAIL;WORK:rho...@example.com%d\n
TEL;TYPE=cell,voice:02141730585%d\n
NOTE;ENCODING=QUOTED-PRINTABLE:c1f1b12d-bc75-4d45-9a1f-b1efe934409f\n
END:VCARD\n;

static EContact *
new_test_contact (guint i)
{
	EContact *contact;
	gchar *str = g_strdup_printf (vcard, i,i,i,i,i,i);
	contact = e_contact_new_from_vcard (str);
	g_free (str);
	return contact;
}


int main ()
{
	EBook *book;
	GError *error = NULL;
	guint i, y = 1000, not_replaced = 0;
	GTimer *timer;
	GList *all = NULL;
	GPtrArray *ids;

	g_type_init ();

	ids = g_ptr_array_new_with_free_func (g_free);

	timer = g_timer_new ();

	book = e_book_new_from_uri (file:///tmp/test, error);

	if (error) {
		g_critical (%s\n, error-message);
		return 0;
	}

	e_book_open (book, TRUE, error);

	if (error) {
		g_critical (%s\n, error-message);
		return 0;
	}

	if (e_book_get_contacts (book, e_book_query_any_field_contains (), all, NULL)) {
		while (all) {
			g_ptr_array_add (ids, g_strdup (e_contact_get_const (all-data, E_CONTACT_UID)));
			g_object_unref (all-data);
			all = all-next;
		}
	}

	g_timer_start (timer);

	for (i = 0; i  y; i++) {
		EContact *contact = NULL;
		gchar *uid, *freeup = NULL;

		if (i  ids-len) {
			uid = g_ptr_array_index (ids, i);
		} else {
			not_replaced++;
			freeup = uid = g_strdup_printf (%d, i);
		}

		/* For this example to work we wouldn't have get the
		 * contact. But for sychronization we would need to
		 * get it (else we can't know the old values). */

		if (!freeup  e_book_get_contact (book, uid, contact, NULL)) {

			e_book_remove_contact (book, uid, error);

			if (error) {
g_critical (%s\n, error-message);
g_object_unref (contact);
g_free (freeup);
return 0

Re: [MeeGo-dev] Some architecture changes (MSSF / Buteo / PIM storage)

2011-03-14 Thread Philip Van Hoof
On Mon, 2011-03-07 at 08:09 -0800, Arjan van de Ven wrote:

Hi Arjan,

 PIM storage
 ===
 The Address book, Calendar data and Email are currently stored in a 
 tracker database, and accessed (officially) via a QtMobility API set.
 There are a range of issues with this implementation, starting with the 
 complexity of adding privacy controls, the performance and
 scalability as well as the completeness for doing a proper syncml sync.


http://mail.gnome.org/archives/tracker-list/2011-March/msg00033.html

Cheers,

Philip


-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] Some architecture changes (MSSF / Buteo / PIM storage)

2011-03-14 Thread Philip Van Hoof

On Mo, 2011-03-14 at 19:03 +, Philip Van Hoof wrote:


Hi,
  
 http://mail.gnome.org/archives/tracker-list/2011-March/msg00033.html

Okay, I bite.

 The comparison is favoring Tracker in a number of ways:
  * Having the Evolution UI running while inserting contacts into
EDS slows down the whole process because all data is also
getting sent back to the UI. This is more complex than just
inserting the data into Tracker, which has nothing reading that
data during that time.

Only seconds after the application finished did the Evo UI became 
unresponsive. The second run took 11s instead of 10s, I can
imagine that this 1s difference is the impact. That's of course
just speculation. Somebody would need to investigate this to be
sure.

However, on favoring:

Note that the E-D-S test appended contacts. It didn't replace
contacts. The Tracker test replaces contacts.

If I append contacts with Tracker, the time to add 1000 contacts
is under 5s instead of between 12s and 15s.

So I gave a 'huge' favor to E-D-S, to be honest. I should for the
E-D-S test lookup each contact and delete it, before adding
the new one and committing it.

  * You use the low-level Tracker API to create contacts in Tracker.
QtContacts should have been used instead, because that is the
API that is relevant for MeeGo and that corresponds to the one
you used for EDS.

There is no lower level API for E-D-S as far as I know ..

Afaik will the qct people repeat the test. Note, however,
that the query that we used comes from QtContacts project.

So it wont be much different.

QtContacts doesn't add a lot of overhead on top of the queries
that it produces. Not in performance.

 Of course you have a point about the system as a whole, but
 that still doesn't make the comparison any better.

Can Evolution's UI be running on MeeGo?

 I could speculate whether this comparison was skewed
 intentionally or unintentionally, but let's not go
 there, okay?

I prefer to stick to the numbers, yes.

 I've said before and I say it again here, I consider
 performance comparisons pointless at this time.

Why are they pointless?

 Doing them badly just reduces any good will that might
 be left from the people one is trying to convince.

More tests, including ones with QtContacts, will follow.

Cheers,

Philip

-- 
Best Regards, Patrick Ohly

The content of this message is my personal opinion only and although
I am an employee of Intel, the statements I make here in no way
represent Intel's position on the issue, nor am I authorized to speak
on behalf of Intel on this matter.


___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] Some architecture changes (MSSF / Buteo / PIM storage)

2011-03-10 Thread Philip Van Hoof
On Wed, 2011-03-09 at 23:24 +0100, Mathias Hasselmann wrote:
 Am Mittwoch, den 09.03.2011, 07:30 -0800 schrieb Arjan van de Ven:

 So speaking about qtcontacts-tracker: Can you point me to bug reports,
 to or broken promises?
 
 A quick search for qtcontacts-tracker on bugs.meego.com finds 19 bugs.
 Not a single show stopper judging from summaries. Nokia's internal bug
 tracker lists 17 bugs for qtcontacts-tracker right now. Our test suite
 (about 10k lines of code) reports 139 of 139 passed.
 
 So what actually are the show stoppers?

We are now adding support for REPLACE in Tracker's SPARQL Update engine.
We know that qtcontacts-tracker had to use a lot of DELETE-WHERE-INSERT
constructions and it's likely that the DELETE-WHERE part and old values
check of INSERT is a major contributor to the Update performance for
qtcontacts-tracker's use-cases.

I think Adrien Bustany has started experimenting with this branch today.
So if Update performance is a show-stopper we will have new numbers this
week.

I made an article explaining how the current syntax works:

http://pvanhoof.be/blog/index.php/2011/03/09/a-replace-extension-for-trackers-sparqls-update

Note that qtcontacts-tracker can also, on top of the REPLACE that we're
adding this week, use UpdateArray (it's not yet doing that, afaik).

UpdateArray can be instrumental in reducing D-Bus traffic and calls (as
you pack multiple queries per call and yet you get per-query errors).

Note that we already use FD passing to avoid most of D-Bus's performance
problems.

 Where is the email telling us qtcontacts-tracker developers that there
 are show stopping issues? At least Patrick knows very well how to
 contact us.

Yes, we did see some E-mails on Tracker's upstream mailing list by
people from Intel. We also received a few patches and thanks to that we
gave priority to our UpdateArray API, then reworked their patch to fit
this.

But never any performance figures, requirements, big problems, etc.

It's very hard to claim that we are unresponsive towards Intel. There's
plenty of mailing list material to back that up.

I also do wonder where these 'discussions' that Arjan talked about took
place. Respected people, like David Neary, have asked the same question
in this meego-dev ML thread:

On Tue, 2011-03-08 at 12:01 +0100, Dave Neary wrote: 
 Hi,
 
 Arjan van de Ven wrote:
  Time has come and gone for this to be a discussion; this is a decision.
 
 Out of interest, when was the time for this to be a discussion? And
 where did that discussion happen?
 
 Thanks,
 Dave.

But I see a pattern here: all our questions are left unanswered or have
so far been answered by emotional responses.

But hey, fine. Been there, done that (Fremantle - Harmattan).

 PS: To get some more numbers I ran few quick searches on bugs.gnome.org:
 
   :evolution-data-server :contacts finds 70 bugs
   :evolution-data-server even finds 354 bugs

Nod. And it's not that several years ago at Nokia, during Fremantle,
people where 'satisfied' with EDS. I haven't seen huge improvements
happening to EDS since that period.


Cheers,

Philip


-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] Some architecture changes (MSSF / Buteo / PIM storage)

2011-03-10 Thread Philip Van Hoof
On Thu, 2011-03-10 at 12:39 +0100, Mathias Hasselmann wrote:
 Am Donnerstag, den 10.03.2011, 12:02 +0100 schrieb Philip Van Hoof:
  On Wed, 2011-03-09 at 23:24 +0100, Mathias Hasselmann wrote:
   PS: To get some more numbers I ran few quick searches on bugs.gnome.org:
   
 :evolution-data-server :contacts finds 70 bugs
 :evolution-data-server even finds 354 bugs
  
  Nod. And it's not that several years ago at Nokia, during Fremantle,
  people where 'satisfied' with EDS. I haven't seen huge improvements
  happening to EDS since that period.
 
 Fremantle's EDS version differs greatly from GNOME EDS. To get a
 realistic idea of the effort needed for turning libebook into something
 that's somewhat production ready, take a look at at the long commit
 history of Fremantle EDS: http://maemo.gitorious.org/eds-fremantle/

And still the teams using it were not satisfied in the end :)

At least that's what I remember.

I also wonder what version of EDS is being proposed for MeeGo. The
eds-fremantle or GNOME's EDS. And how much of the changes in for the
Fremantle version are now in GNOME's upstream EDS?

 Yes, we should have tried harder to get into upstream EDS development.
 First our version of libebook-dbus was in horrible. Later the delta
 became too large to handle. Too little time. Too many secrets around the
 N900. Naive planing. :-/

Ah, yes. Good old times Mathias. Good old times. We'll recover from that
when we're old. At some conference while complaining about those young
guys, with a good beer. You know. :-)


Cheers,

Philip


-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] Some architecture changes (MSSF / Buteo / PIM storage)

2011-03-08 Thread Philip Van Hoof
Interesting thread ... (not).

Congratulations for guiding it with insightful and non-emotional
contributions, Arjan.

Can somebody from Intel answer One of the questions about EDS?

On Mon, 2011-03-07 at 22:40 +0100, Philip Van Hoof wrote:

Question one:

 Can you explain us, how does EDS improve privacy control compared to
 Tracker's RDF store? Does EDS have a way to represent different graphs
 (part of Tracker's future plans to do per-resource privacy control)?
 
   How in EDS do you say that data (about) x isn't to be made
   available to queries made by application y? It's the first time
   that I hear that EDS has such a privacy related capabilities.

Question two:

 Can you also go a bit deeper into how EDS's performance compares to
 Tracker's RDF store for comparable data entry operations? Have you guys
 done measurements on this? Can the results of these measurements and the
 method to reproduce be publicized? Are the queries you used available
 somewhere? Where?

Question three:

 What do you mean with scalability? Scalable over multiple application
 domains other than PIM? Or scalable to more data?

Question four:

 And what kind of data? I wonder how EDS is or can be more scalable
 when the underlying database technology (SQLite, apparently) is identical.
 
 In Tracker is each class stored in a decomposed table. This means that
 contact and E-mail data are each in separate tables. Although decomposed
 typically makes INSERT a bit slower, SELECT is a lot faster.

Question five:

 How will or does EDS make SELECT equally 'scalable' with equally complex
 queries?


 Also note that the Tracker team optimizes both INSERT and SELECT
 use-cases on a use-case per use-case basis, using its domainindexes and
 indexes.

Question seven:

 Have you contacted the Tracker team about the scalability
 problems (if any)?

Cheers,

Philip


-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] [Meego-architecture] Some architecture changes (MSSF / Buteo / PIM storage)

2011-03-08 Thread Philip Van Hoof
On Tue, 2011-03-08 at 11:56 +0100, Mathias Hasselmann wrote:
 Am Dienstag, den 08.03.2011, 12:41 +0200 schrieb Vitaly Repin:
   you seem to be proud of 80 seconds for 500 people I would absolutely 
   not be proud about such
   a abysmal number.
  
  Completely agree. The contacts sync speed should be increased.
 
 Yes, it's still far from optimal. Still we have some more rabbits in our
 hat:
 
   * Just trying to figure out right now if it is acceptable that the
 save request reports finished before contacts are (fully?)
 indexed.
   * The tracker guys just investigate some more crazy tricks to
 improve update performance. Doubt many other people ever having
 driven sqlite optimization as far as the tracker guys are doing.

One of the problems is that the contact-updates also need to first
DELETE possible existing data. This often doubles the time to update.

We plan to add a Tracker-specific extension to SPARQL Update that allows
for UPDATE queries. Although with RDF ain't UPDATE queries going to look
nice (multivalue properties, etc). If you have syntax suggestions for
UPDATE for SPARQL, let us know.

A problem of DELETE could be that triples are removed one by one. I'm
today looking into trying to merge deletes together. Not sure yet if it
will help a lot as it's deleted by ID, which is the primary key =
indexed) and because SQLite deletes rows one by one (internally) anyway.

When a DELETE is very slow then note that the WHERE part of the DELETE
is more or less a SELECT to get the triples-to-delete. If that SELECT is
slow due to a full table scan, use of optionals (left joins), etc (the
usual things for SELECT) then the DELETE will also be slow, of course.


Cheers,

Philip

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] Some architecture changes (MSSF / Buteo / PIM storage)

2011-03-08 Thread Philip Van Hoof
On Tue, 2011-03-08 at 12:12 +, Martyn Russell wrote:
 On 08/03/11 11:58, igor.sto...@nokia.com wrote:
  Hi,
 
 Hi Igor,
 
  From: ext Martyn Russell [mar...@lanedo.com]
  Sent: 08 March 2011 13:53
 
  As you can imagine, if you want to be able to query, insert and merge
  120k of contracts, you really can't have your cake and eat it so to
  speak without some sacrifices one way or another.
 
  That's not true.
 
  Nothing prevents from having 2 DB, where the first one is the usual
  (could be considered as cache) and the other is much larger and used
  only in case the first search fails. Maybe with the explicit consent
  from the user to proceed with such expensive operation.

[cut]

 Also, we've had  1 DB before in Tracker and it was a nightmare. There 
 are problems with joins, security, speed, etc. and we've (as a team) 
 discussed this so much already. We have also done testing with n 
 databases vs 1 database to make sure that our approach makes the most 
 sense with SQLite as it also depends on which database you use.

Some technical chitchat on this:

One of the big problems with 'two or more databases' for one class of
data (in a decomposed schema, where each class gets its own table) is
that each (sub)query done on the tables must become a UNION with the two
tables.

UNION in SQLite internally means ~ executing two queries; twice as
slow. Mind that SQLite does relatively few magic behind the scenes. As a
DB engine it's usually / relatively straightforward.

And in the end is or should splitting up a table yield similar results
as adding an index on a single table, at least for querying. So just add
an index?

Having a lot of tables with some of the tables having a lot of rows
should not slow down queries on the smaller tables in a sqlite's .db. So
for that there is also no real reason to have multiple .db files. If
SQLite does get slowed down (a lot) because of that, then that sounds to
me like a (serious) bug in SQLite.

As for read access without IPC, and the claim that with multiple .db
files you can avoid some locks when multiple processes access the .db
file, can you open SQLite databases in WAL journaling mode and avoid
most of any such lock altogether (for reading). It's not yet true MVCC
but for just reading it's quite useful that way.

For updating this doesn't work, so we have fd-passing over DBus to
tracker-store for updating (for that reason).

Also note that in SQLite a transaction locks each .db that is ATTACHed
to the connection where the transaction started.

ps. I'm not saying that SQLite is the best DBMS on all architectures and
platforms. Especially not when you have a Gig of RAM (just a few more
years and phone-sized devices have that, you'll see). Over the years we
learned to deal with what it is. It has quite a few problems idd.

Most of those popular DBMS's do use too much RAM, though. And if I can
use gigabytes of RAM and waste massive amounts of I/O and battery then
no, it's not difficult to make things perform well. Also if I can choose
the storage hardware, would help a lot. Etc.

For all the numbers that have (not) been posted here: I wonder about all
those. What RAM usage? What storage hardware? What amount of data? What
kind of data? How flexible can it be queried afterward? And how fast is
that query post data entry?

But I have the feeling that we won't see *any* numbers whatsoever.


Cheers,

Philip

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] Some architecture changes (MSSF / Buteo / PIM storage)

2011-03-08 Thread Philip Van Hoof
On Tue, 2011-03-08 at 08:43 -0800, Patrick Ohly wrote:
 On Di, 2011-03-08 at 10:31 +, Philip Van Hoof wrote:
  Can somebody from Intel answer One of the questions about EDS?
 
 All you are asking for is performance. To me, that wasn't the main issue
 with Tracker. The key drawbacks of Tracker as storage for contacts are
 these:
  1. only one address book

You can use either a graph or things like a nie:DataSource and the
property nie:dataSource to have multiple address books.

  2. no support for controlling access

This is unfortunately as far as I know no different in EDS.

Tracker's plans for per-resource access control is to use its (limited)
support for graphs for that.

Access to metadata is then granted per graph. (not implemented, and I'm
not trying to make it sound that this is easy)

Combined with 1. that would mean that you could grant access to one
address book to application A, but not to application B, although B has
access to another address book.

So for example

INSERT { GRAPH urn:protected-xxx { person a nco:Contact ... } }

Application X would get access to urn:protected-xxx in some way. It does
a query for all nco:Contact, and it gets person.

Application Y would not get access to this. It does a query for all
nco:Contact, and it doesn't get person.

And explained lower: Application Y gets no read access to meta.db, so it
must use tracker-store who'd regulate this access. Nor can Application Y
use sqlite3_open on its own, as it has no read access other than via
tracker-store (which uses FD passing D-Bus as IPC).

 There are real-world scenarios (think mirroring sensitive enterprise or
 social web data) were an additional address book is needed, because data
 must not be mixed with the normal contacts.

Sure, ok

 You mentioned plans to add access control. Can that be implemented while
 retaining the direct read capability and the performance improvement
 that you mentioned when announcing that feature? Doesn't look possible
 to me.

I have to be careful now what I write because only after Aegis became
public were NDA things lifted a bit on this stuff ... (not our fault)

What I reply is available in gitorious publicly anyway, so hopefully
it's all fine ;-)

Check the files tracker.postinst and tracker.aegis, *.aegis:
http://meego.gitorious.org/tracker/tracker/trees/harmattan/debian

With Aegis we give certified applications read access to meta.db using
the credential name=GID::metadata-users /.

The directory $HOME/.cache/tracker is protected to be readable only by
processes in that got this GID granted through Aegis. The library
libtracker-sparql, which enables SPARQL query capability, detects
whether or not the process has access and selects either D-Bus (over fd
passing) or direct-access to meta.db based on the UNIX file permissions.

It is true that once a process has permission to read meta.db, that you
can't hide it the data that is in meta.db. We had no plans to do in-db
encryption as this would just make things horribly slow (decrypting
cells in custom SQLite functions, no thanks).

On Harmattan a plan was to only give certified applications the right to
add the GID::metadata-users credential. Meaning that the source code is
known to the vendor, and known not to abuse meta.db.

We investigated whether storing meta.db on a AegisFS was an option. But
because AegisFS uses FUSE, it is notoriously slow on each syscall.

Would meta.db be stored on a AegisFS, then the data would also be
protected using encryption (to survive reboot to root-mode).

 The other question of course is about the schedule. Access control is
 needed for products *now*, not in some distant future. EDS doesn't have
 access control either, but I consider it much easier to add (already
 based on a daemon concept, much more limited scope).

The same daemon concept applies to Tracker's tracker-store. As access
to meta.db is regulated through Aegis's GID:: credential.


Cheers,

Philip

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] Some architecture changes (MSSF / Buteo / PIM storage)

2011-03-08 Thread Philip Van Hoof
On Tue, 2011-03-08 at 09:53 -0700, Clark, Joel wrote:

Hey Clark Joel,

  There are real-world scenarios (think mirroring sensitive enterprise or
  social web data) were an additional address book is needed, because data
  must not be mixed with the normal contacts.
 
 And systems that have multiple user profiles 

Preferably you use a Tracker per UNIX user. But also here can graphs be
used if necessary. I think EDS is similar in this regard.

We don't support system-wide Tracker instances; we lack a use-case for
that (and it sounds like a silly idea to us too).

  You mentioned plans to add access control. Can that be implemented while
 retaining the direct read capability and the 
 
 Also essential for systems with multiple user profiles and concurrent 
 multi-user access.

So as mentioned in earlier reply to Ohly Patrick, is user access to
meta.db regulated using UNIX file permissions using the
GID::metadata-users credential (which grants your process group-id
permissions).

When this read access isn't available then libtracker-sparql falls back
to FD passing over D-Bus automatically.

Unfortunately needs WAL also writable access to meta.db's directory (for
the journal files that it writes) but said GID:: credential should only
be granted to certified applications (if such security is of importance
for the integrator).


Cheers,

Philip

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines


Re: [MeeGo-dev] Some architecture changes (MSSF / Buteo / PIM storage)

2011-03-07 Thread Philip Van Hoof
On Mon, 2011-03-07 at 08:09 -0800, Arjan van de Ven wrote:

Hi Arjan,

 PIM storage
 ===
 The Address book, Calendar data and Email are currently stored in a 
 tracker database, and accessed (officially) via a QtMobility API set.
 There are a range of issues with this implementation, starting with the 
 complexity of adding privacy controls, the performance and
 scalability as well as the completeness for doing a proper syncml sync.

That's great!

Can you explain us, how does EDS improve privacy control compared to
Tracker's RDF store? Does EDS have a way to represent different graphs
(part of Tracker's future plans to do per-resource privacy control)?

How in EDS do you say that data (about) x isn't to be made
available to queries made by application y? It's the first time
that I hear that EDS has such a privacy related capabilities.

Can you also go a bit deeper into how EDS's performance compares to
Tracker's RDF store for comparable data entry operations? Have you guys
done measurements on this? Can the results of these measurements and the
method to reproduce be publicized? Are the queries you used available
somewhere? Where?

What do you mean with scalability? Scalable over multiple application
domains other than PIM? Or scalable to more data? And what kind of data?
I wonder how EDS is or can be more scalable when the underlying database
technology (SQLite, apparently) is identical.

In Tracker is each class stored in a decomposed table. This means that
contact and E-mail data are each in separate tables. Although decomposed
typically makes INSERT a bit slower, SELECT is a lot faster. How will or
does EDS make SELECT equally 'scalable' with equally complex queries?

Also note that the Tracker team optimizes both INSERT and SELECT
use-cases on a use-case per use-case basis, using its domainindexes and
indexes. Have you contacted the Tracker team about the scalability
problems (if any)?

Or is EDS only more scalable during data entry?

 Because of all these items and the available expertise, we have decided 
 to start replacing PIM storage with the Evolution Data Server.

Ah, I see.

 This change will land together with the SyncEvolution change (due to the 
 intimate relationship between the storage and sync of PIM data).
 This change should largely be invisible to applications since 
 applications are supposed to access this data via the appropriate QtMobility
 APIs.

That's not entirely true. Many applications are nowadays accessing said
data through the QSparql API of Harmattan. QtMobility isn't always the
ideal access-layer. Especially not for more complex use-cases.

Are you planning to support or implement a QSparql backend for EDS?

I'd be thrilled to have a query language like or as powerful as SPARQL
in EDS! In what timeframe can we expect this? Are you planning to use
existing SPARQL endpoint technology for this? Which?

 But to avoid setting precedents, the lower level components will 
 be removed from the architecture diagrams effective immediately.
 
 To be clear, this does not mean that tracker is completely removed; 
 tracker is still being used (together with tumbler) for indexing media
 on the device. At this point we are seeing serious issues 
 (performance/stability) with this solution, but the first attempt will 
 be to fix the
 deficiencies rather than a replacement.

At this moment is Tracker (via QSparql and earlier but deprecated
libqttracker) used on Harmattan within quite a wide variety of
(horizontal) applications.

Are the plans to only use Tracker for indexing? That's a fairly small
use-case of Tracker on Harmattan nowadays. Fremantle's metadata solution
of course had an almost exclusive focus on file metadata (indexing). But
that's not the case on Harmattan. What's the plan for those apps?



Cheers!

Philip

-- 


Philip Van Hoof
freelance software developer
Codeminded BVBA - http://codeminded.be

___
MeeGo-dev mailing list
MeeGo-dev@meego.com
http://lists.meego.com/listinfo/meego-dev
http://wiki.meego.com/Mailing_list_guidelines