> On Oct. 8, 2012, 3:24 p.m., Christian Mollekopf wrote:
> > That's fine from the PIM side, but I'd still be interested where you'd want
> > to avoid the duplicates merging. It seems like a crucial feature to me as
> > soon as we have multiple applications operating on the same data, where we
> > can't know which data is already present in the store and which isn't.
> > So it might make sense to change the semantics so you can disable the
> > duplicates merging and have it on by default, as it seems to me more like a
> > performance optimization for cases where we know that no duplicates are
> > existing.
> > Otherwise we could render the whole database pretty quickly useless by
> > creating a massive amount of duplicates.
> >
> > Or am I just misunderstanding something?
Uhh. No.
The duplicate merging is only for the data that is not already present in
Nepomuk. Basically duplicats in the SimpleResourceGraph that you provided.
Example -
_:a a nao:Tag ;
nao:identifier "Tag1" .
_:b a nao:Tag ;
nao:identifier "Tag1" .
_:c a nfo:FileDataObject ;
nao:hasTag _:a, _:b .
Case 1 : When Tag1 does not already exist + Flag off - In that case _:c will
have 2 tags attached to it both of which have the same identifier but have
different resource uris.
Case 2 : When Tag1 does not already exists + Flag on - In that cause the
SimpleResourceGraph will be checked for duplicates, and _:a and _:b would have
been found to be identical. So they would have been merged together into _:a.
_:c would only contain 1 tag then. This is a pre-processing stuff. After this
the entire normal identification process would run to determine if a tag with
identifier "Tag1" already exists.
Case 2: When Tag1 does exist + Flag off - Both _:a and _:b will be identified
to <nepomuk:/res/tag1-uri> and _:c will only have 1 tag
Does this make it clear?
- Vishesh
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/106711/#review20081
-----------------------------------------------------------
On Oct. 3, 2012, 12:11 p.m., Vishesh Handa wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/106711/
> -----------------------------------------------------------
>
> (Updated Oct. 3, 2012, 12:11 p.m.)
>
>
> Review request for Nepomuk, Christian Mollekopf and Sebastian Trueg.
>
>
> Description
> -------
>
> StoreResources: Add a flag to force duplicate detection in the graph
>
> By default each SimpleResource in the graph was always hash (an
> expensive process) and then checked for duplicates with the other
> SimpleResources in the graph.
>
> This feature was only added cause the PIM guys were pushing large
> quantities of duplicate data. It doesn't make sense for everyone to pay
> the penalty for one application.
>
> They can enable this feature with the MergeDuplicateResources flag.
>
>
> Diffs
> -----
>
> libnepomukcore/datamanagement/datamanagement.h 2ac60a5
> services/storage/datamanagementmodel.cpp 7c05cfd
> services/storage/test/datamanagementmodeltest.cpp 3d3340c
>
> Diff: http://git.reviewboard.kde.org/r/106711/diff/
>
>
> Testing
> -------
>
> Updated the relevant tests
>
>
> Thanks,
>
> Vishesh Handa
>
>
_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk