Re: [Nepomuk] Review Request: StoreResources: Add a flag to force duplicate detection in the graph

Vishesh Handa Mon, 08 Oct 2012 09:46:15 -0700


> On Oct. 8, 2012, 3:24 p.m., Christian Mollekopf wrote:
> > That's fine from the PIM side, but I'd still be interested where you'd want 
> > to avoid the duplicates merging. It seems like a crucial feature to me as 
> > soon as we have multiple applications operating on the same data, where we 
> > can't know which data is already present in the store and which isn't.
> > So it might make sense to change the semantics so you can disable the 
> > duplicates merging and have it on by default, as it seems to me more like a 
> > performance optimization for cases where we know that no duplicates are 
> > existing.
> > Otherwise we could render the whole database pretty quickly useless by 
> > creating a massive amount of duplicates.
> > 
> > Or am I just misunderstanding something?


Uhh. No.

The duplicate merging is only for the data that is not already present in 
Nepomuk. Basically duplicats in the SimpleResourceGraph that you provided. 
Example -

_:a a nao:Tag ;
    nao:identifier "Tag1" .

_:b a nao:Tag ;
    nao:identifier "Tag1" .

_:c a nfo:FileDataObject ;
    nao:hasTag _:a, _:b .

Case 1 : When Tag1 does not already exist + Flag off - In that case _:c will 
have 2 tags attached to it both of which have the same identifier but have 
different resource uris.

Case 2 : When Tag1 does not already exists + Flag on - In that cause the 
SimpleResourceGraph will be checked for duplicates, and _:a and _:b would have 
been found to be identical. So they would have been merged together into _:a. 
_:c would only contain 1 tag then. This is a pre-processing stuff. After this 
the entire normal identification process would run to determine if a tag with 
identifier "Tag1" already exists.

Case 2: When Tag1 does exist + Flag off - Both _:a and _:b will be identified 
to <nepomuk:/res/tag1-uri> and _:c will only have 1 tag

Does this make it clear?


- Vishesh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/106711/#review20081
-----------------------------------------------------------


On Oct. 3, 2012, 12:11 p.m., Vishesh Handa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/106711/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2012, 12:11 p.m.)
> 
> 
> Review request for Nepomuk, Christian Mollekopf and Sebastian Trueg.
> 
> 
> Description
> -------
> 
>     StoreResources: Add a flag to force duplicate detection in the graph
>     
>     By default each SimpleResource in the graph was always hash (an
>     expensive process) and then checked for duplicates with the other
>     SimpleResources in the graph.
>     
>     This feature was only added cause the PIM guys were pushing large
>     quantities of duplicate data. It doesn't make sense for everyone to pay
>     the penalty for one application.
>     
>     They can enable this feature with the MergeDuplicateResources flag.
> 
> 
> Diffs
> -----
> 
>   libnepomukcore/datamanagement/datamanagement.h 2ac60a5 
>   services/storage/datamanagementmodel.cpp 7c05cfd 
>   services/storage/test/datamanagementmodeltest.cpp 3d3340c 
> 
> Diff: http://git.reviewboard.kde.org/r/106711/diff/
> 
> 
> Testing
> -------
> 
> Updated the relevant tests
> 
> 
> Thanks,
> 
> Vishesh Handa
> 
>

_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk

Re: [Nepomuk] Review Request: StoreResources: Add a flag to force duplicate detection in the graph

Reply via email to