Virtuoso actually has support for indexing plug-ins. I did not look into this in depth yet but it would even be possible to plug in an external index like clucene and use it from within the query engine.

This, however, is a bit more involved. As a first step it might be worthwhile to look into the improvements that could be achieved by simply customizing the full text index in a way that
1. allows to query all relevant fields in one statement
2. still keep context information

I will, however, have to research this further to know the extend of work required to get to that point. Might be rather simple, might be harder.

Cheers,
Sebastian

On 05/04/2013 03:40 PM, Christian Mollekopf wrote:
On Saturday 04 May 2013 18.49:05 Vishesh Handa wrote:
Hey guys


I was thinking of moving all the plain text related to a file into the
nie:plainTextContent of the resource. So in the case of music we would have
-

<res> nie:plainTextContent "title artist album whatevereElse" .

for the case of files, we would append the file name, and any other plain
text that we want searched just in the nie:plainTextConent. So a search for
any combination of text will just have to search through the plain text
content.

Opinions?

Hey Vishesh,

I think that's a good idea. We're also already using it that way to be able to
search through emails with markup in the email feeder, and I see no reason why
we can't extend that to other resource types (after all the property is
exactly for this purpose).
So that means, in the future all feeders should push all information which
should be matched by full text searching to nie:plainTextContent, right?

The alternative would of course be to use a separate dedicated fulltext index,
which may have better performance, some more features (tokenizer, stemming
etc.), but would obviously complicate the setup again (fulltext query => i.e.
filter by type in nepomuk => retrieve akonadi item). So not necessarily the way
to go, but I wanted to bring it on the table anyways as it's IMO not
conflicting with what nepomuk provides (the semantic analysis), and could
result in better results (performance and feature wise) than letting virtuoso
doing all the work.


We can easily do this for the 4.11 release cause we already need everyone
to re-index everything cause of the migration.

Cool.

Cheers,
Christian
_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk

_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk

Reply via email to