I started writing
http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema . I need help
with two things:

* test cases http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HTestCases
* if time permits, review the proposal, especially
http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HAMixedApproach
.

Thanks,
Marius


On Fri, Oct 11, 2013 at 12:55 PM, Marius Dumitru Florea
<[email protected]> wrote:
> Hi devs,
>
> This is a very important question so think carefully. Let me explain:
>
> In XWiki (model) we have a few entity types. There are *wikis* which
> have *spaces* which have *documents*. A document can have *objects*
> and *attachments*. A document can also define a *class*.
>
> At the same time we like to say that in XWiki "everything is a
> document" because everything revolves around documents. The document
> is the central notion.
>
> We can query the database (using HQL or XWQL) for any of the
> previously mentioned entities but what should a Solr query return
> (semantically)? In other words:
>
> * are you searching for an object without caring about the document
> that holds the object? Same for an object property.
> * how often are you searching for an attachment without caring about
> the document that holds the attachment?
> * are you searching for a class or for the document that defines that class?
> * are you searching for a wiki without caring about the documents it
> contains? Same for a space.
>
> IMO the result of a Solr query should be, semantically, a list of
> documents. But maybe I'm wrong.
>
> -----------------------
> Technical Details
> -----------------------
>
> Unlike a relational database, Solr/Lucene index has a single 'table'.
> So normally you index a single entity type. Each row in the index
> represents an entity of that type. As a consequence the result of a
> Solr query is semantically a list of entities of that type. In our
> case the entity type is (naturally) *document*.
>
> If you want to index more entity types (e.g. index attachments and
> objects _separately_, not as part of a document) then, since there is
> only one 'table' in the index, you need to add a 'type' column that
> specifies the type of entity you have on each row (e.g. type=document,
> type=attachment, type=object etc.). The result of a Solr query is now,
> semantically, a list of different entity types, unless you filter by a
> specific type. It smells like a hack to me.
>
> Let's imagine what happens if we want to search for blog posts that
> has a specific tag. With the first approach this is easy because all
> the (indexed) information is on a single row. With the second approach
> this is considerably more complex because the information is spread on
> multiple rows:
>
> * one row with type=document for the blog post document
> * one row with type=object for the blog post object
> * one row with type=object for the tab object
>
> In a relational database when you have the information spread in
> multiple places (tables) you do joins. Fortunately (you would says)
> Solr supports joins. In this particular case we would have to perform
> 2 joins which means:
>
> index X index X index
>
> where X represents the cartesian product. The document name would be
> the join key. Pretty complex even before trying to write this in Solr
> query syntax..
>
> So basically the question becomes: is it worth indexing more entities
> _separately_ instead of indexing just documents (with info about their
> objects and attachments) considering the complexity that it brings in
> writing Solr queries? Do we search for objects and attachments alone
> as separate entities often enough to justify this complexity? My
> answer is no.
>
> Thanks,
> Marius
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to