Faceted search using RDF-triple like related documents

peos Mon, 10 Feb 2014 14:32:19 -0800

Hi,

We are using ElasticSearch for navigating through our product catalog. We 
have fairly simple documents like:

{
"_index": "catalog",
"_type": "product",
"_id": "476",
"_score": 1,
"_source": {
"id": 476,
"description": "Product description",
"a8": "100 mm",
"a12": "250 g",
"categories": [
8,
4213
]
}
}

where every product has the following attributes:

- id, unique identifier;
- description, a short description;
- a*, custom defined attributes;
- categories, the categories the product is linked to.

We've added queries (including autocomplete), filters and facets so far and
it works really great.

So lately we've added a new feature where users can add RDF-triple like
relations between products using custom predicates. E.g.

1. <product x> is an alternative for <product y>;
2. <product x> is a dispenser for <product y>;
3. etc.

My question is about the second example where products are dispensers for
other products.

We want the user to be able to find disposables using both the disposable
product attributes as well as the linked dispenser product attributes.
Example:

For every printer there are different toners available (e.g. different
capacities, different brands, etc.) and several printers can use the same
toner. When trying to find a toner we want the user to be able to select
both attributes of the toners as well as attributes of the printers linked
to the toners. So when the user selects the brand "Brother" for the toner
brand facet, only "Brother" toners are shown. But when the user selects
"Brother" as a filter for the printer brand facet, all toners that are
suited for the printer are shown, regardless of the toner brand.

So how would this translate in a document design in ES. As both the
dispenser and disposable products are documents within ES we could only
store references on each document categorized on the custom predicate like:

{
"_index": "catalog",
"_type": "product",
"_id": "476",
"_score": 1,
"_source": {
"id": 476,
"description": "Product description",
"a8": "100 mm",
"a12": "250 g",
"categories": [
8,
4213
],
* "<predicate_p>": [*
* <product_id_x>,*
* <product_id_y>*
* ]*
}
}

However when also wanting to represent a facet result count that makes
sense for both dispenser and disposable, meaning the count for both types
of products are based on the resulting disposables, this would probably not
work. We would first need to filter using the dispenser followed by the
disposable, showing different counts for both the dispenser and disposable
attributes.

Another option would be storing the whole related document(s) under the
predicate defined for every document. This means a huge expansion of the
index and a lot of repetition in all data that would make the maintenance
of the documents a lot more complex.

So what would be a best practice solution for this scenario? Or could it be
that we are looking at the wrong type of storage (document store) for this
kind of question (graph database?).

Any idea on this would be very welcome. Thank you in advance!

Cheers,

Peter

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6d379b8e-4452-4ced-a025-8dd80e22fc10%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Faceted search using RDF-triple like related documents

Reply via email to