I gather that your solr documents are the "Title Information" units. Have you considered making your Solr document collection be the "book information" units instead? Each "book information" document will have (yes, de-normalized) the same "title" information as all the other book documents belonging to the same 'title information'. You can even give each 'book information' document some kind of 'title information' id that can be used to fetch all 'book information' documents belonging to the same 'title information'. (If we just call these 'bibs' and 'holdings' this might be less confusing for us library people).
No doubt modelling things this way will bring it's own challenges, but it will solve the particular problems you mention, I believe. Solr is not an rdbms, and de-normalizing to the right level, so your solr documents represent the proper units of granularity for the kinds of queries you want to do, is usually the trick to getting solr to do what you want. The challenge, of course, is when the kinds of queries you want really require multiple different levels of granularity. I haven't found any great general purpose solutions to this problem, it's sort of the gap between what solr is good at and what an rdbms is good at. ________________________________________ From: Bob Sandiford [bob.sandif...@sirsidynix.com] Sent: Tuesday, November 23, 2010 7:26 PM To: solr-user@lucene.apache.org Subject: Special Parent / Child relationship - advice / observations welcome on how to approach this Hi, Long post - sorry... I have a relatively special case of a Parent / Child relationship that I'm trying to model. I'm currently using Solr 1.4.1 and Lucene 2.9.3 For example, my Parent documents represent "Title Information" (e.g. bibliographic information), and each Parent document can contain 0 or more Children, each child representing a physical copy of the Book information (e.g. think of a library with multiple branches, the child documents each represent a books (or other format) available at a given branch). What I *think* is special about this set up is that the only information I need Solr / Lucene to make use of is all Facet based, AND, I don't need facet counts for these child facets. So, for example, "Harry Potter and the Last Crusade" (J.K. Rowlings' upcoming block buster novel :)) has the following information. Location Format Main Book Main DVD Branch Book etc etc. This is a bit simplified - there are actually 5 fields involved for each child document, each of the five fields can be used individually (easy!) or in combination (much harder!) to refine a result set. With the relatively straightforward approach of having Location and Format as ordinary everyday facet fields, I certainly get results (although I ignore the facet counts). So, I search for the book, and, without any facets applied, get back facets like this: Location Main Branch Format Book DVD And, I can narrow by those, and things work - though logically there's a hole that I'm trying to fill. For example, suppose the user chooses to narrow by Location:Branch and Format:DVD. I still get a hit back - but I don't want one, because there isn't a child record that has both of those values. (The user is looking for DVD's at the Branch library, but the only DVD is at Main). I'm completely controlling both the indexing and searching side code - i.e. I can formulate any type of document content I want to be indexed, and I can parse the results before presenting them to the end users. One approach I've been thinking of is a brute force method of accomplishing this, using Facets and using the facet.prefix parameter in the query. So - I could generate 'facets' like this: Location_facet Main Branch Format_facet Book DVD Location-Format_facet Main-Book Main-DVD Branch-Book Format-Location_facet Book-Main Book-Branch DVD-Main When narrowing by a single facet (e.g. Location:Branch), it would be a usual facet search. Something like: &fq=Location_facet:"Branch" and I would request back facets like this &facet=true&facet.mincount=1&facet.field=Location-Format_facet&facet.prefix="Branch-" and then parse out the values returned in the Location-Format_facet to retrieve what follows the "Branch-" prefix, and those would be the facet values for the 'Format' facet presented to the users (so only 'Book' remains as a value). So - with only 2 fields, it's pretty straightforward. (It could be somewhat simplified from the above down to two facet fields instead of four - just keeping the paired facets, and not using the singleton facets, retrieving just one of those paired fields when no limiting is taking place, and parsing out the pairs for the Location and Format facets, and then when limiting on one element use facet.prefix, when limiting on both, again just choose one of the facets and look for the concatenated value...) However it gets more complex as I ramp up to 5 fields. (generally it requires n! individual facet fields, where 'n' is the number of underlying fields. i.e. with two fields, there are two facet type fields needed in the Solr/Lucene index to support this. With three fields, I could do this with 6 facets required. With 5 fields, there would be 5! = 120 required facets. That's getting a bit much... :) Hmmm... A little scribbling (ok, a fair bit of scribbling), and I can actually reduce that to 12 facet fields to cover the 5 fields. So, maybe that's not all that bad... Interesting... (I haven't actually coded up anything yet - this is all a paper-napkin level exercise...) The other thing I've done is perused various archived threads and some upcoming functionality regarding parent / child or hierarchical document strategies. But, I haven't found anything that would help me out much - at least not directly. I saw the Jira LUCENE-2454<https://issues.apache.org/jira/browse/LUCENE-2454> Nested Document Query Support. which looks from the slides overview to be structurally just what I would want - but indicates that there isn't the Query Parser support in place yet... (I.E. how to do a Solr query being able to relate child level queries either within the base query, or in the fq clause... So - my question (finally :)) is - does this logical problem seem resolvable, with an approach other than the brute force outlined above? I'm willing to dive into the Solr / Lucene code if that's what it will take - I'd just like an indication of what people think would be a good / possible approach before I get into that level... e.g. some way of providing to the Indexer a tuple of each found combination of the 5 values, and then doing something (what?) with searching for the facet queries Thanks! Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com<http://www.sirsidynix.com/>