Re: ToChildBlockJoinQuery question
On 1/21/2015 6:59 PM, Gregory Dearing wrote: Jim, I think you hit the nail on the head... that's not what BlockJoinQueries do. If you're wanting to search for children and join to their parents... then use ToParentBlockJoinQuery, with a query that matches the set of children and a filter that matches the set of parents. If you're searching for parents, then joining to their children... then use ToChildBlockJoinQuery, with a query that matches the set of parents and a filter that matches the set of children. When you add related documents to the index (via addDocuments), make that children are added before their parents. The reason all the above is necessary is that it makes it possible to have a nested hierarchy of relationships (ie. Parents have Children, which have Children of their own). You need a query to indicate which part of the hierarchy you're starting from, and a filter indicating which part of the hierarchy you're joining to. Also, you will always get an exception if your query and your filter both match the same document. A child can't be its own parent. That's true for the existing implementation, but seems unnecessary from what I can tell. See https://github.com/safarijv/ifpress-solr-plugin/blob/master/src/main/java/com/ifactory/press/db/solr/search/SafariBlockJoinQuery.java for a variant that allows a child to be its own parent. -Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: ToChildBlockJoinQuery question
Jim, I think you hit the nail on the head... that's not what BlockJoinQueries do. If you're wanting to search for children and join to their parents... then use ToParentBlockJoinQuery, with a query that matches the set of children and a filter that matches the set of parents. If you're searching for parents, then joining to their children... then use ToChildBlockJoinQuery, with a query that matches the set of parents and a filter that matches the set of children. When you add related documents to the index (via addDocuments), make that children are added before their parents. The reason all the above is necessary is that it makes it possible to have a nested hierarchy of relationships (ie. Parents have Children, which have Children of their own). You need a query to indicate which part of the hierarchy you're starting from, and a filter indicating which part of the hierarchy you're joining to. Also, you will always get an exception if your query and your filter both match the same document. A child can't be its own parent. BlockJoin is a very powerful feature, but what it's really doing is modelling relationships using an index that doesn't know what a relationship is. The relationships are determined by a combination of the order that you indexed the block, and the format of your query. This disjoin can lead to some weird behavior if you're not absolutely sure how it works. Thanks, Greg On Wed, Jan 21, 2015 at 4:34 PM, McKinley, James T < james.mckin...@cengage.com> wrote: > > Am I understanding how this is supposed to work? What I think I am (and > should be) doing is providing a query and filter that specifies the parent > docs and the ToChildBlockJoinQuery should return me all the child docs for > the resulting parent docs. Is this correct? The reason I think I'm not > understanding is that I don't see why I need both a filter and a query to > specify the parent docs when a single query or filter should suffice. Am I > misunderstanding what parentQuery and parentFilter mean, they both refer to > parent docs right? > > Jim >
RE: ToChildBlockJoinQuery question
Hi Greg, Thanks for responding to my question. I added some extra conditions to the IndexRunnable run method, namely I required AGTY:np in the source query for the parent docs and required that both the creatorDocs and workDocs actually contain documents or else the addDocuments call would never be made: public void run() { IndexSearcher searcher = new IndexSearcher(reader); try { int count = 0; for (String crid : crids) { List docs = new ArrayList<>(); BooleanQuery abidQuery = new BooleanQuery(); abidQuery.add(new TermQuery(new Term("ABID", crid)), Occur.MUST); abidQuery.add(new TermQuery(new Term("AGPR", "true")), Occur.MUST); abidQuery.add(new TermQuery(new Term("AGTY", "np")), Occur.MUST); TermQuery cridQuery = new TermQuery(new Term("CRID", crid)); TopDocs creatorDocs = searcher.search(abidQuery, Integer.MAX_VALUE); TopDocs workDocs = searcher.search(cridQuery, Integer.MAX_VALUE); if ((creatorDocs.scoreDocs.length > 0) && (workDocs.scoreDocs.length > 0)) { for (int i = 0; i < workDocs.scoreDocs.length; i++) { docs.add(reader.document(workDocs.scoreDocs[i].doc)); } docs.add(reader.document(creatorDocs.scoreDocs[0].doc)); writer.addDocuments(docs); if (++count % 100 == 0) { System.out.println(id + " = " + count); writer.commit(); } } } } catch (IOException e) { throw new RuntimeException(e); } } I then modified the runToChildBlockJoinQuery method to first perform a search with the parent query and parent filter. Then using the id of each parent named person document I did a query for the named works with that creator id (essentially reversing the query that was done to create the BlockJoin index) and I do indeed get works back for every named person that passes the parent query and filter. However I still get the IllegalStateException complaining about a non-FixedBitSet doc id set when doing the ToChildBlockJoinQuery. Here is that code: private void runToChildBlockJoinQuery(String indexPath) throws IOException { FSDirectory dir = FSDirectory.open(new File(indexPath)); IndexReader reader = DirectoryReader.open(dir); IndexSearcher searcher = new IndexSearcher(reader); TermQuery parentFilterQuery = new TermQuery(new Term("AGTY", "np")); TermQuery parentQuery = new TermQuery(new Term("NT", "american")); Filter parentFilter = new CachingWrapperFilter(new QueryWrapperFilter(parentFilterQuery)); TopDocs creatorDocs = searcher.search(parentQuery, parentFilter, Integer.MAX_VALUE); for (ScoreDoc scoreDoc : creatorDocs.scoreDocs) { String[] ids = reader.document(scoreDoc.doc).getValues("ABID"); BooleanQuery cridQuery = new BooleanQuery(); for (String id : ids) { cridQuery.add(new TermQuery(new Term("CRID", id)), Occur.SHOULD); } TopDocs worksDocs = searcher.search(cridQuery, Integer.MAX_VALUE); System.out.println(worksDocs.scoreDocs.length); } ToChildBlockJoinQuery tcbjq = new ToChildBlockJoinQuery(parentQuery, parentFilter, true); TopDocs worksDocs = searcher.search(tcbjq, Integer.MAX_VALUE); // ==> IllegalStateException } So I think all the parent docs have child docs and they should have been indexed in the same addDocuments call with the parent being the last doc in the list. Then, on a lark, I just
Re: ToChildBlockJoinQuery question
James, I haven't actually ran your example, but I think the source problem is that your source query ("NT:American") is hitting documents that have no children. The reason the exception is so weird is that one of your index segments contains zero documents that match your filter. Specifically, there's an index segment containing docs matching "NT:american", but with no documents matching "AGTY:np". This will cause CachingWrapperFilter, which normally returns a FixedBitSet, to instead return a generic "Empty" DocIdSet. Which leads to the exception from ToChildBlockJoinQuery. The summary is, make sure that your source query only hits documents that were actually added using 'addDocuments()'. Since it looks like you're extracting your block relationships from the existing index, that might mean that you'll need to add some extra metadata to the newly created docs instead of just cloning what already exists. -Greg On Wed, Jan 21, 2015 at 10:00 AM, McKinley, James T < james.mckin...@cengage.com> wrote: > Hi, > > I'm attempting to use ToChildBlockJoinQuery in Lucene 4.8.1 by following > Mike McCandless' blog post: > > > http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html > > I have a set of child documents which are named works and a set of parent > documents which are named persons that are the creators of the named > works. The parent document has a nationality and the child document does > not. I want to query the children (named works) limiting by the > nationality of the parent (named person). I've indexed the documents as > follows (I'm pulling the docs from an existing index): > > private void createNamedWorkIndex(String srcIndexPath, String > destIndexPath) throws IOException { > FSDirectory srcDir = FSDirectory.open(new > File(srcIndexPath)); > FSDirectory destDir = FSDirectory.open(new > File(destIndexPath)); > > IndexReader reader = DirectoryReader.open(srcDir); > > Version version = Version.LUCENE_48; > IndexWriterConfig conf = new IndexWriterConfig(version, > new StandardTextAnalyzer(version)); > > Set crids = getCreatorIds(reader); > > String[] crida = crids.toArray(new String[crids.size()]); > > int numThreads = 24; > ExecutorService executor = > Executors.newFixedThreadPool(numThreads); > > int numCrids = crids.size(); > int batchSize = numCrids / numThreads; > int remainder = numCrids % numThreads; > > System.out.println("Inserting work/creator blocks using " > + numThreads + " threads..."); > try (IndexWriter writer = new IndexWriter(destDir, conf)){ > for (int i = 0; i < numThreads; i++) { > String[] cridRange; > if (i == numThreads - 1) { > cridRange = > Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1) + remainder); > } else { > cridRange = > Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1)); > } > String id = "" + ((char)('A' + i)); > Runnable indexer = new IndexRunnable(id , > reader, writer, new HashSet(Arrays.asList(cridRange))); > executor.execute(indexer); > } > executor.shutdown(); > executor.awaitTermination(2, TimeUnit.HOURS); > } catch (Exception e) { > executor.shutdownNow(); > throw new RuntimeException(e); > } finally { > reader.close(); > srcDir.close(); > destDir.close(); > } > > System.out.println("Done!"); > } > > public static class IndexRunnable implements Runnable { > private String id; > private IndexReader reader; > private IndexWriter writer; > private Set crids; > > public IndexRunnable(String id, IndexReader reader, > IndexWriter writer, Set crids) { > this.id = id; > this.reader = reader; > this.writer = writer; > this.crids = crids; > } > > @Override > public void run() { > IndexSearcher searcher = new IndexSearcher(reader); > > try { > int count = 0; > for (String crid : crids) { > List docs = new > ArrayList<>(); > >
Re: MultiPhraseQuery:Rewrite to BooleanQuery
I'm asking this, because QueryBuilder.createFieldQuery in some cases returns MultiPhraseQuery. I need to know on which terms from MultiPhraseQuery match is present. Explanation doesn't give answer on this question. It only returns string, based on these terms, see MultiPhraseQuery.toStirng() method, like: (termA termB) termC it can be "termA termC" OR "termB termC" So my question is, how can I rewrite MultiPhraseQuery to BooleanQuery with PhraseQuery clauses or something else to get matched terms. Can it possible at all and will these queries equal (scoring, boosting, etc). Thanks. 2015-01-21 17:06 GMT+02:00 Ian Lea : > Are you asking if your two suggestions > > 1) a MultiPhraseQuery or > > 2) a BooleanQuery made up of multiple PhraseQuery instances > > are equivalent? If so, I'd say that they could be if you build them > carefully enough. For the specific examples you show I'd say not and > would wonder if you get correct hits, particularly for your > MultiPhraseQuery which looks wrong to me, based on my reading of the > javadoc. But I haven't tried or tested your code - I assume you have. > > > If you are asking something else, please explain more clearly. > > -- > Ian. > > > On Wed, Jan 21, 2015 at 2:50 PM, ku3ia wrote: > > ku3ia wrote > >> Hi folks! > >> I have a multiphrase query, for example, from units: > >> > >> Directory indexStore = newDirectory(); > >> RandomIndexWriter writer = new RandomIndexWriter(random(), indexStore); > >> add("blueberry chocolate pie", writer); > >> add("blueberry chocolate tart", writer); > >> IndexReader r = writer.getReader(); > >> writer.close(); > >> > >> IndexSearcher searcher = newSearcher(r); > >> MultiPhraseQuery q = new MultiPhraseQuery(); > >> q.add(new Term("body", "blueberry")); > >> q.add(new Term("body", "chocolate")); > >> q.add(new Term[] {new Term("body", "pie"), new Term("body", "tart")}); > >> assertEquals(2, searcher.search(q, 1).totalHits); > >> r.close(); > >> indexStore.close(); > >> > >> I need to know on which phrase query will be match. Explanation doesn't > >> return exact information, only that is match by this query. So can I > >> rewrite this query to Boolean?, like > >> > >> BooleanQuery q = new BooleanQuery(); > >> > >> PhraseQuery pq1 = new PhraseQuery(); > >> pq1.add(new Term("body", "blueberry")); > >> pq1.add(new Term("body", "chocolate")); > >> pq1.add(new Term("body", "pie")); > >> q.add(pq1, BooleanClause.Occur.SHOULD); > >> > >> PhraseQuery pq2 = new PhraseQuery(); > >> pq2.add(new Term("body", "blueberry")); > >> pq2.add(new Term("body", "chocolate")); > >> pq2.add(new Term("body", "tart")); > >> q.add(pq2, BooleanClause.Occur.SHOULD); > >> > >> In this case I'll exact know on which query I have a match. But main > >> querstion is, Is this rewrite is equal/true? > >> Thanks. > >> > >> -- > >> dennis yermakov > >> mailto: > > > >> demesg@ > > > > Any ideas? > > > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/MultiPhraseQuery-Rewrite-to-BooleanQuery-tp4178898p4180863.html > > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- dennis yermakov mailto: dem...@gmail.com
Re: MultiPhraseQuery:Rewrite to BooleanQuery
Are you asking if your two suggestions 1) a MultiPhraseQuery or 2) a BooleanQuery made up of multiple PhraseQuery instances are equivalent? If so, I'd say that they could be if you build them carefully enough. For the specific examples you show I'd say not and would wonder if you get correct hits, particularly for your MultiPhraseQuery which looks wrong to me, based on my reading of the javadoc. But I haven't tried or tested your code - I assume you have. If you are asking something else, please explain more clearly. -- Ian. On Wed, Jan 21, 2015 at 2:50 PM, ku3ia wrote: > ku3ia wrote >> Hi folks! >> I have a multiphrase query, for example, from units: >> >> Directory indexStore = newDirectory(); >> RandomIndexWriter writer = new RandomIndexWriter(random(), indexStore); >> add("blueberry chocolate pie", writer); >> add("blueberry chocolate tart", writer); >> IndexReader r = writer.getReader(); >> writer.close(); >> >> IndexSearcher searcher = newSearcher(r); >> MultiPhraseQuery q = new MultiPhraseQuery(); >> q.add(new Term("body", "blueberry")); >> q.add(new Term("body", "chocolate")); >> q.add(new Term[] {new Term("body", "pie"), new Term("body", "tart")}); >> assertEquals(2, searcher.search(q, 1).totalHits); >> r.close(); >> indexStore.close(); >> >> I need to know on which phrase query will be match. Explanation doesn't >> return exact information, only that is match by this query. So can I >> rewrite this query to Boolean?, like >> >> BooleanQuery q = new BooleanQuery(); >> >> PhraseQuery pq1 = new PhraseQuery(); >> pq1.add(new Term("body", "blueberry")); >> pq1.add(new Term("body", "chocolate")); >> pq1.add(new Term("body", "pie")); >> q.add(pq1, BooleanClause.Occur.SHOULD); >> >> PhraseQuery pq2 = new PhraseQuery(); >> pq2.add(new Term("body", "blueberry")); >> pq2.add(new Term("body", "chocolate")); >> pq2.add(new Term("body", "tart")); >> q.add(pq2, BooleanClause.Occur.SHOULD); >> >> In this case I'll exact know on which query I have a match. But main >> querstion is, Is this rewrite is equal/true? >> Thanks. >> >> -- >> dennis yermakov >> mailto: > >> demesg@ > > Any ideas? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/MultiPhraseQuery-Rewrite-to-BooleanQuery-tp4178898p4180863.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
ToChildBlockJoinQuery question
Hi, I'm attempting to use ToChildBlockJoinQuery in Lucene 4.8.1 by following Mike McCandless' blog post: http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html I have a set of child documents which are named works and a set of parent documents which are named persons that are the creators of the named works. The parent document has a nationality and the child document does not. I want to query the children (named works) limiting by the nationality of the parent (named person). I've indexed the documents as follows (I'm pulling the docs from an existing index): private void createNamedWorkIndex(String srcIndexPath, String destIndexPath) throws IOException { FSDirectory srcDir = FSDirectory.open(new File(srcIndexPath)); FSDirectory destDir = FSDirectory.open(new File(destIndexPath)); IndexReader reader = DirectoryReader.open(srcDir); Version version = Version.LUCENE_48; IndexWriterConfig conf = new IndexWriterConfig(version, new StandardTextAnalyzer(version)); Set crids = getCreatorIds(reader); String[] crida = crids.toArray(new String[crids.size()]); int numThreads = 24; ExecutorService executor = Executors.newFixedThreadPool(numThreads); int numCrids = crids.size(); int batchSize = numCrids / numThreads; int remainder = numCrids % numThreads; System.out.println("Inserting work/creator blocks using " + numThreads + " threads..."); try (IndexWriter writer = new IndexWriter(destDir, conf)){ for (int i = 0; i < numThreads; i++) { String[] cridRange; if (i == numThreads - 1) { cridRange = Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1) + remainder); } else { cridRange = Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1)); } String id = "" + ((char)('A' + i)); Runnable indexer = new IndexRunnable(id , reader, writer, new HashSet(Arrays.asList(cridRange))); executor.execute(indexer); } executor.shutdown(); executor.awaitTermination(2, TimeUnit.HOURS); } catch (Exception e) { executor.shutdownNow(); throw new RuntimeException(e); } finally { reader.close(); srcDir.close(); destDir.close(); } System.out.println("Done!"); } public static class IndexRunnable implements Runnable { private String id; private IndexReader reader; private IndexWriter writer; private Set crids; public IndexRunnable(String id, IndexReader reader, IndexWriter writer, Set crids) { this.id = id; this.reader = reader; this.writer = writer; this.crids = crids; } @Override public void run() { IndexSearcher searcher = new IndexSearcher(reader); try { int count = 0; for (String crid : crids) { List docs = new ArrayList<>(); BooleanQuery abidQuery = new BooleanQuery(); abidQuery.add(new TermQuery(new Term("ABID", crid)), Occur.MUST); abidQuery.add(new TermQuery(new Term("AGPR", "true")), Occur.MUST); TermQuery cridQuery = new TermQuery(new Term("CRID", crid)); TopDocs creatorDocs = searcher.search(abidQuery, Integer.MAX_VALUE); TopDocs workDocs = searcher.search(cridQuery, Integer.MAX_VALUE); for (int i = 0; i < workDocs.scoreDocs.length; i++) { docs.add(reader.document(workDocs.scoreDocs[i].doc)); }
Re: MultiPhraseQuery:Rewrite to BooleanQuery
ku3ia wrote > Hi folks! > I have a multiphrase query, for example, from units: > > Directory indexStore = newDirectory(); > RandomIndexWriter writer = new RandomIndexWriter(random(), indexStore); > add("blueberry chocolate pie", writer); > add("blueberry chocolate tart", writer); > IndexReader r = writer.getReader(); > writer.close(); > > IndexSearcher searcher = newSearcher(r); > MultiPhraseQuery q = new MultiPhraseQuery(); > q.add(new Term("body", "blueberry")); > q.add(new Term("body", "chocolate")); > q.add(new Term[] {new Term("body", "pie"), new Term("body", "tart")}); > assertEquals(2, searcher.search(q, 1).totalHits); > r.close(); > indexStore.close(); > > I need to know on which phrase query will be match. Explanation doesn't > return exact information, only that is match by this query. So can I > rewrite this query to Boolean?, like > > BooleanQuery q = new BooleanQuery(); > > PhraseQuery pq1 = new PhraseQuery(); > pq1.add(new Term("body", "blueberry")); > pq1.add(new Term("body", "chocolate")); > pq1.add(new Term("body", "pie")); > q.add(pq1, BooleanClause.Occur.SHOULD); > > PhraseQuery pq2 = new PhraseQuery(); > pq2.add(new Term("body", "blueberry")); > pq2.add(new Term("body", "chocolate")); > pq2.add(new Term("body", "tart")); > q.add(pq2, BooleanClause.Occur.SHOULD); > > In this case I'll exact know on which query I have a match. But main > querstion is, Is this rewrite is equal/true? > Thanks. > > -- > dennis yermakov > mailto: > demesg@ Any ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/MultiPhraseQuery-Rewrite-to-BooleanQuery-tp4178898p4180863.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org