RE: ToChildBlockJoinQuery question

McKinley, James T Wed, 21 Jan 2015 13:39:32 -0800

Hi Greg,

Thanks for responding to my question.  I added some extra conditions to the 
IndexRunnable run method, namely I required AGTY:np in the source query for the 
parent docs and required that both the creatorDocs and workDocs actually 
contain documents or else the addDocuments call would never be made:


                public void run() {
                        IndexSearcher searcher = new IndexSearcher(reader);

                        try {
                                int count = 0;
                                for (String crid : crids) {
                                        List<Document> docs = new ArrayList<>();
                                        
                                        BooleanQuery abidQuery = new 
BooleanQuery();
                                        abidQuery.add(new TermQuery(new 
Term("ABID", crid)), Occur.MUST);
                                        abidQuery.add(new TermQuery(new 
Term("AGPR", "true")), Occur.MUST);
                                        abidQuery.add(new TermQuery(new 
Term("AGTY", "np")), Occur.MUST);
                                        
                                        TermQuery cridQuery = new TermQuery(new 
Term("CRID", crid));
                                        
                                        TopDocs creatorDocs = 
searcher.search(abidQuery, Integer.MAX_VALUE);
                                        TopDocs workDocs = 
searcher.search(cridQuery, Integer.MAX_VALUE);
                                        
                                        if ((creatorDocs.scoreDocs.length > 0) 
&& (workDocs.scoreDocs.length > 0)) {
                                                for (int i = 0; i < 
workDocs.scoreDocs.length; i++) {
                                                        
docs.add(reader.document(workDocs.scoreDocs[i].doc));
                                                }
                                        
                                                
docs.add(reader.document(creatorDocs.scoreDocs[0].doc));
                                                
                                                writer.addDocuments(docs);
                                                if (++count % 100 == 0) {
                                                        System.out.println(id + 
" = " + count);
                                                        writer.commit();
                                                }
                                        }
                                }
                        } catch (IOException e) {
                                throw new RuntimeException(e);
                        }
                }

I then modified the runToChildBlockJoinQuery method to first perform a search 
with the parent query and parent filter. Then using the id of each parent named 
person document I did a query for the named works with that creator id 
(essentially reversing the query that was done to create the BlockJoin index) 
and I do indeed get works back for every named person that passes the parent 
query and filter.  However I still get the IllegalStateException complaining 
about a non-FixedBitSet doc id set when doing the ToChildBlockJoinQuery. Here 
is that code:

        private void runToChildBlockJoinQuery(String indexPath) throws 
IOException {
                FSDirectory dir = FSDirectory.open(new File(indexPath));
                IndexReader reader = DirectoryReader.open(dir);
                IndexSearcher searcher = new IndexSearcher(reader);
                
                TermQuery parentFilterQuery = new TermQuery(new Term("AGTY", 
"np"));
                TermQuery parentQuery = new TermQuery(new Term("NT", 
"american"));
                Filter parentFilter = new CachingWrapperFilter(new 
QueryWrapperFilter(parentFilterQuery));

                TopDocs creatorDocs = searcher.search(parentQuery, 
parentFilter, Integer.MAX_VALUE);
                
                for (ScoreDoc scoreDoc : creatorDocs.scoreDocs) {
                        String[] ids = 
reader.document(scoreDoc.doc).getValues("ABID");
                        BooleanQuery cridQuery = new BooleanQuery();
                        for (String id : ids) {
                                cridQuery.add(new TermQuery(new Term("CRID", 
id)), Occur.SHOULD);
                        }
                        TopDocs worksDocs = searcher.search(cridQuery, 
Integer.MAX_VALUE);
                        System.out.println(worksDocs.scoreDocs.length);
                }
                
                ToChildBlockJoinQuery tcbjq = new 
ToChildBlockJoinQuery(parentQuery, parentFilter, true);
                
                TopDocs worksDocs = searcher.search(tcbjq, Integer.MAX_VALUE);  
// ==> IllegalStateException
        }

So I think all the parent docs have child docs and they should have been 
indexed in the same addDocuments call with the parent being the last doc in the 
list.  Then, on a lark, I just made the parentFilterQuery and the parentQuery 
the same and still got the exception.

Am I understanding how this is supposed to work?  What I think I am (and should 
be) doing is providing a query and filter that specifies the parent docs and 
the ToChildBlockJoinQuery should return me all the child docs for the resulting 
parent docs.  Is this correct?  The reason I think I'm not understanding is 
that I don't see why I need both a filter and a query to specify the parent 
docs when a single query or filter should suffice.  Am I misunderstanding what 
parentQuery and parentFilter mean, they both refer to parent docs right?

I attempted to attach a small tar.gz file (< 1MB) to this message that 
contained a 100 parent index (~10,000 docs total) that gives the exception with 
my block join query, but the mailing list rejected my message, if there's a 
better place to send/upload this index let me know and I surely will.  Thanks 
again for any help.

Jim

________________________________________
From: Gregory Dearing [[email protected]]
Sent: Wednesday, January 21, 2015 1:01 PM
To: [email protected]
Subject: Re: ToChildBlockJoinQuery question

James,

I haven't actually ran your example, but I think the source problem is that
your source query ("NT:American") is hitting documents that have no
children.

The reason the exception is so weird is that one of your index segments
contains zero documents that match your filter.  Specifically, there's an
index segment containing docs matching "NT:american", but with no documents
matching "AGTY:np".

This will cause CachingWrapperFilter, which normally returns a FixedBitSet,
to instead return a generic "Empty" DocIdSet.  Which leads to the exception
from ToChildBlockJoinQuery.

The summary is, make sure that your source query only hits documents that
were actually added using 'addDocuments()'.  Since it looks like you're
extracting your block relationships from the existing index, that might
mean that you'll need to add some extra metadata to the newly created docs
instead of just cloning what already exists.

-Greg


On Wed, Jan 21, 2015 at 10:00 AM, McKinley, James T <
[email protected]> wrote:

> Hi,
>
> I'm attempting to use ToChildBlockJoinQuery in Lucene 4.8.1 by following
> Mike McCandless' blog post:
>
>
> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
>
> I have a set of child documents which are named works and a set of parent
> documents which are named persons that are the creators of the named
> works.  The parent document has a nationality and the child document does
> not.  I want to query the children (named works) limiting by the
> nationality of the parent (named person).  I've indexed the documents as
> follows (I'm pulling the docs from an existing index):
>
>         private void createNamedWorkIndex(String srcIndexPath, String
> destIndexPath) throws IOException {
>                 FSDirectory srcDir = FSDirectory.open(new
> File(srcIndexPath));
>                 FSDirectory destDir = FSDirectory.open(new
> File(destIndexPath));
>
>                 IndexReader reader = DirectoryReader.open(srcDir);
>
>                 Version version = Version.LUCENE_48;
>                 IndexWriterConfig conf = new IndexWriterConfig(version,
> new StandardTextAnalyzer(version));
>
>                 Set<String> crids = getCreatorIds(reader);
>
>                 String[] crida = crids.toArray(new String[crids.size()]);
>
>                 int numThreads = 24;
>                 ExecutorService executor =
> Executors.newFixedThreadPool(numThreads);
>
>                 int numCrids = crids.size();
>                 int batchSize = numCrids / numThreads;
>                 int remainder = numCrids % numThreads;
>
>                 System.out.println("Inserting work/creator blocks using "
> + numThreads + " threads...");
>                 try (IndexWriter writer = new IndexWriter(destDir, conf)){
>                         for (int i = 0; i < numThreads; i++) {
>                                 String[] cridRange;
>                                 if (i == numThreads - 1) {
>                                         cridRange =
> Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1) + remainder);
>                                 } else {
>                                         cridRange =
> Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1));
>                                 }
>                                 String id = "" + ((char)('A' + i));
>                                 Runnable indexer = new IndexRunnable(id ,
> reader, writer, new HashSet<String>(Arrays.asList(cridRange)));
>                                 executor.execute(indexer);
>                         }
>                         executor.shutdown();
>                         executor.awaitTermination(2, TimeUnit.HOURS);
>                 } catch (Exception e) {
>                         executor.shutdownNow();
>                         throw new RuntimeException(e);
>                 } finally {
>                         reader.close();
>                         srcDir.close();
>                         destDir.close();
>                 }
>
>                 System.out.println("Done!");
>         }
>
>         public static class IndexRunnable implements Runnable {
>                 private String id;
>                 private IndexReader reader;
>                 private IndexWriter writer;
>                 private Set<String> crids;
>
>                 public IndexRunnable(String id, IndexReader reader,
> IndexWriter writer, Set<String> crids) {
>                         this.id = id;
>                         this.reader = reader;
>                         this.writer = writer;
>                         this.crids = crids;
>                 }
>
>                 @Override
>                 public void run() {
>                         IndexSearcher searcher = new IndexSearcher(reader);
>
>                         try {
>                                 int count = 0;
>                                 for (String crid : crids) {
>                                         List<Document> docs = new
> ArrayList<>();
>
>                                         BooleanQuery abidQuery = new
> BooleanQuery();
>                                         abidQuery.add(new TermQuery(new
> Term("ABID", crid)), Occur.MUST);
>                                         abidQuery.add(new TermQuery(new
> Term("AGPR", "true")), Occur.MUST);
>
>                                         TermQuery cridQuery = new
> TermQuery(new Term("CRID", crid));
>
>                                         TopDocs creatorDocs =
> searcher.search(abidQuery, Integer.MAX_VALUE);
>                                         TopDocs workDocs =
> searcher.search(cridQuery, Integer.MAX_VALUE);
>
>                                         for (int i = 0; i <
> workDocs.scoreDocs.length; i++) {
>
> docs.add(reader.document(workDocs.scoreDocs[i].doc));
>                                         }
>
>                                         if (creatorDocs.scoreDocs.length >
> 0) {
>
> docs.add(reader.document(creatorDocs.scoreDocs[0].doc));
>                                         }
>
>                                         writer.addDocuments(docs);
>                                         if (++count % 100 == 0) {
>                                                 System.out.println(id + "
> = " + count);
>                                                 writer.commit();
>                                         }
>                                 }
>                         } catch (IOException e) {
>                                 throw new RuntimeException(e);
>                         }
>                 }
>         }
>
> I then attempt to perform a block join query as follows:
>
>         private void runToChildBlockJoinQuery(String indexPath) throws
> IOException {
>                 FSDirectory dir = FSDirectory.open(new File(indexPath));
>                 IndexReader reader = DirectoryReader.open(dir);
>                 IndexSearcher searcher = new IndexSearcher(reader);
>
>                 TermQuery parentQuery = new TermQuery(new Term("NT",
> "american"));
>                 TermQuery parentFilterQuery = new TermQuery(new
> Term("AGTY", "np"));
>                 Filter parentFilter = new CachingWrapperFilter(new
> QueryWrapperFilter(parentFilterQuery));
>
>                 ToChildBlockJoinQuery tcbjq = new
> ToChildBlockJoinQuery(parentQuery, parentFilter, true);
>
>                 TopDocs worksDocs = searcher.search(tcbjq, 20);
>
>                 displayWorks(reader, searcher, worksDocs);
>         }
>
> and I get the following exception:
>
> Exception in thread "main" java.lang.IllegalStateException: parentFilter
> must return FixedBitSet; got org.apache.lucene.util.WAH8DocIdSet@34e671de
>         at
> org.apache.lucene.search.join.ToChildBlockJoinQuery$ToChildBlockJoinWeight.scorer(ToChildBlockJoinQuery.java:148)
>         at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)
>         at
> BlockJoinQueryTester.runToChildBlockJoinQuery(BlockJoinQueryTester.java:73)
>         at BlockJoinQueryTester.main(BlockJoinQueryTester.java:40)
>
> I don't understand what I'm doing wrong and what a "FixedBitSet" is and
> why I don't get one out of my filter.  Is FixedBitSet a special kind of
> OpenBitSet and what does "fixed" mean in this context?  Thanks for any help.
>
> Jim
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: ToChildBlockJoinQuery question

Reply via email to