Hi Greg,
Thanks for responding to my question. I added some extra conditions to the
IndexRunnable run method, namely I required AGTY:np in the source query for the
parent docs and required that both the creatorDocs and workDocs actually
contain documents or else the addDocuments call would never be made:
public void run() {
IndexSearcher searcher = new IndexSearcher(reader);
try {
int count = 0;
for (String crid : crids) {
List<Document> docs = new ArrayList<>();
BooleanQuery abidQuery = new
BooleanQuery();
abidQuery.add(new TermQuery(new
Term("ABID", crid)), Occur.MUST);
abidQuery.add(new TermQuery(new
Term("AGPR", "true")), Occur.MUST);
abidQuery.add(new TermQuery(new
Term("AGTY", "np")), Occur.MUST);
TermQuery cridQuery = new TermQuery(new
Term("CRID", crid));
TopDocs creatorDocs =
searcher.search(abidQuery, Integer.MAX_VALUE);
TopDocs workDocs =
searcher.search(cridQuery, Integer.MAX_VALUE);
if ((creatorDocs.scoreDocs.length > 0)
&& (workDocs.scoreDocs.length > 0)) {
for (int i = 0; i <
workDocs.scoreDocs.length; i++) {
docs.add(reader.document(workDocs.scoreDocs[i].doc));
}
docs.add(reader.document(creatorDocs.scoreDocs[0].doc));
writer.addDocuments(docs);
if (++count % 100 == 0) {
System.out.println(id +
" = " + count);
writer.commit();
}
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}
I then modified the runToChildBlockJoinQuery method to first perform a search
with the parent query and parent filter. Then using the id of each parent named
person document I did a query for the named works with that creator id
(essentially reversing the query that was done to create the BlockJoin index)
and I do indeed get works back for every named person that passes the parent
query and filter. However I still get the IllegalStateException complaining
about a non-FixedBitSet doc id set when doing the ToChildBlockJoinQuery. Here
is that code:
private void runToChildBlockJoinQuery(String indexPath) throws
IOException {
FSDirectory dir = FSDirectory.open(new File(indexPath));
IndexReader reader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(reader);
TermQuery parentFilterQuery = new TermQuery(new Term("AGTY",
"np"));
TermQuery parentQuery = new TermQuery(new Term("NT",
"american"));
Filter parentFilter = new CachingWrapperFilter(new
QueryWrapperFilter(parentFilterQuery));
TopDocs creatorDocs = searcher.search(parentQuery,
parentFilter, Integer.MAX_VALUE);
for (ScoreDoc scoreDoc : creatorDocs.scoreDocs) {
String[] ids =
reader.document(scoreDoc.doc).getValues("ABID");
BooleanQuery cridQuery = new BooleanQuery();
for (String id : ids) {
cridQuery.add(new TermQuery(new Term("CRID",
id)), Occur.SHOULD);
}
TopDocs worksDocs = searcher.search(cridQuery,
Integer.MAX_VALUE);
System.out.println(worksDocs.scoreDocs.length);
}
ToChildBlockJoinQuery tcbjq = new
ToChildBlockJoinQuery(parentQuery, parentFilter, true);
TopDocs worksDocs = searcher.search(tcbjq, Integer.MAX_VALUE);
// ==> IllegalStateException
}
So I think all the parent docs have child docs and they should have been
indexed in the same addDocuments call with the parent being the last doc in the
list. Then, on a lark, I just made the parentFilterQuery and the parentQuery
the same and still got the exception.
Am I understanding how this is supposed to work? What I think I am (and should
be) doing is providing a query and filter that specifies the parent docs and
the ToChildBlockJoinQuery should return me all the child docs for the resulting
parent docs. Is this correct? The reason I think I'm not understanding is
that I don't see why I need both a filter and a query to specify the parent
docs when a single query or filter should suffice. Am I misunderstanding what
parentQuery and parentFilter mean, they both refer to parent docs right?
I attempted to attach a small tar.gz file (< 1MB) to this message that
contained a 100 parent index (~10,000 docs total) that gives the exception with
my block join query, but the mailing list rejected my message, if there's a
better place to send/upload this index let me know and I surely will. Thanks
again for any help.
Jim
________________________________________
From: Gregory Dearing [[email protected]]
Sent: Wednesday, January 21, 2015 1:01 PM
To: [email protected]
Subject: Re: ToChildBlockJoinQuery question
James,
I haven't actually ran your example, but I think the source problem is that
your source query ("NT:American") is hitting documents that have no
children.
The reason the exception is so weird is that one of your index segments
contains zero documents that match your filter. Specifically, there's an
index segment containing docs matching "NT:american", but with no documents
matching "AGTY:np".
This will cause CachingWrapperFilter, which normally returns a FixedBitSet,
to instead return a generic "Empty" DocIdSet. Which leads to the exception
from ToChildBlockJoinQuery.
The summary is, make sure that your source query only hits documents that
were actually added using 'addDocuments()'. Since it looks like you're
extracting your block relationships from the existing index, that might
mean that you'll need to add some extra metadata to the newly created docs
instead of just cloning what already exists.
-Greg
On Wed, Jan 21, 2015 at 10:00 AM, McKinley, James T <
[email protected]> wrote:
> Hi,
>
> I'm attempting to use ToChildBlockJoinQuery in Lucene 4.8.1 by following
> Mike McCandless' blog post:
>
>
> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
>
> I have a set of child documents which are named works and a set of parent
> documents which are named persons that are the creators of the named
> works. The parent document has a nationality and the child document does
> not. I want to query the children (named works) limiting by the
> nationality of the parent (named person). I've indexed the documents as
> follows (I'm pulling the docs from an existing index):
>
> private void createNamedWorkIndex(String srcIndexPath, String
> destIndexPath) throws IOException {
> FSDirectory srcDir = FSDirectory.open(new
> File(srcIndexPath));
> FSDirectory destDir = FSDirectory.open(new
> File(destIndexPath));
>
> IndexReader reader = DirectoryReader.open(srcDir);
>
> Version version = Version.LUCENE_48;
> IndexWriterConfig conf = new IndexWriterConfig(version,
> new StandardTextAnalyzer(version));
>
> Set<String> crids = getCreatorIds(reader);
>
> String[] crida = crids.toArray(new String[crids.size()]);
>
> int numThreads = 24;
> ExecutorService executor =
> Executors.newFixedThreadPool(numThreads);
>
> int numCrids = crids.size();
> int batchSize = numCrids / numThreads;
> int remainder = numCrids % numThreads;
>
> System.out.println("Inserting work/creator blocks using "
> + numThreads + " threads...");
> try (IndexWriter writer = new IndexWriter(destDir, conf)){
> for (int i = 0; i < numThreads; i++) {
> String[] cridRange;
> if (i == numThreads - 1) {
> cridRange =
> Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1) + remainder);
> } else {
> cridRange =
> Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1));
> }
> String id = "" + ((char)('A' + i));
> Runnable indexer = new IndexRunnable(id ,
> reader, writer, new HashSet<String>(Arrays.asList(cridRange)));
> executor.execute(indexer);
> }
> executor.shutdown();
> executor.awaitTermination(2, TimeUnit.HOURS);
> } catch (Exception e) {
> executor.shutdownNow();
> throw new RuntimeException(e);
> } finally {
> reader.close();
> srcDir.close();
> destDir.close();
> }
>
> System.out.println("Done!");
> }
>
> public static class IndexRunnable implements Runnable {
> private String id;
> private IndexReader reader;
> private IndexWriter writer;
> private Set<String> crids;
>
> public IndexRunnable(String id, IndexReader reader,
> IndexWriter writer, Set<String> crids) {
> this.id = id;
> this.reader = reader;
> this.writer = writer;
> this.crids = crids;
> }
>
> @Override
> public void run() {
> IndexSearcher searcher = new IndexSearcher(reader);
>
> try {
> int count = 0;
> for (String crid : crids) {
> List<Document> docs = new
> ArrayList<>();
>
> BooleanQuery abidQuery = new
> BooleanQuery();
> abidQuery.add(new TermQuery(new
> Term("ABID", crid)), Occur.MUST);
> abidQuery.add(new TermQuery(new
> Term("AGPR", "true")), Occur.MUST);
>
> TermQuery cridQuery = new
> TermQuery(new Term("CRID", crid));
>
> TopDocs creatorDocs =
> searcher.search(abidQuery, Integer.MAX_VALUE);
> TopDocs workDocs =
> searcher.search(cridQuery, Integer.MAX_VALUE);
>
> for (int i = 0; i <
> workDocs.scoreDocs.length; i++) {
>
> docs.add(reader.document(workDocs.scoreDocs[i].doc));
> }
>
> if (creatorDocs.scoreDocs.length >
> 0) {
>
> docs.add(reader.document(creatorDocs.scoreDocs[0].doc));
> }
>
> writer.addDocuments(docs);
> if (++count % 100 == 0) {
> System.out.println(id + "
> = " + count);
> writer.commit();
> }
> }
> } catch (IOException e) {
> throw new RuntimeException(e);
> }
> }
> }
>
> I then attempt to perform a block join query as follows:
>
> private void runToChildBlockJoinQuery(String indexPath) throws
> IOException {
> FSDirectory dir = FSDirectory.open(new File(indexPath));
> IndexReader reader = DirectoryReader.open(dir);
> IndexSearcher searcher = new IndexSearcher(reader);
>
> TermQuery parentQuery = new TermQuery(new Term("NT",
> "american"));
> TermQuery parentFilterQuery = new TermQuery(new
> Term("AGTY", "np"));
> Filter parentFilter = new CachingWrapperFilter(new
> QueryWrapperFilter(parentFilterQuery));
>
> ToChildBlockJoinQuery tcbjq = new
> ToChildBlockJoinQuery(parentQuery, parentFilter, true);
>
> TopDocs worksDocs = searcher.search(tcbjq, 20);
>
> displayWorks(reader, searcher, worksDocs);
> }
>
> and I get the following exception:
>
> Exception in thread "main" java.lang.IllegalStateException: parentFilter
> must return FixedBitSet; got org.apache.lucene.util.WAH8DocIdSet@34e671de
> at
> org.apache.lucene.search.join.ToChildBlockJoinQuery$ToChildBlockJoinWeight.scorer(ToChildBlockJoinQuery.java:148)
> at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)
> at
> BlockJoinQueryTester.runToChildBlockJoinQuery(BlockJoinQueryTester.java:73)
> at BlockJoinQueryTester.main(BlockJoinQueryTester.java:40)
>
> I don't understand what I'm doing wrong and what a "FixedBitSet" is and
> why I don't get one out of my filter. Is FixedBitSet a special kind of
> OpenBitSet and what does "fixed" mean in this context? Thanks for any help.
>
> Jim
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]