Re: ToChildBlockJoinQuery question

2015-01-21 Thread Michael Sokolov

On 1/21/2015 6:59 PM, Gregory Dearing wrote:

Jim,

I think you hit the nail on the head... that's not what BlockJoinQueries do.

If you're wanting to search for children and join to their parents... then
use ToParentBlockJoinQuery, with a query that matches the set of children
and a filter that matches the set of parents.

If you're searching for parents, then joining to their children... then use
ToChildBlockJoinQuery, with a query that matches the set of parents and a
filter that matches the set of children.

When you add related documents to the index (via addDocuments), make that
children are added before their parents.

The reason all the above is necessary is that it makes it possible to have
a nested hierarchy of relationships (ie. Parents have Children, which have
Children of their own).  You need a query to indicate which part of the
hierarchy you're starting from, and a filter indicating which part of the
hierarchy you're joining to.

Also, you will always get an exception if your query and your filter both
match the same document.  A child can't be its own parent.
That's true for the existing implementation, but seems unnecessary from 
what I can tell.  See 
https://github.com/safarijv/ifpress-solr-plugin/blob/master/src/main/java/com/ifactory/press/db/solr/search/SafariBlockJoinQuery.java 
for a variant that allows a child to be its own parent.


-Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: ToChildBlockJoinQuery question

2015-01-21 Thread Gregory Dearing
Jim,

I think you hit the nail on the head... that's not what BlockJoinQueries do.

If you're wanting to search for children and join to their parents... then
use ToParentBlockJoinQuery, with a query that matches the set of children
and a filter that matches the set of parents.

If you're searching for parents, then joining to their children... then use
ToChildBlockJoinQuery, with a query that matches the set of parents and a
filter that matches the set of children.

When you add related documents to the index (via addDocuments), make that
children are added before their parents.

The reason all the above is necessary is that it makes it possible to have
a nested hierarchy of relationships (ie. Parents have Children, which have
Children of their own).  You need a query to indicate which part of the
hierarchy you're starting from, and a filter indicating which part of the
hierarchy you're joining to.

Also, you will always get an exception if your query and your filter both
match the same document.  A child can't be its own parent.

BlockJoin is a very powerful feature, but what it's really doing is
modelling relationships using an index that doesn't know what a
relationship is.  The relationships are determined by a combination of the
order that you indexed the block, and the format of your query.  This
disjoin can lead to some weird behavior if you're not absolutely sure how
it works.

Thanks,
Greg





On Wed, Jan 21, 2015 at 4:34 PM, McKinley, James T <
james.mckin...@cengage.com> wrote:

>
> Am I understanding how this is supposed to work?  What I think I am (and
> should be) doing is providing a query and filter that specifies the parent
> docs and the ToChildBlockJoinQuery should return me all the child docs for
> the resulting parent docs.  Is this correct?  The reason I think I'm not
> understanding is that I don't see why I need both a filter and a query to
> specify the parent docs when a single query or filter should suffice.  Am I
> misunderstanding what parentQuery and parentFilter mean, they both refer to
> parent docs right?
>
> Jim
>


RE: ToChildBlockJoinQuery question

2015-01-21 Thread McKinley, James T
Hi Greg,

Thanks for responding to my question.  I added some extra conditions to the 
IndexRunnable run method, namely I required AGTY:np in the source query for the 
parent docs and required that both the creatorDocs and workDocs actually 
contain documents or else the addDocuments call would never be made:

public void run() {
IndexSearcher searcher = new IndexSearcher(reader);

try {
int count = 0;
for (String crid : crids) {
List docs = new ArrayList<>();

BooleanQuery abidQuery = new 
BooleanQuery();
abidQuery.add(new TermQuery(new 
Term("ABID", crid)), Occur.MUST);
abidQuery.add(new TermQuery(new 
Term("AGPR", "true")), Occur.MUST);
abidQuery.add(new TermQuery(new 
Term("AGTY", "np")), Occur.MUST);

TermQuery cridQuery = new TermQuery(new 
Term("CRID", crid));

TopDocs creatorDocs = 
searcher.search(abidQuery, Integer.MAX_VALUE);
TopDocs workDocs = 
searcher.search(cridQuery, Integer.MAX_VALUE);

if ((creatorDocs.scoreDocs.length > 0) 
&& (workDocs.scoreDocs.length > 0)) {
for (int i = 0; i < 
workDocs.scoreDocs.length; i++) {

docs.add(reader.document(workDocs.scoreDocs[i].doc));
}


docs.add(reader.document(creatorDocs.scoreDocs[0].doc));

writer.addDocuments(docs);
if (++count % 100 == 0) {
System.out.println(id + 
" = " + count);
writer.commit();
}
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}

I then modified the runToChildBlockJoinQuery method to first perform a search 
with the parent query and parent filter. Then using the id of each parent named 
person document I did a query for the named works with that creator id 
(essentially reversing the query that was done to create the BlockJoin index) 
and I do indeed get works back for every named person that passes the parent 
query and filter.  However I still get the IllegalStateException complaining 
about a non-FixedBitSet doc id set when doing the ToChildBlockJoinQuery. Here 
is that code:

private void runToChildBlockJoinQuery(String indexPath) throws 
IOException {
FSDirectory dir = FSDirectory.open(new File(indexPath));
IndexReader reader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(reader);

TermQuery parentFilterQuery = new TermQuery(new Term("AGTY", 
"np"));
TermQuery parentQuery = new TermQuery(new Term("NT", 
"american"));
Filter parentFilter = new CachingWrapperFilter(new 
QueryWrapperFilter(parentFilterQuery));

TopDocs creatorDocs = searcher.search(parentQuery, 
parentFilter, Integer.MAX_VALUE);

for (ScoreDoc scoreDoc : creatorDocs.scoreDocs) {
String[] ids = 
reader.document(scoreDoc.doc).getValues("ABID");
BooleanQuery cridQuery = new BooleanQuery();
for (String id : ids) {
cridQuery.add(new TermQuery(new Term("CRID", 
id)), Occur.SHOULD);
}
TopDocs worksDocs = searcher.search(cridQuery, 
Integer.MAX_VALUE);
System.out.println(worksDocs.scoreDocs.length);
}

ToChildBlockJoinQuery tcbjq = new 
ToChildBlockJoinQuery(parentQuery, parentFilter, true);

TopDocs worksDocs = searcher.search(tcbjq, Integer.MAX_VALUE);  
// ==> IllegalStateException
}

So I think all the parent docs have child docs and they should have been 
indexed in the same addDocuments call with the parent being the last doc in the 
list.  Then, on a lark, I just

Re: ToChildBlockJoinQuery question

2015-01-21 Thread Gregory Dearing
James,

I haven't actually ran your example, but I think the source problem is that
your source query ("NT:American") is hitting documents that have no
children.

The reason the exception is so weird is that one of your index segments
contains zero documents that match your filter.  Specifically, there's an
index segment containing docs matching "NT:american", but with no documents
matching "AGTY:np".

This will cause CachingWrapperFilter, which normally returns a FixedBitSet,
to instead return a generic "Empty" DocIdSet.  Which leads to the exception
from ToChildBlockJoinQuery.

The summary is, make sure that your source query only hits documents that
were actually added using 'addDocuments()'.  Since it looks like you're
extracting your block relationships from the existing index, that might
mean that you'll need to add some extra metadata to the newly created docs
instead of just cloning what already exists.

-Greg


On Wed, Jan 21, 2015 at 10:00 AM, McKinley, James T <
james.mckin...@cengage.com> wrote:

> Hi,
>
> I'm attempting to use ToChildBlockJoinQuery in Lucene 4.8.1 by following
> Mike McCandless' blog post:
>
>
> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
>
> I have a set of child documents which are named works and a set of parent
> documents which are named persons that are the creators of the named
> works.  The parent document has a nationality and the child document does
> not.  I want to query the children (named works) limiting by the
> nationality of the parent (named person).  I've indexed the documents as
> follows (I'm pulling the docs from an existing index):
>
> private void createNamedWorkIndex(String srcIndexPath, String
> destIndexPath) throws IOException {
> FSDirectory srcDir = FSDirectory.open(new
> File(srcIndexPath));
> FSDirectory destDir = FSDirectory.open(new
> File(destIndexPath));
>
> IndexReader reader = DirectoryReader.open(srcDir);
>
> Version version = Version.LUCENE_48;
> IndexWriterConfig conf = new IndexWriterConfig(version,
> new StandardTextAnalyzer(version));
>
> Set crids = getCreatorIds(reader);
>
> String[] crida = crids.toArray(new String[crids.size()]);
>
> int numThreads = 24;
> ExecutorService executor =
> Executors.newFixedThreadPool(numThreads);
>
> int numCrids = crids.size();
> int batchSize = numCrids / numThreads;
> int remainder = numCrids % numThreads;
>
> System.out.println("Inserting work/creator blocks using "
> + numThreads + " threads...");
> try (IndexWriter writer = new IndexWriter(destDir, conf)){
> for (int i = 0; i < numThreads; i++) {
> String[] cridRange;
> if (i == numThreads - 1) {
> cridRange =
> Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1) + remainder);
> } else {
> cridRange =
> Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1));
> }
> String id = "" + ((char)('A' + i));
> Runnable indexer = new IndexRunnable(id ,
> reader, writer, new HashSet(Arrays.asList(cridRange)));
> executor.execute(indexer);
> }
> executor.shutdown();
> executor.awaitTermination(2, TimeUnit.HOURS);
> } catch (Exception e) {
> executor.shutdownNow();
> throw new RuntimeException(e);
> } finally {
> reader.close();
> srcDir.close();
> destDir.close();
> }
>
> System.out.println("Done!");
> }
>
> public static class IndexRunnable implements Runnable {
> private String id;
> private IndexReader reader;
> private IndexWriter writer;
> private Set crids;
>
> public IndexRunnable(String id, IndexReader reader,
> IndexWriter writer, Set crids) {
> this.id = id;
> this.reader = reader;
> this.writer = writer;
> this.crids = crids;
> }
>
> @Override
> public void run() {
> IndexSearcher searcher = new IndexSearcher(reader);
>
> try {
> int count = 0;
> for (String crid : crids) {
> List docs = new
> ArrayList<>();
>
>

Re: MultiPhraseQuery:Rewrite to BooleanQuery

2015-01-21 Thread dennis yermakov
I'm asking this, because QueryBuilder.createFieldQuery in some cases
returns MultiPhraseQuery. I need to know on which terms from
MultiPhraseQuery match is present. Explanation doesn't give answer on this
question. It only returns string, based on these terms, see
MultiPhraseQuery.toStirng() method, like:

(termA termB) termC
it can be "termA termC" OR "termB termC"

So my question is, how can I rewrite MultiPhraseQuery to BooleanQuery with
PhraseQuery clauses or something else to get matched terms. Can it possible
at all and will these queries equal (scoring, boosting, etc).

Thanks.

2015-01-21 17:06 GMT+02:00 Ian Lea :

> Are you asking if your two suggestions
>
> 1) a MultiPhraseQuery or
>
> 2) a BooleanQuery made up of multiple PhraseQuery instances
>
> are equivalent?  If so, I'd say that they could be if you build them
> carefully enough.  For the specific examples you show I'd say not and
> would wonder if you get correct hits, particularly for your
> MultiPhraseQuery which looks wrong to me, based on my reading of the
> javadoc.  But I haven't tried or tested your code - I assume you have.
>
>
> If you are asking something else, please explain more clearly.
>
> --
> Ian.
>
>
> On Wed, Jan 21, 2015 at 2:50 PM, ku3ia  wrote:
> > ku3ia wrote
> >> Hi folks!
> >> I have a multiphrase query, for example, from units:
> >>
> >> Directory indexStore = newDirectory();
> >> RandomIndexWriter writer = new RandomIndexWriter(random(), indexStore);
> >> add("blueberry chocolate pie", writer);
> >> add("blueberry chocolate tart", writer);
> >> IndexReader r = writer.getReader();
> >> writer.close();
> >>
> >> IndexSearcher searcher = newSearcher(r);
> >> MultiPhraseQuery q = new MultiPhraseQuery();
> >> q.add(new Term("body", "blueberry"));
> >> q.add(new Term("body", "chocolate"));
> >> q.add(new Term[] {new Term("body", "pie"), new Term("body", "tart")});
> >> assertEquals(2, searcher.search(q, 1).totalHits);
> >> r.close();
> >> indexStore.close();
> >>
> >> I need to know on which phrase query will be match. Explanation doesn't
> >> return exact information, only that is match by this query. So can I
> >> rewrite this query to Boolean?, like
> >>
> >> BooleanQuery q = new BooleanQuery();
> >>
> >> PhraseQuery pq1 = new PhraseQuery();
> >> pq1.add(new Term("body", "blueberry"));
> >> pq1.add(new Term("body", "chocolate"));
> >> pq1.add(new Term("body", "pie"));
> >> q.add(pq1, BooleanClause.Occur.SHOULD);
> >>
> >> PhraseQuery pq2 = new PhraseQuery();
> >> pq2.add(new Term("body", "blueberry"));
> >> pq2.add(new Term("body", "chocolate"));
> >> pq2.add(new Term("body", "tart"));
> >> q.add(pq2, BooleanClause.Occur.SHOULD);
> >>
> >> In this case I'll exact know on which query I have a match. But main
> >> querstion is, Is this rewrite is equal/true?
> >> Thanks.
> >>
> >> --
> >> dennis yermakov
> >> mailto:
> >
> >> demesg@
> >
> > Any ideas?
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/MultiPhraseQuery-Rewrite-to-BooleanQuery-tp4178898p4180863.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
dennis yermakov
mailto: dem...@gmail.com


Re: MultiPhraseQuery:Rewrite to BooleanQuery

2015-01-21 Thread Ian Lea
Are you asking if your two suggestions

1) a MultiPhraseQuery or

2) a BooleanQuery made up of multiple PhraseQuery instances

are equivalent?  If so, I'd say that they could be if you build them
carefully enough.  For the specific examples you show I'd say not and
would wonder if you get correct hits, particularly for your
MultiPhraseQuery which looks wrong to me, based on my reading of the
javadoc.  But I haven't tried or tested your code - I assume you have.


If you are asking something else, please explain more clearly.

--
Ian.


On Wed, Jan 21, 2015 at 2:50 PM, ku3ia  wrote:
> ku3ia wrote
>> Hi folks!
>> I have a multiphrase query, for example, from units:
>>
>> Directory indexStore = newDirectory();
>> RandomIndexWriter writer = new RandomIndexWriter(random(), indexStore);
>> add("blueberry chocolate pie", writer);
>> add("blueberry chocolate tart", writer);
>> IndexReader r = writer.getReader();
>> writer.close();
>>
>> IndexSearcher searcher = newSearcher(r);
>> MultiPhraseQuery q = new MultiPhraseQuery();
>> q.add(new Term("body", "blueberry"));
>> q.add(new Term("body", "chocolate"));
>> q.add(new Term[] {new Term("body", "pie"), new Term("body", "tart")});
>> assertEquals(2, searcher.search(q, 1).totalHits);
>> r.close();
>> indexStore.close();
>>
>> I need to know on which phrase query will be match. Explanation doesn't
>> return exact information, only that is match by this query. So can I
>> rewrite this query to Boolean?, like
>>
>> BooleanQuery q = new BooleanQuery();
>>
>> PhraseQuery pq1 = new PhraseQuery();
>> pq1.add(new Term("body", "blueberry"));
>> pq1.add(new Term("body", "chocolate"));
>> pq1.add(new Term("body", "pie"));
>> q.add(pq1, BooleanClause.Occur.SHOULD);
>>
>> PhraseQuery pq2 = new PhraseQuery();
>> pq2.add(new Term("body", "blueberry"));
>> pq2.add(new Term("body", "chocolate"));
>> pq2.add(new Term("body", "tart"));
>> q.add(pq2, BooleanClause.Occur.SHOULD);
>>
>> In this case I'll exact know on which query I have a match. But main
>> querstion is, Is this rewrite is equal/true?
>> Thanks.
>>
>> --
>> dennis yermakov
>> mailto:
>
>> demesg@
>
> Any ideas?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/MultiPhraseQuery-Rewrite-to-BooleanQuery-tp4178898p4180863.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



ToChildBlockJoinQuery question

2015-01-21 Thread McKinley, James T
Hi,

I'm attempting to use ToChildBlockJoinQuery in Lucene 4.8.1 by following Mike 
McCandless' blog post:

http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

I have a set of child documents which are named works and a set of parent 
documents which are named persons that are the creators of the named works.  
The parent document has a nationality and the child document does not.  I want 
to query the children (named works) limiting by the nationality of the parent 
(named person).  I've indexed the documents as follows (I'm pulling the docs 
from an existing index):

private void createNamedWorkIndex(String srcIndexPath, String 
destIndexPath) throws IOException {
FSDirectory srcDir = FSDirectory.open(new File(srcIndexPath));
FSDirectory destDir = FSDirectory.open(new File(destIndexPath));

IndexReader reader = DirectoryReader.open(srcDir);

Version version = Version.LUCENE_48;
IndexWriterConfig conf = new IndexWriterConfig(version, new 
StandardTextAnalyzer(version));

Set crids = getCreatorIds(reader);

String[] crida = crids.toArray(new String[crids.size()]);

int numThreads = 24;
ExecutorService executor = 
Executors.newFixedThreadPool(numThreads);

int numCrids = crids.size();
int batchSize = numCrids / numThreads;
int remainder = numCrids % numThreads;

System.out.println("Inserting work/creator blocks using " + 
numThreads + " threads...");
try (IndexWriter writer = new IndexWriter(destDir, conf)){
for (int i = 0; i < numThreads; i++) {
String[] cridRange;
if (i == numThreads - 1) {
cridRange = Arrays.copyOfRange(crida, 
i*batchSize, ((i+1)*batchSize - 1) + remainder);
} else {
cridRange = Arrays.copyOfRange(crida, 
i*batchSize, ((i+1)*batchSize - 1));
}
String id = "" + ((char)('A' + i));
Runnable indexer = new IndexRunnable(id , 
reader, writer, new HashSet(Arrays.asList(cridRange)));
executor.execute(indexer);
}
executor.shutdown();
executor.awaitTermination(2, TimeUnit.HOURS);
} catch (Exception e) {
executor.shutdownNow();
throw new RuntimeException(e);
} finally {
reader.close();
srcDir.close();
destDir.close();
}

System.out.println("Done!");
}

public static class IndexRunnable implements Runnable {
private String id;
private IndexReader reader;
private IndexWriter writer;
private Set crids;

public IndexRunnable(String id, IndexReader reader, IndexWriter 
writer, Set crids) {
this.id = id;
this.reader = reader;
this.writer = writer;
this.crids = crids;
}

@Override
public void run() {
IndexSearcher searcher = new IndexSearcher(reader);

try {
int count = 0;
for (String crid : crids) {
List docs = new ArrayList<>();

BooleanQuery abidQuery = new 
BooleanQuery();
abidQuery.add(new TermQuery(new 
Term("ABID", crid)), Occur.MUST);
abidQuery.add(new TermQuery(new 
Term("AGPR", "true")), Occur.MUST);

TermQuery cridQuery = new TermQuery(new 
Term("CRID", crid));

TopDocs creatorDocs = 
searcher.search(abidQuery, Integer.MAX_VALUE);
TopDocs workDocs = 
searcher.search(cridQuery, Integer.MAX_VALUE);

for (int i = 0; i < 
workDocs.scoreDocs.length; i++) {

docs.add(reader.document(workDocs.scoreDocs[i].doc));
}

Re: MultiPhraseQuery:Rewrite to BooleanQuery

2015-01-21 Thread ku3ia
ku3ia wrote
> Hi folks!
> I have a multiphrase query, for example, from units:
> 
> Directory indexStore = newDirectory();
> RandomIndexWriter writer = new RandomIndexWriter(random(), indexStore);
> add("blueberry chocolate pie", writer);
> add("blueberry chocolate tart", writer);
> IndexReader r = writer.getReader();
> writer.close();
> 
> IndexSearcher searcher = newSearcher(r);
> MultiPhraseQuery q = new MultiPhraseQuery();
> q.add(new Term("body", "blueberry"));
> q.add(new Term("body", "chocolate"));
> q.add(new Term[] {new Term("body", "pie"), new Term("body", "tart")});
> assertEquals(2, searcher.search(q, 1).totalHits);
> r.close();
> indexStore.close();
> 
> I need to know on which phrase query will be match. Explanation doesn't
> return exact information, only that is match by this query. So can I
> rewrite this query to Boolean?, like
> 
> BooleanQuery q = new BooleanQuery();
> 
> PhraseQuery pq1 = new PhraseQuery();
> pq1.add(new Term("body", "blueberry"));
> pq1.add(new Term("body", "chocolate"));
> pq1.add(new Term("body", "pie"));
> q.add(pq1, BooleanClause.Occur.SHOULD);
> 
> PhraseQuery pq2 = new PhraseQuery();
> pq2.add(new Term("body", "blueberry"));
> pq2.add(new Term("body", "chocolate"));
> pq2.add(new Term("body", "tart"));
> q.add(pq2, BooleanClause.Occur.SHOULD);
> 
> In this case I'll exact know on which query I have a match. But main
> querstion is, Is this rewrite is equal/true?
> Thanks.
> 
> -- 
> dennis yermakov
> mailto: 

> demesg@

Any ideas?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiPhraseQuery-Rewrite-to-BooleanQuery-tp4178898p4180863.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org