Re: ToChildBlockJoinQuery question

2015-01-21 Thread Gregory Dearing
James,

I haven't actually ran your example, but I think the source problem is that
your source query (NT:American) is hitting documents that have no
children.

The reason the exception is so weird is that one of your index segments
contains zero documents that match your filter.  Specifically, there's an
index segment containing docs matching NT:american, but with no documents
matching AGTY:np.

This will cause CachingWrapperFilter, which normally returns a FixedBitSet,
to instead return a generic Empty DocIdSet.  Which leads to the exception
from ToChildBlockJoinQuery.

The summary is, make sure that your source query only hits documents that
were actually added using 'addDocuments()'.  Since it looks like you're
extracting your block relationships from the existing index, that might
mean that you'll need to add some extra metadata to the newly created docs
instead of just cloning what already exists.

-Greg


On Wed, Jan 21, 2015 at 10:00 AM, McKinley, James T 
james.mckin...@cengage.com wrote:

 Hi,

 I'm attempting to use ToChildBlockJoinQuery in Lucene 4.8.1 by following
 Mike McCandless' blog post:


 http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

 I have a set of child documents which are named works and a set of parent
 documents which are named persons that are the creators of the named
 works.  The parent document has a nationality and the child document does
 not.  I want to query the children (named works) limiting by the
 nationality of the parent (named person).  I've indexed the documents as
 follows (I'm pulling the docs from an existing index):

 private void createNamedWorkIndex(String srcIndexPath, String
 destIndexPath) throws IOException {
 FSDirectory srcDir = FSDirectory.open(new
 File(srcIndexPath));
 FSDirectory destDir = FSDirectory.open(new
 File(destIndexPath));

 IndexReader reader = DirectoryReader.open(srcDir);

 Version version = Version.LUCENE_48;
 IndexWriterConfig conf = new IndexWriterConfig(version,
 new StandardTextAnalyzer(version));

 SetString crids = getCreatorIds(reader);

 String[] crida = crids.toArray(new String[crids.size()]);

 int numThreads = 24;
 ExecutorService executor =
 Executors.newFixedThreadPool(numThreads);

 int numCrids = crids.size();
 int batchSize = numCrids / numThreads;
 int remainder = numCrids % numThreads;

 System.out.println(Inserting work/creator blocks using 
 + numThreads +  threads...);
 try (IndexWriter writer = new IndexWriter(destDir, conf)){
 for (int i = 0; i  numThreads; i++) {
 String[] cridRange;
 if (i == numThreads - 1) {
 cridRange =
 Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1) + remainder);
 } else {
 cridRange =
 Arrays.copyOfRange(crida, i*batchSize, ((i+1)*batchSize - 1));
 }
 String id =  + ((char)('A' + i));
 Runnable indexer = new IndexRunnable(id ,
 reader, writer, new HashSetString(Arrays.asList(cridRange)));
 executor.execute(indexer);
 }
 executor.shutdown();
 executor.awaitTermination(2, TimeUnit.HOURS);
 } catch (Exception e) {
 executor.shutdownNow();
 throw new RuntimeException(e);
 } finally {
 reader.close();
 srcDir.close();
 destDir.close();
 }

 System.out.println(Done!);
 }

 public static class IndexRunnable implements Runnable {
 private String id;
 private IndexReader reader;
 private IndexWriter writer;
 private SetString crids;

 public IndexRunnable(String id, IndexReader reader,
 IndexWriter writer, SetString crids) {
 this.id = id;
 this.reader = reader;
 this.writer = writer;
 this.crids = crids;
 }

 @Override
 public void run() {
 IndexSearcher searcher = new IndexSearcher(reader);

 try {
 int count = 0;
 for (String crid : crids) {
 ListDocument docs = new
 ArrayList();

 BooleanQuery abidQuery = new
 

Re: MultiPhraseQuery:Rewrite to BooleanQuery

2015-01-21 Thread dennis yermakov
I'm asking this, because QueryBuilder.createFieldQuery in some cases
returns MultiPhraseQuery. I need to know on which terms from
MultiPhraseQuery match is present. Explanation doesn't give answer on this
question. It only returns string, based on these terms, see
MultiPhraseQuery.toStirng() method, like:

(termA termB) termC
it can be termA termC OR termB termC

So my question is, how can I rewrite MultiPhraseQuery to BooleanQuery with
PhraseQuery clauses or something else to get matched terms. Can it possible
at all and will these queries equal (scoring, boosting, etc).

Thanks.

2015-01-21 17:06 GMT+02:00 Ian Lea ian@gmail.com:

 Are you asking if your two suggestions

 1) a MultiPhraseQuery or

 2) a BooleanQuery made up of multiple PhraseQuery instances

 are equivalent?  If so, I'd say that they could be if you build them
 carefully enough.  For the specific examples you show I'd say not and
 would wonder if you get correct hits, particularly for your
 MultiPhraseQuery which looks wrong to me, based on my reading of the
 javadoc.  But I haven't tried or tested your code - I assume you have.


 If you are asking something else, please explain more clearly.

 --
 Ian.


 On Wed, Jan 21, 2015 at 2:50 PM, ku3ia dem...@gmail.com wrote:
  ku3ia wrote
  Hi folks!
  I have a multiphrase query, for example, from units:
 
  Directory indexStore = newDirectory();
  RandomIndexWriter writer = new RandomIndexWriter(random(), indexStore);
  add(blueberry chocolate pie, writer);
  add(blueberry chocolate tart, writer);
  IndexReader r = writer.getReader();
  writer.close();
 
  IndexSearcher searcher = newSearcher(r);
  MultiPhraseQuery q = new MultiPhraseQuery();
  q.add(new Term(body, blueberry));
  q.add(new Term(body, chocolate));
  q.add(new Term[] {new Term(body, pie), new Term(body, tart)});
  assertEquals(2, searcher.search(q, 1).totalHits);
  r.close();
  indexStore.close();
 
  I need to know on which phrase query will be match. Explanation doesn't
  return exact information, only that is match by this query. So can I
  rewrite this query to Boolean?, like
 
  BooleanQuery q = new BooleanQuery();
 
  PhraseQuery pq1 = new PhraseQuery();
  pq1.add(new Term(body, blueberry));
  pq1.add(new Term(body, chocolate));
  pq1.add(new Term(body, pie));
  q.add(pq1, BooleanClause.Occur.SHOULD);
 
  PhraseQuery pq2 = new PhraseQuery();
  pq2.add(new Term(body, blueberry));
  pq2.add(new Term(body, chocolate));
  pq2.add(new Term(body, tart));
  q.add(pq2, BooleanClause.Occur.SHOULD);
 
  In this case I'll exact know on which query I have a match. But main
  querstion is, Is this rewrite is equal/true?
  Thanks.
 
  --
  dennis yermakov
  mailto:
 
  demesg@
 
  Any ideas?
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/MultiPhraseQuery-Rewrite-to-BooleanQuery-tp4178898p4180863.html
  Sent from the Lucene - Java Users mailing list archive at Nabble.com.
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




-- 
dennis yermakov
mailto: dem...@gmail.com


Re: ToChildBlockJoinQuery question

2015-01-21 Thread Gregory Dearing
Jim,

I think you hit the nail on the head... that's not what BlockJoinQueries do.

If you're wanting to search for children and join to their parents... then
use ToParentBlockJoinQuery, with a query that matches the set of children
and a filter that matches the set of parents.

If you're searching for parents, then joining to their children... then use
ToChildBlockJoinQuery, with a query that matches the set of parents and a
filter that matches the set of children.

When you add related documents to the index (via addDocuments), make that
children are added before their parents.

The reason all the above is necessary is that it makes it possible to have
a nested hierarchy of relationships (ie. Parents have Children, which have
Children of their own).  You need a query to indicate which part of the
hierarchy you're starting from, and a filter indicating which part of the
hierarchy you're joining to.

Also, you will always get an exception if your query and your filter both
match the same document.  A child can't be its own parent.

BlockJoin is a very powerful feature, but what it's really doing is
modelling relationships using an index that doesn't know what a
relationship is.  The relationships are determined by a combination of the
order that you indexed the block, and the format of your query.  This
disjoin can lead to some weird behavior if you're not absolutely sure how
it works.

Thanks,
Greg





On Wed, Jan 21, 2015 at 4:34 PM, McKinley, James T 
james.mckin...@cengage.com wrote:


 Am I understanding how this is supposed to work?  What I think I am (and
 should be) doing is providing a query and filter that specifies the parent
 docs and the ToChildBlockJoinQuery should return me all the child docs for
 the resulting parent docs.  Is this correct?  The reason I think I'm not
 understanding is that I don't see why I need both a filter and a query to
 specify the parent docs when a single query or filter should suffice.  Am I
 misunderstanding what parentQuery and parentFilter mean, they both refer to
 parent docs right?

 Jim



Re: ToChildBlockJoinQuery question

2015-01-21 Thread Michael Sokolov

On 1/21/2015 6:59 PM, Gregory Dearing wrote:

Jim,

I think you hit the nail on the head... that's not what BlockJoinQueries do.

If you're wanting to search for children and join to their parents... then
use ToParentBlockJoinQuery, with a query that matches the set of children
and a filter that matches the set of parents.

If you're searching for parents, then joining to their children... then use
ToChildBlockJoinQuery, with a query that matches the set of parents and a
filter that matches the set of children.

When you add related documents to the index (via addDocuments), make that
children are added before their parents.

The reason all the above is necessary is that it makes it possible to have
a nested hierarchy of relationships (ie. Parents have Children, which have
Children of their own).  You need a query to indicate which part of the
hierarchy you're starting from, and a filter indicating which part of the
hierarchy you're joining to.

Also, you will always get an exception if your query and your filter both
match the same document.  A child can't be its own parent.
That's true for the existing implementation, but seems unnecessary from 
what I can tell.  See 
https://github.com/safarijv/ifpress-solr-plugin/blob/master/src/main/java/com/ifactory/press/db/solr/search/SafariBlockJoinQuery.java 
for a variant that allows a child to be its own parent.


-Mike

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: ToChildBlockJoinQuery question

2015-01-21 Thread McKinley, James T
Hi Greg,

Thanks for responding to my question.  I added some extra conditions to the 
IndexRunnable run method, namely I required AGTY:np in the source query for the 
parent docs and required that both the creatorDocs and workDocs actually 
contain documents or else the addDocuments call would never be made:

public void run() {
IndexSearcher searcher = new IndexSearcher(reader);

try {
int count = 0;
for (String crid : crids) {
ListDocument docs = new ArrayList();

BooleanQuery abidQuery = new 
BooleanQuery();
abidQuery.add(new TermQuery(new 
Term(ABID, crid)), Occur.MUST);
abidQuery.add(new TermQuery(new 
Term(AGPR, true)), Occur.MUST);
abidQuery.add(new TermQuery(new 
Term(AGTY, np)), Occur.MUST);

TermQuery cridQuery = new TermQuery(new 
Term(CRID, crid));

TopDocs creatorDocs = 
searcher.search(abidQuery, Integer.MAX_VALUE);
TopDocs workDocs = 
searcher.search(cridQuery, Integer.MAX_VALUE);

if ((creatorDocs.scoreDocs.length  0) 
 (workDocs.scoreDocs.length  0)) {
for (int i = 0; i  
workDocs.scoreDocs.length; i++) {

docs.add(reader.document(workDocs.scoreDocs[i].doc));
}


docs.add(reader.document(creatorDocs.scoreDocs[0].doc));

writer.addDocuments(docs);
if (++count % 100 == 0) {
System.out.println(id + 
 =  + count);
writer.commit();
}
}
}
} catch (IOException e) {
throw new RuntimeException(e);
}
}

I then modified the runToChildBlockJoinQuery method to first perform a search 
with the parent query and parent filter. Then using the id of each parent named 
person document I did a query for the named works with that creator id 
(essentially reversing the query that was done to create the BlockJoin index) 
and I do indeed get works back for every named person that passes the parent 
query and filter.  However I still get the IllegalStateException complaining 
about a non-FixedBitSet doc id set when doing the ToChildBlockJoinQuery. Here 
is that code:

private void runToChildBlockJoinQuery(String indexPath) throws 
IOException {
FSDirectory dir = FSDirectory.open(new File(indexPath));
IndexReader reader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(reader);

TermQuery parentFilterQuery = new TermQuery(new Term(AGTY, 
np));
TermQuery parentQuery = new TermQuery(new Term(NT, 
american));
Filter parentFilter = new CachingWrapperFilter(new 
QueryWrapperFilter(parentFilterQuery));

TopDocs creatorDocs = searcher.search(parentQuery, 
parentFilter, Integer.MAX_VALUE);

for (ScoreDoc scoreDoc : creatorDocs.scoreDocs) {
String[] ids = 
reader.document(scoreDoc.doc).getValues(ABID);
BooleanQuery cridQuery = new BooleanQuery();
for (String id : ids) {
cridQuery.add(new TermQuery(new Term(CRID, 
id)), Occur.SHOULD);
}
TopDocs worksDocs = searcher.search(cridQuery, 
Integer.MAX_VALUE);
System.out.println(worksDocs.scoreDocs.length);
}

ToChildBlockJoinQuery tcbjq = new 
ToChildBlockJoinQuery(parentQuery, parentFilter, true);

TopDocs worksDocs = searcher.search(tcbjq, Integer.MAX_VALUE);  
// == IllegalStateException
}

So I think all the parent docs have child docs and they should have been 
indexed in the same addDocuments call with the parent being the last doc in the 
list.  Then, on a lark, I just made the 

Re: MultiPhraseQuery:Rewrite to BooleanQuery

2015-01-21 Thread ku3ia
ku3ia wrote
 Hi folks!
 I have a multiphrase query, for example, from units:
 
 Directory indexStore = newDirectory();
 RandomIndexWriter writer = new RandomIndexWriter(random(), indexStore);
 add(blueberry chocolate pie, writer);
 add(blueberry chocolate tart, writer);
 IndexReader r = writer.getReader();
 writer.close();
 
 IndexSearcher searcher = newSearcher(r);
 MultiPhraseQuery q = new MultiPhraseQuery();
 q.add(new Term(body, blueberry));
 q.add(new Term(body, chocolate));
 q.add(new Term[] {new Term(body, pie), new Term(body, tart)});
 assertEquals(2, searcher.search(q, 1).totalHits);
 r.close();
 indexStore.close();
 
 I need to know on which phrase query will be match. Explanation doesn't
 return exact information, only that is match by this query. So can I
 rewrite this query to Boolean?, like
 
 BooleanQuery q = new BooleanQuery();
 
 PhraseQuery pq1 = new PhraseQuery();
 pq1.add(new Term(body, blueberry));
 pq1.add(new Term(body, chocolate));
 pq1.add(new Term(body, pie));
 q.add(pq1, BooleanClause.Occur.SHOULD);
 
 PhraseQuery pq2 = new PhraseQuery();
 pq2.add(new Term(body, blueberry));
 pq2.add(new Term(body, chocolate));
 pq2.add(new Term(body, tart));
 q.add(pq2, BooleanClause.Occur.SHOULD);
 
 In this case I'll exact know on which query I have a match. But main
 querstion is, Is this rewrite is equal/true?
 Thanks.
 
 -- 
 dennis yermakov
 mailto: 

 demesg@

Any ideas?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/MultiPhraseQuery-Rewrite-to-BooleanQuery-tp4178898p4180863.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



ToChildBlockJoinQuery question

2015-01-21 Thread McKinley, James T
Hi,

I'm attempting to use ToChildBlockJoinQuery in Lucene 4.8.1 by following Mike 
McCandless' blog post:

http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

I have a set of child documents which are named works and a set of parent 
documents which are named persons that are the creators of the named works.  
The parent document has a nationality and the child document does not.  I want 
to query the children (named works) limiting by the nationality of the parent 
(named person).  I've indexed the documents as follows (I'm pulling the docs 
from an existing index):

private void createNamedWorkIndex(String srcIndexPath, String 
destIndexPath) throws IOException {
FSDirectory srcDir = FSDirectory.open(new File(srcIndexPath));
FSDirectory destDir = FSDirectory.open(new File(destIndexPath));

IndexReader reader = DirectoryReader.open(srcDir);

Version version = Version.LUCENE_48;
IndexWriterConfig conf = new IndexWriterConfig(version, new 
StandardTextAnalyzer(version));

SetString crids = getCreatorIds(reader);

String[] crida = crids.toArray(new String[crids.size()]);

int numThreads = 24;
ExecutorService executor = 
Executors.newFixedThreadPool(numThreads);

int numCrids = crids.size();
int batchSize = numCrids / numThreads;
int remainder = numCrids % numThreads;

System.out.println(Inserting work/creator blocks using  + 
numThreads +  threads...);
try (IndexWriter writer = new IndexWriter(destDir, conf)){
for (int i = 0; i  numThreads; i++) {
String[] cridRange;
if (i == numThreads - 1) {
cridRange = Arrays.copyOfRange(crida, 
i*batchSize, ((i+1)*batchSize - 1) + remainder);
} else {
cridRange = Arrays.copyOfRange(crida, 
i*batchSize, ((i+1)*batchSize - 1));
}
String id =  + ((char)('A' + i));
Runnable indexer = new IndexRunnable(id , 
reader, writer, new HashSetString(Arrays.asList(cridRange)));
executor.execute(indexer);
}
executor.shutdown();
executor.awaitTermination(2, TimeUnit.HOURS);
} catch (Exception e) {
executor.shutdownNow();
throw new RuntimeException(e);
} finally {
reader.close();
srcDir.close();
destDir.close();
}

System.out.println(Done!);
}

public static class IndexRunnable implements Runnable {
private String id;
private IndexReader reader;
private IndexWriter writer;
private SetString crids;

public IndexRunnable(String id, IndexReader reader, IndexWriter 
writer, SetString crids) {
this.id = id;
this.reader = reader;
this.writer = writer;
this.crids = crids;
}

@Override
public void run() {
IndexSearcher searcher = new IndexSearcher(reader);

try {
int count = 0;
for (String crid : crids) {
ListDocument docs = new ArrayList();

BooleanQuery abidQuery = new 
BooleanQuery();
abidQuery.add(new TermQuery(new 
Term(ABID, crid)), Occur.MUST);
abidQuery.add(new TermQuery(new 
Term(AGPR, true)), Occur.MUST);

TermQuery cridQuery = new TermQuery(new 
Term(CRID, crid));

TopDocs creatorDocs = 
searcher.search(abidQuery, Integer.MAX_VALUE);
TopDocs workDocs = 
searcher.search(cridQuery, Integer.MAX_VALUE);

for (int i = 0; i  
workDocs.scoreDocs.length; i++) {

docs.add(reader.document(workDocs.scoreDocs[i].doc));
}

Re: MultiPhraseQuery:Rewrite to BooleanQuery

2015-01-21 Thread Ian Lea
Are you asking if your two suggestions

1) a MultiPhraseQuery or

2) a BooleanQuery made up of multiple PhraseQuery instances

are equivalent?  If so, I'd say that they could be if you build them
carefully enough.  For the specific examples you show I'd say not and
would wonder if you get correct hits, particularly for your
MultiPhraseQuery which looks wrong to me, based on my reading of the
javadoc.  But I haven't tried or tested your code - I assume you have.


If you are asking something else, please explain more clearly.

--
Ian.


On Wed, Jan 21, 2015 at 2:50 PM, ku3ia dem...@gmail.com wrote:
 ku3ia wrote
 Hi folks!
 I have a multiphrase query, for example, from units:

 Directory indexStore = newDirectory();
 RandomIndexWriter writer = new RandomIndexWriter(random(), indexStore);
 add(blueberry chocolate pie, writer);
 add(blueberry chocolate tart, writer);
 IndexReader r = writer.getReader();
 writer.close();

 IndexSearcher searcher = newSearcher(r);
 MultiPhraseQuery q = new MultiPhraseQuery();
 q.add(new Term(body, blueberry));
 q.add(new Term(body, chocolate));
 q.add(new Term[] {new Term(body, pie), new Term(body, tart)});
 assertEquals(2, searcher.search(q, 1).totalHits);
 r.close();
 indexStore.close();

 I need to know on which phrase query will be match. Explanation doesn't
 return exact information, only that is match by this query. So can I
 rewrite this query to Boolean?, like

 BooleanQuery q = new BooleanQuery();

 PhraseQuery pq1 = new PhraseQuery();
 pq1.add(new Term(body, blueberry));
 pq1.add(new Term(body, chocolate));
 pq1.add(new Term(body, pie));
 q.add(pq1, BooleanClause.Occur.SHOULD);

 PhraseQuery pq2 = new PhraseQuery();
 pq2.add(new Term(body, blueberry));
 pq2.add(new Term(body, chocolate));
 pq2.add(new Term(body, tart));
 q.add(pq2, BooleanClause.Occur.SHOULD);

 In this case I'll exact know on which query I have a match. But main
 querstion is, Is this rewrite is equal/true?
 Thanks.

 --
 dennis yermakov
 mailto:

 demesg@

 Any ideas?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/MultiPhraseQuery-Rewrite-to-BooleanQuery-tp4178898p4180863.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org