Re: SOLR-792 (hierarchical faceting) issue when only 1 document should be present in the pivot

2010-11-24 Thread Nicolas Peeters
Hi Solr-Users,

I realized that I can get the behaviour that I expect if I put
facet.pivot.mincount to 0. However, I'm still puzzled why this needs to be 0
and not 1. There's one occurence for this document, isn't it?
With this value to 1, the print out of the pivot looks like this (where you
clearly see (1) for "Value_that_can't_be_matched"):

PIVOT: level1_loc_s,level2_loc_s,level3_loc_s
level1_loc_s=Greater London (8)
  level2_loc_s=London (5)
level3_loc_s=Mayfair (3)
level3_loc_s=Hammersmith (2)
  level2_loc_s=Greenwich (3)
level3_loc_s=Greenwich Centre (2)
level3_loc_s=Value_that_cant_be_matched (1)
level1_loc_s=Groot Amsterdam (5)
  level2_loc_s=Amsterdam (3)
level3_loc_s=Jordaan (2)
level3_loc_s=Centrum (1)
  level2_loc_s=Amstelveen (2)
level3_loc_s=Centrum (2)

Any expert advice on why this is the case is more than welcome!

Best regards,

Nicolas

On Wed, Nov 24, 2010 at 2:27 PM, Nicolas Peeters wrote:

> Hi Solr Community,
>
> I've been experimenting with Solr 4.0 (trunk) in order to test the SOLR-792
> feature. I have written a test that shows what I'm trying to ask. Basically,
> I'm creating a hierarchy of the area/city/neighbourhood. The problem that I
> see is that for documents that have only 1 item in a particular hierarchy
> (e.g. Greater London/Greenwich/Centre (which I've called
> "Value_that_cant_be_matched in this example"...)), these are not found by
> the pivot facet. If I add a second one, then it works. I'm puzzled why this
> is the case.
>
> This is the result of the Sytem.out that prints out the pivot facet fields
> hierarchy (see line 86)
>
> PIVOT: level1_loc_s,level2_loc_s,level3_loc_s
> level1_loc_s=Greater London (8)
>   level2_loc_s=London (5)
> level3_loc_s=Mayfair (3)
> level3_loc_s=Hammersmith (2)
>   level2_loc_s=Greenwich (3)
> level3_loc_s=Greenwich Centre (2)
>  //--> why isn't there a
> "level3_loc_s=Value_that_cant_be_matched (1)" here?
> level1_loc_s=Groot Amsterdam (5)
>   level2_loc_s=Amsterdam (3)
> level3_loc_s=Jordaan (2)
>   level2_loc_s=Amstelveen (2)
> level3_loc_s=Centrum (2)
>
>
> How can I make sure that Solr would find in the tree the single document
> when I facet on this "location" hierarchy?
>
> Thank you very much for your help.
>
> Nicolas
>
> import java.io.IOException;
> import java.net.MalformedURLException;
> import java.util.ArrayList;
> import java.util.List;
> import java.util.Map;
>
> import org.apache.solr.client.solrj.SolrQuery;
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
> import org.apache.solr.client.solrj.response.PivotField;
> import org.apache.solr.client.solrj.response.QueryResponse;
> import org.apache.solr.common.SolrInputDocument;
> import org.apache.solr.common.util.NamedList;
> import org.junit.Assert;
> import org.junit.Before;
> import org.junit.Test;
>
> /**
>  * This is a test for hiearchical faceting based on SOLR-792 (I basically
> just checkout the trunk of Solr-4.0).
>  *
>  * Unit test that shows the particular behaviour that I'm experiencing.
>  * I would have expected that the doc (see line 95) with as level3_loc_s
> "Value_that_cant_be_matched" would appear in the pivot. It seems that you
> actually need at least 2!
>  *
>  * @author npeeters
>  */
> public class HierarchicalPivotTest {
>
> CommonsHttpSolrServer server;
>
> @Before
> public void setup() throws MalformedURLException {
> // the instance can be reused
> this.server = new CommonsHttpSolrServer("
> http://localhost:8983/solr";);
> this.server.setSoTimeout(500); // socket read timeout
> this.server.setConnectionTimeout(100);
> this.server.setDefaultMaxConnectionsPerHost(100);
> this.server.setMaxTotalConnections(100);
> this.server.setFollowRedirects(false); // defaults to false
> // allowCompression defaults to false.
> }
>
> protected List createHierarchicalOrgData() {
> int id = 1;
> List docs = new ArrayList();
> docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
> "level1_loc_s", "Groot Amsterdam", "level2_loc_s", "Amsterdam",
> "level3_loc_s", "Centrum"));
> docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
> "level1_loc_s", "Groot Amsterdam", "level2_loc_s", "Amsterdam",
> "level3_loc_s", &qu

SOLR-792 (hierarchical faceting) issue when only 1 document should be present in the pivot

2010-11-24 Thread Nicolas Peeters
Hi Solr Community,

I've been experimenting with Solr 4.0 (trunk) in order to test the SOLR-792
feature. I have written a test that shows what I'm trying to ask. Basically,
I'm creating a hierarchy of the area/city/neighbourhood. The problem that I
see is that for documents that have only 1 item in a particular hierarchy
(e.g. Greater London/Greenwich/Centre (which I've called
"Value_that_cant_be_matched in this example"...)), these are not found by
the pivot facet. If I add a second one, then it works. I'm puzzled why this
is the case.

This is the result of the Sytem.out that prints out the pivot facet fields
hierarchy (see line 86)

PIVOT: level1_loc_s,level2_loc_s,level3_loc_s
level1_loc_s=Greater London (8)
  level2_loc_s=London (5)
level3_loc_s=Mayfair (3)
level3_loc_s=Hammersmith (2)
  level2_loc_s=Greenwich (3)
level3_loc_s=Greenwich Centre (2)
 //--> why isn't there a
"level3_loc_s=Value_that_cant_be_matched (1)" here?
level1_loc_s=Groot Amsterdam (5)
  level2_loc_s=Amsterdam (3)
level3_loc_s=Jordaan (2)
  level2_loc_s=Amstelveen (2)
level3_loc_s=Centrum (2)


How can I make sure that Solr would find in the tree the single document
when I facet on this "location" hierarchy?

Thank you very much for your help.

Nicolas

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.response.PivotField;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrInputDocument;
import org.apache.solr.common.util.NamedList;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;

/**
 * This is a test for hiearchical faceting based on SOLR-792 (I basically
just checkout the trunk of Solr-4.0).
 *
 * Unit test that shows the particular behaviour that I'm experiencing.
 * I would have expected that the doc (see line 95) with as level3_loc_s
"Value_that_cant_be_matched" would appear in the pivot. It seems that you
actually need at least 2!
 *
 * @author npeeters
 */
public class HierarchicalPivotTest {

CommonsHttpSolrServer server;

@Before
public void setup() throws MalformedURLException {
// the instance can be reused
this.server = new CommonsHttpSolrServer("http://localhost:8983/solr
");
this.server.setSoTimeout(500); // socket read timeout
this.server.setConnectionTimeout(100);
this.server.setDefaultMaxConnectionsPerHost(100);
this.server.setMaxTotalConnections(100);
this.server.setFollowRedirects(false); // defaults to false
// allowCompression defaults to false.
}

protected List createHierarchicalOrgData() {
int id = 1;
List docs = new ArrayList();
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Groot Amsterdam", "level2_loc_s", "Amsterdam",
"level3_loc_s", "Centrum"));
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Groot Amsterdam", "level2_loc_s", "Amsterdam",
"level3_loc_s", "Jordaan"));
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Groot Amsterdam", "level2_loc_s", "Amsterdam",
"level3_loc_s", "Jordaan"));
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Groot Amsterdam", "level2_loc_s", "Amstelveen",
"level3_loc_s", "Centrum"));
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Groot Amsterdam", "level2_loc_s", "Amstelveen",
"level3_loc_s", "Centrum"));
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Greater London", "level2_loc_s", "London", "level3_loc_s",
"Hammersmith"));
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Greater London", "level2_loc_s", "London", "level3_loc_s",
"Hammersmith"));
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Greater London", "level2_loc_s", "London", "level3_loc_s",
"Mayfair"));
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Greater London", "level2_loc_s", "London", "level3_loc_s",
"Mayfair"));
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Greater London", "level2_loc_s", "London", "level3_loc_s",
"Mayfair"));
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Greater London", "level2_loc_s", "Greenwich",
"level3_loc_s", "Value_that_cant_be_matched"));
docs.add(makeTestDoc("id", id++, "name", "Organization " + id,
"level1_loc_s", "Greater London", "level2_loc_s", "Greenwich",
"level3_loc_s", "Greenwich Centre"));
docs.add(makeTe