[ 
https://issues.apache.org/jira/browse/SOLR-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lakshmi Venkataswamy updated SOLR-4824:
---------------------------------------

    Description: 
In upgrading from SOLR 3.6 to 4.2/4.3 and comparing results on fuzzy queries, I 
found that after a certain number of documents were ingested the fuzzy query 
had drastically lower number of results.  We have approximately 18,000 
documents per day and after ingesting approximately 40 days of documents, the 
next incremental day of documents results in a lower number of results of a 
fuzzy search.

The query :  
http://10.100.1.xx:8080/solr/corex/select?q=cc:worde~1&facet=on&facet.field=date&fl=date&facet.sort

produces the following result before the threshold is crossed

<response><lst name="responseHeader">
<int name="status">0</int><int name="QTime">2349</int><lst name="params"><str 
name="facet">on</str><str name="fl">date</str><str name="facet.sort"/>
<str name="q">cc:worde~1</str><str 
name="facet.field">date</str></lst></lst><result name="response" 
numFound="362803" start="0"></result>
<lst name="facet_counts"><lst name="facet_queries"/><lst 
name="facet_fields"><lst name="date">
<int name="2012-12-31">2866</int>
<int name="2013-01-01">11372</int>
<int name="2013-01-02">11514</int>
<int name="2013-01-03">12015</int>
<int name="2013-01-04">11746</int>
<int name="2013-01-05">10853</int>
<int name="2013-01-06">11053</int>
<int name="2013-01-07">11815</int>
<int name="2013-01-08">11427</int>
<int name="2013-01-09">11475</int>
<int name="2013-01-10">11461</int>
<int name="2013-01-11">12058</int>
<int name="2013-01-12">11335</int>
<int name="2013-01-13">12039</int>
<int name="2013-01-14">12064</int>
<int name="2013-01-15">12234</int>
<int name="2013-01-16">12545</int>
<int name="2013-01-17">11766</int>
<int name="2013-01-18">12197</int>
<int name="2013-01-19">11414</int>
<int name="2013-01-20">11633</int>
<int name="2013-01-21">12863</int>
<int name="2013-01-22">12378</int>
<int name="2013-01-23">11947</int>
<int name="2013-01-24">11822</int>
<int name="2013-01-25">11882</int>
<int name="2013-01-26">10474</int>
<int name="2013-01-27">11051</int>
<int name="2013-01-28">11776</int>
<int name="2013-01-29">11957</int>
<int name="2013-01-30">11260</int>
<int name="2013-01-31">8511</int>
</lst></lst><lst name="facet_dates"/><lst name="facet_ranges"/></lst></response>

Once the 40 days of documents ingested threshold is crossed the results drop as 
show below for the same query

<response><lst name="responseHeader">
<int name="status">0</int><int name="QTime">2</int><lst name="params"><str 
name="facet">on</str><str name="fl">date</str><str name="facet.sort"/><str 
name="q">cc:worde~1</str><str name="facet.field">date</str></lst></lst>
<result name="response" numFound="1338" start="0"></result>
<lst name="facet_counts"><lst name="facet_queries"/><lst 
name="facet_fields"><lst name="date">
<int name="2012-12-31">0</int>
<int name="2013-01-01">41</int>
<int name="2013-01-02">21</int>
<int name="2013-01-03">24</int>
<int name="2013-01-04">19</int>
<int name="2013-01-05">9</int>
<int name="2013-01-06">11</int>
<int name="2013-01-07">17</int>
<int name="2013-01-08">14</int>
<int name="2013-01-09">24</int>
<int name="2013-01-10">43</int>
<int name="2013-01-11">14</int>
<int name="2013-01-12">52</int>
<int name="2013-01-13">57</int>
<int name="2013-01-14">25</int>
<int name="2013-01-15">17</int>
<int name="2013-01-16">34</int>
<int name="2013-01-17">11</int>
<int name="2013-01-18">16</int>
<int name="2013-01-19">121</int>
<int name="2013-01-20">33</int>
<int name="2013-01-21">26</int>
<int name="2013-01-22">59</int>
<int name="2013-01-23">27</int>
<int name="2013-01-24">10</int>
<int name="2013-01-25">9</int>
<int name="2013-01-26">6</int>
<int name="2013-01-27">16</int>
<int name="2013-01-28">11</int>
<int name="2013-01-29">15</int>
<int name="2013-01-30">21</int>
<int name="2013-01-31">109</int>
<int name="2013-02-01">11</int>
<int name="2013-02-02">7</int>
<int name="2013-02-03">10</int>
<int name="2013-02-04">8</int>
<int name="2013-02-05">13</int>
<int name="2013-02-06">75</int>
<int name="2013-02-07">77</int>
<int name="2013-02-08">31</int>
<int name="2013-02-09">35</int>
<int name="2013-02-10">22</int>
<int name="2013-02-11">18</int>
<int name="2013-02-12">11</int>
<int name="2013-02-13">68</int>
<int name="2013-02-14">40</int>
</lst></lst><lst name="facet_dates"/><lst name="facet_ranges"/></lst></response>

I have also tested this with different months of data and have seen the same 
issue  around the number of documents.

  was:
In upgrading from SOLR 3.6 to 4.2/4.3 I and comparing results on fuzzy queries, 
I found that after a certain number of documents were ingested the fuzzy query 
has drastically lower number of results.  We have approximately 18,000 
documents per day and after ingesting approximately 40 days of documents, the 
next incremental day of documents results in a lower number of results of a 
fuzzy search.

The query :  
http://10.100.1.48:8080/solr/coreTV3/select?q=cc:worde~1&facet=on&facet.field=date&fl=date&facet.sort

produces the following result before the threshold is crossed

<response><lst name="responseHeader">
<int name="status">0</int><int name="QTime">2349</int><lst name="params"><str 
name="facet">on</str><str name="fl">date</str><str name="facet.sort"/>
<str name="q">cc:worde~1</str><str 
name="facet.field">date</str></lst></lst><result name="response" 
numFound="362803" start="0"></result>
<lst name="facet_counts"><lst name="facet_queries"/><lst 
name="facet_fields"><lst name="date">
<int name="2012-12-31">2866</int>
<int name="2013-01-01">11372</int>
<int name="2013-01-02">11514</int>
<int name="2013-01-03">12015</int>
<int name="2013-01-04">11746</int>
<int name="2013-01-05">10853</int>
<int name="2013-01-06">11053</int>
<int name="2013-01-07">11815</int>
<int name="2013-01-08">11427</int>
<int name="2013-01-09">11475</int>
<int name="2013-01-10">11461</int>
<int name="2013-01-11">12058</int>
<int name="2013-01-12">11335</int>
<int name="2013-01-13">12039</int>
<int name="2013-01-14">12064</int>
<int name="2013-01-15">12234</int>
<int name="2013-01-16">12545</int>
<int name="2013-01-17">11766</int>
<int name="2013-01-18">12197</int>
<int name="2013-01-19">11414</int>
<int name="2013-01-20">11633</int>
<int name="2013-01-21">12863</int>
<int name="2013-01-22">12378</int>
<int name="2013-01-23">11947</int>
<int name="2013-01-24">11822</int>
<int name="2013-01-25">11882</int>
<int name="2013-01-26">10474</int>
<int name="2013-01-27">11051</int>
<int name="2013-01-28">11776</int>
<int name="2013-01-29">11957</int>
<int name="2013-01-30">11260</int>
<int name="2013-01-31">8511</int>
</lst></lst><lst name="facet_dates"/><lst name="facet_ranges"/></lst></response>

Once the 40 days of documents ingested threshold is crossed the results drop as 
show below for the same query

<response><lst name="responseHeader">
<int name="status">0</int><int name="QTime">2</int><lst name="params"><str 
name="facet">on</str><str name="fl">date</str><str name="facet.sort"/><str 
name="q">cc:worde~1</str><str name="facet.field">date</str></lst></lst>
<result name="response" numFound="1338" start="0"></result>
<lst name="facet_counts"><lst name="facet_queries"/><lst 
name="facet_fields"><lst name="date">
<int name="2012-12-31">0</int>
<int name="2013-01-01">41</int>
<int name="2013-01-02">21</int>
<int name="2013-01-03">24</int>
<int name="2013-01-04">19</int>
<int name="2013-01-05">9</int>
<int name="2013-01-06">11</int>
<int name="2013-01-07">17</int>
<int name="2013-01-08">14</int>
<int name="2013-01-09">24</int>
<int name="2013-01-10">43</int>
<int name="2013-01-11">14</int>
<int name="2013-01-12">52</int>
<int name="2013-01-13">57</int>
<int name="2013-01-14">25</int>
<int name="2013-01-15">17</int>
<int name="2013-01-16">34</int>
<int name="2013-01-17">11</int>
<int name="2013-01-18">16</int>
<int name="2013-01-19">121</int>
<int name="2013-01-20">33</int>
<int name="2013-01-21">26</int>
<int name="2013-01-22">59</int>
<int name="2013-01-23">27</int>
<int name="2013-01-24">10</int>
<int name="2013-01-25">9</int>
<int name="2013-01-26">6</int>
<int name="2013-01-27">16</int>
<int name="2013-01-28">11</int>
<int name="2013-01-29">15</int>
<int name="2013-01-30">21</int>
<int name="2013-01-31">109</int>
<int name="2013-02-01">11</int>
<int name="2013-02-02">7</int>
<int name="2013-02-03">10</int>
<int name="2013-02-04">8</int>
<int name="2013-02-05">13</int>
<int name="2013-02-06">75</int>
<int name="2013-02-07">77</int>
<int name="2013-02-08">31</int>
<int name="2013-02-09">35</int>
<int name="2013-02-10">22</int>
<int name="2013-02-11">18</int>
<int name="2013-02-12">11</int>
<int name="2013-02-13">68</int>
<int name="2013-02-14">40</int>
</lst></lst><lst name="facet_dates"/><lst name="facet_ranges"/></lst></response>

I have also tested this with different months of data and have seen the same 
issue  around the number of documents.

    
> Fuzzy / Faceting results are changed after ingestion of documents past a 
> certain number 
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-4824
>                 URL: https://issues.apache.org/jira/browse/SOLR-4824
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.2, 4.3
>         Environment: Ubuntu 12.04 LTS 12.04.2 
> jre1.7.0_17
> jboss-as-7.1.1.Final
>            Reporter: Lakshmi Venkataswamy
>
> In upgrading from SOLR 3.6 to 4.2/4.3 and comparing results on fuzzy queries, 
> I found that after a certain number of documents were ingested the fuzzy 
> query had drastically lower number of results.  We have approximately 18,000 
> documents per day and after ingesting approximately 40 days of documents, the 
> next incremental day of documents results in a lower number of results of a 
> fuzzy search.
> The query :  
> http://10.100.1.xx:8080/solr/corex/select?q=cc:worde~1&facet=on&facet.field=date&fl=date&facet.sort
> produces the following result before the threshold is crossed
> <response><lst name="responseHeader">
> <int name="status">0</int><int name="QTime">2349</int><lst name="params"><str 
> name="facet">on</str><str name="fl">date</str><str name="facet.sort"/>
> <str name="q">cc:worde~1</str><str 
> name="facet.field">date</str></lst></lst><result name="response" 
> numFound="362803" start="0"></result>
> <lst name="facet_counts"><lst name="facet_queries"/><lst 
> name="facet_fields"><lst name="date">
> <int name="2012-12-31">2866</int>
> <int name="2013-01-01">11372</int>
> <int name="2013-01-02">11514</int>
> <int name="2013-01-03">12015</int>
> <int name="2013-01-04">11746</int>
> <int name="2013-01-05">10853</int>
> <int name="2013-01-06">11053</int>
> <int name="2013-01-07">11815</int>
> <int name="2013-01-08">11427</int>
> <int name="2013-01-09">11475</int>
> <int name="2013-01-10">11461</int>
> <int name="2013-01-11">12058</int>
> <int name="2013-01-12">11335</int>
> <int name="2013-01-13">12039</int>
> <int name="2013-01-14">12064</int>
> <int name="2013-01-15">12234</int>
> <int name="2013-01-16">12545</int>
> <int name="2013-01-17">11766</int>
> <int name="2013-01-18">12197</int>
> <int name="2013-01-19">11414</int>
> <int name="2013-01-20">11633</int>
> <int name="2013-01-21">12863</int>
> <int name="2013-01-22">12378</int>
> <int name="2013-01-23">11947</int>
> <int name="2013-01-24">11822</int>
> <int name="2013-01-25">11882</int>
> <int name="2013-01-26">10474</int>
> <int name="2013-01-27">11051</int>
> <int name="2013-01-28">11776</int>
> <int name="2013-01-29">11957</int>
> <int name="2013-01-30">11260</int>
> <int name="2013-01-31">8511</int>
> </lst></lst><lst name="facet_dates"/><lst 
> name="facet_ranges"/></lst></response>
> Once the 40 days of documents ingested threshold is crossed the results drop 
> as show below for the same query
> <response><lst name="responseHeader">
> <int name="status">0</int><int name="QTime">2</int><lst name="params"><str 
> name="facet">on</str><str name="fl">date</str><str name="facet.sort"/><str 
> name="q">cc:worde~1</str><str name="facet.field">date</str></lst></lst>
> <result name="response" numFound="1338" start="0"></result>
> <lst name="facet_counts"><lst name="facet_queries"/><lst 
> name="facet_fields"><lst name="date">
> <int name="2012-12-31">0</int>
> <int name="2013-01-01">41</int>
> <int name="2013-01-02">21</int>
> <int name="2013-01-03">24</int>
> <int name="2013-01-04">19</int>
> <int name="2013-01-05">9</int>
> <int name="2013-01-06">11</int>
> <int name="2013-01-07">17</int>
> <int name="2013-01-08">14</int>
> <int name="2013-01-09">24</int>
> <int name="2013-01-10">43</int>
> <int name="2013-01-11">14</int>
> <int name="2013-01-12">52</int>
> <int name="2013-01-13">57</int>
> <int name="2013-01-14">25</int>
> <int name="2013-01-15">17</int>
> <int name="2013-01-16">34</int>
> <int name="2013-01-17">11</int>
> <int name="2013-01-18">16</int>
> <int name="2013-01-19">121</int>
> <int name="2013-01-20">33</int>
> <int name="2013-01-21">26</int>
> <int name="2013-01-22">59</int>
> <int name="2013-01-23">27</int>
> <int name="2013-01-24">10</int>
> <int name="2013-01-25">9</int>
> <int name="2013-01-26">6</int>
> <int name="2013-01-27">16</int>
> <int name="2013-01-28">11</int>
> <int name="2013-01-29">15</int>
> <int name="2013-01-30">21</int>
> <int name="2013-01-31">109</int>
> <int name="2013-02-01">11</int>
> <int name="2013-02-02">7</int>
> <int name="2013-02-03">10</int>
> <int name="2013-02-04">8</int>
> <int name="2013-02-05">13</int>
> <int name="2013-02-06">75</int>
> <int name="2013-02-07">77</int>
> <int name="2013-02-08">31</int>
> <int name="2013-02-09">35</int>
> <int name="2013-02-10">22</int>
> <int name="2013-02-11">18</int>
> <int name="2013-02-12">11</int>
> <int name="2013-02-13">68</int>
> <int name="2013-02-14">40</int>
> </lst></lst><lst name="facet_dates"/><lst 
> name="facet_ranges"/></lst></response>
> I have also tested this with different months of data and have seen the same 
> issue  around the number of documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to