[jira] [Commented] (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2015-01-07 Thread Elran Dvir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267497#comment-14267497
 ] 

Elran Dvir commented on SOLR-1782:
--

Hi Patanachai,

I am using your patch for stats.facet for multivalue fields above Solr 4.8.
It works perfectly for most cases.
I found a case in which it doesn't work. When the field we facet on is a 
numeric field but is not multivalue.
The code fails on:
if (topLevelSortedValues == null) {
topLevelSortedValues = FieldCache.DEFAULT.getTermsIndex(topLevelReader, 
name);
and this the exception I get:
(SolrException.java:120) - null:java.lang.IllegalStateException: Type 
mismatch:time was indexed as NUMERIC
at 
org.apache.lucene.search.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:1161)
at 
org.apache.lucene.search.FieldCacheImpl.getTermsIndex(FieldCacheImpl.java:1145)
at 
org.apache.solr.handler.component.FieldFacetStats.facetTermNum(FieldFacetStats.java:152)
at 
org.apache.solr.request.UnInvertedField.getStats(UnInvertedField.java:587)
at 
org.apache.solr.handler.component.SimpleStats.getStatsFields(StatsComponent.java:514)
at 
org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:64)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1953)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1474)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:960)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1021)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SslConnection.handle(SslConnection.java:196)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:668)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:804)

So which field cache should I be using for numeric field?

Thanks.
 



> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
>Assignee: Hoss Man
> Attachments: SOLR-1782.2.patch, SOLR-1782.2013-01-07.patch, 
> SOLR-1782.2013-04-10.patch,

[jira] [Commented] (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2013-11-19 Thread Steven Bower (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13826558#comment-13826558
 ] 

Steven Bower commented on SOLR-1782:


As much of this functionality is replaced by the Analytics component I don't 
intent to work much more on this patch. 

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
>Assignee: Hoss Man
> Attachments: SOLR-1782.2.patch, SOLR-1782.2013-01-07.patch, 
> SOLR-1782.2013-04-10.patch, SOLR-1782.patch, SOLR-1782.patch, 
> SOLR-1782.patch, SOLR-1782.test.patch, index.rar
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2013-11-07 Thread Nelson Gonzalez Gonzalez (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816993#comment-13816993
 ] 

Nelson Gonzalez Gonzalez commented on SOLR-1782:


Hi,

Is there any patch to apply on solr version 4.4 or 4.5 to fix this issue?

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
>Assignee: Hoss Man
> Attachments: SOLR-1782.2.patch, SOLR-1782.2013-01-07.patch, 
> SOLR-1782.2013-04-10.patch, SOLR-1782.patch, SOLR-1782.patch, 
> SOLR-1782.patch, SOLR-1782.test.patch, index.rar
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2013-05-09 Thread Elran Dvir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13653039#comment-13653039
 ] 

Elran Dvir commented on SOLR-1782:
--

Hi,

I tried to apply the latest patch to 4.2.1 and it doesn't seem to be compatible.
Do you plan a patch compatible to this version?

Thanks.

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
>Assignee: Hoss Man
> Attachments: index.rar, SOLR-1782.2013-01-07.patch, 
> SOLR-1782.2013-04-10.patch, SOLR-1782.2.patch, SOLR-1782.patch, 
> SOLR-1782.patch, SOLR-1782.patch, SOLR-1782.test.patch
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2013-01-29 Thread David Christianson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565220#comment-13565220
 ] 

David Christianson commented on SOLR-1782:
--

Thanks for looking at this! I'll look at the tests in the unified patch.

Two questions, then my understanding of what's going on in the patch.
  
 1. Pure ignorance of the codebase prevents me from understanding perf issues 
with these objects - just that UnInvertedField got used for statistics 
collection with multivalued facets - how is it possible to get the term values 
without the DocTermsIndex?
 2. Should DocTermsIndex be used at all?

My understanding:

I can see there is definitely a code dup issue (or at least a lot of branching 
logic) not only in the patch but in the current code.

 1. the original StatsComponent used a DocTermsIndex to look up facet values to 
be accumulated using DTI.getOrd().
 2. In the patch UnInvertedField is used in FieldFacetStats to deal with 
multivalued facet fields, which mainly included modifying UnInvertedField to 
provide the getTermNumbers method that listed term ordinals. As an aside, I 
don't understand whether this is reasonable to do or not, why there wouldn't 
already be a way to get at all the values of a field.
 3. Within FieldFacetStats the if (null == uif) is added. The code on both 
branches is more or less duplicate, only the multivalued branch (null != uif) 
iterates over multiple ordinals using the UnInverted Field whereas the single 
valued branch only reads one ordingal. But both branches use DocTermsIndex. 
 4. UIF.getStats is used in getStatsInfo in the current code base. It looks to 
me it is specifically used when the value being computed is multivalued and in 
the patch gets modified to handle the case of a multivalued facet field 
accumulating stats on a multivalued value field.
 5. There is another facet accumulation method in FieldFacetStats used in the 
current code base when accumulating stats over multivalued value fields. This 
winds up getting the if (null == uif) treatment as well. So code duplication is 
multiplied
 6. Even though there is this code dup in the current code base wrt 
UIF.getStats it still seems like FieldFacetStats is the core - maybe there's a 
way to fix it all?



> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
>Assignee: Hoss Man
> Attachments: index.rar, SOLR-1782.2013-01-07.patch, 
> SOLR-1782.2.patch, SOLR-1782.patch, SOLR-1782.patch, SOLR-1782.patch, 
> SOLR-1782.test.patch
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2012-12-16 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13533572#comment-13533572
 ] 

Mark Miller commented on SOLR-1782:
---

Assigned to hossman in case he has forgotten this issue. He can unassign 
himself if he doesnt have time.

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
>Assignee: Hoss Man
> Attachments: index.rar, SOLR-1782.2.patch, SOLR-1782.patch, 
> SOLR-1782.patch, SOLR-1782.test.patch
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2011-04-13 Thread David Christianson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019600#comment-13019600
 ] 

David Christianson commented on SOLR-1782:
--

Is a fix anticipated? Already in the next version which we should just 
download? Other issues?

We are currently running 1.4.1 (955469) and are seeing this problem for some 
applications. The patch given does not apply out of the box to 1.4.1 (955469) 
without a few tweaks but so far appears to work.

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
> Attachments: SOLR-1782.2.patch, SOLR-1782.patch, 
> SOLR-1782.test.patch, index.rar
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2011-01-04 Thread Johannes Goll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977359#action_12977359
 ] 

Johannes Goll commented on SOLR-1782:
-

I have applied patch SOLR-1782.2.patch to the latest trunk version (revision 
1055053). I have loaded test data and compared the stats results with 1.4.1 
release version results. The new stat facet counts and sum match the expected 
counts for multi-valued fields. 

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
> Attachments: index.rar, SOLR-1782.2.patch, SOLR-1782.patch, 
> SOLR-1782.test.patch
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2011-01-03 Thread Johannes Goll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976902#action_12976902
 ] 

Johannes Goll commented on SOLR-1782:
-

Wojtek and Hoss Man - are you planning to release the changes as part of 1.4.2  
or 1.5 ?
 

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
> Attachments: index.rar, SOLR-1782.2.patch, SOLR-1782.patch, 
> SOLR-1782.test.patch
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2010-11-29 Thread Anarkii (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964706#action_12964706
 ] 

Anarkii commented on SOLR-1782:
---

Is there any update on this issue? Or if Wojtek's fix would be merged into the 
trunk? I'm trying to do a stats.facet on a field which contains multiple 
tokens, and have the same issue resulting out of getStringIndex. Say, I'm doing 
a stats.facet on a field "tags", which can contain multiple tokens...only one 
of the tags is being considered.

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
> Attachments: index.rar, SOLR-1782.2.patch, SOLR-1782.patch, 
> SOLR-1782.test.patch
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2010-06-11 Thread Wojtek Piaseczny (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877865#action_12877865
 ] 

Wojtek Piaseczny commented on SOLR-1782:


I'd like to contribute to solving this issue, but I'm not sure if I'm going 
down the right path. Here are the possible solutions I see:

1. Use UninvertedField for multi-valued facets in the StatsComponent. This 
would require a new method in UninvertedField: something like getValues(int 
docID). The problem with this is the big terms collection in UninvertedField... 
getting all values for a single document via big terms is expensive (have to 
iterate entire collection). 
2. Get facet values for the result set in the StatsComponent, then iterate 
through each value and get a new document set for each value, then iterate 
through each document in this new set and calculate stats. Sounds expensive.

Are there better options? 

> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
> Attachments: index.rar, SOLR-1782.test.patch
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2010-05-10 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866012#action_12866012
 ] 

Hoss Man commented on SOLR-1782:


Crux of the problem is in FieldFacetStats's dependency on using StringIndex ... 
at least two different code paths use this in the same (broken) way: 
Stats.Component.getFieldCacheStats and UnInvertedField.getStats


> stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued 
> fields
> -
>
> Key: SOLR-1782
> URL: https://issues.apache.org/jira/browse/SOLR-1782
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.4
> Environment: reproduced on Win2k3 using 1.5.0-dev solr ($Id: 
> CHANGES.txt 906924 2010-02-05 12:43:11Z noble $)
>Reporter: Gerald DeConto
> Attachments: index.rar, SOLR-1782.test.patch
>
>
> the StatsComponent assumes any field specified in the stats.facet param can 
> be faceted using FieldCache.DEFAULT.getStringIndex.  This can cause problems 
> with a variety of field types, but in the case of multivalued fields it can 
> either cause erroneous false stats when the number of distinct values is 
> small, or it can cause ArrayIndexOutOfBoundsException when the number of 
> distinct values is greater then the number of documents.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org