Re: update Lucene

2009-05-30 Thread Otis Gospodnetic

Clearly I meant "...along with *Lucene* jars" :)

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Otis Gospodnetic 
> To: solr-dev@lucene.apache.org
> Sent: Wednesday, May 27, 2009 11:59:18 PM
> Subject: Re: update Lucene
> 
> 
> I wonder if it would be useful to commit Lucene's CHANGES.txt into Solr along 
> with Solr jars.  It would then be very easy to tell what changed in Lucene 
> since 
> the version Solr has and the current version of Lucene (or some newer 
> released 
> version, if we were able to be behind).
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
> > From: Yonik Seeley 
> > To: solr-dev@lucene.apache.org
> > Sent: Wednesday, May 27, 2009 4:58:39 PM
> > Subject: update Lucene
> > 
> > I think we should upgrade Lucene again since the index file format has 
> changed:
> > https://issues.apache.org/jira/browse/LUCENE-1654
> > 
> > This also contains a fix for unifying the FieldCache and
> > ExtendedFieldCache instances.
> > 
> > $ svn diff -r r776177 CHANGES.txt
> > Index: CHANGES.txt
> > ===
> > --- CHANGES.txt(revision 776177)
> > +++ CHANGES.txt(working copy)
> > @@ -27,7 +27,11 @@
> >  implement Searchable or extend Searcher, you should change you
> >  code to implement this method.  If you already extend
> >  IndexSearcher, no further changes are needed to use Collector.
> > -(Shai Erera via Mike McCandless)
> > +
> > +Finally, the values Float.Nan, Float.NEGATIVE_INFINITY and
> > +Float.POSITIVE_INFINITY are not valid scores.  Lucene uses these
> > +values internally in certain places, so if you have hits with such
> > +scores it will cause problems. (Shai Erera via Mike McCandless)
> > 
> > Changes in runtime behavior
> > 
> > @@ -107,10 +111,10 @@
> > that's visited.  All core collectors now use this API.  (Mark
> > Miller, Mike McCandless)
> > 
> > -8. LUCENE-1546: Add IndexReader.flush(String commitUserData), allowing
> > -   you to record an opaque commitUserData into the commit written by
> > -   IndexReader.  This matches IndexWriter's commit methods.  (Jason
> > -   Rutherglen via Mike McCandless)
> > +8. LUCENE-1546: Add IndexReader.flush(Map commitUserData), allowing
> > +   you to record an opaque commitUserData (maps String -> String) into
> > +   the commit written by IndexReader.  This matches IndexWriter's
> > +   commit methods.  (Jason Rutherglen via Mike McCandless)
> > 
> > 9. LUCENE-652: Added org.apache.lucene.document.CompressionTools, to
> > enable compressing & decompressing binary content, external to
> > @@ -135,6 +139,9 @@
> >  not make sense for all subclasses of MultiTermQuery. Check individual
> >  subclasses to see if they support #getTerm().  (Mark Miller)
> > 
> > +14. LUCENE-1636: Make TokenFilter.input final so it's set only
> > +once. (Wouter Heijke, Uwe Schindler via Mike McCandless).
> > +
> > Bug fixes
> > 
> > 1. LUCENE-1415: MultiPhraseQuery has incorrect hashCode() and equals()
> > @@ -176,6 +183,9 @@
> > sort) by doc Id in a consistent manner (i.e., if Sort.FIELD_DOC
> > was used vs.
> > when it wasn't). (Shai Erera via Michael McCandless)
> > 
> > +10. LUCENE-1647: Fix case where IndexReader.undeleteAll would cause
> > +the segment's deletion count to be incorrect. (Mike McCandless)
> > +
> >   New features
> > 
> >   1. LUCENE-1411: Added expert API to open an IndexWriter on a prior
> > @@ -186,10 +196,11 @@
> >  when building transactional support on top of Lucene.  (Mike
> >  McCandless)
> > 
> > - 2. LUCENE-1382: Add an optional arbitrary String "commitUserData" to
> > -IndexWriter.commit(), which is stored in the segments file and is
> > -then retrievable via IndexReader.getCommitUserData instance and
> > -static methods.  (Shalin Shekhar Mangar via Mike McCandless)
> > + 2. LUCENE-1382: Add an optional arbitrary Map (String -> String)
> > +"commitUserData" to IndexWriter.commit(), which is stored in the
> > +segments file and is then retrievable via
> > +IndexReader.getCommitUserData instance and static methods.
> > +(Shalin Shekhar Mangar via Mike McCandless)
> > 
> >   3. LUCENE-1406: Added Arabic analyzer.  (Robert Muir via Grant Ingersoll)
> > 
> > @@ -311,6 +322,10 @@
> > 25. LUCENE-1634: Add calibrateSizeByDeletes to LogMergePolicy, to take
> >  deletions into account when considering merges.  (Yasuhiro Matsuda
> >  via Mike McCandless)
> > +
> > +26. LUCENE-1550: Added new n-gram based String distance measure for
> > spell checking.
> > +See the Javadocs for NGramDistance.java for a reference paper on
> > why this is helpful (Tom Morton via Grant Ingersoll)
> > +
> > 
> > Optimizations
> > 
> > 
> > -Yonik
> > http://www.lucidimagination.com



Re: Streaming Docs, Terms, TermVectors

2009-05-30 Thread Yonik Seeley
On a single server, Solr already does streaming of returned
documents... the stored fields of selected docs are retrieved one at a
time as they are written to the socket.  The servlet container already
handles sending out chunked encoding for large responses too.

-Yonik
http://www.lucidimagination.com

On Sat, May 30, 2009 at 12:45 PM, Grant Ingersoll  wrote:
> Anyone have any thoughts on what is involved with streaming lots of results
> out of Solr?
>
> For instance, if I wanted to get something like 1M docs out of Solr (or
> more) via *:* query, how can I tractably do this?  Likewise, if I wanted to
> return all the terms in the index or all the Term Vectors.
>
> Obviously, it is impossible to load all of these things into memory and then
> create a response, so I was wondering if anyone had any ideas on how to
> stream them.
>
> Thanks,
> Grant
>


Re: Streaming Docs, Terms, TermVectors

2009-05-30 Thread Walter Underwood
Don't stream, request chunks of 10 or 100 at a time. It works fine and
you don't have to write or test any new code. In addition, it works
well with HTTP caches, so if two clients want to get the same data,
the second can get it from the cache.

We do that at Netflix. Each front-end box does a series of queries
to get all the movie titles, then loads them into a local index for
autocomplete.

wunder

On 5/30/09 11:01 AM, "Kaktu Chakarabati"  wrote:

> For a streaming-like solution, it is possible infact to have a working
> buffer in-memory that emits chunks on an http connection which is kept alive
> by the server until the full response has been sent.
> This is quite similar for example to how video streaming protocols which can
> operate on top of HTTP work ( cf. a more general discussion on
> http://ajaxpatterns.org/HTTP_Streaming#In_A_Blink ).
> Another (non-mutually exclusive) possibility is to introduce a novel binary
> format for the transmission of such data ( i.e a new wt=<..> type ) over
> http (or any other comm. protocol) so that data can be more effectively
> compressed and made to better fit into memory.
> One such format which has been widely circulating and already has many open
> source projects implementing it is Adobe's AMF (
> http://osflash.org/documentation/amf ). It is however a proprietary format
> so i'm not sure whether it is incorporable under apache foundation terms.
> 
> -Chak
> 
> 
> On Sat, May 30, 2009 at 9:58 AM, Dietrich Featherston
> wrote:
> 
>> I was actually curious about the same thing.  Perhaps an endpoint reference
>> could be passed in the request where the documents can be sent
>> asynchronously, such as a jms topic.
>> 
>> solr/query?q=*:*&epr=/my/topic&eprtype=jms
>> 
>> Then we would need to consider how to break up the response, how to cancel
>> a running query, etc.
>> 
>> Is this along the lines of what you're looking for?  I would be interested
>> in looking at how the request/response contract changes and what types of
>> endpoint references would be supported.
>> 
>> Thanks,
>> D
>> 
>> On May 30, 2009, at 12:45 PM, Grant Ingersoll  wrote:
>> 
>>  Anyone have any thoughts on what is involved with streaming lots of
>>> results out of Solr?
>>> 
>>> For instance, if I wanted to get something like 1M docs out of Solr (or
>>> more) via *:* query, how can I tractably do this?  Likewise, if I wanted to
>>> return all the terms in the index or all the Term Vectors.
>>> 
>>> Obviously, it is impossible to load all of these things into memory and
>>> then create a response, so I was wondering if anyone had any ideas on how to
>>> stream them.
>>> 
>>> Thanks,
>>> Grant
>>> 
>> 



Re: Streaming Docs, Terms, TermVectors

2009-05-30 Thread Kaktu Chakarabati
For a streaming-like solution, it is possible infact to have a working
buffer in-memory that emits chunks on an http connection which is kept alive
by the server until the full response has been sent.
This is quite similar for example to how video streaming protocols which can
operate on top of HTTP work ( cf. a more general discussion on
http://ajaxpatterns.org/HTTP_Streaming#In_A_Blink ).
Another (non-mutually exclusive) possibility is to introduce a novel binary
format for the transmission of such data ( i.e a new wt=<..> type ) over
http (or any other comm. protocol) so that data can be more effectively
compressed and made to better fit into memory.
One such format which has been widely circulating and already has many open
source projects implementing it is Adobe's AMF (
http://osflash.org/documentation/amf ). It is however a proprietary format
so i'm not sure whether it is incorporable under apache foundation terms.

-Chak


On Sat, May 30, 2009 at 9:58 AM, Dietrich Featherston 
wrote:

> I was actually curious about the same thing.  Perhaps an endpoint reference
> could be passed in the request where the documents can be sent
> asynchronously, such as a jms topic.
>
> solr/query?q=*:*&epr=/my/topic&eprtype=jms
>
> Then we would need to consider how to break up the response, how to cancel
> a running query, etc.
>
> Is this along the lines of what you're looking for?  I would be interested
> in looking at how the request/response contract changes and what types of
> endpoint references would be supported.
>
> Thanks,
> D
>
>
>
>
>
>
> On May 30, 2009, at 12:45 PM, Grant Ingersoll  wrote:
>
>  Anyone have any thoughts on what is involved with streaming lots of
>> results out of Solr?
>>
>> For instance, if I wanted to get something like 1M docs out of Solr (or
>> more) via *:* query, how can I tractably do this?  Likewise, if I wanted to
>> return all the terms in the index or all the Term Vectors.
>>
>> Obviously, it is impossible to load all of these things into memory and
>> then create a response, so I was wondering if anyone had any ideas on how to
>> stream them.
>>
>> Thanks,
>> Grant
>>
>


[jira] Commented: (SOLR-236) Field collapsing

2009-05-30 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714750#action_12714750
 ] 

Martijn van Groningen commented on SOLR-236:


I'm looking forward in your experiences with this patch, particular in 
production. 

I think in order to make collapsing work on multi shard systems the process 
method of the CollapseComponent needs to be modified.
CollapseComponent already subclasses QueryComponent (which already supports 
querying on multi shard systems), so it should not be that difficult.

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
> Fix For: 1.5
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, 
> field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
> field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Streaming Docs, Terms, TermVectors

2009-05-30 Thread Dietrich Featherston
I was actually curious about the same thing.  Perhaps an endpoint  
reference could be passed in the request where the documents can be  
sent asynchronously, such as a jms topic.


solr/query?q=*:*&epr=/my/topic&eprtype=jms

Then we would need to consider how to break up the response, how to  
cancel a running query, etc.


Is this along the lines of what you're looking for?  I would be  
interested in looking at how the request/response contract changes and  
what types of endpoint references would be supported.


Thanks,
D





On May 30, 2009, at 12:45 PM, Grant Ingersoll   
wrote:


Anyone have any thoughts on what is involved with streaming lots of  
results out of Solr?


For instance, if I wanted to get something like 1M docs out of Solr  
(or more) via *:* query, how can I tractably do this?  Likewise, if  
I wanted to return all the terms in the index or all the Term Vectors.


Obviously, it is impossible to load all of these things into memory  
and then create a response, so I was wondering if anyone had any  
ideas on how to stream them.


Thanks,
Grant


Streaming Docs, Terms, TermVectors

2009-05-30 Thread Grant Ingersoll
Anyone have any thoughts on what is involved with streaming lots of  
results out of Solr?


For instance, if I wanted to get something like 1M docs out of Solr  
(or more) via *:* query, how can I tractably do this?  Likewise, if I  
wanted to return all the terms in the index or all the Term Vectors.


Obviously, it is impossible to load all of these things into memory  
and then create a response, so I was wondering if anyone had any ideas  
on how to stream them.


Thanks,
Grant


[jira] Commented: (SOLR-236) Field collapsing

2009-05-30 Thread Oleg Gnatovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714742#action_12714742
 ] 

Oleg Gnatovskiy commented on SOLR-236:
--

Hey guys, are there any plans to make field collapsing work on multi shard 
systems?

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
> Fix For: 1.5
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, 
> field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
> field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2009-05-30 Thread Thomas Traeger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714738#action_12714738
 ] 

Thomas Traeger commented on SOLR-236:
-

The problem is solved, thanks. I will use your patch for my current project 
that is planned for golive in 5 weeks. If I find any more issues I will report 
them here.

> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
> Fix For: 1.5
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, 
> field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
> field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (SOLR-1155) Change DirectUpdateHandler2 to allow concurrent adds during an autocommit

2009-05-30 Thread Ryan McKinley
Seems ok now...


On Fri, May 29, 2009 at 7:47 PM, Mike Klaas  wrote:
> I'd like to take a look at this but JIRA seems to be down. Is anyone else
> experiencing this?
>
> -Mike
>
>
> On Wed, May 13, 2009 at 7:41 AM, Jayson Minard (JIRA) wrote:
>
>>
>>     [
>> https://issues.apache.org/jira/browse/SOLR-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>>
>> Jayson Minard updated SOLR-1155:
>> 
>>
>>     Attachment: Solr-1155.patch
>>
>> Resolve TODO for commitWithin, and updated AutoCommitTrackerTest to
>> validate the fix.
>>
>> > Change DirectUpdateHandler2 to allow concurrent adds during an autocommit
>> > -
>> >
>> >                 Key: SOLR-1155
>> >                 URL: https://issues.apache.org/jira/browse/SOLR-1155
>> >             Project: Solr
>> >          Issue Type: Improvement
>> >          Components: search
>> >    Affects Versions: 1.3
>> >            Reporter: Jayson Minard
>> >         Attachments: Solr-1155.patch, Solr-1155.patch
>> >
>> >
>> > Currently DirectUpdateHandler2 will block adds during a commit, and it
>> seems to be possible with recent changes to Lucene to allow them to run
>> concurrently.
>> > See:
>> http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--td23435224.html
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>


[jira] Updated: (SOLR-236) Field collapsing

2009-05-30 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-236:
---

Attachment: field-collapse-solr-236-2.patch

Thanks for the feedback, I fixed the problem you described and I have added a 
new patch containing the fix.
The problem occurred when sorting was done on one ore more normal fields and on 
scoring. 



> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
> Fix For: 1.5
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236-2.patch, 
> field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, 
> field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases
> Two patches:
> - "field_collapsing.patch" for current development version
> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
> P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2009-05-30 Thread Thomas Traeger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714676#action_12714676
 ] 

Thomas Traeger commented on SOLR-236:
-

I made some tests with your patch and trunk (rev. 779497). It looks good so far 
but I have some problems with occasional null pointer exceptions when using the 
sort parameter:

[http://localhost:8983/solr/select?q=*:*&collapse.field=manu&sort=score%20desc,alphaNameSort%20asc]

java.lang.NullPointerException
at 
org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:421)
at 
org.apache.solr.search.CollapseFilter$DocumentComparator.compare(CollapseFilter.java:649)
at 
org.apache.solr.search.CollapseFilter$DocumentPriorityQueue.lessThan(CollapseFilter.java:596)
at 
org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:153)
at 
org.apache.solr.search.CollapseFilter.normalCollapse(CollapseFilter.java:321)
at org.apache.solr.search.CollapseFilter.(CollapseFilter.java:211)
at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:67)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1328)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

These queries work as expected:
http://localhost:8983/solr/select?q=*:*&collapse.field=manu&sort=score%20desc
http://localhost:8983/solr/select?q=*:*&sort=score%20desc,alphaNameSort%20asc


> Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Emmanuel Keller
> Fix For: 1.5
>
> Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
> collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-solr-236.patch, 
> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
> SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
> SOLR-236_collapsing.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse.field" to choose the field used to group results
> "collapse.type" normal (default value) or adjacent
> "collapse.max" to select how many continuous results are allowed