Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-07 Thread Martijn v Groningen
Hi Marc,

I'm not sure if I follow you completely, but the example you gave is
not complete. I'm missing a few tags in your example. Lets assume the
following response that the latest patches produce.


cat


hard
1

 
008
aaa aaa
ccc
 


...



The result list contains collapse groups. The name of the child
elements are the collapse head ids. Everything that falls under the
collapse head belongs to that collapse group and thus adding document
head id to the field value is unnecessary.  In the above example
document with id 009 is the document head of document with id 008.
Document with id 009 should be displayed in the search result.

>From what you have said, it seems that you properly configured the patch.

Martijn

2009/12/7 Marc Sturlese :
>
> Hey there, I have beeb testing the last patch and I think or I am missing
> something or the way to show the collapsed documents when adjacent collapse
> can be sometimes confusing:
> I am using the patch replacing queryComponent for collapseComponent (not
> using both at same time):
>   class="org.apache.solr.handler.component.CollapseComponent">
> What I have noticed is, imagin you get these results in the search:
> doc1:
>   id:001
>   collapseField:ccc
> doc2:
>   id:002
>   collapseField:aaa
> doc3:
>   id:003
>   collapseField:ccc
> doc4:
>   id:004
>   collapseField:bbb
>
> And in the collapse_counts you get:
> 1
> ccc
> 
> 
> 008
> aaa aaa
> ccc
> 
> 
>
> Now, how can I know the head document of doc 008? Both 001 and 003 could
> be... wouldn't make sense to connect in someway  the uniqueField with the
> collapsed documents?
>
> Adding something to collapse_counts like:
> 1
> ccc
> 003
>
> I currently have hacked FieldValueCountCollapseCollectorFactory to return:
> ccc#003
> but this respose looks dirty...
>
> As I said maybe I am missunderstanding something and this can be knwon in
> someway. In that case can someone tell me how?
> Thanks in advance
>
>
>
>
>
>
> JIRA j...@apache.org wrote:
>>
>>
>>     [
>> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783484#action_12783484
>> ]
>>
>> Martijn van Groningen edited comment on SOLR-236 at 11/29/09 9:56 PM:
>> --
>>
>> I have attached a new patch that has the following changes:
>> # Added caching for the field collapse functionality. Check the [solr
>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure
>> field-collapsing with caching.
>> # Removed the collapse.max parameter (collapse.threshold must be used
>> instead). It was deprecated for a long time.
>>
>>       was (Author: martijn):
>>     I have attached a new patch that has the following changes:
>> # Added caching for the field collapse functionality. Check the [solr
>> wiki|http://wiki.apache.org/solr/FieldCollapsing] for how to configure the
>> field-collapsing with caching.
>> # Removed the collapse.max parameter (collapse.threshold must be used
>> instead). It was deprecated for a long time.
>>
>>> Field collapsing
>>> 
>>>
>>>                 Key: SOLR-236
>>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>>             Project: Solr
>>>          Issue Type: New Feature
>>>          Components: search
>>>    Affects Versions: 1.3
>>>            Reporter: Emmanuel Keller
>>>             Fix For: 1.5
>>>
>>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch,
>>> collapsing-patch-to-1.3.0-ivan.patch,
>>> collapsing-patch-to-1.3.0-ivan_2.patch,
>>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
>>> field-collapse-4-with-solrj.patch, field-collapse-5.patch,
>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
>>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
>>> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
>>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
>>> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
>>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
>>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
>>> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>>
>>>
>>> This patch include a new feature called "Field collapsing".
>>> "Used in order to collapse a group of results with similar value for a
>>> given field to a single entry in the result set. Site collapsing is a
>>> special case of this, where all results for a given web site is collapse

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-07 Thread Martijn v Groningen
Yes it should look similar to that. What is the exact request you send to Solr?
Also to check if the patch works correctly can you run: ant clean test
There are a number of tests that test the Field collapse functionality.

Martijn


2009/12/7 Marc Sturlese :
>
>>
>>   cat
>>    
>>        
>>            hard
>>           1
>>            
>>                 
>>                    008
>>                    aaa aaa
>>                    ccc
>>                 
>>            
>>        
>>        ...
>>    
>>
> I see, looks like I am applying the patch wrongly somehow.
> This the complete collapse_counts response I am getting:
> 
>  col
>  
>    
>      1
>      1
>      1
>      bbb
>      ccc
>      xxx
>      
>        
>          2
>          aaa aaa
>          bbb
>        
>      
>      
>        
>          8
>          aaa aaa aaa sd
>          ccc
>       
>      
>      
>        
>          12
>          aaa aaa aaa v
>          xxx
>        
>      
>    
>  
> 
>
> As you can see I am getting a  tag with no name. As I understood what
> you told me. I should be getting as many lst tags as collapsed groups and
> the name attribute of the lst should be the unique field value. So, if the
> patch was applyed correcly teh response should look like:
>
> 
>  col
>  
>    1
>      bbb
>      
>        
>          2
>          aaa aaa
>          bbb
>        
>      
>    
>    
>      1
>      ccc
>      
>        
>          8
>          aaa aaa aaa sd
>          ccc
>       
>      
>    
>    
>      1
>      xxx
>      
>        
>          12
>          aaa aaa aaa v
>          xxx
>        
>      
>    
>  
> 
>
> Is this the way the response looks like when you use teh patch?
> Thanks in advance
>
>
> Martijn v Groningen wrote:
>>
>> Hi Marc,
>>
>> I'm not sure if I follow you completely, but the example you gave is
>> not complete. I'm missing a few tags in your example. Lets assume the
>> following response that the latest patches produce.
>>
>> 
>>     cat
>>     
>>         
>>             hard
>>             1
>>             
>>                  
>>                     008
>>                     aaa aaa
>>                     ccc
>>                  
>>             
>>         
>>         ...
>>     
>> 
>>
>> The result list contains collapse groups. The name of the child
>> elements are the collapse head ids. Everything that falls under the
>> collapse head belongs to that collapse group and thus adding document
>> head id to the field value is unnecessary.  In the above example
>> document with id 009 is the document head of document with id 008.
>> Document with id 009 should be displayed in the search result.
>>
>> From what you have said, it seems that you properly configured the patch.
>>
>> Martijn
>>
>> 2009/12/7 Marc Sturlese :
>>>
>>> Hey there, I have beeb testing the last patch and I think or I am missing
>>> something or the way to show the collapsed documents when adjacent
>>> collapse
>>> can be sometimes confusing:
>>> I am using the patch replacing queryComponent for collapseComponent (not
>>> using both at same time):
>>>  >> class="org.apache.solr.handler.component.CollapseComponent">
>>> What I have noticed is, imagin you get these results in the search:
>>> doc1:
>>>   id:001
>>>   collapseField:ccc
>>> doc2:
>>>   id:002
>>>   collapseField:aaa
>>> doc3:
>>>   id:003
>>>   collapseField:ccc
>>> doc4:
>>>   id:004
>>>   collapseField:bbb
>>>
>>> And in the collapse_counts you get:
>>> 1
>>> ccc
>>> 
>>> 
>>> 008
>>> aaa aaa
>>> ccc
>>> 
>>> 
>>>
>>> Now, how can I know the head document of doc 008? Both 001 and 003 could
>>> be... wouldn't make sense to connect in someway  the uniqueField with the
>>> collapsed documents?
>>>
>>> Adding something to collapse_counts like:
>>> 1
>>> ccc
>>> 003
>>>
>>> I currently have hacked FieldValueCountCollapseCollectorFactory to
>>> return:
>>> ccc#003
>>> but this respose looks dirty...
>>>
>>> As I 

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-07 Thread Martijn v Groningen
The last two parameters are not necessary, since they default both to
true. Could you run the field collapse tests tests successful?

2009/12/7 Marc Sturlese :
>
> The request I am sending is:
> http://localhost:8983/solr/select/?q=aaa&version=2.2&start=0&rows=20&indent=on&collapse.field=col&collapse.includeCollapsedDocs.fl=*&collapse.type=adjacent&collapse.info.doc=true&collapse.info.count=true
>
> I search for 'aaa' in the content field. All the documents in the result
> contain that string in the field content
>
> Martijn v Groningen wrote:
>>
>> Yes it should look similar to that. What is the exact request you send to
>> Solr?
>> Also to check if the patch works correctly can you run: ant clean test
>> There are a number of tests that test the Field collapse functionality.
>>
>> Martijn
>>
>>
>> 2009/12/7 Marc Sturlese :
>>>
>>>>
>>>>   cat
>>>>    
>>>>        
>>>>            hard
>>>>           1
>>>>            
>>>>                 
>>>>                    008
>>>>                    aaa aaa
>>>>                    ccc
>>>>                 
>>>>            
>>>>        
>>>>        ...
>>>>    
>>>>
>>> I see, looks like I am applying the patch wrongly somehow.
>>> This the complete collapse_counts response I am getting:
>>> 
>>>  col
>>>  
>>>    
>>>      1
>>>      1
>>>      1
>>>      bbb
>>>      ccc
>>>      xxx
>>>      
>>>        
>>>          2
>>>          aaa aaa
>>>          bbb
>>>        
>>>      
>>>      
>>>        
>>>          8
>>>          aaa aaa aaa sd
>>>          ccc
>>>       
>>>      
>>>      
>>>        
>>>          12
>>>          aaa aaa aaa v
>>>          xxx
>>>        
>>>      
>>>    
>>>  
>>> 
>>>
>>> As you can see I am getting a  tag with no name. As I understood
>>> what
>>> you told me. I should be getting as many lst tags as collapsed groups and
>>> the name attribute of the lst should be the unique field value. So, if
>>> the
>>> patch was applyed correcly teh response should look like:
>>>
>>> 
>>>  col
>>>  
>>>    1
>>>      bbb
>>>      
>>>        
>>>          2
>>>          aaa aaa
>>>          bbb
>>>        
>>>      
>>>    
>>>    
>>>      1
>>>      ccc
>>>      
>>>        
>>>          8
>>>          aaa aaa aaa sd
>>>          ccc
>>>       
>>>      
>>>    
>>>    
>>>      1
>>>      xxx
>>>      
>>>        
>>>          12
>>>          aaa aaa aaa v
>>>          xxx
>>>        
>>>      
>>>    
>>>  
>>> 
>>>
>>> Is this the way the response looks like when you use teh patch?
>>> Thanks in advance
>>>
>>>
>>> Martijn v Groningen wrote:
>>>>
>>>> Hi Marc,
>>>>
>>>> I'm not sure if I follow you completely, but the example you gave is
>>>> not complete. I'm missing a few tags in your example. Lets assume the
>>>> following response that the latest patches produce.
>>>>
>>>> 
>>>>     cat
>>>>     
>>>>         
>>>>             hard
>>>>             1
>>>>             
>>>>                  
>>>>                     008
>>>>                     aaa aaa
>>>>                     ccc
>>>>                  
>>>>             
>>>>         
>>>>         ...
>>>>     
>>>> 
>>>>
>>>> The result list contains collapse groups. The name of the child
>>>> elements are the collapse head ids. Everything that falls under the
>>>> collapse head belongs to that collapse group and thus adding document
>>>> head id to the field value is unnecessary.  In the above example
>>>> document with id 009 is the document head of document with id 008.
>>>> Document with id 009

Re: [jira] Issue Comment Edited: (SOLR-236) Field collapsing

2009-12-07 Thread Martijn v Groningen
Yes, I can reproduce the same situation here. I will update the patch
asap and add it to Jira.

Martijn

2009/12/7 Marc Sturlese :
>
> Hey! Got it working!
> The problem was that my uniqueField is indexed as long and it's not suported
> by the patch.
> The value is obtained in getCollapseGroupResult function in
> AbstarctCollapseCollector.java as:
>
> String schemaId = searcher.doc(docId).get(uniqueIdFieldname);
>
> To suport long,int,slong,sint,float,sfloat...
> It should be obtaining doing somenthing like:
>
> FieldType idFieldType =
> searcher.getSchema().getFieldType(uniqueIdFieldname);
> String schemaId = "";
> Fieldable name_field = null;
> try {
>      name_field = searcher.doc(id).getFieldable(uniqueIdFieldname);
> } catch (IOException ex) {
>      //deal with exception
> }
> if (name_field != null) {
>   schemaId = idFieldType.storedToReadable(name_field);
> }
>
>
> Martijn v Groningen wrote:
>>
>> The last two parameters are not necessary, since they default both to
>> true. Could you run the field collapse tests tests successful?
>>
>> 2009/12/7 Marc Sturlese :
>>>
>>> The request I am sending is:
>>> http://localhost:8983/solr/select/?q=aaa&version=2.2&start=0&rows=20&indent=on&collapse.field=col&collapse.includeCollapsedDocs.fl=*&collapse.type=adjacent&collapse.info.doc=true&collapse.info.count=true
>>>
>>> I search for 'aaa' in the content field. All the documents in the result
>>> contain that string in the field content
>>>
>>> Martijn v Groningen wrote:
>>>>
>>>> Yes it should look similar to that. What is the exact request you send
>>>> to
>>>> Solr?
>>>> Also to check if the patch works correctly can you run: ant clean test
>>>> There are a number of tests that test the Field collapse functionality.
>>>>
>>>> Martijn
>>>>
>>>>
>>>> 2009/12/7 Marc Sturlese :
>>>>>
>>>>>>
>>>>>>   cat
>>>>>>    
>>>>>>        
>>>>>>            hard
>>>>>>           1
>>>>>>            
>>>>>>                 
>>>>>>                    008
>>>>>>                    aaa aaa
>>>>>>                    ccc
>>>>>>                 
>>>>>>            
>>>>>>        
>>>>>>        ...
>>>>>>    
>>>>>>
>>>>> I see, looks like I am applying the patch wrongly somehow.
>>>>> This the complete collapse_counts response I am getting:
>>>>> 
>>>>>  col
>>>>>  
>>>>>    
>>>>>      1
>>>>>      1
>>>>>      1
>>>>>      bbb
>>>>>      ccc
>>>>>      xxx
>>>>>      
>>>>>        
>>>>>          2
>>>>>          aaa aaa
>>>>>          bbb
>>>>>        
>>>>>      
>>>>>      
>>>>>        
>>>>>          8
>>>>>          aaa aaa aaa sd
>>>>>          ccc
>>>>>       
>>>>>      
>>>>>      
>>>>>        
>>>>>          12
>>>>>          aaa aaa aaa v
>>>>>          xxx
>>>>>        
>>>>>      
>>>>>    
>>>>>  
>>>>> 
>>>>>
>>>>> As you can see I am getting a  tag with no name. As I understood
>>>>> what
>>>>> you told me. I should be getting as many lst tags as collapsed groups
>>>>> and
>>>>> the name attribute of the lst should be the unique field value. So, if
>>>>> the
>>>>> patch was applyed correcly teh response should look like:
>>>>>
>>>>> 
>>>>>  col
>>>>>  
>>>>>    1
>>>>>      bbb
>>>>>      
>>>>>        
>>>>>          2
>>>>>          aaa aaa
>>>>>          bbb
>>>>>        
>>>>>      
>>>>>    
>>>>>    
>>>>>      1
>>>>>      ccc
>>>>>      
>>>>>        
>>>>>          8
>

Re: [jira] Commented: (SOLR-236) Field collapsing

2009-12-23 Thread Martijn v Groningen
Yes, I used his patch.

2009/12/23 Noble Paul (JIRA) :
>
>    [ 
> https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794050#action_12794050
>  ]
>
> Noble Paul commented on SOLR-236:
> -
>
> is't the patch built on the one given by shalin? the configuration looks 
> different...
>
>> Field collapsing
>> 
>>
>>                 Key: SOLR-236
>>                 URL: https://issues.apache.org/jira/browse/SOLR-236
>>             Project: Solr
>>          Issue Type: New Feature
>>          Components: search
>>    Affects Versions: 1.3
>>            Reporter: Emmanuel Keller
>>            Assignee: Shalin Shekhar Mangar
>>             Fix For: 1.5
>>
>>         Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
>> collapsing-patch-to-1.3.0-ivan.patch, 
>> collapsing-patch-to-1.3.0-ivan_2.patch, 
>> collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
>> field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
>> field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
>> field-collapse-5.patch, field-collapse-5.patch, 
>> field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
>> field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
>> field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
>> field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
>> quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
>> SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
>> SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
>> solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch
>>
>>
>> This patch include a new feature called "Field collapsing".
>> "Used in order to collapse a group of results with similar value for a given 
>> field to a single entry in the result set. Site collapsing is a special case 
>> of this, where all results for a given web site is collapsed into one or two 
>> entries in the result set, typically with an associated "more documents from 
>> this site" link. See also Duplicate detection."
>> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
>> The implementation add 3 new query parameters (SolrParams):
>> "collapse.field" to choose the field used to group results
>> "collapse.type" normal (default value) or adjacent
>> "collapse.max" to select how many continuous results are allowed before 
>> collapsing
>> TODO (in progress):
>> - More documentation (on source code)
>> - Test cases
>> Two patches:
>> - "field_collapsing.patch" for current development version
>> - "field_collapsing_1.1.0.patch" for Solr-1.1.0
>> P.S.: Feedback and misspelling correction are welcome ;-)
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>



-- 
Met vriendelijke groet,

Martijn van Groningen