[jira] Commented: (SOLR-236) Field collapsing

2007-05-12 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495334
 ] 

Ryan McKinley commented on SOLR-236:


This looks good.  Someone with better lucene chops should look at the 
IndexSearcher getDocListAndSet part...

A few comments/questions about the interface:

If you apply all the example docs and hit:
http://localhost:8983/solr/select/?q=*:*&collapse=true

you get 500.  We should use:  params.required().get( "collapse.field" ) to have 
a nicer error:

With:
http://localhost:8983/solr/select/?q=*:*&collapse=true&collapse.field=manu&collapse.max=1

the collapse info at the bottom says:


 3
 5
 9


what does that mean?  How would you use it? How does it relate to the  Field collapsing
> 
>
> Key: SOLR-236
> URL: https://issues.apache.org/jira/browse/SOLR-236
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.2
>Reporter: Emmanuel Keller
> Attachments: collapse_field.patch, collapse_field.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-237) Field collapsing

2007-05-12 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495332
 ] 

Ryan McKinley commented on SOLR-237:


Looks like *I* missed something...  yes, SOLR-236 applies to trunk fine.  I 
didn't notice it because I was looking at this issue.

Since any further development/integration should happen on SOLR-236, I think we 
should close this issue and mark it as a duplicate.  

I'll put my substantive comments on SOLR-236...

> Field collapsing
> 
>
> Key: SOLR-237
> URL: https://issues.apache.org/jira/browse/SOLR-237
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.1.0
>Reporter: Emmanuel Keller
> Attachments: field_collapsing-1.1.patch, field_collapsing_1.1.0.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-12 Thread Chris Hostetter

: > Incidently, PatternTokenizerFactory seems to have the anoying limitation
: > of assuming there is a token prior to each match -- even if the match
: > explicitly matches on the start of the string (so it creates a 0 width
: > token) ... that seems like a bug right?

: how would you change it?  I don't know regex well enough to see the
: limitation.  My only criteria was that the output is the same as if you
: send it to string.split( pattern );

Ahhh yes i see ... if you are trying to mimic String.split (or
Pattern.split) then the current behavior is correct.  my thinking was that
if you were trying to use this to tokenize on whitespace (or something
like that) and your input as "  aaa bbb   ccc  " ... this would create 4
tokens: an zero width token, followed by tokens for aaa, bbb, and ccc ...
but that first token seeemed like a mistake to me (or if it's not a
mistake, then it seemed like there should also be a zero width width token
at the end after the last space too ... but that's the say string
splitting works too.


-Hoss



[jira] Commented: (SOLR-237) Field collapsing

2007-05-12 Thread Emmanuel Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495292
 ] 

Emmanuel Keller commented on SOLR-237:
--

I provided two patch.

The first was made off current trunk (see solr-236). And this one (solr-237) 
for release 1.1.

Is that correct ? Or did I miss something ?




> Field collapsing
> 
>
> Key: SOLR-237
> URL: https://issues.apache.org/jira/browse/SOLR-237
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.1.0
>Reporter: Emmanuel Keller
> Attachments: field_collapsing-1.1.patch, field_collapsing_1.1.0.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-12 Thread Ryan McKinley

Chris Hostetter wrote:

: After 1/2 hour of regex hacking... I think I'll stick with a two step
: process: split then trim ;)

But regex hacking is FUN!!

I'm 99% certain this does waht you want...



yup!  that does it.  thanks



..if it doesn't send me an example string that it fails on and tell me
what hte desired output is.

Incidently, PatternTokenizerFactory seems to have the anoying limitation
of assuming there is a token prior to each match -- even if the match
explicitly matches on the start of the string (so it creates a 0 width
token) ... that seems like a bug right?



how would you change it?  I don't know regex well enough to see the 
limitation.  My only criteria was that the output is the same as if you 
send it to string.split( pattern );



thanks again
ryan


[jira] Commented: (SOLR-237) Field collapsing

2007-05-12 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495285
 ] 

Ryan McKinley commented on SOLR-237:


can you make a patch off:
http://svn.apache.org/repos/asf/lucene/solr/trunk/

thanks

> Field collapsing
> 
>
> Key: SOLR-237
> URL: https://issues.apache.org/jira/browse/SOLR-237
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.1.0
>Reporter: Emmanuel Keller
> Attachments: field_collapsing-1.1.patch, field_collapsing_1.1.0.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-237) Field collapsing

2007-05-12 Thread Emmanuel Keller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emmanuel Keller updated SOLR-237:
-

Attachment: field_collapsing-1.1.patch

Patch from http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.1

> Field collapsing
> 
>
> Key: SOLR-237
> URL: https://issues.apache.org/jira/browse/SOLR-237
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.1.0
>Reporter: Emmanuel Keller
> Attachments: field_collapsing-1.1.patch, field_collapsing_1.1.0.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-237) Field collapsing

2007-05-12 Thread Emmanuel Keller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495244
 ] 

Emmanuel Keller commented on SOLR-237:
--

Ryan,

I used the following svn 
path:http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.1.0
Last changed revision: 489774

Shoud I use this one ?
http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.1
Last changed revision: 488066

Thanks for you reply
Emmanuel.

> Field collapsing
> 
>
> Key: SOLR-237
> URL: https://issues.apache.org/jira/browse/SOLR-237
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.1.0
>Reporter: Emmanuel Keller
> Attachments: field_collapsing_1.1.0.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-234) TrimFilter should update the start and end offsets

2007-05-12 Thread Chris Hostetter

: After 1/2 hour of regex hacking... I think I'll stick with a two step
: process: split then trim ;)

But regex hacking is FUN!!

I'm 99% certain this does waht you want...



[jira] Commented: (SOLR-237) Field collapsing

2007-05-12 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495234
 ] 

Ryan McKinley commented on SOLR-237:


Thanks for posting this Emmanuel!

I'm having trouble applying the patch...  I get an error that says something 
like "this patch seems outdated!"

Did you build it with a recent svn checkout?

thanks
ryan

> Field collapsing
> 
>
> Key: SOLR-237
> URL: https://issues.apache.org/jira/browse/SOLR-237
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.1.0
>Reporter: Emmanuel Keller
> Attachments: field_collapsing_1.1.0.patch
>
>
> This patch include a new feature called "Field collapsing".
> "Used in order to collapse a group of results with similar value for a given 
> field to a single entry in the result set. Site collapsing is a special case 
> of this, where all results for a given web site is collapsed into one or two 
> entries in the result set, typically with an associated "more documents from 
> this site" link. See also Duplicate detection."
> http://www.fastsearch.com/glossary.aspx?m=48&amid=299
> The implementation add 3 new query parameters (SolrParams):
> "collapse" set to true to enable collapsing.
> "collapse.field" to choose the field used to group results
> "collapse.max" to select how many continuous results are allowed before 
> collapsing
> TODO (in progress):
> - More documentation (on source code)
> - Test cases 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.