Re: IDF in Distributed Search

2008-04-11 Thread Walter Underwood
Global IDF does not require another request/response.
It is nearly free if you return the right info.

Return the total number of docs and the df in the original
response. Sum the doc counts and dfs, recompute the idf,
and re-rank.

See this post for an efficient way to do it:

  
http://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.htm
l

This works best if you treat the results from each server as
a queue and refill just that queue when it is exhausted. All the
good results might be from one server.

wunder

On 4/11/08 8:50 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote:

> On Fri, Apr 11, 2008 at 11:39 PM, Otis Gospodnetic
> <[EMAIL PROTECTED]> wrote:
>>  So, I'd like to see what it would take to add distributed IDF info to Solr's
>> distributed search.
>>  Here are some questions to get the discussion going:
>>  - Is anyone already working on it?
>>  - Does anyone plan on working on it in the very near future?
>>  - Does anyone already have thoughts how and where dist. idf could be plugged
>> in?
>>  - There is a mention of dist idf and performance cost up there - any idea
>> how costly dist idf would
> 
> It's relatively easy to implement, but the performance cost is is not
> negligible since it adds another search "phase" (another
> request-response).  It should be optional of course (globalidf=true),
> so there is no reason not to add this feature.
> 
> I also left room for this stage (ResponseBuilder.STAGE_PARSE_QUERY),
> which is ordered before query execution.
> 
> -Yonik



Re: IDF in Distributed Search

2008-04-11 Thread Yonik Seeley
On Fri, Apr 11, 2008 at 11:39 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
>  So, I'd like to see what it would take to add distributed IDF info to Solr's 
> distributed search.
>  Here are some questions to get the discussion going:
>  - Is anyone already working on it?
>  - Does anyone plan on working on it in the very near future?
>  - Does anyone already have thoughts how and where dist. idf could be plugged 
> in?
>  - There is a mention of dist idf and performance cost up there - any idea 
> how costly dist idf would

It's relatively easy to implement, but the performance cost is is not
negligible since it adds another search "phase" (another
request-response).  It should be optional of course (globalidf=true),
so there is no reason not to add this feature.

I also left room for this stage (ResponseBuilder.STAGE_PARSE_QUERY),
which is ordered before query execution.

-Yonik


IDF in Distributed Search

2008-04-11 Thread Otis Gospodnetic
Hi,

With a well mixed distributed set of indices not having distributed/global IDF 
won't hurt much.
But what if one has a not so well mixed up set of shards?  One might want to 
apply rules when assigning documents to shards in order to group certain types 
of documents into only a subset of all shards instead of having them spread 
across all shards.  Doing such careful  sharding might allow the searcher to be 
smarter about which shards to search based on the query of client running the 
query, etc.

Thus, I've run through comments on SOLR-303 to see what has been said about 
distributed IDF.
Here is what I extracted:

"## I'm not quite sure about GlobalCollectionStat. Is its purpose just to 
normalize weights from the shards?"
  
"It's to make a distributed search score the same as it would if everything was 
in a single index.
 idf (inverse document frequency) is part of the scoring, so that component 
essentially does a distributed idf."

"...distributed idf... this has a performance cost, and should matter little in 
a well mixed index."


So, I'd like to see what it would take to add distributed IDF info to Solr's 
distributed search.
Here are some questions to get the discussion going:
- Is anyone already working on it?
- Does anyone plan on working on it in the very near future?
- Does anyone already have thoughts how and where dist. idf could be plugged in?
- There is a mention of dist idf and performance cost up there - any idea how 
costly dist idf would be?

Thanks,
Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





[jira] Updated: (SOLR-486) Support binary formats for QueryresponseWriter

2008-04-11 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-486:
--

Attachment: SOLR-486.patch

Revised patch that switches distributed search to use the binary format.
Currently fails the distributed search tests though.

> Support binary formats for QueryresponseWriter
> --
>
> Key: SOLR-486
> URL: https://issues.apache.org/jira/browse/SOLR-486
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java, search
>Reporter: Noble Paul
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, 
> SOLR-486.patch, SOLR-486.patch
>
>
> QueryResponse writer only allows text data to be written.
> So it is not possible to implement a binary protocol . Create another 
> interface which has a method 
> write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-516) Add hl.alternateFieldLen parameter, to set max length for hl.alternateField

2008-04-11 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas reassigned SOLR-516:
---

Assignee: Mike Klaas

> Add hl.alternateFieldLen parameter, to set max length for hl.alternateField
> ---
>
> Key: SOLR-516
> URL: https://issues.apache.org/jira/browse/SOLR-516
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Reporter: Koji Sekiguchi
>Assignee: Mike Klaas
>Priority: Trivial
> Attachments: SOLR-516-solr-ruby.patch, SOLR-516.patch, SOLR-516.patch
>
>
> USE CASE:
> You have a document that is composed of (short) title and (long) body fields 
> and want body to be highlighted.
> In order to avoid highlighted body field to be empty, you can use 
> hl.alternateField parameter.
> Although you want to set f.body.hl.alternateField=body, you may set 
> f.body.hl.alternateField=title,
> because response time is awful when the body values are big. But the title 
> field provides users with
> information smaller than body field.
> In this case, you can use f.body.hl.alternateFieldLen=100 to limit the body 
> length to 100 characters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-516) Add hl.alternateFieldLen parameter, to set max length for hl.alternateField

2008-04-11 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-516.
-

Resolution: Fixed

> Add hl.alternateFieldLen parameter, to set max length for hl.alternateField
> ---
>
> Key: SOLR-516
> URL: https://issues.apache.org/jira/browse/SOLR-516
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Reporter: Koji Sekiguchi
>Assignee: Mike Klaas
>Priority: Trivial
> Attachments: SOLR-516-solr-ruby.patch, SOLR-516.patch, SOLR-516.patch
>
>
> USE CASE:
> You have a document that is composed of (short) title and (long) body fields 
> and want body to be highlighted.
> In order to avoid highlighted body field to be empty, you can use 
> hl.alternateField parameter.
> Although you want to set f.body.hl.alternateField=body, you may set 
> f.body.hl.alternateField=title,
> because response time is awful when the body values are big. But the title 
> field provides users with
> information smaller than body field.
> In this case, you can use f.body.hl.alternateFieldLen=100 to limit the body 
> length to 100 characters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-330) Use new Lucene Token APIs (reuse and char[] buff)

2008-04-11 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-330:
--

Attachment: token_filter.patch

Attaching token_filter.patch, minor update to synonym and WFD to prevent extra 
token creation.

> Use new Lucene Token APIs (reuse and char[] buff)
> -
>
> Key: SOLR-330
> URL: https://issues.apache.org/jira/browse/SOLR-330
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-330.patch, SOLR-330.patch, token_filter.patch
>
>
> Lucene is getting new Token APIs for better performance.
> - token reuse
> - char[] offset + len instead of String
> Requires a new version of lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Automatic binding of results to Beans (for solrj)

2008-04-11 Thread Ryan McKinley

honestly have not looked at it in ages ;)

make a patch and i'll check it over.  I imagine it is pretty good...

ryan


On Apr 10, 2008, at 12:07 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:

hi Ryan ,
I can raise an issue and provide a patch. Is the proposed API fine or
you wish it to be altered?
--Noble

On Thu, Apr 10, 2008 at 3:03 AM, Ryan McKinley <[EMAIL PROTECTED]>  
wrote:
yes, in an early version of solrj, I had an annotation ->  
SolrDocument
implementation.  It also had a hibernate connection inspired by  
compass
(http://www.compass-project.org/) -- it got tossed in an effort to  
simplify

what got commited.

check: http://solrstuff.org/svn/solrj-hibernate/  for an OLD  
version that
won't compile with the current verison, but may be a good place to  
look


ryan






On Apr 9, 2008, at 2:49 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



We can use annotations to bind SolrDocument to java beans directly.

This can make the usage a bit simpler
The QueryResponse class in solrj can have an extra method as follows

public  List getResultBeans(Class klass)

and the bean can have annotations as

class MyBean{
@Field("id") //name is optional
String id;

@Field("category")
List categories
}
--
--Noble Paul








--
--Noble Paul




Re: [jira] Commented: (SOLR-516) Add hl.alternateFieldLen parameter, to set max length for hl.alternateField

2008-04-11 Thread Koji Sekiguchi

I opened SOLR-537 for solr-ruby as Hoss suggested.

Thank you,

Koji

Chris Hostetter wrote:

: I have zero familiarity with the ruby side of Solr, so I will leave the issue 
open for the ruby client patch to be reviewed and applied.

since client work and server work are parallel but distinct, I would 
suggest cloning the issue so the ruby work can be tracked separately.


it makes the issue statuses and CHANGES.txt more reflective of reality.




-Hoss


  




[jira] Updated: (SOLR-537) Use hl.maxAlternateFieldLength parameter from solr-ruby

2008-04-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-537:


Attachment: SOLR-537.patch

a patch to use hl.maxAlternateFieldLength parameter from solr-ruby

> Use hl.maxAlternateFieldLength parameter from solr-ruby
> ---
>
> Key: SOLR-537
> URL: https://issues.apache.org/jira/browse/SOLR-537
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Affects Versions: 1.3
>Reporter: Koji Sekiguchi
>Priority: Trivial
> Attachments: SOLR-537.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-537) Use hl.maxAlternateFieldLength parameter from solr-ruby

2008-04-11 Thread Koji Sekiguchi (JIRA)
Use hl.maxAlternateFieldLength parameter from solr-ruby
---

 Key: SOLR-537
 URL: https://issues.apache.org/jira/browse/SOLR-537
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Affects Versions: 1.3
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: SOLR-537.patch



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.