[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-07-07 Thread Brian Whitman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Whitman updated SOLR-303:
---

Attachment: shards.start_rows.patch

Attaching patch to add a shards.start and shards.rows optional parameter. If 
set, they override distributed search's intelligence on setting start and rows 
per shard. If you set shards.start=10 and shards.rows=10, each shard will be 
queried with start=10 and rows=10 and you'll get back N*10 results (set rows 
on the main query to get it all.)

[Not a java developer, my patch works but may violate good taste/style]

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Fix For: 1.3

 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
 distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch, shards.start_rows.patch, shards_qt.patch, 
 solr-dist-faceting-non-ascii-all.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-05-21 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-303:
--

Fix Version/s: 1.3

marking as intended for 1.3 ... i'm not overly familiar with the state of this 
issue, but i do know that large chunks of functionality have already been 
committed, so i want to make sure that before 1.3 is released someone 
conciously decides between:
   * DONE ...resolving this issue
   * NOT DONE BUT OK ... leaving the issue unresolved and removing the 1.3 
designation
   * NOT DONE AND NOT OK ... rolling back any/all committed code that is 
considered detrimental for the 1.3 release.

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Fix For: 1.3

 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
 distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-05-20 Thread Lars Kotthoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Kotthoff updated SOLR-303:
---

Attachment: solr-dist-faceting-non-ascii-all.patch

I've had a couple of issues with the current version. First, the facet queries 
which are sent to the other shards are posted in the URL, but aren't URL 
encoded, i.e. during the refine stage anything non-ascii results in facet 
counts for new values (i.e. the garbled version) coming back and causing NPEs 
when trying to update the counts.

Furthermore, facet.limit=negative value isn't working as expected, i.e. 
instead of all facets it returns none. Also facet.sort is not automatically 
enabled for negative values.

I've attached solr-dist-faceting-non-ascii-all.patch which fixes the above 
issues. Somebody who understands what everything is supposed to do should have 
a look over it though :)
For example I've found two linked hash maps in FacetInfo, topFacets and 
listFacets, which seem to serve the same purpose. Therefore I replaced them by 
a single hash map. It seems to work just fine this way.

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
 distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch, shards_qt.patch, solr-dist-faceting-non-ascii-all.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-03-18 Thread Jayson Minard (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayson Minard updated SOLR-303:
---

Attachment: distributed_facet_count_bugfix.patch

Attached patch to fix issue with distributed search.  If you specified a 
facet.field that was valid for the schema but not contained in a shard, an 
unintentional exception (array index out of bounds) would be thrown instead of 
returning the facet as empty.

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed_facet_count_bugfix.patch, 
 distributed_pjaol.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-03-18 Thread Jayson Minard (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jayson Minard updated SOLR-303:
---

Attachment: distributed_add_tests_for_intended_behavior.patch

A few more tests to show intended behavior when facets differ between shards 
which is likely in the wild (missing from all but valid in schema, missing from 
some, and invalid field not in schema).  The last test  is just to ensure error 
behavior matches non-distributed searches.

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed_add_tests_for_intended_behavior.patch, 
 distributed_facet_count_bugfix.patch, distributed_pjaol.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-02-25 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

New patch:
  - test framework using multiple embedded jetty servers that adds documents to 
multiple servers, and also to a control server, then executes both distributed 
and non-distributed queries and compares the results.
  - fixed merging for non-string uniqueKeyFields
  - fixed issue when id field was not selected by client
  - break facet count ties by label
  - added rudimentary duplicate detection in case one accidentally adds the 
same doc to different shards
  - add code to handle index changes between query phases (docs may no longer 
exist)

Given that most of this is new functionality, I think things are in good enough 
shape to commit now (making it much easier for others to generate patches 
against it).

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed_pjaol.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-02-17 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

updated patch:
- refactored some distributed search code to make things easier (added 
modifyRequest, etc)
- added merging of debugging info timing info (including timing info, via 
generic recursive merging)
- merge explain info, drops internal id from explain key for easier merging
- Many small changes: don't return scores if they aren't requested (even if 
needed for shard requests to merge), return maxScore
  if scores are requested, enable escaping for shards parameter.

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed_pjaol.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-02-17 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

New patch attached... last one had an unfinished change that prevented 
compilation (using the generic SolrResponse instead of SolrQueryResponse).

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, 
 distributed_pjaol.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-02-04 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

Updated patch:
- face refinement requests piggyback on the requests to retrieve stored fields 
where possible.
- fixed bug when requesting scores... don't include scores even if requested if 
they are not in the given DocList
- fixed HTTP error codes for query parse errirs
- added double/long support in sorting since we've upgraded to lucene 2.3, and 
changed aggregate numFound to handle long
- escapeunescape comma separated ids string using backslash escaping (used 
to specify docs from each shard to retrieve)
- other misc cleanups

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed_pjaol.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-01-29 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

This update adds parallel requests.
  - a singleton communications thread pool (executor) is added... currently 
static, but it should be *per core* and have a way of shutting down.
 - a singleton HttpClient for use by all SolrServer instances, currently 
static, probably fine to remain so (unless there needs to be core specific 
config?)
 - an exception causes everything to be aborted
 - all requests in a phase are sent out in parallel
 - a completion service is used for grabbing completed requests, so the first 
requests back can start being processed.
 - while receiving responses, if any new requests are put on the outgoing 
queue, they are immediately sent out before waiting for any further responses.


 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, distributed.patch, 
 distributed_pjaol.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-01-15 Thread patrick o'leary (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary updated SOLR-303:
-

Attachment: distributed_pjaol.patch

Hey Yonik
Needed to make a couple of updates to ShardDoc as the nested outer classes were 
preventing me from using the patch.
Also included SOLR-457, with a multi threaded implementation of solrj to query 
the shards.
with this patch.

P

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, 
 distributed_pjaol.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-01-10 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

Now patch attached... this one implements count tiebreaking by index order (to 
match the non-distributed faceting).

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, distributed.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-01-09 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

OK, this version patches cleanly and includes some distributed faceting code.
- facet.query and facet.field sorted by count is mostly handled
- breaking ties by natural (index) sort order is not yet implemented
- date faceting and unsorted (index order) facet.field is not implemented

Assuming the user asks for the top 10 terms of a field:
1) The first facet queries piggyback on the queries to get the top ids and sort 
field values.
2) counts are merged, and new refinement requests are send out for those 
terms in the top 10 where a count was not received from some shards.  Also, for 
terms below the top 10, we calculate the maximum it could have based on shards 
we have not heard from, and if that boosts it into the top 10, we include that 
term for refinement.
3) refinement responses are used to adjust the counts, and we are done.

Note that it is theoretically possible to miss terms.  A term could be just 
below the threshold of each shard (and thus not returned by any shard), but the 
total count could boost it in the top.  This could be rectified by retrieving 
*all* terms above a specified count, but it could be expensive.  The counts 
that are currently returned are exact.



 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-01-09 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

New patch attached...

I just discovered that refinement queries weren't working because filter.query 
doesn't accept the new query syntax I was using to avoid having to escape field 
values: !field f=myfieldvalue
(this should probably be committed separately, but it's in this patch for now).

I put in code to over-request facet.field limit, but then commented it out for 
now since it too easily covers up bugs because it often prevents any refinement 
query logic from being exercized.

Also corrected the code that always used the last element as the max possible 
missing count.  If we requested 10 terms and only got 6, then we know that the 
max possible missing count is zero.

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed.patch, distributed.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-01-08 Thread patrick o'leary (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary updated SOLR-303:
-

Attachment: distributed_trunk.patch

This might help, merged the distributed  federated patchs with trunk last 
night, fixed the rejects. Appears to work.
The only things not included are the distributed searcher unit tests from the 
previous patch. Only the deltas were in the patch, so I had no way to rebuild 
them.

Hope this helps
P

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed_trunk.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-01-08 Thread Sean Timm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Timm updated SOLR-303:
---

Comment: was deleted

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 distributed_trunk.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-01-08 Thread Sean Timm
User error.  I thought I had a clean sandbox, but I didn't.  So, the 
only issues I have with the patch are the 2 *Test* files previously 
reported, and the o.a.s.handler.SearchHandler

patching file src/java/org/apache/solr/handler/SearchHandler.java
Reversed (or previously applied) patch detected!  Assume -R? [n] y

-Sean

Sean Timm (JIRA) wrote:

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Timm updated SOLR-303:
---

Comment: was deleted

  

Distributed Search over HTTP


Key: SOLR-303
URL: https://issues.apache.org/jira/browse/SOLR-303
Project: Solr
 Issue Type: New Feature
 Components: search
   Reporter: Sharad Agarwal
   Assignee: Yonik Seeley
Attachments: distributed.patch, distributed.patch, distributed.patch, 
distributed_trunk.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
fedsearch.stu.patch, fedsearch.stu.patch


Searching over multiple shards and aggregating results.
Motivated by http://wiki.apache.org/solr/DistributedSearch



  


[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-01-08 Thread patrick o'leary (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

patrick o'leary updated SOLR-303:
-

Attachment: (was: distributed_trunk.patch)

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2008-01-03 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

Small update, mostly to sorting
- This changes sorting to get values from the Sort comparators (thus supporting 
custom sorts)
- uses external values that can be supported by XML, also nicer for debugging
-  returns sort field values in an array per-field {price=[10,20,30,40,50]}
- merging should be faster... lookup of sort values is by index number instead 
of searching
  for the field name.
- merging short-circuits comparisons for docs in the same shard
- sorting null values now works  respects sortMissingFirst/Last, etc
- if a shard request, don't pre-fetch docs for highlighter

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, distributed.patch, distributed.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2007-12-22 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Attachment: distributed.patch

OK, here is a *draft* that mostly works for searches and highlighting.

There are stages in the request:
{code}
  public static int STAGE_START   = 0;
  public static int STAGE_PARSE_QUERY = 1000;
  public static int STAGE_EXECUTE_QUERY   = 2000;
  public static int STAGE_GET_FIELDS  = 3000;
  public static int STAGE_DONE= Integer.MAX_VALUE;
{code}

When a component wants to send a request, it adds it to outgoing queue.
Other components can inspect and modify these shard requests.
All components get a callback when the shard response is received.

All shard responses purposes (to aid in both correlation and 
inspection/modification by other components).
This is what a ShardRequest looks like:
{code}
public class ShardRequest {
  public final static String[] ALL_SHARDS = null;

  public final static int PURPOSE_PRIVATE = 0x01;
  public final static int PURPOSE_GET_TERM_DFS= 0x02;
  public final static int PURPOSE_GET_TOP_IDS = 0x04;
  public final static int PURPOSE_REFINE_TOP_IDS  = 0x08;
  public final static int PURPOSE_GET_FACETS  = 0x10;
  public final static int PURPOSE_REFINE_FACETS   = 0x20;
  public final static int PURPOSE_GET_FIELDS  = 0x40;
  public final static int PURPOSE_GET_HIGHLIGHTS  = 0x80;

  public int purpose;  // the purpose of this request

  public String[] shards;  // the shards this request should be sent to
// TODO: how to request a specific shard address?

  public ModifiableSolrParams params;

  public ListShardResponse responses = new ArrayListShardResponse();
}
{code}


Components are responsible for themselves... the highlighting component is 
responsible for turning itself on/off at the appropriate time... the query 
component has no knowledge of the highlight component.  This will make it so 
that custom components can be developed that can work in a distributed 
environment w/o explicit support for that component baked into the other 
components.



 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: distributed.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2007-12-12 Thread Sabyasachi Dalal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sabyasachi Dalal updated SOLR-303:
--

Comment: was deleted

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2007-12-12 Thread Sabyasachi Dalal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sabyasachi Dalal updated SOLR-303:
--

Attachment: fedsearch.patch

I have fixed and updated the patch with trunk version 600419. It is integrated 
with the re-opened SOLR-281 patch.
I have added the configuration for the three distributed-search components in 
the solrconfig.xml, under /search request handler. So, the distributed search 
works with /search request only.

Couple of issues :
1. The dist search components need the reference to the SearchHandler. So for 
now , i have hard coded the /search pattern in the FedSearchComponent.
2. Need a clean way to load common init params for the dist search components, 
such as timeout, thread pool size and search handler pattern.

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2007-12-12 Thread Sabyasachi Dalal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sabyasachi Dalal updated SOLR-303:
--

Attachment: (was: fedsearch.patch)

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2007-12-12 Thread Sabyasachi Dalal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sabyasachi Dalal updated SOLR-303:
--

Attachment: fedsearch.patch

I made a mistake and uploaded the wrong patch file. Now uploading the correct 
file.

I have fixed and updated the patch with trunk version 600419. It is integrated 
with the re-opened SOLR-281 patch.
I have added the configuration for the three distributed-search components in 
the solrconfig.xml, under /search request handler. So, the distributed search 
works with /search request only.

Couple of issues :
1. The dist search components need the reference to the SearchHandler. So for 
now , i have hard coded the /search pattern in the FedSearchComponent.
2. Need a clean way to load common init params for the dist search components, 
such as timeout, thread pool size and search handler pattern.

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.stu.patch, 
 fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2007-12-12 Thread Sabyasachi Dalal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sabyasachi Dalal updated SOLR-303:
--

Attachment: fedsearch.patch

Removed the commented line from SolrCore.loadSearchComponents and couple of 
debug statements.

 Distributed Search over HTTP
 

 Key: SOLR-303
 URL: https://issues.apache.org/jira/browse/SOLR-303
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Sharad Agarwal
Assignee: Yonik Seeley
 Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.patch, fedsearch.patch, fedsearch.patch, fedsearch.patch, 
 fedsearch.stu.patch, fedsearch.stu.patch


 Searching over multiple shards and aggregating results.
 Motivated by http://wiki.apache.org/solr/DistributedSearch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-303) Distributed Search over HTTP

2007-11-20 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-303:
--

Description: 
Searching over multiple shards and aggregating results.
Motivated by http://wiki.apache.org/solr/DistributedSearch


  was:
Motivated by http://wiki.apache.org/solr/FederatedSearch
Index view consistency between multiple requests requirement is relaxed in 
this implementation.

Does the federated search query side. Update not yet done.

Tries to achieve:-

- The client applications are totally agnostic to federated search. The 
federated search and merging of results are totally behind the scene in Solr in 
request handler . Response format remains the same after merging of results.
The response from individual shard is deserialized into SolrQueryResponse 
object. The collection of SolrQueryResponse objects are merged to produce a 
single SolrQueryResponse object. This enables to use the Response writers as it 
is; or with minimal change.

- Efficient query processing with highlighting and fields getting generated 
only for merged documents. The query is executed in 2 phases. First phase gets 
the doc unique keys with sort criteria. Second phase brings all requested 
fields and highlighting information. This saves lot of CPU in case there are 
good number of shards and highlighting info is requested.
Should be easy to customize the query execution. For example: user can specify 
to execute query in just 1 phase itself. (For some queries when highlighting 
info is not required and number of fields requested are small; this can be more 
efficient.)

- Ability to easily overwrite the default Federated capability by appropriate 
plugins and request parameters. As federated search is performed by the 
RequestHandler itself, multiple request handlers can easily be pre-configured 
with different federated search settings in solrconfig.xml

- Global weight calculation is done by querying the terms' doc frequencies from 
all shards.

- Federated search works on Http transport. So individual shard's VIP can be 
queried. Load-balancing and Fail-over taken care by VIP as usual.

-Sub-searcher response parsing as a plugin interface. Different implementation 
could be written based on JSON, xml SAX etc. Current one based on XML DOM.


HOW:
---
A new RequestHandler called MultiSearchRequestHandler does the federated search 
on multiple sub-searchers, (referred as shards going forward). It extends the 
RequestHandlerBase. handleRequestBody method in RequestHandlerBase has been 
divided into query building and execute methods. This has been done to 
calculate global numDocs and docFreqs; and execute the query efficiently on 
multiple shards.
All the search request handlers are expected to extend 
MultiSearchRequestHandler class in order to enable federated capability for the 
handler. StandardRequestHandler and DisMaxRequestHandler have been changed to 
extend this class.
 
The federated search kicks in if shards is present in the request parameter. 
Otherwise search is performed as usual on the local index. eg. 
shards=local,host1:port1,host2:port2 will search on the local index and 2 
remote indexes. The search response from all 3 shards are merged and serviced 
back to the client. 

The search request processing on the set of shards is performed as follows:

STEP 1: The query is built, terms are extracted. Global numDocs and docFreqs 
are calculated by requesting all the shards and adding up numDocs and docFreqs 
from each shard.

STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and docFreqs 
are passed as request parameters. All document fields are NOT requested, only 
document uniqFields and sort fields are requested. MoreLikeThis and 
Highlighting information are NOT requested.

STEP 3: Responses from FirstQueryPhase are merged based on sort, start and 
rows params. Merged doc uniqField and sort fields are collected. Other 
information like facet and debug is also merged.

STEP 4: (SecondQueryPhase) Merged doc uniqFields and sort fields are grouped 
based on shards. All shards in the grouping are queried for the merged doc 
uniqFields (from FirstQueryPhase), highlighting and moreLikeThis info.

STEP 5: Responses from all shards from SecondQueryPhase are merged.

STEP 6: Document fields , highlighting and moreLikeThis info from 
SecondQueryPhase are merged into FirstQueryPhase response.




TODO:
-Support sort field other than default score
-Support ResponseDocs in writers other than XMLWriter
-Http connection timeouts

OPEN ISSUES;
-Merging of facets by top n terms of field f 

Scope for Performance optimization:-
-Search shards in parallel threads
-Http connection Keep-Alive ?
-Cache global numDocs and docFreqs
-Cache Query objects in handlers ??

Would appreciate feedback on my approach. I understand that there would be lot 
things I might have over-looked.