[jira] Updated: (SOLR-127) Make Solr more friendly to external HTTP caches

2007-09-14 Thread Thomas Peuss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Peuss updated SOLR-127:
--

Attachment: HTTPCaching.patch

Added Etag support.

> Make Solr more friendly to external HTTP caches
> ---
>
> Key: SOLR-127
> URL: https://issues.apache.org/jira/browse/SOLR-127
> Project: Solr
>  Issue Type: Wish
>Reporter: Hoss Man
> Attachments: HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch
>
>
> an offhand comment I saw recently reminded me of something that really bugged 
> me about the serach solution i used *before* Solr -- it didn't play nicely 
> with HTTP caches that might be sitting in front of it.
> at the moment, Solr doesn't put in particularly usefull info in the HTTP 
> Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
> requests with a 400, and doesn't do anything special with If-Modified-Since.
> t the very least, we can set a Last-Modified based on when the current 
> IndexReder was open (if not the Date on the IndexReader) and use the same 
> info to determing how to respond to If-Modified-Since requests.
> (for the record, i think the reason this hasn't occured to me in the 2+ years 
> i've been using Solr, is because with the internal caching, i've yet to need 
> to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-127) Make Solr more friendly to external HTTP caches

2007-09-14 Thread Thomas Peuss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Peuss updated SOLR-127:
--

Attachment: HTTPCaching.patch

After reading the W3C docs I have seen that we can calculate the Etags in a 
much simpler way.

> Make Solr more friendly to external HTTP caches
> ---
>
> Key: SOLR-127
> URL: https://issues.apache.org/jira/browse/SOLR-127
> Project: Solr
>  Issue Type: Wish
>Reporter: Hoss Man
> Attachments: HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch
>
>
> an offhand comment I saw recently reminded me of something that really bugged 
> me about the serach solution i used *before* Solr -- it didn't play nicely 
> with HTTP caches that might be sitting in front of it.
> at the moment, Solr doesn't put in particularly usefull info in the HTTP 
> Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
> requests with a 400, and doesn't do anything special with If-Modified-Since.
> t the very least, we can set a Last-Modified based on when the current 
> IndexReder was open (if not the Date on the IndexReader) and use the same 
> info to determing how to respond to If-Modified-Since requests.
> (for the record, i think the reason this hasn't occured to me in the 2+ years 
> i've been using Solr, is because with the internal caching, i've yet to need 
> to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Solr nightly build failure

2007-09-14 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build

checkJunitPresence:

compile-common:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/common
[javac] Compiling 26 source files to /tmp/apache-solr-nightly/build/common
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/core
[javac] Compiling 216 source files to /tmp/apache-solr-nightly/build/core
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile-solrj-core:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/client/solrj
[javac] Compiling 21 source files to 
/tmp/apache-solr-nightly/build/client/solrj
[javac] Note: 
/tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compile-solrj:
[javac] Compiling 2 source files to 
/tmp/apache-solr-nightly/build/client/solrj
[javac] Note: 
/tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/embedded/JettySolrRunner.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 60 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 25, Failures: 0, Errors: 0, Time elapsed: 38.254 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 28.398 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 21.365 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 3.621 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.234 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 10.743 sec
[junit] Running org.apache.solr.analysis.TestBufferedTokenStream
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.492 sec
[junit] Running org.apache.solr.analysis.TestCapitalizationFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.111 sec
[junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.173 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.361 sec
[junit] Running org.apache.solr.analysis.TestPatternReplaceFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.242 sec
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.918 sec
[junit] Running org.apache.solr.analysis.TestPhoneticFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.346 sec
[junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.884 sec
[junit] Running org.apache.solr.analysis.TestSynonymFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 5.11 sec
[junit] Running org.apache.solr.analysis.TestTrimFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.431 sec
[junit] Running org.apache.solr.analysis.TestWordDelimiterFilter
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 23.121 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.062 sec
[junit] Running org.apache.solr.common.params.SolrParamTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.081 sec
[junit] Running org.apache.solr.common.util.ContentStreamTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.81 sec
[junit] Running org.apache.solr.common.util.IteratorChainTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.063 sec
[junit] Running org.apache.solr.common.util.TestXMLEscaping

[jira] Updated: (SOLR-127) Make Solr more friendly to external HTTP caches

2007-09-14 Thread Thomas Peuss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Peuss updated SOLR-127:
--

Attachment: HTTPCaching.patch

Be even more standards compliant. If-Match and If-None-Match headers can appear 
multiple times.

> Make Solr more friendly to external HTTP caches
> ---
>
> Key: SOLR-127
> URL: https://issues.apache.org/jira/browse/SOLR-127
> Project: Solr
>  Issue Type: Wish
>Reporter: Hoss Man
> Attachments: HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch
>
>
> an offhand comment I saw recently reminded me of something that really bugged 
> me about the serach solution i used *before* Solr -- it didn't play nicely 
> with HTTP caches that might be sitting in front of it.
> at the moment, Solr doesn't put in particularly usefull info in the HTTP 
> Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
> requests with a 400, and doesn't do anything special with If-Modified-Since.
> t the very least, we can set a Last-Modified based on when the current 
> IndexReder was open (if not the Date on the IndexReader) and use the same 
> info to determing how to respond to If-Modified-Since requests.
> (for the record, i think the reason this hasn't occured to me in the 2+ years 
> i've been using Solr, is because with the internal caching, i've yet to need 
> to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (SOLR-342) Add support for Lucene's new setRAMBufferSizeMB() method in IndexWriter

2007-09-14 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on SOLR-342 started by Grant Ingersoll.

> Add support for Lucene's new setRAMBufferSizeMB() method in IndexWriter
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-342) Add support for Lucene's new setRAMBufferSizeMB() method in IndexWriter

2007-09-14 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-342:


Assignee: Grant Ingersoll

> Add support for Lucene's new setRAMBufferSizeMB() method in IndexWriter
> ---
>
> Key: SOLR-342
> URL: https://issues.apache.org/jira/browse/SOLR-342
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
>
> LUCENE-843 adds support for new indexing capabilities using the 
> setRAMBufferSizeMB() method that should significantly speed up indexing for 
> many applications.  To fix this, we will need trunk version of Lucene (or 
> wait for the next official release of Lucene)
> Side effect of this is that Lucene's new, faster StandardTokenizer will also 
> be incorporated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-303) Federated Search over HTTP

2007-09-14 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527570
 ] 

Stu Hood commented on SOLR-303:
---

Thanks Sharad, the last patch applied cleanly as you said.

I've run into some errors that should be quick fixes for your next revision:

* I had to modify the code not to assume that shard names end in '/solr' so 
that I could specify an instance name, like: 'blah.com:8080/instance_name'.
* The parameters for your subqueries are not (always?) getting escaped. My 
document ids contain some colons (':'), and so its throwing a null pointer 
error during the SecondQueryphase, and then again in SolrCore execute.


Thanks a lot for your work!

> Federated Search over HTTP
> --
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Priority: Minor
> Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch
>
>
> Motivated by http://wiki.apache.org/solr/FederatedSearch
> "Index view consistency between multiple requests" requirement is relaxed in 
> this implementation.
> Does the federated search query side. Update not yet done.
> Tries to achieve:-
> 
> - The client applications are totally agnostic to federated search. The 
> federated search and merging of results are totally behind the scene in Solr 
> in request handler . Response format remains the same after merging of 
> results.
> The response from individual shard is deserialized into SolrQueryResponse 
> object. The collection of SolrQueryResponse objects are merged to produce a 
> single SolrQueryResponse object. This enables to use the Response writers as 
> it is; or with minimal change.
> - Efficient query processing with highlighting and fields getting generated 
> only for merged documents. The query is executed in 2 phases. First phase 
> gets the doc unique keys with sort criteria. Second phase brings all 
> requested fields and highlighting information. This saves lot of CPU in case 
> there are good number of shards and highlighting info is requested.
> Should be easy to customize the query execution. For example: user can 
> specify to execute query in just 1 phase itself. (For some queries when 
> highlighting info is not required and number of fields requested are small; 
> this can be more efficient.)
> - Ability to easily overwrite the default Federated capability by appropriate 
> plugins and request parameters. As federated search is performed by the 
> RequestHandler itself, multiple request handlers can easily be pre-configured 
> with different federated search settings in solrconfig.xml
> - Global weight calculation is done by querying the terms' doc frequencies 
> from all shards.
> - Federated search works on Http transport. So individual shard's VIP can be 
> queried. Load-balancing and Fail-over taken care by VIP as usual.
> -Sub-searcher response parsing as a plugin interface. Different 
> implementation could be written based on JSON, xml SAX etc. Current one based 
> on XML DOM.
> HOW:
> ---
> A new RequestHandler called MultiSearchRequestHandler does the federated 
> search on multiple sub-searchers, (referred as "shards" going forward). It 
> extends the RequestHandlerBase. handleRequestBody method in 
> RequestHandlerBase has been divided into query building and execute methods. 
> This has been done to calculate global numDocs and docFreqs; and execute the 
> query efficiently on multiple shards.
> All the "search" request handlers are expected to extend 
> MultiSearchRequestHandler class in order to enable federated capability for 
> the handler. StandardRequestHandler and DisMaxRequestHandler have been 
> changed to extend this class.
>  
> The federated search kicks in if "shards" is present in the request 
> parameter. Otherwise search is performed as usual on the local index. eg. 
> shards=local,host1:port1,host2:port2 will search on the local index and 2 
> remote indexes. The search response from all 3 shards are merged and serviced 
> back to the client. 
> The search request processing on the set of shards is performed as follows:
> STEP 1: The query is built, terms are extracted. Global numDocs and docFreqs 
> are calculated by requesting all the shards and adding up numDocs and 
> docFreqs from each shard.
> STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and docFreqs 
> are passed as request parameters. All document fields are NOT requested, only 
> document uniqFields and sort fields are requested. MoreLikeThis and 
> Highlighting information are NOT requested.
> STEP 3: Responses from FirstQueryPhase are merged based on "sort", "start" 
> and "rows" params. Merged doc uniqField and sort fields are collected. Other 
> information 

[jira] Commented: (SOLR-303) Federated Search over HTTP

2007-09-14 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527637
 ] 

Stu Hood commented on SOLR-303:
---

For the second issue above, I did the following:

*Added 'static String escape(string, field, schema)' to QueryParsing, that uses 
SolrQueryParser's escape method. I run this across all key values as they are 
being iterated in the beginning of 
'SecondQPhaseComponent.createSecondPhaseParams'

> Federated Search over HTTP
> --
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Priority: Minor
> Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch
>
>
> Motivated by http://wiki.apache.org/solr/FederatedSearch
> "Index view consistency between multiple requests" requirement is relaxed in 
> this implementation.
> Does the federated search query side. Update not yet done.
> Tries to achieve:-
> 
> - The client applications are totally agnostic to federated search. The 
> federated search and merging of results are totally behind the scene in Solr 
> in request handler . Response format remains the same after merging of 
> results.
> The response from individual shard is deserialized into SolrQueryResponse 
> object. The collection of SolrQueryResponse objects are merged to produce a 
> single SolrQueryResponse object. This enables to use the Response writers as 
> it is; or with minimal change.
> - Efficient query processing with highlighting and fields getting generated 
> only for merged documents. The query is executed in 2 phases. First phase 
> gets the doc unique keys with sort criteria. Second phase brings all 
> requested fields and highlighting information. This saves lot of CPU in case 
> there are good number of shards and highlighting info is requested.
> Should be easy to customize the query execution. For example: user can 
> specify to execute query in just 1 phase itself. (For some queries when 
> highlighting info is not required and number of fields requested are small; 
> this can be more efficient.)
> - Ability to easily overwrite the default Federated capability by appropriate 
> plugins and request parameters. As federated search is performed by the 
> RequestHandler itself, multiple request handlers can easily be pre-configured 
> with different federated search settings in solrconfig.xml
> - Global weight calculation is done by querying the terms' doc frequencies 
> from all shards.
> - Federated search works on Http transport. So individual shard's VIP can be 
> queried. Load-balancing and Fail-over taken care by VIP as usual.
> -Sub-searcher response parsing as a plugin interface. Different 
> implementation could be written based on JSON, xml SAX etc. Current one based 
> on XML DOM.
> HOW:
> ---
> A new RequestHandler called MultiSearchRequestHandler does the federated 
> search on multiple sub-searchers, (referred as "shards" going forward). It 
> extends the RequestHandlerBase. handleRequestBody method in 
> RequestHandlerBase has been divided into query building and execute methods. 
> This has been done to calculate global numDocs and docFreqs; and execute the 
> query efficiently on multiple shards.
> All the "search" request handlers are expected to extend 
> MultiSearchRequestHandler class in order to enable federated capability for 
> the handler. StandardRequestHandler and DisMaxRequestHandler have been 
> changed to extend this class.
>  
> The federated search kicks in if "shards" is present in the request 
> parameter. Otherwise search is performed as usual on the local index. eg. 
> shards=local,host1:port1,host2:port2 will search on the local index and 2 
> remote indexes. The search response from all 3 shards are merged and serviced 
> back to the client. 
> The search request processing on the set of shards is performed as follows:
> STEP 1: The query is built, terms are extracted. Global numDocs and docFreqs 
> are calculated by requesting all the shards and adding up numDocs and 
> docFreqs from each shard.
> STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and docFreqs 
> are passed as request parameters. All document fields are NOT requested, only 
> document uniqFields and sort fields are requested. MoreLikeThis and 
> Highlighting information are NOT requested.
> STEP 3: Responses from FirstQueryPhase are merged based on "sort", "start" 
> and "rows" params. Merged doc uniqField and sort fields are collected. Other 
> information like facet and debug is also merged.
> STEP 4: (SecondQueryPhase) Merged doc uniqFields and sort fields are grouped 
> based on shards. All shards in the grouping are queried for the merged doc 
> uniqFields (from FirstQueryPhase), highlighting and m

[jira] Commented: (SOLR-303) Federated Search over HTTP

2007-09-14 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527647
 ] 

Stu Hood commented on SOLR-303:
---

I'm also seeing the following issue, but I haven't have time to investigate:

{quote}
WARNING: Exception while querying shard 
crc10:8080/solr_postfix09092000-09112000 :java.lang.ClassCastException: 
com.sun.org.apache.xerces.internal.dom.DeferredTextImpl cannot be cast to 
org.w3c.dom.Element
{quote}

> Federated Search over HTTP
> --
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Priority: Minor
> Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch
>
>
> Motivated by http://wiki.apache.org/solr/FederatedSearch
> "Index view consistency between multiple requests" requirement is relaxed in 
> this implementation.
> Does the federated search query side. Update not yet done.
> Tries to achieve:-
> 
> - The client applications are totally agnostic to federated search. The 
> federated search and merging of results are totally behind the scene in Solr 
> in request handler . Response format remains the same after merging of 
> results.
> The response from individual shard is deserialized into SolrQueryResponse 
> object. The collection of SolrQueryResponse objects are merged to produce a 
> single SolrQueryResponse object. This enables to use the Response writers as 
> it is; or with minimal change.
> - Efficient query processing with highlighting and fields getting generated 
> only for merged documents. The query is executed in 2 phases. First phase 
> gets the doc unique keys with sort criteria. Second phase brings all 
> requested fields and highlighting information. This saves lot of CPU in case 
> there are good number of shards and highlighting info is requested.
> Should be easy to customize the query execution. For example: user can 
> specify to execute query in just 1 phase itself. (For some queries when 
> highlighting info is not required and number of fields requested are small; 
> this can be more efficient.)
> - Ability to easily overwrite the default Federated capability by appropriate 
> plugins and request parameters. As federated search is performed by the 
> RequestHandler itself, multiple request handlers can easily be pre-configured 
> with different federated search settings in solrconfig.xml
> - Global weight calculation is done by querying the terms' doc frequencies 
> from all shards.
> - Federated search works on Http transport. So individual shard's VIP can be 
> queried. Load-balancing and Fail-over taken care by VIP as usual.
> -Sub-searcher response parsing as a plugin interface. Different 
> implementation could be written based on JSON, xml SAX etc. Current one based 
> on XML DOM.
> HOW:
> ---
> A new RequestHandler called MultiSearchRequestHandler does the federated 
> search on multiple sub-searchers, (referred as "shards" going forward). It 
> extends the RequestHandlerBase. handleRequestBody method in 
> RequestHandlerBase has been divided into query building and execute methods. 
> This has been done to calculate global numDocs and docFreqs; and execute the 
> query efficiently on multiple shards.
> All the "search" request handlers are expected to extend 
> MultiSearchRequestHandler class in order to enable federated capability for 
> the handler. StandardRequestHandler and DisMaxRequestHandler have been 
> changed to extend this class.
>  
> The federated search kicks in if "shards" is present in the request 
> parameter. Otherwise search is performed as usual on the local index. eg. 
> shards=local,host1:port1,host2:port2 will search on the local index and 2 
> remote indexes. The search response from all 3 shards are merged and serviced 
> back to the client. 
> The search request processing on the set of shards is performed as follows:
> STEP 1: The query is built, terms are extracted. Global numDocs and docFreqs 
> are calculated by requesting all the shards and adding up numDocs and 
> docFreqs from each shard.
> STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and docFreqs 
> are passed as request parameters. All document fields are NOT requested, only 
> document uniqFields and sort fields are requested. MoreLikeThis and 
> Highlighting information are NOT requested.
> STEP 3: Responses from FirstQueryPhase are merged based on "sort", "start" 
> and "rows" params. Merged doc uniqField and sort fields are collected. Other 
> information like facet and debug is also merged.
> STEP 4: (SecondQueryPhase) Merged doc uniqFields and sort fields are grouped 
> based on shards. All shards in the grouping are queried for the merged doc 
> uniqFields (from FirstQueryPhase), highlight

[jira] Created: (SOLR-356) pluggable functions (value sources)

2007-09-14 Thread Yonik Seeley (JIRA)
pluggable functions (value sources)
---

 Key: SOLR-356
 URL: https://issues.apache.org/jira/browse/SOLR-356
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley


allow configuration of new value sources ot be created by the function query 
parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-349) new functions for FunctionQuery

2007-09-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527649
 ] 

Yonik Seeley commented on SOLR-349:
---

Being able to add new functions to the parser is probably useful to people... I 
opened SOLR-356 for this.

> new functions for FunctionQuery
> ---
>
> Key: SOLR-349
> URL: https://issues.apache.org/jira/browse/SOLR-349
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Attachments: FunctionQuery.patch, FunctionQuery.patch, 
> linear_combination.patch
>
>
> User should be able to boost a query by a function of other fields
> Some background: 
> http://www.nabble.com/boosting-a-query-by-a-function-of-other-fields-tf4387856.html#a12510092

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2007-09-14 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527664
 ] 

Hoss Man commented on SOLR-127:
---

1) it's not a good idea to assume the indexVersion can be used as a timestamp 
... Lucene does not guarantee that.  To be safe we should record the timestamp 
we opened the index at.  (using the lastModified on files in the Directory is a 
bad idea as well ... someone could swap out an index with a backup and get 
"older" files that represent a "newer" index from Solr's perspective)

1) isn't the header named "ETag" (not "Etag") ?

2) I'm not an expert on all this new fangled HTTP/1.1 stuff ... but is an ETag 
based on the URI and the indexVersion/timestamp really that useful?  wouldn't 
the Last-Modified header in that case be just as useful?  I thought the value 
add of an ETag was that even if the content has been modified, if that 
modification results in no real changes, old cached values can still be useful. 
 with Solr specificly in mind, the index may have changed, but if the results 
of a query are identicle to the results before the change, those cna have the 
same ETag right?  wouldn't a hash of the URI and the SolrQueryResponse make 
more sense in that regards?

> Make Solr more friendly to external HTTP caches
> ---
>
> Key: SOLR-127
> URL: https://issues.apache.org/jira/browse/SOLR-127
> Project: Solr
>  Issue Type: Wish
>Reporter: Hoss Man
> Attachments: HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch
>
>
> an offhand comment I saw recently reminded me of something that really bugged 
> me about the serach solution i used *before* Solr -- it didn't play nicely 
> with HTTP caches that might be sitting in front of it.
> at the moment, Solr doesn't put in particularly usefull info in the HTTP 
> Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
> requests with a 400, and doesn't do anything special with If-Modified-Since.
> t the very least, we can set a Last-Modified based on when the current 
> IndexReder was open (if not the Date on the IndexReader) and use the same 
> info to determing how to respond to If-Modified-Since requests.
> (for the record, i think the reason this hasn't occured to me in the 2+ years 
> i've been using Solr, is because with the internal caching, i've yet to need 
> to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-319) changes SynonymFilterFactoryto "Analyze" synonyms file

2007-09-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-319:
--

Summary: changes SynonymFilterFactoryto "Analyze" synonyms file  (was: 
changes SynonymFilterFactory for N-gram tokenizer)

I've revised the summary line of this bug because it was a little confusing to 
me ... the issue isn't really specific to n-gram based tokenizers, as you point 
out this is a general issue that currently when constructing the synonyms file 
you have to be very aware of the analysis chain of your fieldtype -- ie: if 
LowercaseFilterFactory comes before SynonymFilterFactory, then all synonyms 
must be lowercased in your file.

The notion of specifying a TokenizerFactory as a property of the 
SynonymFilterFactory that tells it how to parse the synonymstxt file is pretyt 
clever, and would solve the  CJKTokenizer problem you describe, but i don't 
think it really goes far enough -- consider the lowercase example.  it would be 
good if you could have a synonyms file that contained proper names, and have it 
do the right thing when used in lower cased fields as well as exact case fields.

to extend the tokenizer idea -- what if you could specify the name of a 
fieldtype, and the entire Analyzer for that fieldtype would be used to parse 
the individual synonym records?  this should simplify the patch a bit (since 
you don't have to worry about initializing any factories,  the schema will take 
care of it for you) and make it a lot more powerful.

> changes SynonymFilterFactoryto "Analyze" synonyms file
> --
>
> Key: SOLR-319
> URL: https://issues.apache.org/jira/browse/SOLR-319
> Project: Solr
>  Issue Type: Improvement
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-319.patch
>
>
> WHAT:
> Currently, SynonymFilterFactory works very well with N-gram tokenizer 
> (CJKTokenizer, for example).
> But we have to take care of the statement in synonyms.txt.
> For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want 
> C1C2C3 maps to C4C5C6,
> I have to write the rule as follows:
> C1C2 C2C3 => C4C5 C5C6
> But I want to write it "C1C2C3=>C4C5C6". This patch allows it. It is also 
> helpful for sharing synonyms.txt.
> HOW:
> tokenFactory attribute is added to  class="solr.SynonymFilterFactory"/>.
> If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory 
> to create Tokenizer.
> Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in 
> synonyms.txt file.
> sample-1: CJKTokenizer
>  positionIncrementGap="100">
>   
> 
>  synonyms="ngram_synonym_test_ja.txt"
>   ignoreCase="true" expand="true" 
> tokenFactory="solr.CJKTokenizerFactory"/>
> 
>   
>   
> 
> 
>   
> 
> sample-2: NGramTokenizer
>  positionIncrementGap="100">
>   
>  maxGramSize="2"/>
> 
>   
>   
>  maxGramSize="2"/>
>  synonyms="ngram_synonym_test_ngram.txt"
>   ignoreCase="true" expand="true"
>   tokenFactory="solr.NGramTokenizerFactory" 
> minGramSize="2" maxGramSize="2"/>
> 
>   
> 
> backward compatibility:
> Yes. If you omit tokenFactory attribute from  class="solr.SynonymFilterFactory"/> tag, it works as usual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-319) changes SynonymFilterFactoryto "Analyze" synonyms file

2007-09-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527682
 ] 

Koji Sekiguchi commented on SOLR-319:
-

Absolutely. I'll try to change my patch to implement the fieldtype idea. Thank 
you.

> changes SynonymFilterFactoryto "Analyze" synonyms file
> --
>
> Key: SOLR-319
> URL: https://issues.apache.org/jira/browse/SOLR-319
> Project: Solr
>  Issue Type: Improvement
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-319.patch
>
>
> WHAT:
> Currently, SynonymFilterFactory works very well with N-gram tokenizer 
> (CJKTokenizer, for example).
> But we have to take care of the statement in synonyms.txt.
> For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want 
> C1C2C3 maps to C4C5C6,
> I have to write the rule as follows:
> C1C2 C2C3 => C4C5 C5C6
> But I want to write it "C1C2C3=>C4C5C6". This patch allows it. It is also 
> helpful for sharing synonyms.txt.
> HOW:
> tokenFactory attribute is added to  class="solr.SynonymFilterFactory"/>.
> If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory 
> to create Tokenizer.
> Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in 
> synonyms.txt file.
> sample-1: CJKTokenizer
>  positionIncrementGap="100">
>   
> 
>  synonyms="ngram_synonym_test_ja.txt"
>   ignoreCase="true" expand="true" 
> tokenFactory="solr.CJKTokenizerFactory"/>
> 
>   
>   
> 
> 
>   
> 
> sample-2: NGramTokenizer
>  positionIncrementGap="100">
>   
>  maxGramSize="2"/>
> 
>   
>   
>  maxGramSize="2"/>
>  synonyms="ngram_synonym_test_ngram.txt"
>   ignoreCase="true" expand="true"
>   tokenFactory="solr.NGramTokenizerFactory" 
> minGramSize="2" maxGramSize="2"/>
> 
>   
> 
> backward compatibility:
> Yes. If you omit tokenFactory attribute from  class="solr.SynonymFilterFactory"/> tag, it works as usual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2007-09-14 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527694
 ] 

Walter Underwood commented on SOLR-127:
---

Last-modified does require monotonic time, but ETags are version stamps without 
any ordering. The indexVersion should be fine for an ETag.

> Make Solr more friendly to external HTTP caches
> ---
>
> Key: SOLR-127
> URL: https://issues.apache.org/jira/browse/SOLR-127
> Project: Solr
>  Issue Type: Wish
>Reporter: Hoss Man
> Attachments: HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch
>
>
> an offhand comment I saw recently reminded me of something that really bugged 
> me about the serach solution i used *before* Solr -- it didn't play nicely 
> with HTTP caches that might be sitting in front of it.
> at the moment, Solr doesn't put in particularly usefull info in the HTTP 
> Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
> requests with a 400, and doesn't do anything special with If-Modified-Since.
> t the very least, we can set a Last-Modified based on when the current 
> IndexReder was open (if not the Date on the IndexReader) and use the same 
> info to determing how to respond to If-Modified-Since requests.
> (for the record, i think the reason this hasn't occured to me in the 2+ years 
> i've been using Solr, is because with the internal caching, i've yet to need 
> to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.