[jira] Commented: (SOLR-215) Multiple Solr Cores - remove static singleton

2007-10-12 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534437
 ] 

Ryan McKinley commented on SOLR-215:


Ok, I liked... fixing this is not hard.

Deprecation support was already baked into IndexSchema:

TokenFilterFactory tfac = 
(TokenFilterFactory)solrConfig.newInstance(className);
if (tfac instanceof SolrConfig.Initializable)
  ((SolrConfig.Initializable)tfac).init(solrConfig, 
DOMUtil.toMapExcept(attrs,"class"));
else
  tfac.init(DOMUtil.toMapExcept(attrs,"class"));

the problem is that BaseTokenizerFactory and BaseTokenFilterFactory both 
implement SolrConfig.Initializable so the IndexSchema assumes they are using 
the new interface.  If someone extends something from these Base classes it is 
not called.

the fix is simply to call init( args ) from within init( config, args ) -- I'll 
remove the warning message since that will be called by default now.

ryan

> Multiple Solr Cores - remove static singleton
> -
>
> Key: SOLR-215
> URL: https://issues.apache.org/jira/browse/SOLR-215
> Project: Solr
>  Issue Type: Improvement
>Reporter: Henri Biestro
>Priority: Minor
> Fix For: 1.3
>
> Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
> solr-215.patch, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-215.patch.zip, solr-trunk-533775.patch, solr-trunk-538091.patch, 
> solr-trunk-542847-1.patch, solr-trunk-542847.patch, solr-trunk-src.patch
>
>
> WHAT:
> As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
> This patch is intended to allow multiple cores in Solr which also brings 
> multiple indexes capability.
> The patch file to grab is solr-215.patch.zip (see MISC session below).
> WHY:
> The current Solr practical wisdom is that one schema - thus one index - is 
> most likely to accomodate your indexing needs, using a filter to segregate 
> documents if needed. If you really need multiple indexes, deploy multiple web 
> applications.
> There are a some use cases however where having multiple indexes or multiple 
> cores through Solr itself may make sense.
> Multiple cores:
> Deployment issues within some organizations where IT will resist deploying 
> multiple web applications.
> Seamless schema update where you can create a new core and switch to it 
> without starting/stopping servers.
> Embedding Solr in your own application (instead of 'raw' Lucene) and 
> functionally need to segregate schemas & collections.
> Multiple indexes:
> Multiple language collections where each document exists in different 
> languages, analysis being language dependant.
> Having document types that have nothing (or very little) in common with 
> respect to their schema, their lifetime/update frequencies or even collection 
> sizes.
> HOW:
> The best analogy is to consider that instead of deploying multiple 
> web-application, you can have one web-application that hosts more than one 
> Solr core. The patch does not change any of the core logic (nor the core 
> code); each core is configured & behaves exactly as the one core in 1.2; the 
> various caches are per-core & so is the info-bean-registry.
> What the patch does is replace the SolrCore singleton by a collection of 
> cores; all the code modifications are driven by the removal of the different 
> singletons (the config, the schema & the core).
> Each core is 'named' and a static map (keyed by name) allows to easily manage 
> them.
> You declare one servlet filter mapping per core you want to expose in the 
> web.xml; this allows easy to access each core through a different url. 
> USAGE (example web deployment, patch installed):
> Step0
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml 
> monitor.ml
> Will index the 2 documents in solr.xml & monitor.xml
> Step1:
> http://localhost:8983/solr/core0/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core0 index; 2 
> documents
> Step2:
> http://localhost:8983/solr/core1/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core1 index; no 
> documents
> Step3:
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml
> java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml
> Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1;
> running queries from the admin interface, you can verify indexes have 
> different content. 
> USAGE (Java code):
> //create a configuration
> SolrConfig config = new SolrConfig("solrconfig.xml");
> //create a schema
> IndexSchema schema = new IndexSchema(config, "schema0.xml");
> //create a core from the 2 other.
> SolrCore core = new SolrCore("core0", "/path/to/

[jira] Assigned: (SOLR-378) Increase connectionTimeout so SolrExampleTest doesn't timeout when trying to connect to SolrServer instance

2007-10-12 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley reassigned SOLR-378:
--

Assignee: Ryan McKinley

> Increase connectionTimeout so SolrExampleTest doesn't timeout when trying to 
> connect to SolrServer instance
> ---
>
> Key: SOLR-378
> URL: https://issues.apache.org/jira/browse/SOLR-378
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 1.3
> Environment: Nightly build box
>Reporter: Yousef Ourabi
>Assignee: Ryan McKinley
>Priority: Trivial
> Fix For: 1.3
>
> Attachments: build.patch
>
>
> The SolrExampleJettyTest is timing out, the default timeout on the SolrServer 
> instance is set to 5. The attached patch adds a system property to the junit 
> task (build.xml line 395) that increases this to 30, and also tweaks line 72 
> of SolrExampleJettyTest to read the system property.
> http://lucene.zones.apache.org:8080/hudson/job/Solr-Nightly/ws/trunk/build/test-results/TEST-org.apache.solr.client.solrj.embedded.SolrExampleJettyTest.xml
> Thanks,
> Yousef Ourabi

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Deprecations and SolrConfig patch

2007-10-12 Thread Ryan McKinley


Check my last comment on SOLR-215



which should not be an indication that the code in question will not 
function correctly.  Also, there isn't a peep of warning in CHANGES.txt




I was waiting for SOLR-360 to add the CHANGES.txt for multi-core related 
things.  You are right, the deprecated API deprecations and alternatives 
need to go in ASAP.



The problem seems to stem from leaving in the old method.  Since we are 
breaking backward compatibility, it would be better to break hard and 
prevent Solr from compiling, or to actually provide backward compatiblity.


grumpy,
-Mike



I think there is a way to maintain backward compatibility, but it is not 
totally straightforward (so i can't do it right now)


Sorry this caused you head banging and thanks for (unwhittingly) helping 
to iron out SOLR-215.


ryan





[jira] Commented: (SOLR-215) Multiple Solr Cores - remove static singleton

2007-10-12 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534433
 ] 

Ryan McKinley commented on SOLR-215:


Mike Klass points out a BIG BAD problem with this patch:
http://www.nabble.com/Deprecations-and-SolrConfig-patch-tf4611038.html

The token filter interface keeps:
@Deprecated
  public void init(Map args) {
log.warning("calling the deprecated form of init; should be calling 
init(SolrConfig solrConfig, Map args)");
this.args=args;
  } 

but this is never called, so it only tricks us into thinking it is backwards 
compatible. 

Options:
1. Break the API -- at least no one would get fooled into thinking it works

2. Add some hacky bits to IndexSchema readTokenFilterFactory that first calls 
the deprecated init, then calls the 'real' one. -- make some clear statemes 
somewhere about how this works and how it will go away.

I don't have time to look at this for another week or so, but it is very 
important.  Henri, if you have some time, it would be great if you could take a 
look at some options.

ryan


> Multiple Solr Cores - remove static singleton
> -
>
> Key: SOLR-215
> URL: https://issues.apache.org/jira/browse/SOLR-215
> Project: Solr
>  Issue Type: Improvement
>Reporter: Henri Biestro
>Priority: Minor
> Fix For: 1.3
>
> Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
> solr-215.patch, solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-215.patch.zip, solr-trunk-533775.patch, solr-trunk-538091.patch, 
> solr-trunk-542847-1.patch, solr-trunk-542847.patch, solr-trunk-src.patch
>
>
> WHAT:
> As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
> This patch is intended to allow multiple cores in Solr which also brings 
> multiple indexes capability.
> The patch file to grab is solr-215.patch.zip (see MISC session below).
> WHY:
> The current Solr practical wisdom is that one schema - thus one index - is 
> most likely to accomodate your indexing needs, using a filter to segregate 
> documents if needed. If you really need multiple indexes, deploy multiple web 
> applications.
> There are a some use cases however where having multiple indexes or multiple 
> cores through Solr itself may make sense.
> Multiple cores:
> Deployment issues within some organizations where IT will resist deploying 
> multiple web applications.
> Seamless schema update where you can create a new core and switch to it 
> without starting/stopping servers.
> Embedding Solr in your own application (instead of 'raw' Lucene) and 
> functionally need to segregate schemas & collections.
> Multiple indexes:
> Multiple language collections where each document exists in different 
> languages, analysis being language dependant.
> Having document types that have nothing (or very little) in common with 
> respect to their schema, their lifetime/update frequencies or even collection 
> sizes.
> HOW:
> The best analogy is to consider that instead of deploying multiple 
> web-application, you can have one web-application that hosts more than one 
> Solr core. The patch does not change any of the core logic (nor the core 
> code); each core is configured & behaves exactly as the one core in 1.2; the 
> various caches are per-core & so is the info-bean-registry.
> What the patch does is replace the SolrCore singleton by a collection of 
> cores; all the code modifications are driven by the removal of the different 
> singletons (the config, the schema & the core).
> Each core is 'named' and a static map (keyed by name) allows to easily manage 
> them.
> You declare one servlet filter mapping per core you want to expose in the 
> web.xml; this allows easy to access each core through a different url. 
> USAGE (example web deployment, patch installed):
> Step0
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml 
> monitor.ml
> Will index the 2 documents in solr.xml & monitor.xml
> Step1:
> http://localhost:8983/solr/core0/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core0 index; 2 
> documents
> Step2:
> http://localhost:8983/solr/core1/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core1 index; no 
> documents
> Step3:
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml
> java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml
> Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1;
> running queries from the admin interface, you can verify indexes have 
> different content. 
> USAGE (Java code):
> //create a configuration
> SolrConfig config = new SolrConfig("solrconfig.xml");
> //create a schema
> IndexSchema schema

Re: CommonsHttpSolrServer and multithread

2007-10-12 Thread Ryan McKinley



I could build one CommonsHttpSolrServer for each query, or I could build
just one, put it in a singleton and reuse it.



either way.  Solrj uses MultiThreadedHttpConnectionManager.



SolrJ I'm using is dated 2007-09-24 (downloaded from hudson), httpclient
libs used is 3.1 (the very one than came with that sorlj)



Try with a nightly after Oct 5 and see what happens:
http://svn.apache.org/viewvc?view=rev&revision=582349

If that does not do it, we need to fix something.

ryan


[jira] Resolved: (SOLR-376) Allow backup summary field for highlighting

2007-10-12 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-376.
-

Resolution: Fixed

committed for 1.3

> Allow backup summary field for highlighting
> ---
>
> Key: SOLR-376
> URL: https://issues.apache.org/jira/browse/SOLR-376
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Mike Klaas
>Assignee: Mike Klaas
>Priority: Minor
> Fix For: 1.3
>
> Attachments: alternate_field_highlight.patch
>
>
> Allow the specification of an alternate field to use as the highlight field 
> if no snippets are generated.
> e.g.
> f.contents.hl.alternateField=backupSummary

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-375) SpellCheckerRequestHandler improvements to handle multiWords and identify if a word is spelled correctly

2007-10-12 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas updated SOLR-375:


Fix Version/s: (was: 1.2)
   1.3

> SpellCheckerRequestHandler improvements to handle multiWords and identify if 
> a word is spelled correctly
> 
>
> Key: SOLR-375
> URL: https://issues.apache.org/jira/browse/SOLR-375
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 1.2
> Environment: Tested using: Windows XP, Apache TomCat v5.5.23, Java 
> JDK 1.5.0_12, Solr v1.2
>Reporter: Scott Tabar
> Fix For: 1.3
>
> Attachments: JIRA_SOLR-375.diff
>
>
> The current implementation of SpellCheckerRequestHandler has some limitations:
> 1. It does not identify if a word is spelled correctly (a match to its index) 
>   a. If a word is spelled correctly, the correct spelling is not included in 
> the suggested list, so the suggestions cannot be used to deduce if the word 
> is correct
>   b. If the word does not exist in the index and there are no suggestions, 
> the suggestion list is empty
> 2. No support for multiple words
> I have made some changes to this class that addresses these limitations:
> 1. the key value pair exists=true/false has been added to provide a clear 
> understanding if the word is in the index or not
> 2. the key value pair words=_words_to_be_checked_ to identify the original 
> word(s) that was checked and for what the suggestion list is for.  This 
> becomes more important for the support of multiple words.
> 3. If a parameter key word on the query string exists with the value of 
> multiWords=true, then support for multiple words is enabled.
>   a. Multiple words are defined by the value of q and are separated by either 
> a space or +
>   b. Each word is has its own entry in a NamedList object so as to group all 
> result attributes back to that word: words=, exist=, and suggestions=
>  
> My intended goals is that these changes should not effect existing 
> implementations of the spell checker within Solr.
> The format of the multiWords support should be easily supported and used 
> within Prototype if the output type is JSon.
> I have made the changes.  I still need to do some basic testing to ensure all 
> is working as it is intended, then I will commit to SVN (within 24 hours?).  
> When I commit, I will also add more JavaDocs to the class, and also try to 
> attach more comments to this JIRA.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Solr-Nightly #228

2007-10-12 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Solr-Nightly/228/changes

Changes:

[klaas] SOLR-376: hl.alternateField

--
[...truncated 999 lines...]
A client/ruby/solr-ruby/test/unit/spellchecker_request_test.rb
AUclient/ruby/solr-ruby/test/unit/data_mapper_test.rb
AUclient/ruby/solr-ruby/test/unit/util_test.rb
A client/ruby/solr-ruby/test/functional
A client/ruby/solr-ruby/test/functional/test_solr_server.rb
A client/ruby/solr-ruby/test/functional/server_test.rb
A client/ruby/solr-ruby/test/conf
AUclient/ruby/solr-ruby/test/conf/schema.xml
A client/ruby/solr-ruby/test/conf/protwords.txt
A client/ruby/solr-ruby/test/conf/stopwords.txt
AUclient/ruby/solr-ruby/test/conf/solrconfig.xml
A client/ruby/solr-ruby/test/conf/scripts.conf
A client/ruby/solr-ruby/test/conf/admin-extra.html
A client/ruby/solr-ruby/test/conf/synonyms.txt
A client/ruby/solr-ruby/LICENSE.txt
A client/ruby/solr-ruby/Rakefile
A client/ruby/solr-ruby/script
AUclient/ruby/solr-ruby/script/setup.rb
AUclient/ruby/solr-ruby/script/solrshell
A client/ruby/solr-ruby/lib
A client/ruby/solr-ruby/lib/solr
AUclient/ruby/solr-ruby/lib/solr/util.rb
A client/ruby/solr-ruby/lib/solr/document.rb
A client/ruby/solr-ruby/lib/solr/exception.rb
AUclient/ruby/solr-ruby/lib/solr/indexer.rb
AUclient/ruby/solr-ruby/lib/solr/response.rb
AUclient/ruby/solr-ruby/lib/solr/connection.rb
A client/ruby/solr-ruby/lib/solr/importer
AUclient/ruby/solr-ruby/lib/solr/importer/delimited_file_source.rb
AUclient/ruby/solr-ruby/lib/solr/importer/solr_source.rb
AUclient/ruby/solr-ruby/lib/solr/importer/array_mapper.rb
AUclient/ruby/solr-ruby/lib/solr/importer/mapper.rb
AUclient/ruby/solr-ruby/lib/solr/importer/xpath_mapper.rb
A client/ruby/solr-ruby/lib/solr/importer/hpricot_mapper.rb
A client/ruby/solr-ruby/lib/solr/xml.rb
AUclient/ruby/solr-ruby/lib/solr/importer.rb
A client/ruby/solr-ruby/lib/solr/field.rb
AUclient/ruby/solr-ruby/lib/solr/solrtasks.rb
A client/ruby/solr-ruby/lib/solr/request
A client/ruby/solr-ruby/lib/solr/request/ping.rb
A client/ruby/solr-ruby/lib/solr/request/spellcheck.rb
A client/ruby/solr-ruby/lib/solr/request/select.rb
AUclient/ruby/solr-ruby/lib/solr/request/optimize.rb
AUclient/ruby/solr-ruby/lib/solr/request/standard.rb
A client/ruby/solr-ruby/lib/solr/request/delete.rb
AUclient/ruby/solr-ruby/lib/solr/request/index_info.rb
A client/ruby/solr-ruby/lib/solr/request/update.rb
A client/ruby/solr-ruby/lib/solr/request/dismax.rb
AUclient/ruby/solr-ruby/lib/solr/request/modify_document.rb
A client/ruby/solr-ruby/lib/solr/request/add_document.rb
A client/ruby/solr-ruby/lib/solr/request/commit.rb
A client/ruby/solr-ruby/lib/solr/request/base.rb
AUclient/ruby/solr-ruby/lib/solr/request.rb
A client/ruby/solr-ruby/lib/solr/response
A client/ruby/solr-ruby/lib/solr/response/ping.rb
A client/ruby/solr-ruby/lib/solr/response/spellcheck.rb
AUclient/ruby/solr-ruby/lib/solr/response/select.rb
AUclient/ruby/solr-ruby/lib/solr/response/optimize.rb
A client/ruby/solr-ruby/lib/solr/response/standard.rb
A client/ruby/solr-ruby/lib/solr/response/xml.rb
A client/ruby/solr-ruby/lib/solr/response/ruby.rb
A client/ruby/solr-ruby/lib/solr/response/delete.rb
AUclient/ruby/solr-ruby/lib/solr/response/index_info.rb
A client/ruby/solr-ruby/lib/solr/response/dismax.rb
AUclient/ruby/solr-ruby/lib/solr/response/modify_document.rb
A client/ruby/solr-ruby/lib/solr/response/add_document.rb
A client/ruby/solr-ruby/lib/solr/response/commit.rb
A client/ruby/solr-ruby/lib/solr/response/base.rb
AUclient/ruby/solr-ruby/lib/solr.rb
A client/ruby/solr-ruby/CHANGES.yml
A client/ruby/solr-ruby/README
A client/ruby/solr-ruby/examples
A client/ruby/solr-ruby/examples/marc
AUclient/ruby/solr-ruby/examples/marc/marc_importer.rb
A client/ruby/solr-ruby/examples/delicious_library
A client/ruby/solr-ruby/examples/delicious_library/sample_export.txt
AUclient/ruby/solr-ruby/examples/delicious_library/dl_importer.rb
A client/ruby/solr-ruby/examples/tang
AUclient/ruby/solr-ruby/examples/tang/tang_importer.rb
 U
At revision 584174
[trunk] $ /export/home/hudson/tools/ant/apache-ant-1.6.5/bin/ant 
-Dversion=$BUILD_ID -Dtest.junit.output.format=xml nightly
Buildfile: build.xml

init-forrest-entities:
[mkdir] Created dir: 
http://lucene.zones.apache.org:8080/hudson/job/Solr-Nightly/ws/trunk/build 

compile-common:
[mkdir] Created d

[jira] Updated: (SOLR-370) Add a STX stream transform update handler

2007-10-12 Thread Thomas Peuss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Peuss updated SOLR-370:
--

Attachment: StxUpdateRequestHandler.patch

* Added a unit test
* Thread handling more robust: If the consumer thread dies the producing thread 
is interrupted as well now

> Add a STX stream transform update handler
> -
>
> Key: SOLR-370
> URL: https://issues.apache.org/jira/browse/SOLR-370
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Thomas Peuss
>Priority: Minor
> Attachments: example-stx.zip, joost-20070718-bin.zip, 
> StxUpdateRequestHandler.patch, StxUpdateRequestHandler.patch, 
> StxUpdateRequestHandler.patch
>
>
> Here is a patch that adds a STX stream transform update handler. This allows 
> to feed custom XML formats to Solr. It is based on the STX transformation 
> engine Joost (http://joost.sourceforge.net/).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-303) Federated Search over HTTP

2007-10-12 Thread Sharad Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534268
 ] 

Sharad Agarwal commented on SOLR-303:
-

>>But: I'm still having the problem where multi-valued fields only get one 
>>value returned. During AuxiliaryQPhaseComponent.merge(SolrQueryResponse rsp, 
>>SolrQueryResponse auxPhaseRes), you check whether the field already exists 
>>before adding it, but multi-value fields can exist multiple times.

yeah, may be I have missed those scenarios. If you have the fix, pl feel free 
to update the patch.

>>Also, I'm considering disabling the AuxiliaryQPhase and just letting the 
>>MainQPhase fetch the document fields. All of my documents are small ( < 1k on 
>>average with 10ish fields), so I think making another call across the network 
>>to fetch the remaining fields is probably a waste for our indexes. What do 
>>you think?
Having AuxiliaryQPhase saves primarily on following counts:-
1) fetching doc fields 
2) generating snippets 
3) more like this query etc
-> for only the merged docs.

>From my experience generating snippets is very CPU intensive and if the no of 
>shards are large, there would be lot of CPU wastage (if snippets are generated 
>in MainQPhase) => CPU wastage proportional to (n-1)/n  => n being no of shards
So, having extra network calls saves on CPU. Hence there being trade-off 
between two.

> Federated Search over HTTP
> --
>
> Key: SOLR-303
> URL: https://issues.apache.org/jira/browse/SOLR-303
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Sharad Agarwal
>Priority: Minor
> Attachments: fedsearch.patch, fedsearch.patch, fedsearch.patch, 
> fedsearch.patch, fedsearch.stu.patch, fedsearch.stu.patch
>
>
> Motivated by http://wiki.apache.org/solr/FederatedSearch
> "Index view consistency between multiple requests" requirement is relaxed in 
> this implementation.
> Does the federated search query side. Update not yet done.
> Tries to achieve:-
> 
> - The client applications are totally agnostic to federated search. The 
> federated search and merging of results are totally behind the scene in Solr 
> in request handler . Response format remains the same after merging of 
> results.
> The response from individual shard is deserialized into SolrQueryResponse 
> object. The collection of SolrQueryResponse objects are merged to produce a 
> single SolrQueryResponse object. This enables to use the Response writers as 
> it is; or with minimal change.
> - Efficient query processing with highlighting and fields getting generated 
> only for merged documents. The query is executed in 2 phases. First phase 
> gets the doc unique keys with sort criteria. Second phase brings all 
> requested fields and highlighting information. This saves lot of CPU in case 
> there are good number of shards and highlighting info is requested.
> Should be easy to customize the query execution. For example: user can 
> specify to execute query in just 1 phase itself. (For some queries when 
> highlighting info is not required and number of fields requested are small; 
> this can be more efficient.)
> - Ability to easily overwrite the default Federated capability by appropriate 
> plugins and request parameters. As federated search is performed by the 
> RequestHandler itself, multiple request handlers can easily be pre-configured 
> with different federated search settings in solrconfig.xml
> - Global weight calculation is done by querying the terms' doc frequencies 
> from all shards.
> - Federated search works on Http transport. So individual shard's VIP can be 
> queried. Load-balancing and Fail-over taken care by VIP as usual.
> -Sub-searcher response parsing as a plugin interface. Different 
> implementation could be written based on JSON, xml SAX etc. Current one based 
> on XML DOM.
> HOW:
> ---
> A new RequestHandler called MultiSearchRequestHandler does the federated 
> search on multiple sub-searchers, (referred as "shards" going forward). It 
> extends the RequestHandlerBase. handleRequestBody method in 
> RequestHandlerBase has been divided into query building and execute methods. 
> This has been done to calculate global numDocs and docFreqs; and execute the 
> query efficiently on multiple shards.
> All the "search" request handlers are expected to extend 
> MultiSearchRequestHandler class in order to enable federated capability for 
> the handler. StandardRequestHandler and DisMaxRequestHandler have been 
> changed to extend this class.
>  
> The federated search kicks in if "shards" is present in the request 
> parameter. Otherwise search is performed as usual on the local index. eg. 
> shards=local,host1:port1,host2:port2 will search on the local index and 2 
> remote indexes. The search response from all 3 shards

CommonsHttpSolrServer and multithread

2007-10-12 Thread Walter Ferrara
What is the best approach to use solrj CommonsHttpSolrServer for
execution of queries like in
CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);
server.query(query);
?
I could build one CommonsHttpSolrServer for each query, or I could build
just one, put it in a singleton and reuse it.

The point is, I get exception both ways.

When I do standard queries test with a single thread, everything works
fine, although I feel far more safe recreating CommonsHttpSolrServer for
each query.
When I do stress test with 6 concurrent or more threads, I got
exception. (Not every query fails, just very few of them.)

If I recreate one server per query, I may end up with BindException on
Windows (which, AFAIK could be a windows-related problem, take a look
for example at:
http://www.mailinglistarchive.com/[EMAIL PROTECTED]/msg00575.html
), and with a java.net.SocketException: Too many open files on Linux,
which maybe state the same issue (no free local ports?).

But if I reuse the same server object, maybe some piece of code inside
the XML parser is not thread-safe (but should CommonsHttpSolrServer be
thread-safe?) , I end up with exception like:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[-1,-1]
Message: Element type "int" must be followed by either attribute
specifications, ">" or "/>".
at
com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown
Source)
at
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:172)
at
org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:196)
at
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:84)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:239)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:80)
[...]

or

com.sun.org.apache.xerces.internal.xni.XNIException: Scanner State 7 not
Recognized
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$TrailingMiscDriver.next(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown
Source)
at
com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.setInputSource(Unknown
Source)
at
com.sun.xml.internal.stream.XMLInputFactoryImpl.getXMLStreamReaderImpl(Unknown
Source)
at
com.sun.xml.internal.stream.XMLInputFactoryImpl.createXMLStreamReader(Unknown
Source)
at
org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:67)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:239)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:80)
at
org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:99)
[...]
(does it have something in common with SOLR-360?)
I got no such exception if I recreate a CommonsHttpSolrServer  object
every query, only the BindException.

My enviroment for testing is Windows (2003), while solr reside in a
jetty in a remote machine - but solr seems not responsible for it - look
to me more a httpclient related thing for the BindException. I've tried
acting on max-connection-per-host and similar parameters, (via
httpclient defaults), but it seems not to resolve.

Does httpclient works as a singleton, like a connection pool, i.e.
opening one new CommonsHttpSolrServer, even if it create a new
MultiThreadedHttpConnectionManager every time, just reuse the same
connection pool, is that right?

SolrJ I'm using is dated 2007-09-24 (downloaded from hudson), httpclient
libs used is 3.1 (the very one than came with that sorlj)

Walter
--



Solr nightly build failure

2007-10-12 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build

compile-common:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/common
[javac] Compiling 27 source files to /tmp/apache-solr-nightly/build/common
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/core
[javac] Compiling 226 source files to /tmp/apache-solr-nightly/build/core
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile-solrj-core:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/client/solrj
[javac] Compiling 22 source files to 
/tmp/apache-solr-nightly/build/client/solrj
[javac] Note: 
/tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compile-solrj:
[javac] Compiling 2 source files to 
/tmp/apache-solr-nightly/build/client/solrj

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 66 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 25, Failures: 0, Errors: 0, Time elapsed: 23.615 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 8.543 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 5.618 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 3.206 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.16 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.748 sec
[junit] Running org.apache.solr.analysis.TestBufferedTokenStream
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.837 sec
[junit] Running org.apache.solr.analysis.TestCapitalizationFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.556 sec
[junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.439 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.439 sec
[junit] Running org.apache.solr.analysis.TestPatternReplaceFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.927 sec
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.473 sec
[junit] Running org.apache.solr.analysis.TestPhoneticFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.818 sec
[junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.029 sec
[junit] Running org.apache.solr.analysis.TestSynonymFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.162 sec
[junit] Running org.apache.solr.analysis.TestTrimFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.442 sec
[junit] Running org.apache.solr.analysis.TestWordDelimiterFilter
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 4.665 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.061 sec
[junit] Running org.apache.solr.common.params.SolrParamTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.073 sec
[junit] Running org.apache.solr.common.util.ContentStreamTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.071 sec
[junit] Running org.apache.solr.common.util.IteratorChainTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.058 sec
[junit] Running org.apache.solr.common.util.TestXMLEscaping
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.067 sec
[junit] Running org.apache.solr.core.RequestHandlersTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 3.649 sec
[junit] Running org.apache.solr.core.S