[jira] Commented: (SOLR-561) Solr replication by Solr (for windows also)
[ https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608668#action_12608668 ] Noble Paul commented on SOLR-561: - bq: First we have an active master, some standby masters and search slaves This looks like a good approach. In the current design I must allow users to specify multiple 'materUrl' . This must take care of one or more standby masters. It can automatically fallback to another master if one fails. bq.On active master, there is a index snapshots manager. Whenever there's an update, it takes a snapshot. On window, it uses copy (I should try fsutil) and on linux it uses hard link..The snapshot manager also clean up old snapshots. From time to time, I still got index corruption when commit update. When that happen, shapshot manager allows us to rollback to previous good snapshot. How can I know if the index got corrupted? if I can know it the best way to implement that would be to add a command to ReplicationHandler to rollback to latest . bq.On active master, there is a replication server component which listens at a specific port plain socket communication is more work than relying over the simple http protocol .The little extra efficiency you may achieve may not justify that (http is not too solw either). In this case the servlet container provides you with sockets , threads etc etc. Take a look at the patch on how efficiently is it done in the current patch. bq.client creates a tmp directory and hard link everything from its local index directory, then for each file in the file list, if it does not exit locally, get new file from server; if it is newer than local one, ask server for update like rsync; if local files do not exist in file list, delete them. in the case of compound file is used for index, the file update will update only diff blocks. The current implementation is more or less like what you have done. For a compound file I am not sure if a diff based sync can be more efficient. Because it is hard to get the similar blocks in the file. I rely on checksums of whole file. If there is an efficient mechanism to obtain identical blocks, share the code I can incorporate that The hardlink approach may be not necessary now as I made the SolrCore not to hardcode the index folder. > Solr replication by Solr (for windows also) > --- > > Key: SOLR-561 > URL: https://issues.apache.org/jira/browse/SOLR-561 > Project: Solr > Issue Type: New Feature > Components: replication >Affects Versions: 1.3 > Environment: All >Reporter: Noble Paul > Attachments: deletion_policy.patch, SOLR-561.patch, SOLR-561.patch > > > The current replication strategy in solr involves shell scripts . The > following are the drawbacks with the approach > * It does not work with windows > * Replication works as a separate piece not integrated with solr. > * Cannot control replication from solr admin/JMX > * Each operation requires manual telnet to the host > Doing the replication in java has the following advantages > * Platform independence > * Manual steps can be completely eliminated. Everything can be driven from > solrconfig.xml . > ** Adding the url of the master in the slaves should be good enough to enable > replication. Other things like frequency of > snapshoot/snappull can also be configured . All other information can be > automatically obtained. > * Start/stop can be triggered from solr/admin or JMX > * Can get the status/progress while replication is going on. It can also > abort an ongoing replication > * No need to have a login into the machine > This issue can track the implementation of solr replication in java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-609) SpellCheckComponent doesn't read default options from solrconfig.xml
[ https://issues.apache.org/jira/browse/SOLR-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-609: --- Attachment: SOLR-609.patch Constructs a defaults SolrParams in the init method which is used for getting default value specified in solrconfig.xml for onlyMorePopular, count, collate and extendedResults parameters. > SpellCheckComponent doesn't read default options from solrconfig.xml > > > Key: SOLR-609 > URL: https://issues.apache.org/jira/browse/SOLR-609 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 1.3 > Environment: confirmed on FreeBSD7-stable , nightly 1.3 build > (2008-06-25) , jdk1.6. > I am using the spellchecker called as last-components from my dismax handler. >Reporter: Norberto Meijome >Priority: Minor > Attachments: SOLR-609.patch > > > solrconfig.xml contains : > [...] >class="org.apache.solr.handler.component.SpellCheckComponent"> > > > false > > true > > 1 > > true > > [... all default options after this] > confirmed options .count , collate , extendedResults set in solrconfig.xml > take no effect on the query . They work as intended if added to the URL. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-607) Commit only request handler for read only slaves
[ https://issues.apache.org/jira/browse/SOLR-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-607. --- Resolution: Duplicate This issue is very different from SOLR-527 ... it's so incredible different that it's actually exactly the same. > Commit only request handler for read only slaves > > > Key: SOLR-607 > URL: https://issues.apache.org/jira/browse/SOLR-607 > Project: Solr > Issue Type: New Feature > Components: update >Reporter: Hoss Man > > Replication currently requires that the snapinstaller script be able to use > curl to hit a URL (/update) to stream a {{{}} command to. > To help make it easier to "secure" read only Solr slave instances, we should > add a "CommitOnlyRequestHandler" which would ignore all content streams and > could be used on slaves in place of XmlUpdateRequestHandler just for > triggering a commit to open a new Searcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-561) Solr replication by Solr (for windows also)
[ https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608660#action_12608660 ] Yajun Liu commented on SOLR-561: I'm using Solr to build a search service for my company. From operation or maybe performance point view, we need to use java to replicate index. >From very high level, my design is similar to what Noble mentioned here. It is >like this: 1) First we have an active master, some standby masters and search slaves. The active master handles crawling data and update index; standby masters are redundant to active master. If active master goes away, one of the standby will become active. Standby masters replicate index from active master to act as backup; search slaves only replicate index from active master. 2) On active master, there is a index snapshots manager. Whenever there's an update, it takes a snapshot. On window, it uses copy (I should try fsutil) and on linux it uses hard link..The snapshot manager also clean up old snapshots. From time to time, I still got index corruption when commit update. When that happen, shapshot manager allows us to rollback to previous good snapshot. 3) On active master, there is a replication server component which listens at a specific port (The reason I did not use http port is I do not use solr as it is. I embed solr in our application server, so go through http would be not very efficient for us). Each standby and slave has replication client component. The following is the protocol between the replication client and server: a) client ping the a directory server for the location of active master b) connect to the active master at the specific port c) handshake: right now just check for version and authentication. in the future, it will negotiate security, compression, etc. d) client sends SNAPSHOT_OPEN command followed by index name. The master could manage multiple indexes. Server sends index_not_found if index does not exist or ok followed by snapshot name of the latest snapshot; e) if the index is found, client compares the timestamp with that of local snapshot. The timestamp of snapshot is derived from snapshot name because part of snapshot name is encoded timestamps. If local is newer, tell the server to close the snapshot; otherwise, ask server for a list of files in the snapshot. If ok, server sends ok op, followed by a file list including filename, timestamp, etc. f) client creates a tmp directory and hard link everything from its local index directory, then for each file in the file list, if it does not exit locally, get new file from server; if it is newer than local one, ask server for update like rsync; if local files do not exist in file list, delete them. in the case of compound file is used for index, the file update will update only diff blocks. g) if everything goes well, tell server to close the snapshot, rename the tmp directory to a proper place, create solr-core using this new index, warmup any cache if necessary, route new request to this solr-core, close old solr-core, remove old index directory. Right now a client replicates index from active master every 3 mins. for a slow change datasource. It works fine because create new solr-core and warmup cache take less than 3 mins. We plan to use it for a fast changing datasource, so create new solr-core and dump all the cache is not feasible. Any suggestion? > Solr replication by Solr (for windows also) > --- > > Key: SOLR-561 > URL: https://issues.apache.org/jira/browse/SOLR-561 > Project: Solr > Issue Type: New Feature > Components: replication >Affects Versions: 1.3 > Environment: All >Reporter: Noble Paul > Attachments: deletion_policy.patch, SOLR-561.patch, SOLR-561.patch > > > The current replication strategy in solr involves shell scripts . The > following are the drawbacks with the approach > * It does not work with windows > * Replication works as a separate piece not integrated with solr. > * Cannot control replication from solr admin/JMX > * Each operation requires manual telnet to the host > Doing the replication in java has the following advantages > * Platform independence > * Manual steps can be completely eliminated. Everything can be driven from > solrconfig.xml . > ** Adding the url of the master in the slaves should be good enough to enable > replication. Other things like frequency of > snapshoot/snappull can also be configured . All other information can be > automatically obtained. > * Start/stop can be triggered from solr/admin or JMX > * Can get the status/progress while replication is going on. It can also > abort an ongoing replication > * No need to have a login into the machine > This issue can track the implementation of solr replication in java --
[jira] Commented: (SOLR-607) Commit only request handler for read only slaves
[ https://issues.apache.org/jira/browse/SOLR-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608647#action_12608647 ] Sean Timm commented on SOLR-607: How is this different from SOLR-527? > Commit only request handler for read only slaves > > > Key: SOLR-607 > URL: https://issues.apache.org/jira/browse/SOLR-607 > Project: Solr > Issue Type: New Feature > Components: update >Reporter: Hoss Man > > Replication currently requires that the snapinstaller script be able to use > curl to hit a URL (/update) to stream a {{{}} command to. > To help make it easier to "secure" read only Solr slave instances, we should > add a "CommitOnlyRequestHandler" which would ignore all content streams and > could be used on slaves in place of XmlUpdateRequestHandler just for > triggering a commit to open a new Searcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-607) Commit only request handler for read only slaves
[ https://issues.apache.org/jira/browse/SOLR-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608646#action_12608646 ] Ryan McKinley commented on SOLR-607: Perhaps this is not the place to discuss this, but... The word "commit" takes a long time to get used to the idea that that means something like "open the index with everything in it" -- I suppose once you are used to it, we forget how strange it is to "commit" to resync the index. CommitOnlyRequestHandler does not _sound_ "secure" to newbies -- but I'm not sure what a better name would be. > Commit only request handler for read only slaves > > > Key: SOLR-607 > URL: https://issues.apache.org/jira/browse/SOLR-607 > Project: Solr > Issue Type: New Feature > Components: update >Reporter: Hoss Man > > Replication currently requires that the snapinstaller script be able to use > curl to hit a URL (/update) to stream a {{{}} command to. > To help make it easier to "secure" read only Solr slave instances, we should > add a "CommitOnlyRequestHandler" which would ignore all content streams and > could be used on slaves in place of XmlUpdateRequestHandler just for > triggering a commit to open a new Searcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-609) SpellCheckComponent doesn't read default options from solrconfig.xml
SpellCheckComponent doesn't read default options from solrconfig.xml Key: SOLR-609 URL: https://issues.apache.org/jira/browse/SOLR-609 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.3 Environment: confirmed on FreeBSD7-stable , nightly 1.3 build (2008-06-25) , jdk1.6. I am using the spellchecker called as last-components from my dismax handler. Reporter: Norberto Meijome Priority: Minor solrconfig.xml contains : [...] false true 1 true [... all default options after this] confirmed options .count , collate , extendedResults set in solrconfig.xml take no effect on the query . They work as intended if added to the URL. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-607) Commit only request handler for read only slaves
[ https://issues.apache.org/jira/browse/SOLR-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-607: -- Component/s: update Summary: Commit only request handler for read only slaves (was: Commit online request handler for read only slaves) fixing summary > Commit only request handler for read only slaves > > > Key: SOLR-607 > URL: https://issues.apache.org/jira/browse/SOLR-607 > Project: Solr > Issue Type: New Feature > Components: update >Reporter: Hoss Man > > Replication currently requires that the snapinstaller script be able to use > curl to hit a URL (/update) to stream a {{{}} command to. > To help make it easier to "secure" read only Solr slave instances, we should > add a "CommitOnlyRequestHandler" which would ignore all content streams and > could be used on slaves in place of XmlUpdateRequestHandler just for > triggering a commit to open a new Searcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-608) scripts using curl should support authentication params
scripts using curl should support authentication params --- Key: SOLR-608 URL: https://issues.apache.org/jira/browse/SOLR-608 Project: Solr Issue Type: Improvement Components: replication Reporter: Hoss Man All scripts that utilize "curl" should be enhanced such that user authentication based params can be specified and used by curl. This would make it possible for people to "secure" their Solr servers using Servlet Container authentication features, but still interact with those Solr instances using the scripts out of the box. The most straight forward approach would probably be to add a new "curl_args" option in scripts.conf that could could contain any legal curl command line options and would be prepended to the args for all usages of curl in the Solr scripts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-607) Commit online request handler for read only slaves
Commit online request handler for read only slaves -- Key: SOLR-607 URL: https://issues.apache.org/jira/browse/SOLR-607 Project: Solr Issue Type: New Feature Reporter: Hoss Man Replication currently requires that the snapinstaller script be able to use curl to hit a URL (/update) to stream a {{{}} command to. To help make it easier to "secure" read only Solr slave instances, we should add a "CommitOnlyRequestHandler" which would ignore all content streams and could be used on slaves in place of XmlUpdateRequestHandler just for triggering a commit to open a new Searcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly
[ https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608614#action_12608614 ] Geoffrey Young commented on SOLR-606: - I'm not in charge of any of the environments, so it might take me some time to apply the patch. hopefully I'll be able to report back tomorrow. if it matters, my spelling field is defined as so: my spellcheck component configuration was straight from the docs, save changing the queryAnalyzerFieldType to match the above. > spellcheck.colate doesn't handle multiple tokens properly > - > > Key: SOLR-606 > URL: https://issues.apache.org/jira/browse/SOLR-606 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 1.3 > Environment: tomcat >Reporter: Geoffrey Young >Assignee: Grant Ingersoll >Priority: Minor > Attachments: SOLR-606.patch > > > originally posted as part of SOLR-572: > > https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487 > the new spellcheck.collate feature seems to exhibit some strange behaviors > when handed a query with multiple tokens. > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show"}}, > "spellcheck":{ >"suggestions":[ > "redbull",[ >"suggestion",["redbelly"]], > "show",[ >"suggestion",["shot"]], > "collation","redbelly airshotw"]}} > {noformat} > in this case, note the fields are incorrectly concatenated (no space between > tokens, left over 'w' from input string) > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show", > "spellcheck.q":"redbull air show"}}, > "spellcheck":{ > "suggestions":[ > "redbull air show",[ >"suggestion",["redbull singers"]], > "collation","redbull singersredbull air show"]}} > {noformat} > this is slightly different - the suggestions are still concatenated without a > space, but the collation is way off. > --Geoff -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly
[ https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608605#action_12608605 ] Grant Ingersoll commented on SOLR-606: -- Also, can you post your spell check configuration? > spellcheck.colate doesn't handle multiple tokens properly > - > > Key: SOLR-606 > URL: https://issues.apache.org/jira/browse/SOLR-606 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 1.3 > Environment: tomcat >Reporter: Geoffrey Young >Assignee: Grant Ingersoll >Priority: Minor > Attachments: SOLR-606.patch > > > originally posted as part of SOLR-572: > > https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487 > the new spellcheck.collate feature seems to exhibit some strange behaviors > when handed a query with multiple tokens. > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show"}}, > "spellcheck":{ >"suggestions":[ > "redbull",[ >"suggestion",["redbelly"]], > "show",[ >"suggestion",["shot"]], > "collation","redbelly airshotw"]}} > {noformat} > in this case, note the fields are incorrectly concatenated (no space between > tokens, left over 'w' from input string) > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show", > "spellcheck.q":"redbull air show"}}, > "spellcheck":{ > "suggestions":[ > "redbull air show",[ >"suggestion",["redbull singers"]], > "collation","redbull singersredbull air show"]}} > {noformat} > this is slightly different - the suggestions are still concatenated without a > space, but the collation is way off. > --Geoff -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly
[ https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-606: - Priority: Minor (was: Major) > spellcheck.colate doesn't handle multiple tokens properly > - > > Key: SOLR-606 > URL: https://issues.apache.org/jira/browse/SOLR-606 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 1.3 > Environment: tomcat >Reporter: Geoffrey Young >Assignee: Grant Ingersoll >Priority: Minor > Attachments: SOLR-606.patch > > > originally posted as part of SOLR-572: > > https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487 > the new spellcheck.collate feature seems to exhibit some strange behaviors > when handed a query with multiple tokens. > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show"}}, > "spellcheck":{ >"suggestions":[ > "redbull",[ >"suggestion",["redbelly"]], > "show",[ >"suggestion",["shot"]], > "collation","redbelly airshotw"]}} > {noformat} > in this case, note the fields are incorrectly concatenated (no space between > tokens, left over 'w' from input string) > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show", > "spellcheck.q":"redbull air show"}}, > "spellcheck":{ > "suggestions":[ > "redbull air show",[ >"suggestion",["redbull singers"]], > "collation","redbull singersredbull air show"]}} > {noformat} > this is slightly different - the suggestions are still concatenated without a > space, but the collation is way off. > --Geoff -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly
[ https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-606: - Attachment: SOLR-606.patch Can you try this patch and post the results? It doesn't fix the problem, but I'm having a hard time reproducing it and it adds some more output to the spellcheck.extendedResults=true option. Thus, you will need to add extendedResults to your flags. > spellcheck.colate doesn't handle multiple tokens properly > - > > Key: SOLR-606 > URL: https://issues.apache.org/jira/browse/SOLR-606 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 1.3 > Environment: tomcat >Reporter: Geoffrey Young >Assignee: Grant Ingersoll > Attachments: SOLR-606.patch > > > originally posted as part of SOLR-572: > > https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487 > the new spellcheck.collate feature seems to exhibit some strange behaviors > when handed a query with multiple tokens. > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show"}}, > "spellcheck":{ >"suggestions":[ > "redbull",[ >"suggestion",["redbelly"]], > "show",[ >"suggestion",["shot"]], > "collation","redbelly airshotw"]}} > {noformat} > in this case, note the fields are incorrectly concatenated (no space between > tokens, left over 'w' from input string) > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show", > "spellcheck.q":"redbull air show"}}, > "spellcheck":{ > "suggestions":[ > "redbull air show",[ >"suggestion",["redbull singers"]], > "collation","redbull singersredbull air show"]}} > {noformat} > this is slightly different - the suggestions are still concatenated without a > space, but the collation is way off. > --Geoff -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly
[ https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on SOLR-606 started by Grant Ingersoll. > spellcheck.colate doesn't handle multiple tokens properly > - > > Key: SOLR-606 > URL: https://issues.apache.org/jira/browse/SOLR-606 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 1.3 > Environment: tomcat >Reporter: Geoffrey Young >Assignee: Grant Ingersoll > > originally posted as part of SOLR-572: > > https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487 > the new spellcheck.collate feature seems to exhibit some strange behaviors > when handed a query with multiple tokens. > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show"}}, > "spellcheck":{ >"suggestions":[ > "redbull",[ >"suggestion",["redbelly"]], > "show",[ >"suggestion",["shot"]], > "collation","redbelly airshotw"]}} > {noformat} > in this case, note the fields are incorrectly concatenated (no space between > tokens, left over 'w' from input string) > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show", > "spellcheck.q":"redbull air show"}}, > "spellcheck":{ > "suggestions":[ > "redbull air show",[ >"suggestion",["redbull singers"]], > "collation","redbull singersredbull air show"]}} > {noformat} > this is slightly different - the suggestions are still concatenated without a > space, but the collation is way off. > --Geoff -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: How to contribute to Solr
On Thu, Jun 26, 2008 at 5:51 PM, Yajun Liu <[EMAIL PROTECTED]> wrote: > I'm using solr 1.2 to build a search service at my company. I have > made some improvements and bug fixing. The largest changes is I wrote > a java package to allow you to replicate index. The package has a java > implementation of rsync with some optimization for the index > replication. Hi Yajun, For contributing to Solr, please see http://wiki.apache.org/solr/HowToContribute Also note that there has been work done on having Solr do index replication itself: https://issues.apache.org/jira/browse/SOLR-561 You probably want to look at that and perhaps start a discussion of similarities or differences with your approach. -Yonik
How to contribute to Solr
Hi, I'm using solr 1.2 to build a search service at my company. I have made some improvements and bug fixing. The largest changes is I wrote a java package to allow you to replicate index. The package has a java implementation of rsync with some optimization for the index replication. Please let me know whether I could check it into your source tree. Thanks. --Yajun
[jira] Resolved: (SOLR-603) Support Partial Optimizes
[ https://issues.apache.org/jira/browse/SOLR-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-603. -- Resolution: Fixed Committed revision 672031. > Support Partial Optimizes > - > > Key: SOLR-603 > URL: https://issues.apache.org/jira/browse/SOLR-603 > Project: Solr > Issue Type: New Feature >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: SOLR-603.patch, SOLR-603.patch > > > It would be useful if Solr supported Lucene's capability to do partial > optimizes. The associated method on the IndexWriter is > [http://lucene.apache.org/java/2_3_2/api/core/org/apache/lucene/index/IndexWriter.html#optimize(int,%20boolean)] > and the variations there-in. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly
[ https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned SOLR-606: Assignee: Grant Ingersoll > spellcheck.colate doesn't handle multiple tokens properly > - > > Key: SOLR-606 > URL: https://issues.apache.org/jira/browse/SOLR-606 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 1.3 > Environment: tomcat >Reporter: Geoffrey Young >Assignee: Grant Ingersoll > > originally posted as part of SOLR-572: > > https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487 > the new spellcheck.collate feature seems to exhibit some strange behaviors > when handed a query with multiple tokens. > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show"}}, > "spellcheck":{ >"suggestions":[ > "redbull",[ >"suggestion",["redbelly"]], > "show",[ >"suggestion",["shot"]], > "collation","redbelly airshotw"]}} > {noformat} > in this case, note the fields are incorrectly concatenated (no space between > tokens, left over 'w' from input string) > {noformat} > { > "responseHeader":{ > "params":{ > "q":"redbull air show", > "spellcheck.q":"redbull air show"}}, > "spellcheck":{ > "suggestions":[ > "redbull air show",[ >"suggestion",["redbull singers"]], > "collation","redbull singersredbull air show"]}} > {noformat} > this is slightly different - the suggestions are still concatenated without a > space, but the collation is way off. > --Geoff -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608487#action_12608487 ] geoff edited comment on SOLR-572 at 6/26/08 12:26 PM: --- I'm seeing random weirdness in the collation results. the same query shift-refreshed sometimes yields (in json) {noformat} { "responseHeader":{ "params":{ "spellcheck":"true", "q":"redbull air show", "qf":"search-en", "spellcheck.collate":"true", "qt":"dismax", "wt":"json", "rows":"0"}}, "response":{"numFound":0,"start":0,"docs":[] }, "spellcheck":{ "suggestions":[ "redbull",[ "numFound",1, "startOffset",0, "endOffset",7, "suggestion",["redbelly"]], "show",[ "numFound",1, "startOffset",12, "endOffset",16, "suggestion",["shot"]], "collation","redbelly airshotw"]}} {noformat} note the "collation" spacing and extraneous 'w'. a refresh toggles between that and what you might expect : {noformat} "collation","redbelly air shot"] {noformat} UPDATE: opened new issue as SOLR-606 --Geoff was (Author: geoff): I'm seeing random weirdness in the collation results. the same query shift-refreshed sometimes yields (in json) {noformat} { "responseHeader":{ "params":{ "spellcheck":"true", "q":"redbull air show", "qf":"search-en", "spellcheck.collate":"true", "qt":"dismax", "wt":"json", "rows":"0"}}, "response":{"numFound":0,"start":0,"docs":[] }, "spellcheck":{ "suggestions":[ "redbull",[ "numFound",1, "startOffset",0, "endOffset",7, "suggestion",["redbelly"]], "show",[ "numFound",1, "startOffset",12, "endOffset",16, "suggestion",["shot"]], "collation","redbelly airshotw"]}} {noformat} note the "collation" spacing and extraneous 'w'. a refresh toggles between that and what you might expect : {noformat} "collation","redbelly air shot"] {noformat} --Geoff > Spell Checker as a Search Component > --- > > Key: SOLR-572 > URL: https://issues.apache.org/jira/browse/SOLR-572 > Project: Solr > Issue Type: New Feature > Components: spellchecker >Affects Versions: 1.3 >Reporter: Shalin Shekhar Mangar >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 1.3 > > Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch > > > http://wiki.apache.org/solr/SpellCheckComponent > Expose the Lucene contrib SpellChecker as a Search Component. Provide the > following features: > * Allow creating a spell index on a given field and make it possible to have > multiple spell indices -- one for each field > * Give suggestions on a per-field basis > * Given a multi-word query, give only one consistent suggestion > * Process the query with the same analyzer specified for the source field and > process each token separately > * Allow the user to specify minimum length for a token (optional) > Consistency criteria for a multi-word query can consist of the following: > * Preserve the correct words in the original query as it is > * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly
spellcheck.colate doesn't handle multiple tokens properly - Key: SOLR-606 URL: https://issues.apache.org/jira/browse/SOLR-606 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 1.3 Environment: tomcat Reporter: Geoffrey Young originally posted as part of SOLR-572: https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487 the new spellcheck.collate feature seems to exhibit some strange behaviors when handed a query with multiple tokens. {noformat} { "responseHeader":{ "params":{ "q":"redbull air show"}}, "spellcheck":{ "suggestions":[ "redbull",[ "suggestion",["redbelly"]], "show",[ "suggestion",["shot"]], "collation","redbelly airshotw"]}} {noformat} in this case, note the fields are incorrectly concatenated (no space between tokens, left over 'w' from input string) {noformat} { "responseHeader":{ "params":{ "q":"redbull air show", "spellcheck.q":"redbull air show"}}, "spellcheck":{ "suggestions":[ "redbull air show",[ "suggestion",["redbull singers"]], "collation","redbull singersredbull air show"]}} {noformat} this is slightly different - the suggestions are still concatenated without a space, but the collation is way off. --Geoff -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-605) Programatically register SolrEventListeners
[ https://issues.apache.org/jira/browse/SOLR-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-605: --- Attachment: SOLR-605-RegisterEventListeners.patch this adds {code:java} UpdateHandler.java: void registerCommitCallback( SolrEventListener listener ) void registerOptimizeCallback( SolrEventListener listener ) SolrCore.java: void registerFirstSearcherListener( SolrEventListener listener ) void registerNewSearcherListener( SolrEventListener listener ) {code} > Programatically register SolrEventListeners > --- > > Key: SOLR-605 > URL: https://issues.apache.org/jira/browse/SOLR-605 > Project: Solr > Issue Type: New Feature >Reporter: Ryan McKinley >Assignee: Ryan McKinley >Priority: Trivial > Fix For: 1.3 > > Attachments: SOLR-605-RegisterEventListeners.patch > > > Currently all eventListeners need to be registered via solrconfig.xml -- it > would be nice to programatically register classes for these events too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-605) Programatically register SolrEventListeners
Programatically register SolrEventListeners --- Key: SOLR-605 URL: https://issues.apache.org/jira/browse/SOLR-605 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Trivial Fix For: 1.3 Currently all eventListeners need to be registered via solrconfig.xml -- it would be nice to programatically register classes for these events too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608527#action_12608527 ] Sean Timm commented on SOLR-572: For what it is worth, here is the code that I used client side before the collation feature was available. I haven't looked at how it is done in this patch. It has some nice features such as delimiting the spelling correction, e.g., with HTML bold tags, and preserving the users initial case on each word. {code} StringBuilder buff = new StringBuilder(); StringBuilder rawBuff = new StringBuilder(); int last = 0; String userStr = null; // for each suggestion for( Suggestion s : suggestions ) { // add part before the mispelling userStr = userQuery.substring( last, s.startOffset ); buff.append( userStr ); rawBuff.append( userStr ); String suggestion = s.suggestion; if( _spellCheckPreserveUserCase ) { userStr = userQuery.substring( s.startOffset, s.endOffset ); char[] userCh = userStr.toCharArray(); boolean initialUpper = Character.isUpperCase( userCh[0] ); boolean allUpper = true; for( char c : userCh ) { if( Character.isLowerCase( c ) ) { allUpper = false; break; } } if( allUpper ) { suggestion = suggestion.toUpperCase(); } else if( initialUpper ) { userCh = suggestion.toCharArray(); userCh[0] = Character.toUpperCase( userCh[0] ); suggestion = new String( userCh ); } } buff.append( _spellCheckStartHighlight ).append( suggestion ) .append( _spellCheckEndHighlight ); rawBuff.append( suggestion ); last = s.endOffset; } // add part after all mispellings userStr = userQuery.substring( last ); buff.append( userStr ); rawBuff.append( userStr ); if( log().isDebugEnabled() ) { log().debug( "Did you mean: " + buff ); log().debug( "Did you mean link: " + rawBuff ); } {code} > Spell Checker as a Search Component > --- > > Key: SOLR-572 > URL: https://issues.apache.org/jira/browse/SOLR-572 > Project: Solr > Issue Type: New Feature > Components: spellchecker >Affects Versions: 1.3 >Reporter: Shalin Shekhar Mangar >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 1.3 > > Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch > > > http://wiki.apache.org/solr/SpellCheckComponent > Expose the Lucene contrib SpellChecker as a Search Component. Provide the > following features: > * Allow creating a spell index on a given field and make it possible to have > multiple spell indices -- one for each field > * Give suggestions on a per-field basis > * Given a multi-word query, give only one consistent suggestion > * Process the query with the same analyzer specified for the source field and > process each token separately > * Allow the user to specify minimum length for a token (optional) > Consistency criteria for a multi-word query can consist of the following: > * Preserve the correct words in the original query as it is > * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-423) SolrRequestHandler close notification
[ https://issues.apache.org/jira/browse/SOLR-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-423. -- Resolution: Fixed Committed revision 671960. > SolrRequestHandler close notification > - > > Key: SOLR-423 > URL: https://issues.apache.org/jira/browse/SOLR-423 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.3 >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 1.3 > > Attachments: SOLR-423-CloseHook.patch, SOLR-423.patch, SOLR-423.patch > > > It may be beneficial for implementations of SolrRequestHandler to be notified > that the SolrCore is closing so that they can release any resources that they > may have open. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-585) ResponseBuilder.getQParser() is always null b/c it never gets set
[ https://issues.apache.org/jira/browse/SOLR-585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-585. -- Resolution: Fixed > ResponseBuilder.getQParser() is always null b/c it never gets set > - > > Key: SOLR-585 > URL: https://issues.apache.org/jira/browse/SOLR-585 > Project: Solr > Issue Type: Bug >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > > The ResponseBuilder never gets it's QParser set. > I believe the fix is: > {code}Index: src/java/org/apache/solr/handler/component/QueryComponent.java > === > --- src/java/org/apache/solr/handler/component/QueryComponent.java > (revision 660920) > +++ src/java/org/apache/solr/handler/component/QueryComponent.java > (working copy) > @@ -80,7 +80,7 @@ >QParser parser = QParser.getParser(rb.getQueryString(), defType, req); >rb.setQuery( parser.getQuery() ); >rb.setSortSpec( parser.getSort(true) ); > - > + rb.setQparser(parser); >String[] fqs = > req.getParams().getParams(org.apache.solr.common.params.CommonParams.FQ); >if (fqs!=null && fqs.length!=0) { > List filters = rb.getFilters(); > {code} > but will test it first! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608519#action_12608519 ] Grant Ingersoll commented on SOLR-572: -- Can you open a new issue to track this? Looks like a string replace issue on the offsets. We probably should do the collation a bit differently to make sure the words fit right. We'll probably have to right pad or something like that. > Spell Checker as a Search Component > --- > > Key: SOLR-572 > URL: https://issues.apache.org/jira/browse/SOLR-572 > Project: Solr > Issue Type: New Feature > Components: spellchecker >Affects Versions: 1.3 >Reporter: Shalin Shekhar Mangar >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 1.3 > > Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch > > > http://wiki.apache.org/solr/SpellCheckComponent > Expose the Lucene contrib SpellChecker as a Search Component. Provide the > following features: > * Allow creating a spell index on a given field and make it possible to have > multiple spell indices -- one for each field > * Give suggestions on a per-field basis > * Given a multi-word query, give only one consistent suggestion > * Process the query with the same analyzer specified for the source field and > process each token separately > * Allow the user to specify minimum length for a token (optional) > Consistency criteria for a multi-word query can consist of the following: > * Preserve the correct words in the original query as it is > * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606719#action_12606719 ] gsingers edited comment on SOLR-572 at 6/26/08 10:39 AM: Because of the stupid way it gets initialized as a NamedListInitializerWhateverWhatever. I'm open to alternate suggestions on how to do it and take advantage of the resource loader, etc. Every time I go to do initialization stuff in Solr these days I pine for Spring, since we are basically re-inventing it, albeit not as nicely. -Grant was (Author: gsingers): Because of the stupid way it gets initialized as a NamedListInitializerWhateverWhatever. I'm open to alternate suggestions on how to do it and take advantage of the resource loader, etc. Every time I go to do initialization stuff in Solr these days I pine for Spring, since we are basically re-inventing it, albeit not as nicely. -Grant -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ > Spell Checker as a Search Component > --- > > Key: SOLR-572 > URL: https://issues.apache.org/jira/browse/SOLR-572 > Project: Solr > Issue Type: New Feature > Components: spellchecker >Affects Versions: 1.3 >Reporter: Shalin Shekhar Mangar >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 1.3 > > Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch > > > http://wiki.apache.org/solr/SpellCheckComponent > Expose the Lucene contrib SpellChecker as a Search Component. Provide the > following features: > * Allow creating a spell index on a given field and make it possible to have > multiple spell indices -- one for each field > * Give suggestions on a per-field basis > * Given a multi-word query, give only one consistent suggestion > * Process the query with the same analyzer specified for the source field and > process each token separately > * Allow the user to specify minimum length for a token (optional) > Consistency criteria for a multi-word query can consist of the following: > * Preserve the correct words in the original query as it is > * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608487#action_12608487 ] Geoffrey Young commented on SOLR-572: - I'm seeing random weirdness in the collation results. the same query shift-refreshed sometimes yields (in json) {noformat} { "responseHeader":{ "params":{ "spellcheck":"true", "q":"redbull air show", "qf":"search-en", "spellcheck.collate":"true", "qt":"dismax", "wt":"json", "rows":"0"}}, "response":{"numFound":0,"start":0,"docs":[] }, "spellcheck":{ "suggestions":[ "redbull",[ "numFound",1, "startOffset",0, "endOffset",7, "suggestion",["redbelly"]], "show",[ "numFound",1, "startOffset",12, "endOffset",16, "suggestion",["shot"]], "collation","redbelly airshotw"]}} {noformat} note the "collation" spacing and extraneous 'w'. a refresh toggles between that and what you might expect : {noformat} "collation","redbelly air shot"] {noformat} --Geoff > Spell Checker as a Search Component > --- > > Key: SOLR-572 > URL: https://issues.apache.org/jira/browse/SOLR-572 > Project: Solr > Issue Type: New Feature > Components: spellchecker >Affects Versions: 1.3 >Reporter: Shalin Shekhar Mangar >Assignee: Grant Ingersoll >Priority: Minor > Fix For: 1.3 > > Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, > SOLR-572.patch, SOLR-572.patch, SOLR-572.patch > > > http://wiki.apache.org/solr/SpellCheckComponent > Expose the Lucene contrib SpellChecker as a Search Component. Provide the > following features: > * Allow creating a spell index on a given field and make it possible to have > multiple spell indices -- one for each field > * Give suggestions on a per-field basis > * Given a multi-word query, give only one consistent suggestion > * Process the query with the same analyzer specified for the source field and > process each token separately > * Allow the user to specify minimum length for a token (optional) > Consistency criteria for a multi-word query can consist of the following: > * Preserve the correct words in the original query as it is > * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-469) Data Import RequestHandler
[ https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608447#action_12608447 ] Shalin Shekhar Mangar commented on SOLR-469: bq. Patch applies cleanly, tests pass, although I notice several @ignore in there. The @ignore are present in TestJdbcDataSource (for lack of mysql to test with) and in TestScriptTransformer (script tests can only be run with Java 6 which has a JS ScriptEngine present by default). We can rewrite the test with Derby if needed. bq. Also, I notice several interfaces that have a number of methods on them. Have you thought about abstract base classes instead? Apart from the ones Noble pointed out, there's Evaluator which users can use to extend the power of VariableResolver. The EvaluatorBag provides some generally useful implementations. Probably the context can be passed to Evaluator as well. Apart from that, I'm not sure if/how they would change in the future. An AbstractDataSource can be added -- maybe we can templatize the query as well in addition to the return type. bq. What relation does the Context have to the HttpDataSource? The Context is independent of a data source. It's just extra information which is passed along if someone needs to use. Most of the implementation do not actually use it. bq. What if I wanted to slurp from a table on the fly? If you mean passing an SQL query on the fly as a request parameter then no, it is not supported. We haven't seen a use-case for it yet -- since schema and indexing are well defined in advance and there is no harm in putting the query in the configuration. However, if someone really wants to do something like that, he/she can pass a full data-config as a request parameter (debug mode) which can be executed. The interactive mode uses this approach. An alternate approach can be to extend SqlEntityProcessor and override the getQuery method to use the Context#getRequestParameters and if sql param is present, use that as the query instead of the sql in configuration. bq. Interactive mode has a bit of a chicken and the egg problem when it comes to JDBC, right, in that the Driver needs to be present in Solr/lib right? Yes, to play interactively while using a JdbcDataSource, one would need to have the driver jar present in the class-path before hand. The interactive mode is however independent -- HttpDataSource does not have this limitation (slashdot example on the wiki) bq. In the JDBCDataSource, not sure I follow the connection stuff. Can you explain a bit? The connection is acquired once and used throught the import process. It is closed if not used for 10 seconds. The idea behind the time-out was to avoid the connection getting closed by the server due to the inactivity. Apart from that scenario, there's very less probability of a connection error happening -- and even if it did, we may not have a way to deal with it. > Data Import RequestHandler > -- > > Key: SOLR-469 > URL: https://issues.apache.org/jira/browse/SOLR-469 > Project: Solr > Issue Type: New Feature > Components: update >Affects Versions: 1.3 >Reporter: Noble Paul >Assignee: Grant Ingersoll > Fix For: 1.3 > > Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, > SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, > SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch > > > We need a RequestHandler Which can import data from a DB or other dataSources > into the Solr index .Think of it as an advanced form of SqlUpload Plugin > (SOLR-103). > The way it works is as follows. > * Provide a configuration file (xml) to the Handler which takes in the > necessary SQL queries and mappings to a solr schema > - It also takes in a properties file for the data source > configuraution > * Given the configuration it can also generate the solr schema.xml > * It is registered as a RequestHandler which can take two commands > do-full-import, do-delta-import > - do-full-import - dumps all the data from the Database into the > index (based on the SQL query in configuration) > - do-delta-import - dumps all the data that has changed since last > import. (We assume a modified-timestamp column in tables) > * It provides a admin page > - where we can schedule it to be run automatically at regular > intervals > - It shows the status of the Handler (idle, full-import, > delta-import) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-469) Data Import RequestHandler
[ https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608429#action_12608429 ] Noble Paul commented on SOLR-469: - bq.I'd suggest,that instead of relying on MySQL in TestJdbcDataSource, we instead use and embedded Derby or some sort of JDBC mock. I suggest Derby mainly b/c it's already ASF and I don't want to bother looking up licenses for HSQL or any of the others that might work. We must remove the TestJdbcDataSource if we cannot integrate derby in the dev dependencies. bq.Also, I notice several interfaces that have a number of methods on them. Have you thought about abstract base classes instead? Yes/No A lot of interfaces are never implemented by users like Context, VariableResolver They are kept as interfaces to make API's simple The interfaces people need to implement are * EntityProcessor: We expect users to extend EntityProcessorBase * Transformer : The most commonly implemented interface. I am ambivalent regarding this. I'm do not know if it will change * DataSource : This may be made abstract class bq.What relation does the Context have to the HttpDataSource? DataSource is always created for an entity. The Context is the easiest way to get info about the entity. The current DataSources do not use that info . But because we have the info readily available just pass it over. bq.What if I wanted to slurp from a table on the fly? CachedSqlEntityProcessor already does that. It slurps the table and caches the info bq.Interactive mode has a bit of a chicken and the egg problem when it comes to JDBC, right, in that the Driver needs to be present in Solr/lib right? Not sure If I got the question . Interactive dev mode does not need the drivers bq.In the JDBCDataSource, not sure I follow the connection stuff. Can you explain a bit? We create connections using Drivermanager.getConnection(). No pooling because, the same connection is used throughout the indexing. one conn is created per entity. So no pooling implemented. A PooledJdbcDataSource impl? > Data Import RequestHandler > -- > > Key: SOLR-469 > URL: https://issues.apache.org/jira/browse/SOLR-469 > Project: Solr > Issue Type: New Feature > Components: update >Affects Versions: 1.3 >Reporter: Noble Paul >Assignee: Grant Ingersoll > Fix For: 1.3 > > Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, > SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, > SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch > > > We need a RequestHandler Which can import data from a DB or other dataSources > into the Solr index .Think of it as an advanced form of SqlUpload Plugin > (SOLR-103). > The way it works is as follows. > * Provide a configuration file (xml) to the Handler which takes in the > necessary SQL queries and mappings to a solr schema > - It also takes in a properties file for the data source > configuraution > * Given the configuration it can also generate the solr schema.xml > * It is registered as a RequestHandler which can take two commands > do-full-import, do-delta-import > - do-full-import - dumps all the data from the Database into the > index (based on the SQL query in configuration) > - do-delta-import - dumps all the data that has changed since last > import. (We assume a modified-timestamp column in tables) > * It provides a admin page > - where we can schedule it to be run automatically at regular > intervals > - It shows the status of the Handler (idle, full-import, > delta-import) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: TestDistributedSearch
On Thu, Jun 26, 2008 at 7:19 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > From time to time, I get: > when running ant clean test > > I am assuming it is a timing issue. Is there a different way we could > create the servers? Hmmm, that is the first thing that is sent to the servers, so it probably is that a server hasn't come all the way up yet. Perhaps for now the simplest thing would be to sleep a couple of seconds? -Yonik
[jira] Commented: (SOLR-469) Data Import RequestHandler
[ https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608421#action_12608421 ] Grant Ingersoll commented on SOLR-469: -- Patch applies cleanly, tests pass, although I notice several @ignore in there. Docs look good in my preliminary perusing. I've only started looking at things, and have a lot of reading to catch up on, so these first comments, please take with a grain of salt, as the English saying goes... I'd suggest,that instead of relying on MySQL in TestJdbcDataSource, we instead use and embedded Derby or some sort of JDBC mock. I suggest Derby mainly b/c it's already ASF and I don't want to bother looking up licenses for HSQL or any of the others that might work. Also, I notice several interfaces that have a number of methods on them. Have you thought about abstract base classes instead? I know, there is a whole big debate over it, and people will argue that if you get the interface exactly correct, you should use interfaces. Nice in theory, but Lucene/Solr experience suggests that rarely happens. Of course, I think the correct way is to actually do both, as one can easily decorate an abstract base class with more interfaces as needed. Just food for thought, b/c what's going to quickly happen after release is someone is going to need a new method on the DataSource or something and then we are going to be stuck doing all kinds of workarounds due to back compatibility reasons. The alternative is to clearly mark all Interfaces as being experimental at this point and clearly note that we expect them to change. We may even want to consider both! The other point, though, is contrib packages need not be held to the same standard as core when it comes to back compat. What relation does the Context have to the HttpDataSource? In other words, the DataSource init method takes a Context, meaning the HttpDataSource needs a Context, yet in my first glance at the Context, it seems to be DB related. What if I wanted to slurp from a table on the fly? That is, I want to send in a select statement in my request and I let the columns line up where they may Field wise (i.e. via dynamic fields or I rely on something like select id, colA as fieldA, colB as fieldB from MyTable; ) Is that possible? Interactive mode has a bit of a chicken and the egg problem when it comes to JDBC, right, in that the Driver needs to be present in Solr/lib right? So, one can currently only interactively configure a JDBC DataSource if the driver is already in lib and loaded by the ClassLoader. If you haven't already, it might actually be useful to show what drivers are present by using the DriverManager. In the JDBCDataSource, not sure I follow the connection stuff. Can you explain a bit? Also, what if I wanted to plug in my own Connection Pooling library, as I may already have one setup for other things (if using Solr embedded)? > Data Import RequestHandler > -- > > Key: SOLR-469 > URL: https://issues.apache.org/jira/browse/SOLR-469 > Project: Solr > Issue Type: New Feature > Components: update >Affects Versions: 1.3 >Reporter: Noble Paul >Assignee: Grant Ingersoll > Fix For: 1.3 > > Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, > SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, > SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch > > > We need a RequestHandler Which can import data from a DB or other dataSources > into the Solr index .Think of it as an advanced form of SqlUpload Plugin > (SOLR-103). > The way it works is as follows. > * Provide a configuration file (xml) to the Handler which takes in the > necessary SQL queries and mappings to a solr schema > - It also takes in a properties file for the data source > configuraution > * Given the configuration it can also generate the solr schema.xml > * It is registered as a RequestHandler which can take two commands > do-full-import, do-delta-import > - do-full-import - dumps all the data from the Database into the > index (based on the SQL query in configuration) > - do-delta-import - dumps all the data that has changed since last > import. (We assume a modified-timestamp column in tables) > * It provides a admin page > - where we can schedule it to be run automatically at regular > intervals > - It shows the status of the Handler (idle, full-import, > delta-import) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
TestDistributedSearch
From time to time, I get: type = "org .apache .solr .client .solrj .SolrServerException ">org.apache.solr.client.solrj.SolrServerException: java.net.ConnectExcept ion: Connection refused at org .apache .solr .client .solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java: 359) at org .apache .solr .client .solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java: 155) at org .apache .solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:220) at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java: 114) at org.apache.solr.TestDistributedSearch.del(TestDistributedSearch.java: 166) at org .apache.solr.TestDistributedSearch.doTest(TestDistributedSearch.java: 432) at org .apache .solr .TestDistributedSearch.testDistribSearch(TestDistributedSearch.java:427) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432) at java.net.Socket.connect(Socket.java:520) at org .apache .commons .httpclient .protocol .ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140) at org .apache .commons .httpclient .protocol .DefaultProtocolSocketFactory .createSocket(DefaultProtocolSocketFactory.java:125) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java: 707) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager $HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java: 1361) at org .apache .commons .httpclient .HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org .apache .commons .httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java: 171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 323) at org .apache .solr .client .solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java: 303) when running ant clean test I am assuming it is a timing issue. Is there a different way we could create the servers? -Grant