[jira] Commented: (SOLR-561) Solr replication by Solr (for windows also)

2008-06-26 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608668#action_12608668
 ] 

Noble Paul commented on SOLR-561:
-

bq: First we have an active master, some standby masters and search slaves

This looks like a good approach. In the current design I must allow users to 
specify multiple 'materUrl' . This must take care of one or more standby 
masters.  It can automatically fallback to another master if one fails. 

bq.On active master, there is a index snapshots manager. Whenever there's an 
update, it takes a snapshot. On window, it uses copy (I should try fsutil) and 
on linux it uses hard link..The snapshot manager also clean up old snapshots. 
From time to time, I still got index corruption when commit update. When that 
happen, shapshot manager allows us to rollback to previous good snapshot.

How can I know if the index got corrupted? if I can know it the best way to 
implement that would be to add a command to ReplicationHandler to rollback to 
latest .

bq.On active master, there is a replication server component which listens at a 
specific port 
plain socket communication is more work than relying over the simple http 
protocol .The little extra efficiency you may achieve may not justify that 
(http is not too solw either). In this case the servlet container provides you 
with sockets , threads etc etc. Take a look at the patch on how efficiently is 
it done in the current patch. 


bq.client creates a tmp directory and hard link everything from its local index 
directory, then for each file in the file list, if it does not exit locally, 
get new file from server; if it is newer than local one, ask server for update 
like rsync; if local files do not exist in file list, delete them. in the case 
of compound file is used for index, the file update will update only diff 
blocks.
The current implementation is more or less like what you have done. For a 
compound file I am not sure if a diff based sync can be more efficient. Because 
it is hard to get the similar blocks in the file. I rely on checksums  of whole 
file. If there is an efficient mechanism to obtain identical blocks, share the 
code I can incorporate that
The hardlink approach may be not necessary now as I made the SolrCore not to 
hardcode the index folder. 








> Solr replication by Solr (for windows also)
> ---
>
> Key: SOLR-561
> URL: https://issues.apache.org/jira/browse/SOLR-561
> Project: Solr
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 1.3
> Environment: All
>Reporter: Noble Paul
> Attachments: deletion_policy.patch, SOLR-561.patch, SOLR-561.patch
>
>
> The current replication strategy in solr involves shell scripts . The 
> following are the drawbacks with the approach
> *  It does not work with windows
> * Replication works as a separate piece not integrated with solr.
> * Cannot control replication from solr admin/JMX
> * Each operation requires manual telnet to the host
> Doing the replication in java has the following advantages
> * Platform independence
> * Manual steps can be completely eliminated. Everything can be driven from 
> solrconfig.xml .
> ** Adding the url of the master in the slaves should be good enough to enable 
> replication. Other things like frequency of
> snapshoot/snappull can also be configured . All other information can be 
> automatically obtained.
> * Start/stop can be triggered from solr/admin or JMX
> * Can get the status/progress while replication is going on. It can also 
> abort an ongoing replication
> * No need to have a login into the machine 
> This issue can track the implementation of solr replication in java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-609) SpellCheckComponent doesn't read default options from solrconfig.xml

2008-06-26 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-609:
---

Attachment: SOLR-609.patch

Constructs a defaults SolrParams in the init method which is used for getting 
default value specified in solrconfig.xml for onlyMorePopular, count, collate 
and extendedResults parameters.

> SpellCheckComponent doesn't read default options from solrconfig.xml
> 
>
> Key: SOLR-609
> URL: https://issues.apache.org/jira/browse/SOLR-609
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 1.3
> Environment: confirmed on FreeBSD7-stable , nightly 1.3 build 
> (2008-06-25) , jdk1.6.
> I am using the spellchecker called as last-components from my dismax handler.
>Reporter: Norberto Meijome
>Priority: Minor
> Attachments: SOLR-609.patch
>
>
> solrconfig.xml contains :
> [...]
>class="org.apache.solr.handler.component.SpellCheckComponent">
>   
>   
>   false
>   
>   true
>   
>   1
>   
>   true
>   
> [... all default options after this]
> confirmed options .count , collate , extendedResults set in solrconfig.xml 
> take no effect on the query . They work as intended if added to the URL.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-607) Commit only request handler for read only slaves

2008-06-26 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-607.
---

Resolution: Duplicate

This issue is very different from SOLR-527 ... it's so incredible different 
that it's actually exactly the same.

> Commit only request handler for read only slaves
> 
>
> Key: SOLR-607
> URL: https://issues.apache.org/jira/browse/SOLR-607
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Hoss Man
>
> Replication currently requires that the snapinstaller script be able to use 
> curl to hit a URL (/update) to stream a {{{}} command to.
> To help make it easier to "secure" read only Solr slave instances, we should 
> add a "CommitOnlyRequestHandler" which would ignore all content streams and 
> could be used on slaves in place of XmlUpdateRequestHandler just for 
> triggering a commit to open a new Searcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-561) Solr replication by Solr (for windows also)

2008-06-26 Thread Yajun Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608660#action_12608660
 ] 

Yajun Liu commented on SOLR-561:


I'm using Solr to build a search service for my company. From operation or 
maybe performance point view, we need to use java to replicate index.

>From very high level, my design is similar to what Noble mentioned here. It is 
>like this:

1) First we have an active master, some standby masters and search slaves. The 
active master handles crawling data and update index; standby masters are 
redundant to active master. If active master goes away, one of the standby will 
become active. Standby masters replicate index from active master to act as 
backup; search slaves only replicate index from active master.

2) On active master, there is a index snapshots manager. Whenever there's an 
update, it takes a snapshot. On window, it uses copy (I should try fsutil) and 
on linux it uses hard link..The snapshot manager also clean up old snapshots. 
From time to time, I still got index corruption when commit update. When that 
happen, shapshot manager allows us to rollback to previous good snapshot.

3) On active master, there is a replication server component which listens at a 
specific port (The reason I did not use http port is I do not use solr as it 
is. I embed solr in our application server, so go through http would be not 
very efficient for us). Each standby and slave has replication client 
component. The following is the protocol between the replication client and 
server:
  a) client ping the a directory server for the location of active master
  b) connect to the active master at the specific port
  c) handshake: right now just check for version and authentication. in the 
future, it will negotiate security, compression, etc.
  d) client sends SNAPSHOT_OPEN command followed by index name. The master 
could manage multiple indexes. Server sends index_not_found if index does not 
exist or ok followed by snapshot name of the latest snapshot;
  e) if the index is found, client compares the timestamp with that of local 
snapshot. The timestamp of snapshot is derived from snapshot name because part 
of snapshot name is encoded timestamps. If local is newer, tell the server to 
close the snapshot; otherwise, ask server for a list of files in the snapshot. 
If ok, server sends ok op, followed by a file list including filename, 
timestamp, etc.
  f) client creates a tmp directory and hard link everything from its local 
index directory, then for each file in the file list, if it does not exit 
locally, get new file from server; if it is newer than local one, ask server 
for update like rsync; if local files do not exist in file list, delete them. 
in the case of compound file is used for index, the file update will update 
only diff blocks.
  g) if everything goes well, tell server to close the snapshot, rename the tmp 
directory to a proper place, create solr-core using this new index, warmup any 
cache if necessary, route new request to this solr-core, close old solr-core, 
remove old index directory.

Right now a client replicates index from active master every 3 mins. for a slow 
change datasource. It works fine because create new solr-core and warmup cache 
take less than 3 mins. We plan to use it for a fast changing datasource, so 
create new solr-core and dump all the cache is not feasible. Any suggestion? 

> Solr replication by Solr (for windows also)
> ---
>
> Key: SOLR-561
> URL: https://issues.apache.org/jira/browse/SOLR-561
> Project: Solr
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 1.3
> Environment: All
>Reporter: Noble Paul
> Attachments: deletion_policy.patch, SOLR-561.patch, SOLR-561.patch
>
>
> The current replication strategy in solr involves shell scripts . The 
> following are the drawbacks with the approach
> *  It does not work with windows
> * Replication works as a separate piece not integrated with solr.
> * Cannot control replication from solr admin/JMX
> * Each operation requires manual telnet to the host
> Doing the replication in java has the following advantages
> * Platform independence
> * Manual steps can be completely eliminated. Everything can be driven from 
> solrconfig.xml .
> ** Adding the url of the master in the slaves should be good enough to enable 
> replication. Other things like frequency of
> snapshoot/snappull can also be configured . All other information can be 
> automatically obtained.
> * Start/stop can be triggered from solr/admin or JMX
> * Can get the status/progress while replication is going on. It can also 
> abort an ongoing replication
> * No need to have a login into the machine 
> This issue can track the implementation of solr replication in java

-- 

[jira] Commented: (SOLR-607) Commit only request handler for read only slaves

2008-06-26 Thread Sean Timm (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608647#action_12608647
 ] 

Sean Timm commented on SOLR-607:


How is this different from SOLR-527?

> Commit only request handler for read only slaves
> 
>
> Key: SOLR-607
> URL: https://issues.apache.org/jira/browse/SOLR-607
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Hoss Man
>
> Replication currently requires that the snapinstaller script be able to use 
> curl to hit a URL (/update) to stream a {{{}} command to.
> To help make it easier to "secure" read only Solr slave instances, we should 
> add a "CommitOnlyRequestHandler" which would ignore all content streams and 
> could be used on slaves in place of XmlUpdateRequestHandler just for 
> triggering a commit to open a new Searcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-607) Commit only request handler for read only slaves

2008-06-26 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608646#action_12608646
 ] 

Ryan McKinley commented on SOLR-607:


Perhaps this is not the place to discuss this, but...

The word "commit" takes a long time to get used to the idea that that means 
something like "open the index with everything in it"  -- I suppose once you 
are used to it, we forget how strange it is to "commit" to resync the index.

CommitOnlyRequestHandler does not _sound_ "secure" to newbies -- but I'm not 
sure what a better name would be.

> Commit only request handler for read only slaves
> 
>
> Key: SOLR-607
> URL: https://issues.apache.org/jira/browse/SOLR-607
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Hoss Man
>
> Replication currently requires that the snapinstaller script be able to use 
> curl to hit a URL (/update) to stream a {{{}} command to.
> To help make it easier to "secure" read only Solr slave instances, we should 
> add a "CommitOnlyRequestHandler" which would ignore all content streams and 
> could be used on slaves in place of XmlUpdateRequestHandler just for 
> triggering a commit to open a new Searcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-609) SpellCheckComponent doesn't read default options from solrconfig.xml

2008-06-26 Thread Norberto Meijome (JIRA)
SpellCheckComponent doesn't read default options from solrconfig.xml


 Key: SOLR-609
 URL: https://issues.apache.org/jira/browse/SOLR-609
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.3
 Environment: confirmed on FreeBSD7-stable , nightly 1.3 build 
(2008-06-25) , jdk1.6.

I am using the spellchecker called as last-components from my dismax handler.
Reporter: Norberto Meijome
Priority: Minor


solrconfig.xml contains :

[...]



false

true

1

true

[... all default options after this]

confirmed options .count , collate , extendedResults set in solrconfig.xml take 
no effect on the query . They work as intended if added to the URL.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-607) Commit only request handler for read only slaves

2008-06-26 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-607:
--

Component/s: update
Summary: Commit only request handler for read only slaves  (was: Commit 
online request handler for read only slaves)

fixing summary

> Commit only request handler for read only slaves
> 
>
> Key: SOLR-607
> URL: https://issues.apache.org/jira/browse/SOLR-607
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Hoss Man
>
> Replication currently requires that the snapinstaller script be able to use 
> curl to hit a URL (/update) to stream a {{{}} command to.
> To help make it easier to "secure" read only Solr slave instances, we should 
> add a "CommitOnlyRequestHandler" which would ignore all content streams and 
> could be used on slaves in place of XmlUpdateRequestHandler just for 
> triggering a commit to open a new Searcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-608) scripts using curl should support authentication params

2008-06-26 Thread Hoss Man (JIRA)
scripts using curl should support authentication params
---

 Key: SOLR-608
 URL: https://issues.apache.org/jira/browse/SOLR-608
 Project: Solr
  Issue Type: Improvement
  Components: replication
Reporter: Hoss Man


All scripts that utilize "curl" should be enhanced such that user 
authentication based params can be specified and used by curl.  This would make 
it possible for people to "secure" their Solr servers using Servlet Container 
authentication features, but still interact with those Solr instances using the 
scripts out of the box.

The most straight forward approach would probably be to add a new "curl_args" 
option in scripts.conf that could could contain any legal curl command line 
options and would be prepended to the args for all usages of curl in the Solr 
scripts.
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-607) Commit online request handler for read only slaves

2008-06-26 Thread Hoss Man (JIRA)
Commit online request handler for read only slaves
--

 Key: SOLR-607
 URL: https://issues.apache.org/jira/browse/SOLR-607
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man


Replication currently requires that the snapinstaller script be able to use 
curl to hit a URL (/update) to stream a {{{}} command to.

To help make it easier to "secure" read only Solr slave instances, we should 
add a "CommitOnlyRequestHandler" which would ignore all content streams and 
could be used on slaves in place of XmlUpdateRequestHandler just for triggering 
a commit to open a new Searcher.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly

2008-06-26 Thread Geoffrey Young (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608614#action_12608614
 ] 

Geoffrey Young commented on SOLR-606:
-

I'm not in charge of any of the environments, so it might take me some time to 
apply the patch.  hopefully I'll be able to report back tomorrow.

if it matters, my spelling field is defined as so:


  


  
  


  


my spellcheck component configuration was straight from the docs, save changing 
the queryAnalyzerFieldType to match the above.

> spellcheck.colate doesn't handle multiple tokens properly
> -
>
> Key: SOLR-606
> URL: https://issues.apache.org/jira/browse/SOLR-606
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 1.3
> Environment: tomcat
>Reporter: Geoffrey Young
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-606.patch
>
>
> originally posted as part of SOLR-572:
>   
> https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487
> the new spellcheck.collate feature seems to exhibit some strange behaviors 
> when handed a query with multiple tokens.
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show"}},
>   "spellcheck":{
>"suggestions":[
>   "redbull",[
>"suggestion",["redbelly"]],
>   "show",[
>"suggestion",["shot"]],
>   "collation","redbelly airshotw"]}}
> {noformat}
> in this case, note the fields are incorrectly concatenated (no space between 
> tokens, left over 'w' from input string)
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show",
>   "spellcheck.q":"redbull air show"}},
>  "spellcheck":{
>   "suggestions":[
>   "redbull air show",[
>"suggestion",["redbull singers"]],
>   "collation","redbull singersredbull air show"]}}
> {noformat}
> this is slightly different - the suggestions are still concatenated without a 
> space, but the collation is way off.
> --Geoff

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly

2008-06-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608605#action_12608605
 ] 

Grant Ingersoll commented on SOLR-606:
--

Also, can you post your spell check configuration?

> spellcheck.colate doesn't handle multiple tokens properly
> -
>
> Key: SOLR-606
> URL: https://issues.apache.org/jira/browse/SOLR-606
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 1.3
> Environment: tomcat
>Reporter: Geoffrey Young
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-606.patch
>
>
> originally posted as part of SOLR-572:
>   
> https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487
> the new spellcheck.collate feature seems to exhibit some strange behaviors 
> when handed a query with multiple tokens.
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show"}},
>   "spellcheck":{
>"suggestions":[
>   "redbull",[
>"suggestion",["redbelly"]],
>   "show",[
>"suggestion",["shot"]],
>   "collation","redbelly airshotw"]}}
> {noformat}
> in this case, note the fields are incorrectly concatenated (no space between 
> tokens, left over 'w' from input string)
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show",
>   "spellcheck.q":"redbull air show"}},
>  "spellcheck":{
>   "suggestions":[
>   "redbull air show",[
>"suggestion",["redbull singers"]],
>   "collation","redbull singersredbull air show"]}}
> {noformat}
> this is slightly different - the suggestions are still concatenated without a 
> space, but the collation is way off.
> --Geoff

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly

2008-06-26 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-606:
-

Priority: Minor  (was: Major)

> spellcheck.colate doesn't handle multiple tokens properly
> -
>
> Key: SOLR-606
> URL: https://issues.apache.org/jira/browse/SOLR-606
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 1.3
> Environment: tomcat
>Reporter: Geoffrey Young
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-606.patch
>
>
> originally posted as part of SOLR-572:
>   
> https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487
> the new spellcheck.collate feature seems to exhibit some strange behaviors 
> when handed a query with multiple tokens.
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show"}},
>   "spellcheck":{
>"suggestions":[
>   "redbull",[
>"suggestion",["redbelly"]],
>   "show",[
>"suggestion",["shot"]],
>   "collation","redbelly airshotw"]}}
> {noformat}
> in this case, note the fields are incorrectly concatenated (no space between 
> tokens, left over 'w' from input string)
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show",
>   "spellcheck.q":"redbull air show"}},
>  "spellcheck":{
>   "suggestions":[
>   "redbull air show",[
>"suggestion",["redbull singers"]],
>   "collation","redbull singersredbull air show"]}}
> {noformat}
> this is slightly different - the suggestions are still concatenated without a 
> space, but the collation is way off.
> --Geoff

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly

2008-06-26 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-606:
-

Attachment: SOLR-606.patch

Can you try this patch and post the results?  It doesn't fix the problem, but 
I'm having a hard time reproducing it and it adds some more output to the 
spellcheck.extendedResults=true option.

Thus, you will need to add extendedResults to your flags.

> spellcheck.colate doesn't handle multiple tokens properly
> -
>
> Key: SOLR-606
> URL: https://issues.apache.org/jira/browse/SOLR-606
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 1.3
> Environment: tomcat
>Reporter: Geoffrey Young
>Assignee: Grant Ingersoll
> Attachments: SOLR-606.patch
>
>
> originally posted as part of SOLR-572:
>   
> https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487
> the new spellcheck.collate feature seems to exhibit some strange behaviors 
> when handed a query with multiple tokens.
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show"}},
>   "spellcheck":{
>"suggestions":[
>   "redbull",[
>"suggestion",["redbelly"]],
>   "show",[
>"suggestion",["shot"]],
>   "collation","redbelly airshotw"]}}
> {noformat}
> in this case, note the fields are incorrectly concatenated (no space between 
> tokens, left over 'w' from input string)
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show",
>   "spellcheck.q":"redbull air show"}},
>  "spellcheck":{
>   "suggestions":[
>   "redbull air show",[
>"suggestion",["redbull singers"]],
>   "collation","redbull singersredbull air show"]}}
> {noformat}
> this is slightly different - the suggestions are still concatenated without a 
> space, but the collation is way off.
> --Geoff

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly

2008-06-26 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on SOLR-606 started by Grant Ingersoll.

> spellcheck.colate doesn't handle multiple tokens properly
> -
>
> Key: SOLR-606
> URL: https://issues.apache.org/jira/browse/SOLR-606
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 1.3
> Environment: tomcat
>Reporter: Geoffrey Young
>Assignee: Grant Ingersoll
>
> originally posted as part of SOLR-572:
>   
> https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487
> the new spellcheck.collate feature seems to exhibit some strange behaviors 
> when handed a query with multiple tokens.
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show"}},
>   "spellcheck":{
>"suggestions":[
>   "redbull",[
>"suggestion",["redbelly"]],
>   "show",[
>"suggestion",["shot"]],
>   "collation","redbelly airshotw"]}}
> {noformat}
> in this case, note the fields are incorrectly concatenated (no space between 
> tokens, left over 'w' from input string)
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show",
>   "spellcheck.q":"redbull air show"}},
>  "spellcheck":{
>   "suggestions":[
>   "redbull air show",[
>"suggestion",["redbull singers"]],
>   "collation","redbull singersredbull air show"]}}
> {noformat}
> this is slightly different - the suggestions are still concatenated without a 
> space, but the collation is way off.
> --Geoff

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: How to contribute to Solr

2008-06-26 Thread Yonik Seeley
On Thu, Jun 26, 2008 at 5:51 PM, Yajun Liu <[EMAIL PROTECTED]> wrote:
> I'm using solr 1.2 to build a search service at my company. I have
> made some improvements and bug fixing. The largest changes is I wrote
> a java package to allow you to replicate index. The package has a java
> implementation of rsync with some optimization for the index
> replication.

Hi Yajun,
For contributing to Solr, please see
http://wiki.apache.org/solr/HowToContribute

Also note that there has been work done on having Solr do index
replication itself:
https://issues.apache.org/jira/browse/SOLR-561
You probably want to look at that and perhaps start a discussion of
similarities or differences with your approach.

-Yonik


How to contribute to Solr

2008-06-26 Thread Yajun Liu
Hi,

I'm using solr 1.2 to build a search service at my company. I have
made some improvements and bug fixing. The largest changes is I wrote
a java package to allow you to replicate index. The package has a java
implementation of rsync with some optimization for the index
replication.

Please let me know whether I could check it into your source tree.

Thanks.

--Yajun


[jira] Resolved: (SOLR-603) Support Partial Optimizes

2008-06-26 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-603.
--

Resolution: Fixed

Committed revision 672031.

> Support Partial Optimizes
> -
>
> Key: SOLR-603
> URL: https://issues.apache.org/jira/browse/SOLR-603
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-603.patch, SOLR-603.patch
>
>
> It would be useful if Solr supported Lucene's capability to do partial 
> optimizes.  The associated method on the IndexWriter is 
> [http://lucene.apache.org/java/2_3_2/api/core/org/apache/lucene/index/IndexWriter.html#optimize(int,%20boolean)]
>  and the variations there-in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly

2008-06-26 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-606:


Assignee: Grant Ingersoll

> spellcheck.colate doesn't handle multiple tokens properly
> -
>
> Key: SOLR-606
> URL: https://issues.apache.org/jira/browse/SOLR-606
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 1.3
> Environment: tomcat
>Reporter: Geoffrey Young
>Assignee: Grant Ingersoll
>
> originally posted as part of SOLR-572:
>   
> https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487
> the new spellcheck.collate feature seems to exhibit some strange behaviors 
> when handed a query with multiple tokens.
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show"}},
>   "spellcheck":{
>"suggestions":[
>   "redbull",[
>"suggestion",["redbelly"]],
>   "show",[
>"suggestion",["shot"]],
>   "collation","redbelly airshotw"]}}
> {noformat}
> in this case, note the fields are incorrectly concatenated (no space between 
> tokens, left over 'w' from input string)
> {noformat}
> {
>  "responseHeader":{
>   "params":{
>   "q":"redbull air show",
>   "spellcheck.q":"redbull air show"}},
>  "spellcheck":{
>   "suggestions":[
>   "redbull air show",[
>"suggestion",["redbull singers"]],
>   "collation","redbull singersredbull air show"]}}
> {noformat}
> this is slightly different - the suggestions are still concatenated without a 
> space, but the collation is way off.
> --Geoff

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-572) Spell Checker as a Search Component

2008-06-26 Thread Geoffrey Young (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608487#action_12608487
 ] 

geoff edited comment on SOLR-572 at 6/26/08 12:26 PM:
---

I'm seeing random weirdness in the collation results.  the same query 
shift-refreshed sometimes yields (in json)


{noformat}
{
 "responseHeader":{
"params":{
"spellcheck":"true",
"q":"redbull air show",
"qf":"search-en",
"spellcheck.collate":"true",
"qt":"dismax",
"wt":"json",
"rows":"0"}},
 "response":{"numFound":0,"start":0,"docs":[]
 },
 "spellcheck":{
  "suggestions":[
"redbull",[
 "numFound",1,
 "startOffset",0,
 "endOffset",7,
 "suggestion",["redbelly"]],
"show",[
 "numFound",1,
 "startOffset",12,
 "endOffset",16,
 "suggestion",["shot"]],
"collation","redbelly airshotw"]}}
{noformat}

note the "collation" spacing and extraneous 'w'.  a refresh toggles between 
that and what you might expect :

{noformat}
"collation","redbelly air shot"]
{noformat}

UPDATE: opened new issue as SOLR-606

--Geoff

  was (Author: geoff):
I'm seeing random weirdness in the collation results.  the same query 
shift-refreshed sometimes yields (in json)


{noformat}
{
 "responseHeader":{
"params":{
"spellcheck":"true",
"q":"redbull air show",
"qf":"search-en",
"spellcheck.collate":"true",
"qt":"dismax",
"wt":"json",
"rows":"0"}},
 "response":{"numFound":0,"start":0,"docs":[]
 },
 "spellcheck":{
  "suggestions":[
"redbull",[
 "numFound",1,
 "startOffset",0,
 "endOffset",7,
 "suggestion",["redbelly"]],
"show",[
 "numFound",1,
 "startOffset",12,
 "endOffset",16,
 "suggestion",["shot"]],
"collation","redbelly airshotw"]}}
{noformat}

note the "collation" spacing and extraneous 'w'.  a refresh toggles between 
that and what you might expect :

{noformat}
"collation","redbelly air shot"]
{noformat}

--Geoff
  
> Spell Checker as a Search Component
> ---
>
> Key: SOLR-572
> URL: https://issues.apache.org/jira/browse/SOLR-572
> Project: Solr
>  Issue Type: New Feature
>  Components: spellchecker
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>
>
> http://wiki.apache.org/solr/SpellCheckComponent
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-606) spellcheck.colate doesn't handle multiple tokens properly

2008-06-26 Thread Geoffrey Young (JIRA)
spellcheck.colate doesn't handle multiple tokens properly
-

 Key: SOLR-606
 URL: https://issues.apache.org/jira/browse/SOLR-606
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 1.3
 Environment: tomcat
Reporter: Geoffrey Young


originally posted as part of SOLR-572:

  
https://issues.apache.org/jira/browse/SOLR-572?focusedCommentId=12608487#action_12608487

the new spellcheck.collate feature seems to exhibit some strange behaviors when 
handed a query with multiple tokens.

{noformat}
{
 "responseHeader":{
  "params":{
"q":"redbull air show"}},
  "spellcheck":{
   "suggestions":[
"redbull",[
 "suggestion",["redbelly"]],
"show",[
 "suggestion",["shot"]],
"collation","redbelly airshotw"]}}
{noformat}

in this case, note the fields are incorrectly concatenated (no space between 
tokens, left over 'w' from input string)

{noformat}
{
 "responseHeader":{
  "params":{
"q":"redbull air show",
"spellcheck.q":"redbull air show"}},
 "spellcheck":{
  "suggestions":[
"redbull air show",[
 "suggestion",["redbull singers"]],
"collation","redbull singersredbull air show"]}}
{noformat}

this is slightly different - the suggestions are still concatenated without a 
space, but the collation is way off.

--Geoff

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-605) Programatically register SolrEventListeners

2008-06-26 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-605:
---

Attachment: SOLR-605-RegisterEventListeners.patch

this adds 
{code:java}
UpdateHandler.java:
  void registerCommitCallback( SolrEventListener listener )
  void registerOptimizeCallback( SolrEventListener listener )

SolrCore.java:
  void registerFirstSearcherListener( SolrEventListener listener )
  void registerNewSearcherListener( SolrEventListener listener )

{code}

> Programatically register SolrEventListeners
> ---
>
> Key: SOLR-605
> URL: https://issues.apache.org/jira/browse/SOLR-605
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
>Priority: Trivial
> Fix For: 1.3
>
> Attachments: SOLR-605-RegisterEventListeners.patch
>
>
> Currently all eventListeners need to be registered via solrconfig.xml -- it 
> would be nice to programatically register classes for these events too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-605) Programatically register SolrEventListeners

2008-06-26 Thread Ryan McKinley (JIRA)
Programatically register SolrEventListeners
---

 Key: SOLR-605
 URL: https://issues.apache.org/jira/browse/SOLR-605
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
Assignee: Ryan McKinley
Priority: Trivial
 Fix For: 1.3


Currently all eventListeners need to be registered via solrconfig.xml -- it 
would be nice to programatically register classes for these events too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-26 Thread Sean Timm (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608527#action_12608527
 ] 

Sean Timm commented on SOLR-572:


For what it is worth, here is the code that I used client side before the 
collation feature was available.  I haven't looked at how it is done in this 
patch.  It has some nice features such as delimiting the spelling correction, 
e.g., with HTML bold tags, and preserving the users initial case on each word.

{code}
StringBuilder buff = new StringBuilder();
StringBuilder rawBuff = new StringBuilder();
int last = 0;
String userStr = null;
// for each suggestion
for( Suggestion s : suggestions ) {
// add part before the mispelling
userStr = userQuery.substring( last, s.startOffset );
buff.append( userStr );
rawBuff.append( userStr );
String suggestion = s.suggestion;
if( _spellCheckPreserveUserCase ) {
userStr = userQuery.substring( s.startOffset, s.endOffset );
char[] userCh = userStr.toCharArray();
boolean initialUpper = Character.isUpperCase( userCh[0] );
boolean allUpper = true;
for( char c : userCh ) {
if( Character.isLowerCase( c ) ) {
allUpper = false;
break;
}
}
if( allUpper ) {
suggestion = suggestion.toUpperCase();
}
else if( initialUpper ) {
userCh = suggestion.toCharArray();
userCh[0] = Character.toUpperCase( userCh[0] );
suggestion = new String( userCh );
}
}
buff.append( _spellCheckStartHighlight ).append( suggestion )
.append( _spellCheckEndHighlight );
rawBuff.append( suggestion );
last = s.endOffset;
}
// add part after all mispellings
userStr = userQuery.substring( last );
buff.append( userStr );
rawBuff.append( userStr );
if( log().isDebugEnabled() ) {
log().debug( "Did you mean: " + buff );
log().debug( "Did you mean link: " + rawBuff );
}
{code}

> Spell Checker as a Search Component
> ---
>
> Key: SOLR-572
> URL: https://issues.apache.org/jira/browse/SOLR-572
> Project: Solr
>  Issue Type: New Feature
>  Components: spellchecker
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>
>
> http://wiki.apache.org/solr/SpellCheckComponent
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-423) SolrRequestHandler close notification

2008-06-26 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-423.
--

Resolution: Fixed

Committed revision 671960.

> SolrRequestHandler close notification
> -
>
> Key: SOLR-423
> URL: https://issues.apache.org/jira/browse/SOLR-423
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-423-CloseHook.patch, SOLR-423.patch, SOLR-423.patch
>
>
> It may be beneficial for implementations of SolrRequestHandler to be notified 
> that the SolrCore is closing so that they can release any resources that they 
> may have open.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-585) ResponseBuilder.getQParser() is always null b/c it never gets set

2008-06-26 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-585.
--

Resolution: Fixed

> ResponseBuilder.getQParser() is always null b/c it never gets set
> -
>
> Key: SOLR-585
> URL: https://issues.apache.org/jira/browse/SOLR-585
> Project: Solr
>  Issue Type: Bug
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
>
> The ResponseBuilder never gets it's QParser set.
> I believe the fix is:
> {code}Index: src/java/org/apache/solr/handler/component/QueryComponent.java
> ===
> --- src/java/org/apache/solr/handler/component/QueryComponent.java  
> (revision 660920)
> +++ src/java/org/apache/solr/handler/component/QueryComponent.java  
> (working copy)
> @@ -80,7 +80,7 @@
>QParser parser = QParser.getParser(rb.getQueryString(), defType, req);
>rb.setQuery( parser.getQuery() );
>rb.setSortSpec( parser.getSort(true) );
> -
> +  rb.setQparser(parser);
>String[] fqs = 
> req.getParams().getParams(org.apache.solr.common.params.CommonParams.FQ);
>if (fqs!=null && fqs.length!=0) {
>  List filters = rb.getFilters();
> {code}
> but will test it first!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608519#action_12608519
 ] 

Grant Ingersoll commented on SOLR-572:
--

Can you open a new issue to track this?  Looks like a string replace issue on 
the offsets.  We probably should do the collation a bit differently to make 
sure the words fit right.  We'll probably have to right pad or something like 
that.

> Spell Checker as a Search Component
> ---
>
> Key: SOLR-572
> URL: https://issues.apache.org/jira/browse/SOLR-572
> Project: Solr
>  Issue Type: New Feature
>  Components: spellchecker
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>
>
> http://wiki.apache.org/solr/SpellCheckComponent
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-572) Spell Checker as a Search Component

2008-06-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606719#action_12606719
 ] 

gsingers edited comment on SOLR-572 at 6/26/08 10:39 AM:


Because of the stupid way it gets initialized as a  
NamedListInitializerWhateverWhatever.  I'm open to alternate  
suggestions on how to do it and take advantage of the resource loader,  
etc.

Every time I go to do initialization stuff in Solr these days I pine  
for Spring, since we are basically re-inventing it, albeit not as  
nicely.

-Grant




  was (Author: gsingers):
Because of the stupid way it gets initialized as a  
NamedListInitializerWhateverWhatever.  I'm open to alternate  
suggestions on how to do it and take advantage of the resource loader,  
etc.

Every time I go to do initialization stuff in Solr these days I pine  
for Spring, since we are basically re-inventing it, albeit not as  
nicely.

-Grant



--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








  
> Spell Checker as a Search Component
> ---
>
> Key: SOLR-572
> URL: https://issues.apache.org/jira/browse/SOLR-572
> Project: Solr
>  Issue Type: New Feature
>  Components: spellchecker
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>
>
> http://wiki.apache.org/solr/SpellCheckComponent
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-572) Spell Checker as a Search Component

2008-06-26 Thread Geoffrey Young (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608487#action_12608487
 ] 

Geoffrey Young commented on SOLR-572:
-

I'm seeing random weirdness in the collation results.  the same query 
shift-refreshed sometimes yields (in json)


{noformat}
{
 "responseHeader":{
"params":{
"spellcheck":"true",
"q":"redbull air show",
"qf":"search-en",
"spellcheck.collate":"true",
"qt":"dismax",
"wt":"json",
"rows":"0"}},
 "response":{"numFound":0,"start":0,"docs":[]
 },
 "spellcheck":{
  "suggestions":[
"redbull",[
 "numFound",1,
 "startOffset",0,
 "endOffset",7,
 "suggestion",["redbelly"]],
"show",[
 "numFound",1,
 "startOffset",12,
 "endOffset",16,
 "suggestion",["shot"]],
"collation","redbelly airshotw"]}}
{noformat}

note the "collation" spacing and extraneous 'w'.  a refresh toggles between 
that and what you might expect :

{noformat}
"collation","redbelly air shot"]
{noformat}

--Geoff

> Spell Checker as a Search Component
> ---
>
> Key: SOLR-572
> URL: https://issues.apache.org/jira/browse/SOLR-572
> Project: Solr
>  Issue Type: New Feature
>  Components: spellchecker
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, 
> SOLR-572.patch, SOLR-572.patch, SOLR-572.patch
>
>
> http://wiki.apache.org/solr/SpellCheckComponent
> Expose the Lucene contrib SpellChecker as a Search Component. Provide the 
> following features:
> * Allow creating a spell index on a given field and make it possible to have 
> multiple spell indices -- one for each field
> * Give suggestions on a per-field basis
> * Given a multi-word query, give only one consistent suggestion
> * Process the query with the same analyzer specified for the source field and 
> process each token separately
> * Allow the user to specify minimum length for a token (optional)
> Consistency criteria for a multi-word query can consist of the following:
> * Preserve the correct words in the original query as it is
> * Never give duplicate words in a suggestion

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-469) Data Import RequestHandler

2008-06-26 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608447#action_12608447
 ] 

Shalin Shekhar Mangar commented on SOLR-469:


bq. Patch applies cleanly, tests pass, although I notice several @ignore in 
there.
The @ignore are present in TestJdbcDataSource (for lack of mysql to test with) 
and in TestScriptTransformer (script tests can only be run with Java 6 which 
has a JS ScriptEngine present by default). We can rewrite the test with Derby 
if needed.

bq. Also, I notice several interfaces that have a number of methods on them. 
Have you thought about abstract base classes instead?
Apart from the ones Noble pointed out, there's Evaluator which users can use to 
extend the power of VariableResolver. The EvaluatorBag provides some generally 
useful implementations. Probably the context can be passed to Evaluator as 
well. Apart from that, I'm not sure if/how they would change in the future. An 
AbstractDataSource can be added -- maybe we can templatize the query as well in 
addition to the return type.

bq. What relation does the Context have to the HttpDataSource? 
The Context is independent of a data source. It's just extra information which 
is passed along if someone needs to use. Most of the implementation do not 
actually use it.

bq. What if I wanted to slurp from a table on the fly?
If you mean passing an SQL query on the fly as a request parameter then no, it 
is not supported. We haven't seen a use-case for it yet -- since schema and 
indexing are well defined in advance and there is no harm in putting the query 
in the configuration. However, if someone really wants to do something like 
that, he/she can pass a full data-config as a request parameter (debug mode) 
which can be executed. The interactive mode uses this approach. An alternate 
approach can be to extend SqlEntityProcessor and override the getQuery method 
to use the Context#getRequestParameters and if sql param is present, use that 
as the query instead of the sql in configuration.

bq. Interactive mode has a bit of a chicken and the egg problem when it comes 
to JDBC, right, in that the Driver needs to be present in Solr/lib right?
Yes, to play interactively while using a JdbcDataSource, one would need to have 
the driver jar present in the class-path before hand. The interactive mode is 
however independent -- HttpDataSource does not have this limitation (slashdot 
example on the wiki)

bq. In the JDBCDataSource, not sure I follow the connection stuff. Can you 
explain a bit? 
The connection is acquired once and used throught the import process. It is 
closed if not used for 10 seconds. The idea behind the time-out was to avoid 
the connection getting closed by the server due to the inactivity. Apart from 
that scenario, there's very less probability of a connection error happening -- 
and even if it did, we may not have a way to deal with it.



> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Grant Ingersoll
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-469) Data Import RequestHandler

2008-06-26 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608429#action_12608429
 ] 

Noble Paul commented on SOLR-469:
-

bq.I'd suggest,that instead of relying on MySQL in TestJdbcDataSource, we 
instead use and embedded Derby or some sort of JDBC mock. I suggest Derby 
mainly b/c it's already ASF and I don't want to bother looking up licenses for 
HSQL or any of the others that might work.

We must remove the TestJdbcDataSource if we cannot integrate derby in the dev 
dependencies. 
bq.Also, I notice several interfaces that have a number of methods on them. 
Have you thought about abstract base classes instead?

Yes/No A lot of interfaces are never implemented by users like Context, 
VariableResolver They are kept as interfaces to make API's simple
The interfaces people need to implement are 
* EntityProcessor: We  expect users to extend EntityProcessorBase 
* Transformer : The most commonly implemented interface. I am ambivalent 
regarding this. I'm do  not know if it will change
* DataSource : This may be made abstract class

bq.What relation does the Context have to the HttpDataSource? 

DataSource is always created for an entity. The Context is the easiest  way to 
get info about the entity. The current DataSources do not use that info . But 
because we have the info readily available just pass it over.

bq.What if I wanted to slurp from a table on the fly?

CachedSqlEntityProcessor already does that. It slurps the table and caches the 
info

bq.Interactive mode has a bit of a chicken and the egg problem when it comes to 
JDBC, right, in that the Driver needs to be present in Solr/lib right?

Not sure If I got the question . Interactive dev mode does not need the drivers

bq.In the JDBCDataSource, not sure I follow the connection stuff. Can you 
explain a bit? 
We create connections using Drivermanager.getConnection(). No pooling because, 
the same connection is used throughout the indexing. one conn is created per 
entity. So no pooling implemented.

A  PooledJdbcDataSource impl?




> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Grant Ingersoll
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: TestDistributedSearch

2008-06-26 Thread Yonik Seeley
On Thu, Jun 26, 2008 at 7:19 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> From time to time, I get:
>   when running ant clean test
>
> I am assuming it is a timing issue.  Is there a different way we could
> create the servers?

Hmmm, that is the first thing that is sent to the servers, so it
probably is that a server hasn't come all the way up yet.  Perhaps for
now the simplest thing would be to sleep a couple of seconds?

-Yonik


[jira] Commented: (SOLR-469) Data Import RequestHandler

2008-06-26 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608421#action_12608421
 ] 

Grant Ingersoll commented on SOLR-469:
--

Patch applies cleanly, tests pass, although I notice several @ignore in there.  
 Docs look good in my preliminary perusing.  I've only started looking at 
things, and have a lot of reading to catch up on, so these first comments, 
please take with a grain of salt, as the English saying goes... 

I'd suggest,that instead of relying on MySQL in TestJdbcDataSource, we instead 
use and embedded Derby or some sort of JDBC mock.  I suggest Derby mainly b/c 
it's already ASF and I don't want to bother looking up licenses for HSQL or any 
of the others that might work.  

Also, I notice several interfaces that have a number of methods on them.  Have 
you thought about abstract base classes instead?  I know, there is a whole big 
debate over it, and people will argue that if you get the interface exactly 
correct, you should use interfaces.  Nice in theory, but Lucene/Solr experience 
suggests that rarely happens.  Of course, I think the correct way is to 
actually do both, as one can easily decorate an abstract base class with more 
interfaces as needed.  Just food for thought, b/c what's going to quickly 
happen after release is someone is going to need a new method on the DataSource 
or something and then we are going to be stuck doing all kinds of workarounds 
due to back compatibility reasons.  The alternative is to clearly mark all 
Interfaces as being experimental at this point and clearly note that we expect 
them to change.  We may even want to consider both!  The other point, though, 
is contrib packages need not be held to the same standard as core when it comes 
to back compat.

What relation does the Context have to the HttpDataSource?  In other words, the 
DataSource init method takes a Context, meaning the HttpDataSource needs a 
Context, yet in my first glance at the Context, it seems to be DB related.

What if I wanted to slurp from a table on the fly?  That is, I want to send in 
a select statement in my request and I let the columns line up where they may 
Field wise (i.e. via dynamic fields or I rely on something like select id, colA 
as fieldA, colB as fieldB from MyTable;   )
Is that possible?  

Interactive mode has a bit of a chicken and the egg problem when it comes to 
JDBC, right, in that the Driver needs to be present in Solr/lib right?  So, one 
can currently only interactively configure a JDBC DataSource if the driver is 
already in lib and loaded by the ClassLoader.   If you haven't already, it 
might actually be useful to show what drivers are present by using the 
DriverManager.

In the JDBCDataSource, not sure I follow the connection stuff.  Can you explain 
a bit?  Also, what if I wanted to plug in my own Connection Pooling library, as 
I may already have one setup for other things (if using Solr embedded)?




> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Grant Ingersoll
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



TestDistributedSearch

2008-06-26 Thread Grant Ingersoll

From time to time, I get:
 type 
= 
"org 
.apache 
.solr 
.client 
.solrj 
.SolrServerException 
">org.apache.solr.client.solrj.SolrServerException:  
java.net.ConnectExcept

ion: Connection refused
at  
org 
.apache 
.solr 
.client 
.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java: 
359)
at  
org 
.apache 
.solr 
.client 
.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java: 
155)
at  
org 
.apache 
.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:220)
at  
org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java: 
114)
at  
org.apache.solr.TestDistributedSearch.del(TestDistributedSearch.java: 
166)
at  
org 
.apache.solr.TestDistributedSearch.doTest(TestDistributedSearch.java: 
432)
at  
org 
.apache 
.solr 
.TestDistributedSearch.testDistribSearch(TestDistributedSearch.java:427)

Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at  
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)

at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)
at java.net.Socket.connect(Socket.java:520)
at  
org 
.apache 
.commons 
.httpclient 
.protocol 
.ReflectionSocketFactory.createSocket(ReflectionSocketFactory.java:140)
at  
org 
.apache 
.commons 
.httpclient 
.protocol 
.DefaultProtocolSocketFactory 
.createSocket(DefaultProtocolSocketFactory.java:125)
at  
org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java: 
707)
at  
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager 
$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java: 
1361)
at  
org 
.apache 
.commons 
.httpclient 
.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at  
org 
.apache 
.commons 
.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java: 
171)
at  
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 
397)
at  
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 
323)
at  
org 
.apache 
.solr 
.client 
.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java: 
303)



when running ant clean test

I am assuming it is a timing issue.  Is there a different way we could  
create the servers?


-Grant