[jira] Updated: (SOLR-256) Stats via JMX

2007-07-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-256:
--

Attachment: jmx.patch

Sharad: my concern with your most recent patch is that if a servlet container 
uses it's own config to drive programmatic JMX Server creation (jetty plus 
appears to do this past on the example jetty-jmx.xml config file, but i haven't 
actually confirmed this), then Solr won't detect it  because it's looking 
explicitly for the system properties.

based on the javadocs your findMBeanServer(null) idea seems right on the money 
... i'm attaching a tweak to your patch that uses this appraoch, and it seems 
to work great, what do you think?

(was there a reason you decided to look for the properties explicitly instead 
of try this appraoch?)

Any JMX experts want to chime in whether we should be doing something 
differently? 

> Stats via JMX
> -
>
> Key: SOLR-256
> URL: https://issues.apache.org/jira/browse/SOLR-256
> Project: Solr
>  Issue Type: New Feature
>  Components: search, update
>Reporter: Sharad Agarwal
>Priority: Minor
> Attachments: jmx.patch, jmx.patch, jmx.patch, jmx.patch, jmx.patch
>
>
> This patch adds JMX capability to get statistics from all the SolrInfoMBean.
> The implementation is done such a way to minimize code changes. 
> In SolrInfoRegistry, I have overloaded Map's  put and remove methods to 
> register and unregister SolrInfoMBean in MBeanServer. 
> Later on, I am planning to use register and unregister methods in 
> SolrInfoRegistry and removing getRegistry() method (Hiding the map instance 
> to other classes)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-102) Ideas for better highlighting

2007-07-19 Thread Mike Klaas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Klaas resolved SOLR-102.
-

Resolution: Fixed

committed in r557872

> Ideas for better highlighting
> -
>
> Key: SOLR-102
> URL: https://issues.apache.org/jira/browse/SOLR-102
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Mike Klaas
>Assignee: Mike Klaas
>Priority: Minor
> Attachments: regexfrag.patch, RegexFragmenter.java
>
>
> A collection of rough enhancements to the default highlighter. Mostly to be 
> used as ideas for future development.
> RegexFragmenter -> Define a regular expression to indicate "points of 
> interest" inthe target text (eg., beginning/end of sentences).  Fragmenter 
> will attempt to start/end fragments at these locations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-305) ad support to analysis tool for working with type names instead of just field names

2007-07-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-305.
---

Resolution: Fixed
  Assignee: Hoss Man

Committed revision 557870.


> ad support to analysis tool for working with type names instead of just field 
> names
> ---
>
> Key: SOLR-305
> URL: https://issues.apache.org/jira/browse/SOLR-305
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Hoss Man
>Priority: Trivial
> Attachments: analysistool.bytype.diff
>
>
> quick little patch to analysis.jsp so people can choose between specifying a 
> field name or a fieldtype name ... may save time when you want to try out a 
> bunch of different analyzer options because you only have to create the field 
> types - not fields that use them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-19 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514099
 ] 

Mike Klaas commented on SOLR-139:
-

It is my fault that the DUH2 locking is so hairy to begin with, so I should at 
least review changes to it ;)

With your last change, the locking looks sound.  However, I noticed a few 
things:

This comment is now inaccurate:
+// need to start off with the write lock because we can't aquire
+// the write lock if we need to.

Should openSearcher() call closeSearcher() instead of doing it manually?  It 
looks like searcherHasChanges is not being reset to false.



> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Attachments: getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-XmlUpdater.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: dismax catenated token search

2007-07-19 Thread Chris Hostetter

: > Yes, pf should be replaced by a word proximity query that doesn't
: > require all words to match :)

: 2) dismax parameter that throws word catenations into the MaxDisjunction:
:"a b c" would also search for ab and bc.

that doesn't address the inverse problem: when "pain killer" is indexed
but the user searches for "painkiller"

I believe both problems can be solved by using the NgramTokenizer on a
field in the qf ... but i have not tested this.  (i'm not entreily certain
what the NgramTokenizer does with whitespaces, so it might actually need
to KeywordTokenizer followed by a Filter that strips out interword
whitespace, followed by NgramTokenFilter ... or something like that.


-Hoss



[jira] Updated: (SOLR-258) Date based Facets

2007-07-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-258:
--

Attachment: date_facets.patch

fixed the the NOW issue by refactoring the toExternal(toInternal()) logic into 
a new DateField.parseMath(Date,String) method ... a DateMathParser is still 
used internally to deal with teh math parsing aspects, but i wanted to leave 
the assumptions about the date format in the DateField class itself.

comments/critique about this approach welcome.

> Date based Facets
> -
>
> Key: SOLR-258
> URL: https://issues.apache.org/jira/browse/SOLR-258
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
>Assignee: Hoss Man
> Attachments: date_facets.patch, date_facets.patch, date_facets.patch, 
> date_facets.patch, date_facets.patch, date_facets.patch, date_facets.patch
>
>
> 1) Allow clients to express concepts like...
> * "give me facet counts per day for every day this month."
> * "give me facet counts per hour for every hour of today."
> * "give me facet counts per hour for every hour of a specific day."
> * "give me facet counts per hour for every hour of a specific day and 
> give me facet counts for the 
>number of matches before that day, or after that day." 
> 2) Return all data in a way that makes it easy to use to build filter queries 
> on those date ranges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: dismax catenated token search

2007-07-19 Thread Yonik Seeley

On 7/19/07, Mike Klaas <[EMAIL PROTECTED]> wrote:


On 19-Jul-07, at 2:49 PM, Yonik Seeley wrote:

> Does anyone have a good idea how to go about searching for
> concatenated tokens?
>
> Say that the index has "painkiller" and the user types in
> "pain killer" (without the quotes).
>
> If one were using the standard request handler, the easiest would be
> to have the client handle it by sending in both variants:
> pain OR killer OR painkiller
>  or a variant like
> "pain killer" OR painkiller
>
> But is there any answer when using dismax?
> Requiring the client to send in pain killer painkiller seems like it
> may decrease relevance too much if you currently use "pf" (phrase
> fields) since the phrase "pain killer painkiller" isn't going to match
> anything.
>
> Thoughts?

Yes, pf should be replaced by a word proximity query that doesn't
require all words to match :)


Some other quick ideas:
1) client issues two separate queries... "pain killer" and
"painkiller" and merges
  results.
2) dismax parameter that throws word catenations into the MaxDisjunction:
  "a b c" would also search for ab and bc.

-Yonik


Re: dismax catenated token search

2007-07-19 Thread Mike Klaas


On 19-Jul-07, at 2:49 PM, Yonik Seeley wrote:

Does anyone have a good idea how to go about searching for  
concatenated tokens?


Say that the index has "painkiller" and the user types in
"pain killer" (without the quotes).

If one were using the standard request handler, the easiest would be
to have the client handle it by sending in both variants:
pain OR killer OR painkiller
 or a variant like
"pain killer" OR painkiller

But is there any answer when using dismax?
Requiring the client to send in pain killer painkiller seems like it
may decrease relevance too much if you currently use "pf" (phrase
fields) since the phrase "pain killer painkiller" isn't going to match
anything.

Thoughts?


Yes, pf should be replaced by a word proximity query that doesn't  
require all words to match :)


-Mike


[jira] Updated: (SOLR-258) Date based Facets

2007-07-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-258:
--

Attachment: date_facets.patch

checkpoint...

* renamed pre/post/inner to before/after/between
* added a new facet.date.hardend param (with test additions)

...still need to tackle the "NOW" inconsistency issue.

> Date based Facets
> -
>
> Key: SOLR-258
> URL: https://issues.apache.org/jira/browse/SOLR-258
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
>Assignee: Hoss Man
> Attachments: date_facets.patch, date_facets.patch, date_facets.patch, 
> date_facets.patch, date_facets.patch, date_facets.patch
>
>
> 1) Allow clients to express concepts like...
> * "give me facet counts per day for every day this month."
> * "give me facet counts per hour for every hour of today."
> * "give me facet counts per hour for every hour of a specific day."
> * "give me facet counts per hour for every hour of a specific day and 
> give me facet counts for the 
>number of matches before that day, or after that day." 
> 2) Return all data in a way that makes it easy to use to build filter queries 
> on those date ranges.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



dismax catenated token search

2007-07-19 Thread Yonik Seeley

Does anyone have a good idea how to go about searching for concatenated tokens?

Say that the index has "painkiller" and the user types in
"pain killer" (without the quotes).

If one were using the standard request handler, the easiest would be
to have the client handle it by sending in both variants:
pain OR killer OR painkiller
 or a variant like
"pain killer" OR painkiller

But is there any answer when using dismax?
Requiring the client to send in pain killer painkiller seems like it
may decrease relevance too much if you currently use "pf" (phrase
fields) since the phrase "pain killer painkiller" isn't going to match
anything.

Thoughts?

-Yonik


[jira] Commented: (SOLR-312) create solrj javadoc in build.xml

2007-07-19 Thread Paul Sundling (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514003
 ] 

Paul Sundling commented on SOLR-312:


Maven already has javadoc target included automatically.  

> create solrj javadoc in build.xml
> -
>
> Key: SOLR-312
> URL: https://issues.apache.org/jira/browse/SOLR-312
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 1.3
> Environment: a new task in build.xml named javadoc-solrj that does 
> pretty much what you'd expect.  creates a new fold build/docs/api-solrj.  
> heavily based on the example from the solr core javadoc target.
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: create-solrj-javadoc.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-312) create solrj javadoc in build.xml

2007-07-19 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley resolved SOLR-312.


Resolution: Fixed

added in rev 557774

thanks Will

> create solrj javadoc in build.xml
> -
>
> Key: SOLR-312
> URL: https://issues.apache.org/jira/browse/SOLR-312
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 1.3
> Environment: a new task in build.xml named javadoc-solrj that does 
> pretty much what you'd expect.  creates a new fold build/docs/api-solrj.  
> heavily based on the example from the solr core javadoc target.
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: create-solrj-javadoc.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-311) wrong quoting in tutorial - fails on windows

2007-07-19 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-311.
---

Resolution: Fixed
  Assignee: Hoss Man


thanks for pointing this out.

FYI: the html and pdf versions are generated from the xml by forrest.

Committed revision 557739.
website sync should be ~30 minutes

> wrong quoting in tutorial - fails on windows
> 
>
> Key: SOLR-311
> URL: https://issues.apache.org/jira/browse/SOLR-311
> Project: Solr
>  Issue Type: Bug
>  Components: documentation
> Environment: Windows XP and likely other windows variants
>Reporter: Paul Sundling
>Assignee: Hoss Man
>Priority: Trivial
>
> java -Ddata=args -Dcommit=no -jar post.jar 'SP2514N'
> and
> java -Ddata=args -jar post.jar 'name:DDR'
> should have their single quotes replaced with double quotes.  Otherwise, it 
> results in the following error on windows command line:
> (sample DOS window FAILS)
> C:\downloads\temp\apache-solr-1.2.0\example\exampledocs>java -Ddata=args -jar 
> post.jar 'name:DDR'
> < was unexpected at this time.
> (sample DOS window WORKS)
> C:\downloads\temp\apache-solr-1.2.0\example\exampledocs>java -Ddata=args -jar 
> post.jar "name:DDR"
> SimplePostTool: version 1.2
> SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, 
> other encodings are not currently supported
> SimplePostTool: POSTing args to http://localhost:8983/solr/update..
> SimplePostTool: COMMITting Solr index changes..
> As demonstrated double quotes works with windows.  I also tested double 
> quotes in cygwin, and it should presumably work for linux/UNIX as well.
> I started to do a patch, but I see there are three locations where updates 
> might need to be made and I wasn't sure how PDF files were generated, so 
> here's the list of effected source files:
> site/tutorial.html
> site/tutorial.pdf
> src/site/src/documentation/content/xdocs/tutorial.xml

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-19 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513972
 ] 

Hoss Man commented on SOLR-308:
---

I understood your data entry/delete reindexing strategy, but i hadn't 
considered the use case of doing a query, and then issuing a followup query to 
get more details about specific items.

As yonik points out, exposing the internal lucene docid would be a bad idea 
since it may change every time an IndexReader is opened ... even if hte doc you 
are interested in is still in the index (ie: hasn't been deleted) other 
deletions may have changed it's internal id.

i have no objection to adding a FieldType that can generate UUID on demand for 
use cases like this, but having it ignore the input seems a little sketchy to 
me.  it seems like a better approach would be to have UUIDFieldType with a 
toInternal() method that tests it's input for some marker token (like "NEW" or 
"*") and if it sees that token, generates a new UUID, otherwise it uses the 
literal value.  then you can configure the id field with a defaultValue of 
"NEW" in the schema and any doc without an id will get a unique one, but if 
someone tries to update an existing doc whose id they already know, it will 
still work as well.

> Add a field that generates an unique id when you have none in your data to 
> index
> 
>
> Key: SOLR-308
> URL: https://issues.apache.org/jira/browse/SOLR-308
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Thomas Peuss
>Priority: Minor
> Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique 
> id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-19 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513960
 ] 

Ryan McKinley commented on SOLR-308:


The easiest option is to add a UUID when you index the data.  

Other options would be to make this FieldType a plugin and put it in the 'lib' 
directory.

> Add a field that generates an unique id when you have none in your data to 
> index
> 
>
> Key: SOLR-308
> URL: https://issues.apache.org/jira/browse/SOLR-308
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Thomas Peuss
>Priority: Minor
> Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique 
> id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513959
 ] 

Yonik Seeley commented on SOLR-308:
---

Lucene docids are transient (they change when the index changes) - they should 
not be used across different instances of an IndexReader

> Add a field that generates an unique id when you have none in your data to 
> index
> 
>
> Key: SOLR-308
> URL: https://issues.apache.org/jira/browse/SOLR-308
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Thomas Peuss
>Priority: Minor
> Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique 
> id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-19 Thread Thomas Peuss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513947
 ] 

Thomas Peuss commented on SOLR-308:
---

That would be a good replacement for my problem. From the Lucene docs I see 
that the document id is 32 bits (int). I don't know if the docid "wraps around" 
when this address space is exhausted (I assume not). Or is the docid field 
recomputed on "optimize"?

I try to add the functionality to see the document id in the response. So for 
now we can close this issue for now.

> Add a field that generates an unique id when you have none in your data to 
> index
> 
>
> Key: SOLR-308
> URL: https://issues.apache.org/jira/browse/SOLR-308
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Thomas Peuss
>Priority: Minor
> Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique 
> id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-215) Multiple Solr Cores

2007-07-19 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513946
 ] 

Ryan McKinley commented on SOLR-215:


 
> My suggestion is that this be added in phase 2, after Henri's initial changes 
> are committed.
> Does this sound reasonable?
> 

Yes - perhaps getting this checked in without touching handlers or the web-app 
side is a good idea.  It is a little weird since the multi-core aspect would 
only be usable programatically, but that will make it possible to easily bat 
around a 'core manager' and http design.

The one big question is what to do with the TokenizerFactory API.  

Yonik, how do you suggest upgrading an interface?  The only clean way I can 
think is to upgrade the TokenizerFactory interface with a 
'MulitCoreTokenizerFactory'  adding an additional argument.  I don't like it, 
but don't know the API compatibility rules well enough to know if it is 
required or is ok to change the API.



Will - as is, this patch does not let you dynamically change the core.  They 
are statically defined in web.xml.  This will change.

> Multiple Solr Cores
> ---
>
> Key: SOLR-215
> URL: https://issues.apache.org/jira/browse/SOLR-215
> Project: Solr
>  Issue Type: Improvement
>Reporter: Henri Biestro
>Priority: Minor
> Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, 
> solr-trunk-542847.patch, solr-trunk-src.patch
>
>
> WHAT:
> As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
> This patch is intended to allow multiple cores in Solr which also brings 
> multiple indexes capability.
> The patch file to grab is solr-215.patch.zip (see MISC session below).
> WHY:
> The current Solr practical wisdom is that one schema - thus one index - is 
> most likely to accomodate your indexing needs, using a filter to segregate 
> documents if needed. If you really need multiple indexes, deploy multiple web 
> applications.
> There are a some use cases however where having multiple indexes or multiple 
> cores through Solr itself may make sense.
> Multiple cores:
> Deployment issues within some organizations where IT will resist deploying 
> multiple web applications.
> Seamless schema update where you can create a new core and switch to it 
> without starting/stopping servers.
> Embedding Solr in your own application (instead of 'raw' Lucene) and 
> functionally need to segregate schemas & collections.
> Multiple indexes:
> Multiple language collections where each document exists in different 
> languages, analysis being language dependant.
> Having document types that have nothing (or very little) in common with 
> respect to their schema, their lifetime/update frequencies or even collection 
> sizes.
> HOW:
> The best analogy is to consider that instead of deploying multiple 
> web-application, you can have one web-application that hosts more than one 
> Solr core. The patch does not change any of the core logic (nor the core 
> code); each core is configured & behaves exactly as the one core in 1.2; the 
> various caches are per-core & so is the info-bean-registry.
> What the patch does is replace the SolrCore singleton by a collection of 
> cores; all the code modifications are driven by the removal of the different 
> singletons (the config, the schema & the core).
> Each core is 'named' and a static map (keyed by name) allows to easily manage 
> them.
> You declare one servlet filter mapping per core you want to expose in the 
> web.xml; this allows easy to access each core through a different url. 
> USAGE (example web deployment, patch installed):
> Step0
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml 
> monitor.ml
> Will index the 2 documents in solr.xml & monitor.xml
> Step1:
> http://localhost:8983/solr/core0/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core0 index; 2 
> documents
> Step2:
> http://localhost:8983/solr/core1/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core1 index; no 
> documents
> Step3:
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml
> java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml
> Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1;
> running queries from the admin interface, you can verify indexes have 
> different content. 
> USAGE (Java code):
> //create a configuration
> SolrConfig config = new SolrConfig("solrconfig.xml");
> //create a schema
> IndexSchema schema = new IndexSchema(config, "schema0.xml");
> //create a core from the 2 other.
> SolrCore cor

[jira] Commented: (SOLR-215) Multiple Solr Cores

2007-07-19 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513912
 ] 

Will Johnson commented on SOLR-215:
---

did anything ever get baked into the patch for handling the core name as a cgi 
param instead of as a url path element?  the email thread we had going didn't 
seem to come to any hard conclusions but i'd like to lobby for it as a part of 
the spec.  i read through the patch but i couldn't quite follow things enough 
to tell.

> Multiple Solr Cores
> ---
>
> Key: SOLR-215
> URL: https://issues.apache.org/jira/browse/SOLR-215
> Project: Solr
>  Issue Type: Improvement
>Reporter: Henri Biestro
>Priority: Minor
> Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, 
> solr-trunk-542847.patch, solr-trunk-src.patch
>
>
> WHAT:
> As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
> This patch is intended to allow multiple cores in Solr which also brings 
> multiple indexes capability.
> The patch file to grab is solr-215.patch.zip (see MISC session below).
> WHY:
> The current Solr practical wisdom is that one schema - thus one index - is 
> most likely to accomodate your indexing needs, using a filter to segregate 
> documents if needed. If you really need multiple indexes, deploy multiple web 
> applications.
> There are a some use cases however where having multiple indexes or multiple 
> cores through Solr itself may make sense.
> Multiple cores:
> Deployment issues within some organizations where IT will resist deploying 
> multiple web applications.
> Seamless schema update where you can create a new core and switch to it 
> without starting/stopping servers.
> Embedding Solr in your own application (instead of 'raw' Lucene) and 
> functionally need to segregate schemas & collections.
> Multiple indexes:
> Multiple language collections where each document exists in different 
> languages, analysis being language dependant.
> Having document types that have nothing (or very little) in common with 
> respect to their schema, their lifetime/update frequencies or even collection 
> sizes.
> HOW:
> The best analogy is to consider that instead of deploying multiple 
> web-application, you can have one web-application that hosts more than one 
> Solr core. The patch does not change any of the core logic (nor the core 
> code); each core is configured & behaves exactly as the one core in 1.2; the 
> various caches are per-core & so is the info-bean-registry.
> What the patch does is replace the SolrCore singleton by a collection of 
> cores; all the code modifications are driven by the removal of the different 
> singletons (the config, the schema & the core).
> Each core is 'named' and a static map (keyed by name) allows to easily manage 
> them.
> You declare one servlet filter mapping per core you want to expose in the 
> web.xml; this allows easy to access each core through a different url. 
> USAGE (example web deployment, patch installed):
> Step0
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml 
> monitor.ml
> Will index the 2 documents in solr.xml & monitor.xml
> Step1:
> http://localhost:8983/solr/core0/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core0 index; 2 
> documents
> Step2:
> http://localhost:8983/solr/core1/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core1 index; no 
> documents
> Step3:
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml
> java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml
> Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1;
> running queries from the admin interface, you can verify indexes have 
> different content. 
> USAGE (Java code):
> //create a configuration
> SolrConfig config = new SolrConfig("solrconfig.xml");
> //create a schema
> IndexSchema schema = new IndexSchema(config, "schema0.xml");
> //create a core from the 2 other.
> SolrCore core = new SolrCore("core0", "/path/to/index", config, schema);
> //Accessing a core:
> SolrCore core = SolrCore.getCore("core0"); 
> PATCH MODIFICATIONS DETAILS (per package):
> org.apache.solr.core:
> The heaviest modifications are in SolrCore & SolrConfig.
> SolrCore is the most obvious modification; instead of a singleton, there is a 
> static map of cores keyed by names and assorted methods. To retain some 
> compatibility, the 'null' named core replaces the singleton for the relevant 
> methods, for instance SolrCore.getCore(). One small constraint on the core 
> name is they can't contain '/' or '\' avoiding 

[jira] Commented: (SOLR-215) Multiple Solr Cores

2007-07-19 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513896
 ] 

Otis Gospodnetic commented on SOLR-215:
---

I didn't even realize this patch would still require cores to be declared 
apriori in static files such as web.xml. 

I think this new multi-core functionality should come with the "core manager" 
handler, as we said here:
https://issues.apache.org/jira/browse/SOLR-215#action_12506920
https://issues.apache.org/jira/browse/SOLR-215#action_12507189

So, something like:
/admin/coremanager?cmd=add&name=foo&schema=foo-schema.xml&config=foo-solrconfig.xml
(this assumes that foo-schema.xml and foo-solrconfig.xml already exist in conf/ 
dir)

One could also POST this and *include* the *content* of the 2 .xml files.  In 
that case the core manager would be the one writing their content to disk in 
conf/ dir prior to starting the given core.

My suggestion is that this be added in phase 2, after Henri's initial changes 
are committed.
Does this sound reasonable?


> Multiple Solr Cores
> ---
>
> Key: SOLR-215
> URL: https://issues.apache.org/jira/browse/SOLR-215
> Project: Solr
>  Issue Type: Improvement
>Reporter: Henri Biestro
>Priority: Minor
> Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, 
> solr-trunk-542847.patch, solr-trunk-src.patch
>
>
> WHAT:
> As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
> This patch is intended to allow multiple cores in Solr which also brings 
> multiple indexes capability.
> The patch file to grab is solr-215.patch.zip (see MISC session below).
> WHY:
> The current Solr practical wisdom is that one schema - thus one index - is 
> most likely to accomodate your indexing needs, using a filter to segregate 
> documents if needed. If you really need multiple indexes, deploy multiple web 
> applications.
> There are a some use cases however where having multiple indexes or multiple 
> cores through Solr itself may make sense.
> Multiple cores:
> Deployment issues within some organizations where IT will resist deploying 
> multiple web applications.
> Seamless schema update where you can create a new core and switch to it 
> without starting/stopping servers.
> Embedding Solr in your own application (instead of 'raw' Lucene) and 
> functionally need to segregate schemas & collections.
> Multiple indexes:
> Multiple language collections where each document exists in different 
> languages, analysis being language dependant.
> Having document types that have nothing (or very little) in common with 
> respect to their schema, their lifetime/update frequencies or even collection 
> sizes.
> HOW:
> The best analogy is to consider that instead of deploying multiple 
> web-application, you can have one web-application that hosts more than one 
> Solr core. The patch does not change any of the core logic (nor the core 
> code); each core is configured & behaves exactly as the one core in 1.2; the 
> various caches are per-core & so is the info-bean-registry.
> What the patch does is replace the SolrCore singleton by a collection of 
> cores; all the code modifications are driven by the removal of the different 
> singletons (the config, the schema & the core).
> Each core is 'named' and a static map (keyed by name) allows to easily manage 
> them.
> You declare one servlet filter mapping per core you want to expose in the 
> web.xml; this allows easy to access each core through a different url. 
> USAGE (example web deployment, patch installed):
> Step0
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml 
> monitor.ml
> Will index the 2 documents in solr.xml & monitor.xml
> Step1:
> http://localhost:8983/solr/core0/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core0 index; 2 
> documents
> Step2:
> http://localhost:8983/solr/core1/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core1 index; no 
> documents
> Step3:
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml
> java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml
> Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1;
> running queries from the admin interface, you can verify indexes have 
> different content. 
> USAGE (Java code):
> //create a configuration
> SolrConfig config = new SolrConfig("solrconfig.xml");
> //create a schema
> IndexSchema schema = new IndexSchema(config, "schema0.xml");
> //create a core from the 2 other.
> SolrCore core = new SolrCore("core0", "/path/to/index", config, schema);
> //Accessin

[jira] Updated: (SOLR-312) create solrj javadoc in build.xml

2007-07-19 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-312:
--

Attachment: create-solrj-javadoc.patch

simple patch to add new task

> create solrj javadoc in build.xml
> -
>
> Key: SOLR-312
> URL: https://issues.apache.org/jira/browse/SOLR-312
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 1.3
> Environment: a new task in build.xml named javadoc-solrj that does 
> pretty much what you'd expect.  creates a new fold build/docs/api-solrj.  
> heavily based on the example from the solr core javadoc target.
>Reporter: Will Johnson
>Priority: Minor
> Fix For: 1.3
>
> Attachments: create-solrj-javadoc.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-312) create solrj javadoc in build.xml

2007-07-19 Thread Will Johnson (JIRA)
create solrj javadoc in build.xml
-

 Key: SOLR-312
 URL: https://issues.apache.org/jira/browse/SOLR-312
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.3
 Environment: a new task in build.xml named javadoc-solrj that does 
pretty much what you'd expect.  creates a new fold build/docs/api-solrj.  
heavily based on the example from the solr core javadoc target.
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-308) Add a field that generates an unique id when you have none in your data to index

2007-07-19 Thread Pieter Berkel (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513891
 ] 

Pieter Berkel commented on SOLR-308:


>From the usage case you have provided, it sounds like the unique id will 
>change every time you delete and re-insert the document.  If this is the case, 
>then perhaps it might be more efficient to use the lucene document id as your 
>unique id value rather than a seperate field?  However, as far as I'm aware, 
>there currently isn't any way to access the lucene doc id from solr (except 
>perhaps the luke request handler)?


> Add a field that generates an unique id when you have none in your data to 
> index
> 
>
> Key: SOLR-308
> URL: https://issues.apache.org/jira/browse/SOLR-308
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Thomas Peuss
>Priority: Minor
> Attachments: GeneratedId.patch
>
>
> This patch adds a field that generates an unique id when you have no unique 
> id in your data you want to index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-307) NGramFilterFactory and EdgeNGramFilterFactory

2007-07-19 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-307.
---

Resolution: Fixed

Thanks Thomas, this is committed.


> NGramFilterFactory and EdgeNGramFilterFactory
> -
>
> Key: SOLR-307
> URL: https://issues.apache.org/jira/browse/SOLR-307
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Thomas Peuss
>Priority: Minor
> Attachments: SolrNGramFilters.patch
>
>
> Here is a patch that adds an NGramFilterFactory and EdgeNGramFilterFactory to 
> Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Solr-Nightly #147

2007-07-19 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Solr-Nightly/147/changes




[jira] Commented: (SOLR-215) Multiple Solr Cores

2007-07-19 Thread Henri Biestro (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513829
 ] 

Henri Biestro commented on SOLR-215:


On Otis's comments:
1 & 2- static initializers for lock related value: you are correct, the code 
has been lost most likely in some merge- my bad.
3- SolrInfoRegistry  deprecated: you are correct, functionality is replaced by 
SolrCore.getSolrCore().getInfoRegistry().
4-classLoader not assigned: not sure why it happens but this fixes it...
5- checkName is not subtle: I had the idea of "normalizing" the core name (url 
like normalize for instance) but did not pursue since it might make the 
replication scripts more complex to modify (aka the normalization code would 
need to be duplicated in the script). And since the solaris scripts were not 
completely functional (my dev machine being solaris), I've postponed the 
task... ( I also was "dreaming" about being able to derive from SorlCore to 
benefit from the static map, implement a naming policy that would encompass the 
config & schema name generations, etc...). Anyhow, this can indeed be 
simplified with a regexp match.
6-finalize(): no, I believe finalizing one core should just ensure that this 
core is shutdown.This is only for completeness though since I cant see how a 
core could be gc-ed & finalized before it actually gets shutdown & removed from 
the map of cores.

On Ryan's comments:
1- factory/init interface compatibility break: I'll look into other ways since 
if this is a blocker (ctor, setter or wrap/delegate...). 
2- RequestHandlers know core: SolrUpdateServlet is deprecated but is still 
there; I was just trying to ensure correct/compatible behavior. I agree 
SolrInit is more clutter than necessity but can be dropped easily if there is 
no need to support the SolrUpdateServlet.
3- I do agree that there must be an easier & more functional way to declare and 
access a core than the current one. I'll try the route you describe.
4- Having core "descriptors" (config/schema) as explicit files in a 
$solrhome/cores directory; might use some naming convention to derive the core 
name from them (related to uploading/dynamic creation of cores).

I'm mostly "off the grid" today but I'll try my best on Friday.


> Multiple Solr Cores
> ---
>
> Key: SOLR-215
> URL: https://issues.apache.org/jira/browse/SOLR-215
> Project: Solr
>  Issue Type: Improvement
>Reporter: Henri Biestro
>Priority: Minor
> Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, 
> solr-trunk-542847.patch, solr-trunk-src.patch
>
>
> WHAT:
> As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
> This patch is intended to allow multiple cores in Solr which also brings 
> multiple indexes capability.
> The patch file to grab is solr-215.patch.zip (see MISC session below).
> WHY:
> The current Solr practical wisdom is that one schema - thus one index - is 
> most likely to accomodate your indexing needs, using a filter to segregate 
> documents if needed. If you really need multiple indexes, deploy multiple web 
> applications.
> There are a some use cases however where having multiple indexes or multiple 
> cores through Solr itself may make sense.
> Multiple cores:
> Deployment issues within some organizations where IT will resist deploying 
> multiple web applications.
> Seamless schema update where you can create a new core and switch to it 
> without starting/stopping servers.
> Embedding Solr in your own application (instead of 'raw' Lucene) and 
> functionally need to segregate schemas & collections.
> Multiple indexes:
> Multiple language collections where each document exists in different 
> languages, analysis being language dependant.
> Having document types that have nothing (or very little) in common with 
> respect to their schema, their lifetime/update frequencies or even collection 
> sizes.
> HOW:
> The best analogy is to consider that instead of deploying multiple 
> web-application, you can have one web-application that hosts more than one 
> Solr core. The patch does not change any of the core logic (nor the core 
> code); each core is configured & behaves exactly as the one core in 1.2; the 
> various caches are per-core & so is the info-bean-registry.
> What the patch does is replace the SolrCore singleton by a collection of 
> cores; all the code modifications are driven by the removal of the different 
> singletons (the config, the schema & the core).
> Each core is 'named' and a static map (keyed by name) allows to easily manage 
> them.
> You declare one servlet filter mapping per core you want to expose in the 
>