[jira] Commented: (SOLR-243) Create a hook to allow custome code to create custome index readers
[ https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499284 ] Hoss Man commented on SOLR-243: --- some comments after reading the patch (haven't run it yet) ... 1) in the future, please use 2 spaces for indenting (and no tabs) 2) the existing public constructors for SolrIndexSearcher are now private which is an API change that has to be carefully considered ... on top of that, they don't even use the new IndexReder factory at all (they could get it by asking the SolrCore - it's a singleton). 3) instead of adding a new indexReaderFactory element directly to the solrconfig.xml, it would probably make sense to get it when parsing the mainIndex/indexDefaults blocks. 4) StandardIndexReaderFactory" might be a better class name then "DefaultIndexReaderFactory" 5) I don't think we really need IndexReaderFactory.DEFAULT (static final instances in interfaces never make sense to me, they are not part of the "interface") ... just let SolrCore have a hardcoded instance of to use if it can't find one in the config. 6) people should be able to specify configuration options for their factories ... either using an init(NamedLIst) method (like RequestHandlers) or using an init(Map) method (like Caches, TokenFilters, etc...) 7) catching "Exception" when calling newInstance() is too broad. explicitly catch only the exceptions that are expected and warrant using the default factory, otherwise you might silently ignore a really serious problem. ... although frankly, if someone configures an explict IndexReader, and it can't be instantiated, that should probably be SEVERE not just WARNING. > Create a hook to allow custome code to create custome index readers > --- > > Key: SOLR-243 > URL: https://issues.apache.org/jira/browse/SOLR-243 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 > Environment: Solr core >Reporter: John Wang > Fix For: 1.3 > > Attachments: indexReaderFactory.patch > > > I have a customized IndexReader and I want to write a Solr plugin to use my > derived IndexReader implementation. Currently IndexReader instantiation is > hard coded to be: > IndexReader.open(path) > It would be really useful if this is done thru a plugable factory that can be > configured, e.g. IndexReaderFactory > interface IndexReaderFactory{ > IndexReader newReader(String name,String path); > } > the default implementation would just return: IndexReader.open(path) > And in the newSearcher and getSearcher methods in SolrCore class can call the > current factory implementation to get the IndexReader instance and then build > the SolrIndexSearcher by passing in the reader. > It would be really nice to add this improvement soon (This seems to be a > trivial addition) as our project really depends on this. > Thanks > -John -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-243) Create a hook to allow custome code to create custome index readers
[ https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated SOLR-243: --- Attachment: indexReaderFactory.patch I have attached a patch for this issue. > Create a hook to allow custome code to create custome index readers > --- > > Key: SOLR-243 > URL: https://issues.apache.org/jira/browse/SOLR-243 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 1.3 > Environment: Solr core >Reporter: John Wang > Fix For: 1.3 > > Attachments: indexReaderFactory.patch > > > I have a customized IndexReader and I want to write a Solr plugin to use my > derived IndexReader implementation. Currently IndexReader instantiation is > hard coded to be: > IndexReader.open(path) > It would be really useful if this is done thru a plugable factory that can be > configured, e.g. IndexReaderFactory > interface IndexReaderFactory{ > IndexReader newReader(String name,String path); > } > the default implementation would just return: IndexReader.open(path) > And in the newSearcher and getSearcher methods in SolrCore class can call the > current factory implementation to get the IndexReader instance and then build > the SolrIndexSearcher by passing in the reader. > It would be really nice to add this improvement soon (This seems to be a > trivial addition) as our project really depends on this. > Thanks > -John -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-216) Improvements to solr.py
[ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499273 ] Mike Klaas commented on SOLR-216: - Thanks for your contribution! Some comments: style: - list comprehensions solely to perform looped execution are harder to parse and slower than explicitly writing a for loop - shadowing builtins is generally a bad idea - SolrConnection is an old-style class, but Response is new-style functionality: - why are 'status'/'QTime' returned as floats? - all NamedList's appearing in the output are converted to dicts--this loses information (in particular, it will be unnecessarily hard for the user to use highlighting/debug data). Using the python/json response format would prevent this. Not returning highlight/debug data in the standard response format (and yet providing said parameters in the query() method) seems odd. Am I missing something? Oh, they are set as dynamic attributes of Response, I see. Definitely needs documentation. - passing fields='' to query() will return all fields, when the desired return is likely no fields - it might be better to settle on an api that permits doc/field boosts. How about using a tuple as the field name in the field dict? conn.add_many([{'id': 1, ('field2', 2.33): u"some text"}]) doc boosts could be handled by optionally providing the fielddict as a (, boost) tuple. - for 2.5+, a cool addition might be: if sys.version > 2.5 import contextlib def batched(solrconn): solrconn.begin_batch() yield solrconn solrconn.end_batch() batched = contextlib.contextmanager(batched) Use as: with batched(solrconn): solrconn.add(...) solrconn.add(...) solrconn.add(...) > Improvements to solr.py > --- > > Key: SOLR-216 > URL: https://issues.apache.org/jira/browse/SOLR-216 > Project: Solr > Issue Type: Improvement > Components: clients - python >Affects Versions: 1.2 >Reporter: Jason Cater >Priority: Trivial > Attachments: solr.py > > > I've taken the original solr.py code and extended it to include higher-level > functions. > * Requires python 2.3+ > * Supports SSL (https://) schema > * Conforms (mostly) to PEP 8 -- the Python Style Guide > * Provides a high-level results object with implicit data type conversion > * Supports batching of update commands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: solrconfig.xml defaults
On 25-May-07, at 2:09 PM, Yonik Seeley wrote: On 5/25/07, Mike Klaas <[EMAIL PROTECTED]> wrote: HashDocSet maxSize: perhaps consider increasing this, or making this by default a parameter which is tuned automatically (.5% of maxDocs, for instance) I think when HashDocSet is large enough, it can be slower than OpenBitSet for taking intersections, even when it still saves memory. So it depends on what one is optimizing for. I picked 3000 long ago since that it seemed the fastest for faceting with one particular data set (between 500K to 1M docs), but that was before OpenBitSet. It also caps the max table size at 4096 entries Wasn't HashDocSet significantly optimized for intersection recently? (16K RAM) (power of two hash table with a load factor of .75). Does it make sense to go up to 8K entries? Do you have any data on different sizes? Unfortunately, I don't. I'm using 20K right now for indices ranging in size from 3-8M docs, but that was based on advice on the wiki, and the memory savings seemed worth it (each bit filter is pushing 500Kb to 1Mb at that scale). I might have time to run some experiments before 1.2 is released. If not, 3000 seems like a well-founded default. Most people will start with the example solrconfig.xml, I suspect, and getting the performance-related settings right at the start will help the perception of Solr's performance. I'd be tempted to increase the default filterCache size too, but that can have quite high memory requirements. Yeah, many people won't think to increase the VM heap size. Perhaps that's better as a documentation fix. I just added a note to SolrPerformanceFactors. Most of the information is already on the wiki. What about commenting out most of the default parameters in the dismax handler config, so it becomes more standard & usable (w/o editing it's config) after someone customizes their schema? Makes sense, but I agree with Hoss that it is nice for the user to be able to easily use the example OOB. -Mike
Re: solrconfig.xml defaults
: What about commenting out most of the default parameters in the dismax : handler config, so it becomes more standard & usable (w/o editing it's : config) after someone customizes their schema? i'm torn on this ... those defaults make sense for the example schema/data -- which is the main point of the whole example/solr/conf. but i appreciate that people can be confused by errors from dismax when they chagne their schema (see pingQuery) perhaps the best solution is to remove the qf/pf/bf defaults for "dismax" and add them to "partitioned" -Hoss
Re: solrconfig.xml defaults
On 5/25/07, Mike Klaas <[EMAIL PROTECTED]> wrote: Since auditing solrconfig.xml defaults is on the list of things for 1.2, I thought I'd get the ball rolling: Thanks, that was one of the things I was looking into now (hitting all the new URLs and seeing what they looked like too) Lazy field loading: seems like it would benefit more people to be enabled explicitly. I've been using it successfully and some substantial gains have been reported on the lucene list. The downsides don't really seen significant. Sounds fine. HashDocSet maxSize: perhaps consider increasing this, or making this by default a parameter which is tuned automatically (.5% of maxDocs, for instance) I think when HashDocSet is large enough, it can be slower than OpenBitSet for taking intersections, even when it still saves memory. So it depends on what one is optimizing for. I picked 3000 long ago since that it seemed the fastest for faceting with one particular data set (between 500K to 1M docs), but that was before OpenBitSet. It also caps the max table size at 4096 entries (16K RAM) (power of two hash table with a load factor of .75). Does it make sense to go up to 8K entries? Do you have any data on different sizes? Most people will start with the example solrconfig.xml, I suspect, and getting the performance-related settings right at the start will help the perception of Solr's performance. I'd be tempted to increase the default filterCache size too, but that can have quite high memory requirements. Yeah, many people won't think to increase the VM heap size. Perhaps that's better as a documentation fix. What about commenting out most of the default parameters in the dismax handler config, so it becomes more standard & usable (w/o editing it's config) after someone customizes their schema? -Yonik
solrconfig.xml defaults
Since auditing solrconfig.xml defaults is on the list of things for 1.2, I thought I'd get the ball rolling: Lazy field loading: seems like it would benefit more people to be enabled explicitly. I've been using it successfully and some substantial gains have been reported on the lucene list. The downsides don't really seen significant. HashDocSet maxSize: perhaps consider increasing this, or making this by default a parameter which is tuned automatically (.5% of maxDocs, for instance) Most people will start with the example solrconfig.xml, I suspect, and getting the performance-related settings right at the start will help the perception of Solr's performance. I'd be tempted to increase the default filterCache size too, but that can have quite high memory requirements. -Mike
Re: [Solr Wiki] Update of "Solr1.2" by ryan
small typo: changes in behavior. For new-stype update handlers, errors are now new-style otherwise looks good
Re: [Solr Wiki] Update of "Solr1.2" by ryan
What about this blurb for CHANGES.txt? The Solr "Request Handler" framework has been updated in two key ways: First, if a Request Handler is registered in solrconfig.xml with a name starting with "/" then it can be accessed using path-based URL, instead of using the legacy "/select?qt=name" URL structure. Second, the Request Handler framework has been extended making it possible to write Request Handlers that process streams of data for doing updates, and there is a new-style Request Handler for XML updates given the name of "/update" in the example solrconfig.xml. Existing installations without this "/update" handler will continue to use the old update servlet and should see no changes in behavior. For new-stype update handlers, errors are now reflected in the HTTP status code, Content-type checking is more strict, and the response format has changed and is controllable via the wt parameter. -Yonik On 5/22/07, Apache Wiki <[EMAIL PROTECTED]> wrote: Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification. The following page has been changed by ryan: http://wiki.apache.org/solr/Solr1%2e2 -- framework has been extended making it possible to write Request Handlers that process streams of data for doing updates, and the existing XML-based update functionality has been refactored into a Request Handler given the name of "/update" in the example solrconfig.xml. - Existing Apache Solr installations that do not have references to this name in the - solrconfig.xml will be unaffected, but installations which do use it will now need to be - more explicit about the Content-Type when posting XML data, and will get a response format + Existing Apache Solr installations that do not reference "/update" in solrconfig.xml will + be unaffected. Installations that use it will need to explicitly define the Content-Type + when posting XML data (ie: curl ... -H 'Content-type:text/xml; charset=utf-8'), and will - controlled by the (wt) Response Writer. + get a new response format controlled by the (wt) Response Writer. }}} * delete example in the tutorial currently doesn't work because content type isn't specified .. tutorial could be fixed, but SOLR-230 would be a better use of time. * [https://issues.apache.org/jira/browse/SOLR-230 SOLR-230] should be done to keep the tutorial nice and portable and eliminate the need for curl
[jira] Commented: (SOLR-69) PATCH:MoreLikeThis support
[ https://issues.apache.org/jira/browse/SOLR-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499198 ] Andrew Nagy commented on SOLR-69: - A really nice feature would be to allow for boosting for fields, for example: ?q=id:1&mlt=true&mlt.fl=title^5,author^3,topic This would find items that are more similar to the title over the author, etc. > PATCH:MoreLikeThis support > -- > > Key: SOLR-69 > URL: https://issues.apache.org/jira/browse/SOLR-69 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Bertrand Delacretaz >Priority: Minor > Attachments: lucene-queries-2.0.0.jar, lucene-queries-2.1.1-dev.jar, > SOLR-69-MoreLikeThisRequestHandler.patch, > SOLR-69-MoreLikeThisRequestHandler.patch, > SOLR-69-MoreLikeThisRequestHandler.patch, SOLR-69.patch, SOLR-69.patch, > SOLR-69.patch, SOLR-69.patch > > > Here's a patch that implements simple support of Lucene's MoreLikeThis class. > The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be > more appropriate ;-) Erik Hatcher's example mentioned in > http://www.mail-archive.com/[EMAIL PROTECTED]/msg00878.html > To use it, add at least the following parameters to a standard or dismax > query: > mlt=true > mlt.fl=list,of,fields,which,define,similarity > See the MoreLikeThisHelper source code for more parameters. > Here are two URLs that work with the example config, after loading all > documents found in exampledocs in the index (just to show that it seems to > work - of course you need a larger corpus to make it interesting): > http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score > http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score > Results are added to the output like this: > > ... > > > > 1.5293242 > SOLR1000 > > > > > 1.5293242 > UTF8TEST > > > > I haven't tested this extensively yet, will do in the next few days. But > comments are welcome of course. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: svn commit: r541391 - in /lucene/solr/trunk: CHANGES.txt example/solr/conf/xslt/example_atom.xsl example/solr/conf/xslt/example_rss.xsl
Some versions of RSS are quite incompatible, so we MUST say what version we are implementing. RSS 1.0 is completely different from the 0.9 series and 2.0. Atom doesn't have a version number, but RFC 4287 Atom is informally called 1.0. The rss_example is supposed to be RSS 2.0. The atom_example validated on feedvalidator.org as Atom 1.0, I assume that's the RFC 4287.
Re: svn commit: r541391 - in /lucene/solr/trunk: CHANGES.txt example/solr/conf/xslt/example_atom.xsl example/solr/conf/xslt/example_rss.xsl
On 5/25/07 10:45 AM, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > > : I'd slap versions to those 2 XSL files to immediately answer "which > : version of Atom|RSS does this produce?" > > i'm comfortable calling the example_rss.xsl "RSS", since most RSS > readers will know what do do with it, but i don't know that i'm > comfrotable calling it any specific version of RSS, people are more likely > to get irrate about claiming ot be a specific version if one little thing > is wrong then they are about not claiming to be anything in particular. Some versions of RSS are quite incompatible, so we MUST say what version we are implementing. RSS 1.0 is completely different from the 0.9 series and 2.0. Atom doesn't have a version number, but RFC 4287 Atom is informally called 1.0. wunder
Re: svn commit: r541391 - in /lucene/solr/trunk: CHANGES.txt example/solr/conf/xslt/example_atom.xsl example/solr/conf/xslt/example_rss.xsl
: I'd slap versions to those 2 XSL files to immediately answer "which : version of Atom|RSS does this produce?" i'm comfortable calling the example_rss.xsl "RSS", since most RSS readers will know what do do with it, but i don't know that i'm comfrotable calling it any specific version of RSS, people are more likely to get irrate about claiming ot be a specific version if one little thing is wrong then they are about not claiming to be anything in particular. example.xsl says it outputs "HTML" but doesn't make any specific claims about which version of (x)html ... not asserting any particular allegience in a holy war is a good way to avoid getting slaughtered :) -Hoss
Re: svn commit: r541391 - in /lucene/solr/trunk: CHANGES.txt example/solr/conf/xslt/example_atom.xsl example/solr/conf/xslt/example_rss.xsl
I'd slap versions to those 2 XSL files to immediately answer "which version of Atom|RSS does this produce?" Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Thursday, May 24, 2007 3:06:31 PM Subject: svn commit: r541391 - in /lucene/solr/trunk: CHANGES.txt example/solr/conf/xslt/example_atom.xsl example/solr/conf/xslt/example_rss.xsl Author: hossman Date: Thu May 24 12:06:29 2007 New Revision: 541391 URL: http://svn.apache.org/viewvc?view=rev&rev=541391 Log: SOLR-208: example XSLTs for RSS and Atom Added: lucene/solr/trunk/example/solr/conf/xslt/example_atom.xsl (with props) lucene/solr/trunk/example/solr/conf/xslt/example_rss.xsl (with props) Modified: lucene/solr/trunk/CHANGES.txt Modified: lucene/solr/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene/solr/trunk/CHANGES.txt?view=diff&rev=541391&r1=541390&r2=541391 == --- lucene/solr/trunk/CHANGES.txt (original) +++ lucene/solr/trunk/CHANGES.txt Thu May 24 12:06:29 2007 @@ -197,7 +197,12 @@ 33. SOLR-234: TrimFilter can update the Token's startOffset and endOffset if updateOffsets="true". By default the Token offsets are unchanged. (ryan) - + +34. SOLR-208: new example_rss.xsl and example_atom.xsl to provide more +examples for people about the Solr XML response format and how they +can transform it to suit different needs. +(Brian Whitman via hossman) + Changes in runtime behavior 1. Highlighting using DisMax will only pick up terms from the main user query, not boost or filter queries (klaas). Added: lucene/solr/trunk/example/solr/conf/xslt/example_atom.xsl URL: http://svn.apache.org/viewvc/lucene/solr/trunk/example/solr/conf/xslt/example_atom.xsl?view=auto&rev=541391 == --- lucene/solr/trunk/example/solr/conf/xslt/example_atom.xsl (added) +++ lucene/solr/trunk/example/solr/conf/xslt/example_atom.xsl Thu May 24 12:06:29 2007 @@ -0,0 +1,63 @@ + + + + + + + + + + + + +http://www.w3.org/2005/Atom";;> + Example Solr Atom Feed + + This has been formatted by the sample "example_atom.xsl" transform - + use your own XSLT to get a nicer Atom feed. + + +Apache Solr +[EMAIL PROTECTED] + + http://localhost:8983/solr/q={$query}&wt=xslt&tr=atom.xsl"/> + + + + tag:localhost,2007:example + + + + + + + + + + http://localhost:8983/solr/select?q={$id}"/> + tag:localhost,2007: + + + + + + Propchange: lucene/solr/trunk/example/solr/conf/xslt/example_atom.xsl -- svn:eol-style = native Propchange: lucene/solr/trunk/example/solr/conf/xslt/example_atom.xsl -- svn:keywords = Date Author Id Revision HeadURL Added: lucene/solr/trunk/example/solr/conf/xslt/example_rss.xsl URL: http://svn.apache.org/viewvc/lucene/solr/trunk/example/solr/conf/xslt/example_rss.xsl?view=auto&rev=541391 == --- lucene/solr/trunk/example/solr/conf/xslt/example_rss.xsl (added) +++ lucene/solr/trunk/example/solr/conf/xslt/example_rss.xsl Thu May 24 12:06:29 2007 @@ -0,0 +1,62 @@ + + + + + + + + + + + + + Example Solr RSS Feed + http://localhost:8983/solr + + This has been formatted by the sample "example_rss.xsl" transform - + use your own XSLT to get a nicer RSS feed. + + en-us + http://localhost:8983/solr + + + + + + + + + + + + +http://localhost:8983/solr/select?q=id: + + + + + + +http://localhost:8983/solr/select?q=id: + + + + Propchange: lucene/solr/trunk/example/solr/conf/xslt/example_rss.xsl -- svn:eol-style = native Propchange: lucene/solr/trunk/example/solr/conf/xslt/example_rss.xsl -- svn:keywords = Date Author Id Revision HeadURL
[jira] Updated: (SOLR-239) Read IndexSchema from InputStream instead of Config file
[ https://issues.apache.org/jira/browse/SOLR-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-239: -- Attachment: IndexSchemaStream2.patch updated with fixed and test raw-schema.jsp and added back the IndexSchema testDynamicCopy() test. > Read IndexSchema from InputStream instead of Config file > > > Key: SOLR-239 > URL: https://issues.apache.org/jira/browse/SOLR-239 > Project: Solr > Issue Type: Improvement >Affects Versions: 1.2 > Environment: all >Reporter: Will Johnson >Priority: Minor > Fix For: 1.2 > > Attachments: IndexSchemaStream.patch, IndexSchemaStream2.patch, > IndexSchemaStream2.patch, IndexSchemaStream2.patch > > > Soon to follow patch adds a constructor to IndexSchema to allow them to be > created directly from InputStreams. The overall logic for the Core's use of > the IndexSchema creation/use does not change however this allows java clients > like those in SOLR-20 to be able to parse an IndexSchema. Once a schema is > parsed, the client can inspect an index's capabilities which is useful for > building generic search UI's. ie provide a drop down list of fields to > search/sort by. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.