[jira] Assigned: (SOLR-389) RequestHandlerBase javadocs improvement
[ https://issues.apache.org/jira/browse/SOLR-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley reassigned SOLR-389: -- Assignee: Ryan McKinley (was: Grant Ingersoll) Thanks Grant Do you want me to commit this now? Is it still In Progress (I have not seen people use that status before, does it mean you are still working on it?) RequestHandlerBase javadocs improvement --- Key: SOLR-389 URL: https://issues.apache.org/jira/browse/SOLR-389 Project: Solr Issue Type: Improvement Components: documentation Reporter: Grant Ingersoll Assignee: Ryan McKinley Priority: Trivial Attachments: SOLR-389.patch, SOLR-389.patch Provide more javadocs on RequestHandlerBase#.init(NamedList) method to explain about the defaults, appends and invariants -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-281) Search Components (plugins)
[ https://issues.apache.org/jira/browse/SOLR-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-281: -- Attachment: solr-281.patch Here's a new (smaller) patch that utilizes pluggable query parsers, and - removes DisMax specific modules since dismax specific logic is limited to query construction - DisMax request handler is now just the standard handler with defType=dismax added as a default - removed variable RequestBuilder class logic since it seems unnecessary... if two non-standard components want to communicate something, they can use the Context or the Response. (any reason I'm missing why it should stay?) Thoughts on these changes? We need to think through all the members of ResponseBuilder carefully and decide what component sets/reads them in what phase (and if that makes the most sense). Should ResponseBuilder have methods instead of members? If so, that would allow a component to perhaps even replace the ResponseBuilder and delegate to the original. How will a users custom component get configuration? Should components be a full plugins with an init() for configuration? Search Components (plugins) --- Key: SOLR-281 URL: https://issues.apache.org/jira/browse/SOLR-281 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Attachments: SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, solr-281.patch, solr-281.patch, solr-281.patch A request handler with pluggable search components for things like: - standard - dismax - more-like-this - highlighting - field collapsing For more discussion, see: http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#a11050274 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-392) Way to control search time, hits, and memory usage
Way to control search time, hits, and memory usage -- Key: SOLR-392 URL: https://issues.apache.org/jira/browse/SOLR-392 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Lance Norskog Priority: Minor It would be good for end-user applications if Solr allowed searches to time out. It is possible now for the servlet container to throw a timeout exception. It would be very useful if the Solr search request timeout offered these features: 1) timeout: stop searching after N milliseconds and return results using only those hits already found 2) hit limit: stop searching after N milliseconds and return results using only those hits already found 3) ram limit: estimate the amount of ram used so far and stop searching at a given amount In all cases it would be very useful to estimate the remaining results to any degree of accuracy. Argument for estimation: For an extreme example, Google clearly does not finish any search that is more than the requested return value. Instead it returns very quickly on any search and overestimates all searches. If the first page says there are five pages, the second will often say that there are four pages instead. The third page will say 3 out of 3. Argument for 'timeout' control: we've all waited too long for searches Argument for 'hit limit' control: I really don't need to know that I'll have 14 thousand results. I'm not going to view them all. Argument for 'ram limit' control: Over-complex queries can cause Java OutOfMemory errors, and Tomcat does not recover gracefully. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-392) Way to control search time, hits, and memory usage
[ https://issues.apache.org/jira/browse/SOLR-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537480 ] Sean Timm commented on SOLR-392: This is related to LUCENE-997 Add search timeout support to Lucene. Way to control search time, hits, and memory usage -- Key: SOLR-392 URL: https://issues.apache.org/jira/browse/SOLR-392 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Lance Norskog Priority: Minor It would be good for end-user applications if Solr allowed searches to time out. It is possible now for the servlet container to throw a timeout exception. It would be very useful if the Solr search request timeout offered these features: 1) timeout: stop searching after N milliseconds and return results using only those hits already found 2) hit limit: stop searching after N milliseconds and return results using only those hits already found 3) ram limit: estimate the amount of ram used so far and stop searching at a given amount In all cases it would be very useful to estimate the remaining results to any degree of accuracy. Argument for estimation: For an extreme example, Google clearly does not finish any search that is more than the requested return value. Instead it returns very quickly on any search and overestimates all searches. If the first page says there are five pages, the second will often say that there are four pages instead. The third page will say 3 out of 3. Argument for 'timeout' control: we've all waited too long for searches Argument for 'hit limit' control: I really don't need to know that I'll have 14 thousand results. I'm not going to view them all. Argument for 'ram limit' control: Over-complex queries can cause Java OutOfMemory errors, and Tomcat does not recover gracefully. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-391) Date type parser is overly fussy
[ https://issues.apache.org/jira/browse/SOLR-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537484 ] Yonik Seeley commented on SOLR-391: --- However, when it is printed out it does not include the Z at the end. That would be a bug... How does one reproduce this? I checked XML, JSON, Python, and Ruby output writers... they all include the Z for the DateField type. Date type parser is overly fussy Key: SOLR-391 URL: https://issues.apache.org/jira/browse/SOLR-391 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.2 Reporter: Lance Norskog Priority: Trivial The parser for the Solr 'date' type is overly picky. It requires an entire year-month-day-T-hour-minute-second-Z string. However, when it is printed out it does not include the Z at the end. Thus, a naive XSL script that translates the output into an XML that can be re-fed with the Solr input format has to include the trailing Z to get dates to parse. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: SolrConfig.Initializable
Ah, yes the concern is about ... the use use of SolrCore in plugin init methods before the SolrCore itself is fully initialized. : We could either add a status to the core and have each high-level core : method check this status before running (which would also allow to have : cores logically stopped -or marked under reload/backup/etc) or heavily : insist that using the core before 'init' has been called would lead to : unpredictable results. Let me propose a completely new appraoch then ... Currently, the usage of SolrConfig.Initializable works like this... factory = solrConfig.newInstance(className) if (factory instanceof SolrConfig.Initializable) { ((SolrConfig.Initializable)factory).init(solrConfig, args); } else { log.warning(DEPRECATE); factory.init(args); } what if we eliminate the SolrConfig.Initializable interface completely, and introduce a new interface... public interface SolrCoreAware { public void tellAboutCore(SolrCore c); } ...the semantics being that once a SolrCore is finished initializing itself, and is completely ready to be used, the last phase of it's initialization is to loop over all of the SolrCoreAware instances it knows about, and call that.tellAboutCore(this). From a plugin's perspective, init(...) will still always be called before it's every asked to to any work, but in addition it would be garunteed that if it implements SolrCoreAware, tellAboutCore would be called after init(...) but before it's asked to do any real work. Keeping track of all the plugins to tell about the SolrCore later would be trivial ... SolrConfig.newInstance could do it, and SOlrCore could get the info later. HH... even better, SolrConfig could be changed to implement SolrCoreAware, and all of the hard work could be implmented with the addition below to SolrConfig (and all SolrCore has to do is call solrConfig.tellAboutCore(this)) ... private ListSolrCoreAware needToTell = new List() private SolrCore core = null; public tellAboutCore(SolrCore c) { core = c; foreach (SolrCoreAware plugin : neeedToTell) { plugin.tellAboutCore(core); } needToTell = null; } public Object newInstance(String cname, String... subpackages) { Object that = super.newInstance(cname, subpackages); if (that instanceof SolrCoreAware) { if (null == core) { needToTell.add(that); } else { that.tellAboutCore(core); } } return that; } :Any reason to (re)introduce SolrCore.Initializable instead of : constructors :that can take a SolrCore as a parameter? : If we can manage legacy code effectively the constructor approach seems : cleaner in the long run. But i'm not sure how that would work. Also, i would prefer we avoid any work towards expecting plugin constructors to take in a SolrCore as a param... 1) It still wouldn't address the issue of plugins attempting to use the core they have access to before it's fully initialized 2) interfaces and abstract classes can't enforce any contract on constructors, that's why we've always use default constructors and had init(...) methods in the APIs ... it allows for compile time checking that the Plugins will work (they have to go out of their way to write a plugin without a default constructor to break things at runtime) -Hoss
[jira] Commented: (SOLR-388) Refactor ResponseWriters and Friends.
[ https://issues.apache.org/jira/browse/SOLR-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537505 ] Hoss Man commented on SOLR-388: --- Sorry, i'm a little behind on some things, several comments... Hoss. I can certainly copy/convert the results into a NamedList composed only of the primitives. But it's an extra copy that's bad for the performance you don't have to copy anything, you just have to make sure that whatever complex object you want to add to the response implements one of the allowed interfaces (Map or List probably being the easiest) so the ResponseWriter's can access them. As i said before: please start a thread on the solr-user list describing what it is you are doing in your RequestHandler, and the community might have lots suggestions for achieving your goals in a performant way that can work with any response writer (and won't require you to hack any internals or wait on any API changes) BTW, the documentation needs to be updated to reflect that Document is also a supported primitive. e... it's a question of perspective. the documentation is correct in that response writers have never been required to support Document as a primitive (just DocList). i haven't looked closely, but it wouldn't surprise me if the recursive nature of the writers that come with Solr can handle it okay, but that doesn't mean the docs for the interface are wrong ... it just means that example impl's do more then they have to. Are there any reasons why it would be bad for JSONWriter to be public, especially considering XMLWriter has been public for a while? only that making a class public is essentially a one way operation; once a class is public it can't easily be made unpublic, nor can any of it's method signatures be changed .. which can be very limiting in making future improvements. adding a package protected class with a dirty API can be done easily, because we don't have to maintain it if we decide we don't like it -- we control all the clients and can change it all at once. So deciding to make a class public requires careful consideration as to wether or not we think the API of that class is dirty and if Solr is willing to stand by it for the foreseeable future. Refactor ResponseWriters and Friends. - Key: SOLR-388 URL: https://issues.apache.org/jira/browse/SOLR-388 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.2 Reporter: Luke Lu When developing custom request handlers, it's often necessary to create corresponding response writers that extends existing ones. In our case, we want to augment the result list (more attributes other than numFound, maxScore, on the fly per doc attributes that are not fields etc.) , only to find JSONWriter and friends are private to the package. We could copy the whole thing and modify it, but it wouldn't take advantage of recent fixes like Yonik's FastWriter changes without tedious manual intervention. I hope that we can can *at least* extends it and overrides writeVal() to add a new result type to call writeMyType. Ideally the ResponseWriter hierarchy could be rewritten to take advantage of a double dispatching trick to get rid of the ugly if something is instance of someclass else ... list, as it clearly doesn't scale well with number of types (_n_) and depth (_d_) of the writer hierarchy, as the complexity would be O(_nd_), which worse than the O(1) double dispatching mechanism. Some pseudo code here: {code:title=SomeResponseWriter.java} // a list of overloaded write method public void write(SomeType t) { // implementation } {code} {code:title=ResponseWritable.java} // an interface for objects that support the scheme public interface ResponseWritable { public abstract void write(ResponseWriter writer); } {code} {code:title=SomeType.java} // Sometype needs to implement the ResponseWritable interface // to facilitate double dispatching public void write(ResponseWriter writer) { writer.write(this); } {code} So when adding a new MyType and MySomeResponseWriter, we only need to add these two files without having to muck with the writeVal if-then-else list. Note, you still need to use the if else list for builtin types and any types that you can't modify in the write(Object) method. {code:title=MyType.java} // implements the ResponseWritable interface public write(ResponseWriter writer) { writer.write(this); } {code} {code:title=MySomeResponseWriter.java} // only need to implement this method public void write(MyType t) { // implementation } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.