[jira] Assigned: (SOLR-389) RequestHandlerBase javadocs improvement

2007-10-24 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley reassigned SOLR-389:
--

Assignee: Ryan McKinley  (was: Grant Ingersoll)

Thanks Grant

Do you want me to commit this now?  Is it still In Progress  (I have not seen 
people use that status before, does it mean you are still working on it?)

 RequestHandlerBase javadocs improvement
 ---

 Key: SOLR-389
 URL: https://issues.apache.org/jira/browse/SOLR-389
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Grant Ingersoll
Assignee: Ryan McKinley
Priority: Trivial
 Attachments: SOLR-389.patch, SOLR-389.patch


 Provide more javadocs on RequestHandlerBase#.init(NamedList) method to 
 explain about the defaults, appends and invariants

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-281) Search Components (plugins)

2007-10-24 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-281:
--

Attachment: solr-281.patch

Here's a new (smaller) patch that utilizes pluggable query parsers, and
- removes DisMax specific modules since dismax specific logic is limited to 
query construction
- DisMax request handler is now just the standard handler with defType=dismax 
added as a default
- removed variable RequestBuilder class logic since it seems unnecessary... if 
two non-standard components want to communicate something, they can use the 
Context or the Response.  (any reason I'm missing why it should stay?)

Thoughts on these changes?

We need to think through all the members of ResponseBuilder carefully and 
decide what component sets/reads them in what phase (and if that makes the most 
sense).

Should ResponseBuilder have methods instead of members?  If so, that would 
allow a component to perhaps even replace the ResponseBuilder and delegate to 
the original.

How will a users custom component get configuration?  Should components be a 
full plugins with an init() for configuration?




 Search Components (plugins)
 ---

 Key: SOLR-281
 URL: https://issues.apache.org/jira/browse/SOLR-281
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley
 Attachments: SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, SOLR-281-SearchComponents.patch, 
 SOLR-281-SearchComponents.patch, solr-281.patch, solr-281.patch, 
 solr-281.patch


 A request handler with pluggable search components for things like:
   - standard
   - dismax
   - more-like-this
   - highlighting
   - field collapsing 
 For more discussion, see:
 http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#a11050274

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-392) Way to control search time, hits, and memory usage

2007-10-24 Thread Lance Norskog (JIRA)
Way to control search time, hits, and memory usage
--

 Key: SOLR-392
 URL: https://issues.apache.org/jira/browse/SOLR-392
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Lance Norskog
Priority: Minor


It would be good for end-user applications if Solr allowed searches to time 
out. It is possible now for the servlet container to throw a timeout exception. 
It would be very useful if the Solr search request timeout offered these 
features:

1) timeout: stop searching after N milliseconds and return results using only 
those hits already found
2) hit limit: stop searching after N milliseconds and return results using only 
those hits already found
3) ram limit: estimate the amount of ram used so far and stop searching at a 
given amount

In all cases it would be very useful to estimate the remaining results to any 
degree of accuracy.

Argument for estimation:
For an extreme example, Google clearly does not finish any search that is more 
than the requested return value. Instead it returns very quickly on any search 
and overestimates all searches. If the first page says there are five pages, 
the second will often say that there are four pages instead. The third page 
will say 3 out of 3. 

Argument for 'timeout' control: we've all waited too long for searches

Argument for 'hit limit' control:
I really don't need to know that I'll have 14 thousand results. I'm not going 
to view them all.

Argument for 'ram limit' control:
Over-complex queries can cause Java OutOfMemory errors, and Tomcat does not 
recover gracefully.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-392) Way to control search time, hits, and memory usage

2007-10-24 Thread Sean Timm (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537480
 ] 

Sean Timm commented on SOLR-392:


This is related to LUCENE-997 Add search timeout support to Lucene.

 Way to control search time, hits, and memory usage
 --

 Key: SOLR-392
 URL: https://issues.apache.org/jira/browse/SOLR-392
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Lance Norskog
Priority: Minor

 It would be good for end-user applications if Solr allowed searches to time 
 out. It is possible now for the servlet container to throw a timeout 
 exception. It would be very useful if the Solr search request timeout offered 
 these features:
 1) timeout: stop searching after N milliseconds and return results using only 
 those hits already found
 2) hit limit: stop searching after N milliseconds and return results using 
 only those hits already found
 3) ram limit: estimate the amount of ram used so far and stop searching at a 
 given amount
 In all cases it would be very useful to estimate the remaining results to any 
 degree of accuracy.
 Argument for estimation:
 For an extreme example, Google clearly does not finish any search that is 
 more than the requested return value. Instead it returns very quickly on any 
 search and overestimates all searches. If the first page says there are five 
 pages, the second will often say that there are four pages instead. The third 
 page will say 3 out of 3. 
 Argument for 'timeout' control: we've all waited too long for searches
 Argument for 'hit limit' control:
 I really don't need to know that I'll have 14 thousand results. I'm not going 
 to view them all.
 Argument for 'ram limit' control:
 Over-complex queries can cause Java OutOfMemory errors, and Tomcat does not 
 recover gracefully.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-391) Date type parser is overly fussy

2007-10-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537484
 ] 

Yonik Seeley commented on SOLR-391:
---

 However, when it is printed out it does not include the Z at the end.
That would be a bug...

How does one reproduce this?  I checked XML, JSON, Python, and Ruby output 
writers... they all include the Z for the DateField type.


 Date type parser is overly fussy
 

 Key: SOLR-391
 URL: https://issues.apache.org/jira/browse/SOLR-391
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.2
Reporter: Lance Norskog
Priority: Trivial

 The parser for the Solr 'date' type is overly picky. It requires an entire 
 year-month-day-T-hour-minute-second-Z string.
 However, when it is printed out it does not include the Z at the end. Thus, a 
 naive XSL script that translates the output into an XML that can be re-fed 
 with the Solr input format has to include the trailing Z to get dates to 
 parse.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: SolrConfig.Initializable

2007-10-24 Thread Chris Hostetter

Ah, yes the concern is about ... the use use of SolrCore in plugin init 
methods before the SolrCore itself is fully initialized.

: We could either add a status to the core and have each high-level core
: method check this status before running (which would also allow to have
: cores logically stopped -or marked under reload/backup/etc) or heavily
: insist that using the core before 'init' has been called would lead to
: unpredictable results.

Let me propose a completely new appraoch then ...

Currently, the usage of SolrConfig.Initializable works like this...

factory = solrConfig.newInstance(className)
if (factory instanceof SolrConfig.Initializable) {
  ((SolrConfig.Initializable)factory).init(solrConfig, args);
} else {
  log.warning(DEPRECATE);
  factory.init(args);
}

what if we eliminate the SolrConfig.Initializable interface completely, 
and introduce a new interface...

  public interface SolrCoreAware {
public void tellAboutCore(SolrCore c);
  }

...the semantics being that once a SolrCore is finished initializing 
itself, and is completely ready to be used, the last phase of 
it's initialization is to loop over all of the SolrCoreAware instances it 
knows about, and call that.tellAboutCore(this).  From a plugin's 
perspective, init(...) will still always be called before it's every asked 
to to any work, but in addition it would be garunteed that if it 
implements SolrCoreAware, tellAboutCore would be called after init(...) 
but before it's asked to do any real work.

Keeping track of all the plugins to tell about the SolrCore later would 
be trivial ... SolrConfig.newInstance could do it, and SOlrCore could get 
the info later.

HH... even better, SolrConfig could be changed to implement 
SolrCoreAware, and all of the hard work could be implmented with the 
addition below to SolrConfig (and all SolrCore has to do is call 
solrConfig.tellAboutCore(this)) ...

   private ListSolrCoreAware needToTell = new List()
   private SolrCore core = null;
   public tellAboutCore(SolrCore c) {
 core = c;
 foreach (SolrCoreAware plugin : neeedToTell) {
plugin.tellAboutCore(core);
 }
 needToTell = null;
   }
   public Object newInstance(String cname, String... subpackages) {
 Object that = super.newInstance(cname, subpackages);
 if (that instanceof SolrCoreAware) {
if (null == core) {
   needToTell.add(that);
} else {
   that.tellAboutCore(core);
}
 }
 return that;
   }

:Any reason to (re)introduce SolrCore.Initializable instead of 
:  constructors
:that can take a SolrCore as a parameter?

:  If we can manage legacy code effectively the constructor approach seems 
:  cleaner in the long run.  But i'm not sure how that would work.  Also, 

i would prefer we avoid any work towards expecting plugin constructors 
to take in a SolrCore as a param...

1) It still wouldn't address the issue of plugins attempting to use the 
core they have access to before it's fully initialized

2) interfaces and abstract classes can't enforce any contract on 
constructors, that's why we've always use default constructors and had 
init(...) methods in the APIs ... it allows for compile time checking that 
the Plugins will work (they have to go out of their way to write a plugin 
without a default constructor to break things at runtime)


-Hoss



[jira] Commented: (SOLR-388) Refactor ResponseWriters and Friends.

2007-10-24 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537505
 ] 

Hoss Man commented on SOLR-388:
---

Sorry, i'm a little behind on some things, several comments...

 Hoss. I can certainly copy/convert the results into a NamedList composed only 
 of the primitives. But it's an extra copy that's bad for the performance

you don't have to copy anything, you just have to make sure that whatever 
complex object you want to add to the response implements one of the allowed 
interfaces (Map or List probably being the easiest) so the ResponseWriter's can 
access them.

As i said before: please start a thread on the solr-user list describing what 
it is you are doing in your RequestHandler, and the community might have lots 
suggestions for achieving your goals in a performant way that can work with any 
response writer (and won't require you to hack any internals or wait on any API 
changes)

 BTW, the documentation needs to be updated to reflect that Document is 
 also a supported primitive.

e...  it's a question of perspective.  the documentation is correct in that 
response writers have never been required to support Document as a primitive 
(just DocList).  i haven't looked closely, but it wouldn't surprise me if the 
recursive nature of the writers that come with Solr can handle it okay, but 
that doesn't mean the docs for the interface are wrong ... it just means that 
example impl's do more then they have to.

 Are there any reasons why it would be bad for JSONWriter to be public, 
 especially 
 considering XMLWriter has been public for a while?

only that making a class public is essentially a one way operation; once a 
class is public it can't easily be made unpublic, nor can any of it's method 
signatures be changed .. which can be very limiting in making future 
improvements.  adding a package protected class with a dirty API can be done 
easily, because we don't have to maintain it if we decide we don't like it -- 
we control all the clients and can change it all at once.  So deciding to make 
a class public requires careful consideration as to wether or not we think the 
API of that class is dirty and if Solr is willing to stand by it for the 
foreseeable future.

 Refactor ResponseWriters and Friends.
 -

 Key: SOLR-388
 URL: https://issues.apache.org/jira/browse/SOLR-388
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.2
Reporter: Luke Lu

 When developing custom request handlers, it's often necessary to create 
 corresponding response writers that extends existing ones. In our case, we 
 want to augment the result list (more attributes other than numFound, 
 maxScore, on the fly per doc attributes that are not fields etc.) , only to 
 find JSONWriter and friends are private to the package. We could copy the 
 whole thing and modify it, but it wouldn't take advantage of recent fixes 
 like Yonik's FastWriter changes without tedious manual intervention. I hope 
 that we can can *at least* extends it and overrides writeVal() to add a new 
 result type to call writeMyType. 
 Ideally the ResponseWriter hierarchy could be rewritten to take advantage of 
 a double dispatching trick to get rid of the ugly if something is instance of 
 someclass else ... list, as it clearly doesn't scale well with number of 
 types (_n_) and depth (_d_) of the writer hierarchy, as the complexity would 
 be O(_nd_), which worse than the O(1) double dispatching mechanism. Some 
 pseudo code here:
 {code:title=SomeResponseWriter.java}
 // a list of overloaded write method
 public void write(SomeType t) {
   // implementation
 }
 {code}
 {code:title=ResponseWritable.java}
 // an interface for objects that support the scheme
 public interface ResponseWritable {
   public abstract void write(ResponseWriter writer);
 }
 {code}
 {code:title=SomeType.java}
 // Sometype needs to implement the ResponseWritable interface
 // to facilitate double dispatching
 public void write(ResponseWriter writer) {
   writer.write(this);
 }
 {code}
 So when adding a new MyType and MySomeResponseWriter, we only need to add 
 these two files without having to muck with the writeVal if-then-else list. 
 Note, you still need to use the if else list for builtin types and any types 
 that you can't modify in the write(Object) method. 
 {code:title=MyType.java}
 // implements the ResponseWritable interface
 public write(ResponseWriter writer) {
   writer.write(this);
 }
 {code}
 {code:title=MySomeResponseWriter.java}
 //  only need to implement this method
 public void write(MyType t) {
   // implementation
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.