[jira] Commented: (SOLR-112) Hierarchical Handler Config

2007-01-20 Thread J.J. Larrea (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466308
 ] 

J.J. Larrea commented on SOLR-112:
--

Re foo vs. /foo:

I think of the SolrServlet as being just one way to invoke the request 
dispatcher.  One could for example write a SOAP or other RPC message receiver 
which called a method something like handleRequest(String reqName, 
SolrQueryRequest req, SolrQueryResponse rsp)*1*.  So I wouldn't want to bind 
the request invocation syntax too tightly to a URL-based mechanism for 
invocation.

Similarly, I think of allowing slashes in request handler names as merely a 
convention; it could be "search.products.instock" or "search-products-instock" 
just as easily.  Of course, it is advantageous for the handler name to be 
RFC1738-compliant (as those examples both are) so the pathInfo can be used to 
set the name, as we all like, e.g. 
http://localhost:8989/solr/select/search-products-instock

What your suggestion comes down to, Ryan, is whether the pathInfo-parsed 
request adds a leading / slash to the request name, or not.  If it does it 
forces URL syntax into the request-naming space, and while that won't 
particularly hurt anything I'm not sure it buys anything either...  Why should 
a SOLR configurer need to make an explicit gesture to indicate they want to use 
the more "modern" pathInfo-based invocation style rather than the older qt= 
invocation style?  And shouldn't the request handler definition be either 
agnostic as to the request method (GET, POST, pathINFO, qt=, SOAP, direct API 
call, ...) or else have access to a more comprehensive mechanism for filtering 
which methods they respond to? 

*1* (I haven't yet had a chance to catch up on the voluminous SOLR-104 
discussion so this may not be the currently contemplated syntax, but hopefully 
my argument for potentially supporting non-URL-based invokers still holds.)

> Hierarchical Handler Config
> ---
>
> Key: SOLR-112
> URL: https://issues.apache.org/jira/browse/SOLR-112
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
>Priority: Minor
> Fix For: 1.2
>
> Attachments: SOLR-112.patch
>
>
> From J.J. Larrea on SOLR-104
> 2. What would make this even more powerful would be the ability to "subclass" 
> (meaning refine and/or extend) request handler configs: If the requestHandler 
> element allowed an attribute extends="" and 
> chained the SolrParams, then one could do something like:
>class="solr.DisMaxRequestHandler" >
> 
>  0.01
>  
> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>  
>  ... much more, per the "dismax" example in the sample solrconfig.xml ...
>   
>   ... and replacing the "partitioned" example ...
>extends="search/products/all" >
> 
>   inStock:true
> 
>   

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-80) negative filter queries

2007-01-20 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-80:
-

Attachment: negative_filters.patch

> negative filter queries
> ---
>
> Key: SOLR-80
> URL: https://issues.apache.org/jira/browse/SOLR-80
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Yonik Seeley
> Attachments: negative_filters.patch
>
>
> There is a need for negative filter queries to avoid long filter generation 
> times and large caching requirements.
> Currently, if someone wants to filter out a small number of documents, they 
> must specify the complete set of documents to express those negative 
> conditions against.  
> q=foo&fq=id:[* TO *] -id:101
> In this example, to filter out a single document, the complete set of 
> documents (minus one) is generated, and a large bitset is cached.  You could 
> also add the restriction to the main query, but that doesn't work with the 
> dismax handler which doesn't have a facility for this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-80) negative filter queries

2007-01-20 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466307
 ] 

Yonik Seeley commented on SOLR-80:
--

attached draft (it doesn't work yet, and there isn't any test code).

> negative filter queries
> ---
>
> Key: SOLR-80
> URL: https://issues.apache.org/jira/browse/SOLR-80
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Yonik Seeley
> Attachments: negative_filters.patch
>
>
> There is a need for negative filter queries to avoid long filter generation 
> times and large caching requirements.
> Currently, if someone wants to filter out a small number of documents, they 
> must specify the complete set of documents to express those negative 
> conditions against.  
> q=foo&fq=id:[* TO *] -id:101
> In this example, to filter out a single document, the complete set of 
> documents (minus one) is generated, and a large bitset is cached.  You could 
> also add the restriction to the main query, but that doesn't work with the 
> dismax handler which doesn't have a facility for this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] Resolved: (SOLR-114) HashDocSet new hash(), andNot(), union()

2007-01-20 Thread Yonik Seeley

On 1/21/07, Mike Klaas <[EMAIL PROTECTED]> wrote:

On 1/20/07, Yonik Seeley (JIRA) <[EMAIL PROTECTED]> wrote:

> > Looking at the negative filters stuff, I realized that andNot() had no 
optimized implementation for HashDocSet, so I implemented that and union().

Out of curiosity, what is your current plan for this?  Something along
the lines of storing a negated flag, which would be used to do
andNot() rather than intersection() in SolrIndexSearcher.getDocSet()?

I think it would be a great feature and can help out with devel or review.


There are two related things, and I'm only tackling one.  I'm *not
*looking at a generated DocSet and then choosing to try and cache it
as an inverse if it would be smaller.

I am looking at queries, and determining if they are negative (no
positive elements, currently matches nothing in Lucene).  If they are
negative, I generate and cache the positive version, and do andNot()
for operations.

Code is done but untested (no test code yet even).  I'll add a draft
for your review now.

-Yonik


[jira] Commented: (SOLR-112) Hierarchical Handler Config

2007-01-20 Thread J.J. Larrea (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466304
 ] 

J.J. Larrea commented on SOLR-112:
--

I'm sure you won't like your extemperaneous suggestion (foo/baz implicitly 
extending foo with baz) once you get a chance to think about it, Hoss. :-)

The concern of efficiently structuring request handlers in solrconfig is quite 
different from he concern of publishing them to the outside world.  For 
example, mightn't one set up the equivalent of an "abstract base class" request 
config which has no value being invoked directly in a request, but has great 
value as the root of a tree of request configs which will be invoked?  And 
similarly, shouldn't one be able to rearrange the internal configuration (e.g. 
refactoring) without affecting an already "published" request syntax?

If it didn't break backwards compatibility, one could even consider having 
separate arguments defining an internal name (used for extending) and an 
external name (used for invoking), with either one being optional -- allowing 
configs which are uninvokable but extendable, and vice versa.  Or for better 
backwards compatibility, one name could default to the other, but could be 
explicitly overridden (potentially to the empty string) if so desired.  I am 
not advocating either of these approaches (simpler is perhaps better) as much 
as using them to illustrate the separability of the concerns.

Does this make sense?

> Hierarchical Handler Config
> ---
>
> Key: SOLR-112
> URL: https://issues.apache.org/jira/browse/SOLR-112
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
>Priority: Minor
> Fix For: 1.2
>
> Attachments: SOLR-112.patch
>
>
> From J.J. Larrea on SOLR-104
> 2. What would make this even more powerful would be the ability to "subclass" 
> (meaning refine and/or extend) request handler configs: If the requestHandler 
> element allowed an attribute extends="" and 
> chained the SolrParams, then one could do something like:
>class="solr.DisMaxRequestHandler" >
> 
>  0.01
>  
> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>  
>  ... much more, per the "dismax" example in the sample solrconfig.xml ...
>   
>   ... and replacing the "partitioned" example ...
>extends="search/products/all" >
> 
>   inStock:true
> 
>   

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] Resolved: (SOLR-114) HashDocSet new hash(), andNot(), union()

2007-01-20 Thread Mike Klaas

On 1/20/07, Yonik Seeley (JIRA) <[EMAIL PROTECTED]> wrote:


> Looking at the negative filters stuff, I realized that andNot() had no 
optimized implementation for HashDocSet, so I implemented that and union().


Out of curiosity, what is your current plan for this?  Something along
the lines of storing a negated flag, which would be used to do
andNot() rather than intersection() in SolrIndexSearcher.getDocSet()?

I think it would be a great feature and can help out with devel or review.

-Mike


[jira] Resolved: (SOLR-114) HashDocSet new hash(), andNot(), union()

2007-01-20 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-114.
---

Resolution: Fixed

committed.

> HashDocSet new hash(), andNot(), union()
> 
>
> Key: SOLR-114
> URL: https://issues.apache.org/jira/browse/SOLR-114
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Yonik Seeley
> Attachments: hashdocset.patch, test.patch
>
>
> Looking at the negative filters stuff, I realized that andNot() had no 
> optimized implementation for HashDocSet, so I implemented that and union().
> While I was in there, I did a re-analysis of hash collision rates and came up 
> with a cool new hash method that goes directly into a linear scan and is 
> hence simpler, faster, and has fewer collisions.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Yonik Seeley

On 1/20/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

: I'm on board as long as the URL structure is:
:   ${path/from/solr/config}?stream.type=raw

actually the URL i was suggesting was...

${parser/path/from/solr/config}${handler/path/from/solr/config}?param=val

...i was trying to avoid keeping the parser name out of the query string,
so we don't have to do any hack parsing of
HttpServletRequest.getQueryString() to get it.


We need code to do that anyway since getParameterMap() doesn't support
getting params from the URL if it's a POST (I believe I tried this in
the past and it didn't work).

Aesthetically, having an optional parser in the queryString seems
nicer than in the path.


basically if you have this...

  
  
  


Pluggable request parsers seems needlessly complex, and it gets harder
to explain it all to someone new.
Can't we start simple and defer anything like that until there is a real need?


if they really had a reason to want to force one type of parsing, they
could register it with a differnet prefix.


That is a point.  I'm not sure of the usecases though... it's not safe
to let untrusted people update solr at all, so I don't understand
prohibiting certain types of streams.


  * default URLs stay clean
  * no need for an extra "stream.type" param
  * urls only get ugly if people want them to get ugly because they don't
want to make their clients set the mime type correctly.


The first and last points are also true for a stream.type type of thing.
After all, we will need other parameters for specifying local files,
right?  Or is opening local files up to the RequestHandler again?

Anyway, I'm not too unhappy either way, as long as I can leave out any
explicit "parser" and just get the right thing to happen.

-Yonik


[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

2007-01-20 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466293
 ] 

Yonik Seeley commented on SOLR-116:
---

Facets are slightly different than docfreq's... one is expensive, and one is 
very cheap since it's pre-calculated by lucene.
The disad to the lucene version is that the docfreq doesn't take deleted docs 
into account.

If you want to page through or download *all* terms of a full-text field, the 
faceting code would take forever in comparison.

other ideas for info:

"index" : {
  "numDocs" : 10123,
  "maxDoc" : 12345,
  "age" : 2000,  #number of milliseconds the index has been open... sort of 
equivalent to index freshness, but not really.
  "version":123425235,  #index version.  Actually, I think this should be in 
responseHeader to aid in client-side caching
}

I think this stuff is useful, it's just a matter  of preference if it goes in 
the same handler or not.
If this *does* go in this handler, then perhaps it should be named "indexinfo" 
or something.  I'd be fine with this hander being only about schema too though.

> StructureRequestHandler - allowing client to discover all fields in the index
> -
>
> Key: SOLR-116
> URL: https://issues.apache.org/jira/browse/SOLR-116
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Erik Hatcher
> Assigned To: Erik Hatcher
>Priority: Minor
> Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format 
> (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their 
> types with parsing of schema.xml to glean a lot and dynamically configure 
> based on what is inside an index.  Should more information per field be 
> returned, or is simply the type name sufficient?   What else is desirable for 
> this request handler?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

2007-01-20 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466292
 ] 

Erik Hatcher commented on SOLR-116:
---

I had thought of the Map for the field name keyed value as well.  

Terms and document frequencies make more sense from a facet handler, it seems, 
which you can already do with 
&qt=standard&facet=true&facet.field=fieldname&q=[* TO *] I believe.

I'll add the Map level in there, and the notice, and commit soon so we can 
tinker with it in Flare as a way to provide a dynamic UI based on the fields in 
the index.

> StructureRequestHandler - allowing client to discover all fields in the index
> -
>
> Key: SOLR-116
> URL: https://issues.apache.org/jira/browse/SOLR-116
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Erik Hatcher
> Assigned To: Erik Hatcher
>Priority: Minor
> Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format 
> (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their 
> types with parsing of schema.xml to glean a lot and dynamically configure 
> based on what is inside an index.  Should more information per field be 
> returned, or is simply the type name sufficient?   What else is desirable for 
> this request handler?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Chris Hostetter
On Sat, 20 Jan 2007, Ryan McKinley wrote:

: Date: Sat, 20 Jan 2007 19:17:16 -0800
: From: Ryan McKinley <[EMAIL PROTECTED]>
: Reply-To: solr-dev@lucene.apache.org
: To: solr-dev@lucene.apache.org
: Subject: Re: Update Plugins (was Re: Handling disparate data sources in
: Solr)
:
: >
: > ...what if we bring that idea back, and let people configure it in the
: > solrconfig.xml, using path like names...
: >
: >   
: >   
: >   
: >   
: >
: > ...but don't make it a *public* interface ... make it package protected,
: > or maybe even a private static interface of the Dispatch Filter .. either
: > way, don't instantiate instances of it using the plugin-lib ClassLoader,
: > make sure it comes from the WAR to only uses the ones provided out of hte
: > box.


: I'm on board as long as the URL structure is:
:   ${path/from/solr/config}?stream.type=raw

actually the URL i was suggesting was...

${parser/path/from/solr/config}${handler/path/from/solr/config}?param=val

...i was trying to avoid keeping the parser name out of the query string,
so we don't have to do any hack parsing of
HttpServletRequest.getQueryString() to get it.

basically if you have this...

  
  
  

  
  
  

...then these urls are all valid...

   http://localhost:/solr/raw/update?param=val
  ..uses raw post body for update
   http://localhost:/solr/multi/update?param=val
  ..uses multipart mime for update
   http://localhost:/solr/update?param=val
  ..no requestParser matched path prefix, so default is choosen and
COntent-Type is used to decide where streams come from.

but if instead my config looks like this...

  
  

  
  
  

...then these URLs would fail...

   http://localhost:/solr/raw/update?param=val
   http://localhost:/solr/multi/update?param=val

...because the empty string would match as a parser, but "/raw/update"
and "/multi/update" wouldn't match as requestHandlers (the registration of
"/raw" as a parser would be useless)

this URL would work however...

   http://localhost:/solr/update?param=val
  ..treat all requetss as if they have multi-part mime streams

...i use this only as an example of what i'm describing ... not sa an
example of soemthing we shoudl recommend.

The key to all of this being that we'd check parser names against the URL
prefix in order from shortest to longest, then check the rest of the path
as a requestHandler ... if either of those fail, then the filter would
skip the request.

What we would probably recommended is that people map the "guess" request
parser to "/" so that they could put in all of hte options they want on
buffer sizes and such, then map their requestHandlers without a "/"
prefix, and use content types correctly.

if they really had a reason to want to force one type of parsing, they
could register it with a differnet prefix.

  * default URLs stay clean
  * no need for an extra "stream.type" param
  * urls only get ugly if people want them to get ugly because they don't
want to make their clients set the mime type correctly.




-Hoss



[jira] Commented: (SOLR-112) Hierarchical Handler Config

2007-01-20 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466291
 ] 

Ryan McKinley commented on SOLR-112:


I think that path should be specified explicitly.

I like that
  ...  

will only match /select?wt=foo

and that:
  ...  

will match
 /foo  (and /select?wt=/foo)

I like the idea that somone adding the prefix '/' is an explicit gesture they 
want to set the URL path. (even if it overrides something else, for example 
/admin)




> Hierarchical Handler Config
> ---
>
> Key: SOLR-112
> URL: https://issues.apache.org/jira/browse/SOLR-112
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
>Priority: Minor
> Fix For: 1.2
>
> Attachments: SOLR-112.patch
>
>
> From J.J. Larrea on SOLR-104
> 2. What would make this even more powerful would be the ability to "subclass" 
> (meaning refine and/or extend) request handler configs: If the requestHandler 
> element allowed an attribute extends="" and 
> chained the SolrParams, then one could do something like:
>class="solr.DisMaxRequestHandler" >
> 
>  0.01
>  
> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>  
>  ... much more, per the "dismax" example in the sample solrconfig.xml ...
>   
>   ... and replacing the "partitioned" example ...
>extends="search/products/all" >
> 
>   inStock:true
> 
>   

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Ryan McKinley


...what if we bring that idea back, and let people configure it in the
solrconfig.xml, using path like names...

  
  
  
  

...but don't make it a *public* interface ... make it package protected,
or maybe even a private static interface of the Dispatch Filter .. either
way, don't instantiate instances of it using the plugin-lib ClassLoader,
make sure it comes from the WAR to only uses the ones provided out of hte
box.



I'm on board as long as the URL structure is:
 ${path/from/solr/config}?stream.type=raw

and if you are missing the parameter it chooses a good option.

(stream.type can change, just that the parser is configured in the
query string, not he path)

I like it!


Also, this would give us a natural place to configure the max size etc
for multi-part upload


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Chris Hostetter

(the three of us are online way to much ... for crying out loud it's a
saturday night folks!)

: In my opinion, I don't think we need to worry about it for the
: *default* handler.  That is not a very difficult constraint and, there
: is no one out there expecting to be able to post parameters in the URL
: and the body.  I'm not sure it is worth complicating anything if this
: is the only thing we are trying to avoid.

you'd be suprised the number of people i've run into who expect thta to
work.

: I think the *default* should handle all the cases mentioned without
: the client worrying about different URLs  for the various methods.
:
: The next question is which (if any) of the explicit parsers you think
: are worth including in web.xml?

holy crap, i think i have a solution that will make all of us really
happy...

remember that idea we all really detested of a public plugin interface,
configured in the solrconfig.xml that looked like this...

 public interface RequestParser(
SolrRequest parse(HttpServletRequest req);
 }

...what if we bring that idea back, and let people configure it in the
solrconfig.xml, using path like names...

  
  
  
  

...but don't make it a *public* interface ... make it package protected,
or maybe even a private static interface of the Dispatch Filter .. either
way, don't instantiate instances of it using the plugin-lib ClassLoader,
make sure it comes from the WAR to only uses the ones provided out of hte
box.

then make the dispatcher check each URL first by seeeing if it starts with
the name of any registered requestParser ... if it doesn't then use the
default "UseContentTypeRequestParser" .. *then* it does what the rest of
ryans current Dispatcher does, taking the rest of hte path to pick a
request handler.

the bueaty of this approach, is that if no  tags appear in
the solrconfig.xml, then the URLs look exactly like you guys want, and the
request parsing / stream building semantics are exactly the same as they
are today ... if/when we (or maybe just "i") write those other
RequestParsers people can choose to turn them on (and change their URLs)
if they want, but if they don't they can keep having the really simple
URLs ... OR they could register something like this...

  

...and have really simple URLs, but be garunteed that they allways got
their streams from raw POST bodies.

This would also solve Ryans concern about allowing people to turn off
fetching streams from remote URLs (or from local files, a small concern i
had but hadn't mentioend yet since we had bigger fish to fry)



Thoughts?


-Hoss



Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Yonik Seeley

On 1/20/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> It would be:
> http://${context}/${path}?stream.type=post

Yes!
Feels like a much more natural place to me than as part of the path of the URL.
Just need to hash out meaningful param names/values?


Oh, and I'm more interested in the semantics of those param/values,
and not what request parser it happens to get mapped to.  I'd vote for
different request parsers being an implementation detail, and keeping
those details (plugability) out of solrconfig.xml for now.

We could always add it later, but it's a lot tougher to remove things.

-Yonik


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Yonik Seeley

On 1/20/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:

> >- put everyone
> > understands how to put something in a URL.  if nothing else, think of
> > putting the "parsetype" in the URL as a checksum that the RequestParaser
> > can use to validate it's assumptions -- if it's not there, then it can do
> > all of the intellegent things you think it should do, but if it is there
> > that dictates what it should do.
>
> If it's optional in the args, I could be on board with that.
>

If its optional in the req.getQueryString() I'm in.

Ignore my previous post about
${context}/multipart/asdgadsga

It would be:
http://${context}/${path}?stream.type=post


Yes!
Feels like a much more natural place to me than as part of the path of the URL.
Just need to hash out meaningful param names/values?

-Yonik


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Ryan McKinley


>- put everyone
> understands how to put something in a URL.  if nothing else, think of
> putting the "parsetype" in the URL as a checksum that the RequestParaser
> can use to validate it's assumptions -- if it's not there, then it can do
> all of the intellegent things you think it should do, but if it is there
> that dictates what it should do.

If it's optional in the args, I could be on board with that.



If its optional in the req.getQueryString() I'm in.

Ignore my previous post about
${context}/multipart/asdgadsga

It would be:
http://${context}/${path}?stream.type=post


[jira] Commented: (SOLR-112) Hierarchical Handler Config

2007-01-20 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466290
 ] 

Hoss Man commented on SOLR-112:
---

random idea i had that we might consider, not sure yet if i like it yet but i 
wanted to throw it out there...

if someone has..

   ... 
   ... 
   ... 

(NOTE: foo/baz has no class or extends) 

could/should we assume that "foo/baz" extends "foo" since it's a prefix of the 
name?


> Hierarchical Handler Config
> ---
>
> Key: SOLR-112
> URL: https://issues.apache.org/jira/browse/SOLR-112
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
>Priority: Minor
> Fix For: 1.2
>
> Attachments: SOLR-112.patch
>
>
> From J.J. Larrea on SOLR-104
> 2. What would make this even more powerful would be the ability to "subclass" 
> (meaning refine and/or extend) request handler configs: If the requestHandler 
> element allowed an attribute extends="" and 
> chained the SolrParams, then one could do something like:
>class="solr.DisMaxRequestHandler" >
> 
>  0.01
>  
> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>  
>  ... much more, per the "dismax" example in the sample solrconfig.xml ...
>   
>   ... and replacing the "partitioned" example ...
>extends="search/products/all" >
> 
>   inStock:true
> 
>   

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Ryan McKinley


> consider the example you've got on your test.html page: "POST - with query
> string" ... that doesn't obey the typical semantics of a POST with a query
> string ... if you used the methods on HttpServletRequest to get the params
> it would give you all the params it found both in the query strings *and*
> in the post body.

Blech.  I was wondering about that.  Sounds like bad form, but perhaps could be
supported via something like
/solr/foo?postbody=args



In my opinion, I don't think we need to worry about it for the
*default* handler.  That is not a very difficult constraint and, there
is no one out there expecting to be able to post parameters in the URL
and the body.  I'm not sure it is worth complicating anything if this
is the only thing we are trying to avoid.

I think the *default* should handle all the cases mentioned without
the client worrying about different URLs  for the various methods.

The next question is which (if any) of the explicit parsers you think
are worth including in web.xml?

http://${host}/${context}/${path/from/config}  (default)
http://${host}/${context}/params/${path/from/config} (used
getParameterMap() to fill args)
http://${host}/${context}/multipart/${path/from/config} (force
multipart request)
http://${host}/${context}/stream/${path/from/config} (params from URL,
body as stream)


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Yonik Seeley

On 1/20/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

but the HTTP Client libraries in vaious languages don't allways make it
easy to set Content-type -- and even if they do that doesn't mean the
person using that library knows how to use it properly -


I think we have to go with common usages.  We neither rely on, nor
discard content-type in all cases.
- When it has a charset, believe it.
- When it says form-encoded, only believe it if there aren't args on
the URL (because many clients like curl default to
"application/x-www-form-urlencoded" for a post.


- put everyone
understands how to put something in a URL.  if nothing else, think of
putting the "parsetype" in the URL as a checksum that the RequestParaser
can use to validate it's assumptions -- if it's not there, then it can do
all of the intellegent things you think it should do, but if it is there
that dictates what it should do.


If it's optional in the args, I could be on board with that.


(aren't you the one that convinced me a few years back that it was better
to trust a URL then to trust HTTP Headers? ... because people understand
URLs and put things in them, but they don't allways know what headers to
send .. curl being the great example, it allways sends a Content-TYpe even
if the user doesn't ask it to right?)


Well, for the update server, we do ignore the form-data stuff, but we
don't ignore the charset.


: Multi-part posts will have the content-type set correctly, or it won't work.
: The big use-case I see is browser file upload, and they will set it correctly.

right, but my point is what if i want the multi-part POST body left alone
so my RequestHandler can deal with it as a single stream -- if i set
every header correctly, the "smart" parsing code will parse it -- which is
why sometihng in the URL telling it *not* to parse it is important.


That sounds like a pretty rare corner case.


: We should not preclude wacky handlers from doing things for
: themselves, calling our stuff as utility methods.

how? ... if there is one and only one RequestParser which makes the
SolrRequest before the RequestHandler ever sees it, and parses the post
body because the content-type is multipart/mixed how can a  wacky
handler ever get access to the raw post body?


I wasn't thinking *that* whacky :-)
There are always other options, such as using your own servlet though.
I don't think we should try to solve every case (the whole 80/20
thing).

-Yonik


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Yonik Seeley

On 1/20/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

Ryan: this patch truely does kick ass ... we can probably simplify a lot
of the Legacy stuff by leveraging your new StandardRequestBuilder -- but
that can be done later.


Much is already done by the looks of it.


i'm stil really not liking the way there is a single SolrRequestBuilder
with a big complicated build method that "guesses" what streams the user
wants.


But I don't need a separate URL to do GET vs POST in HTTP.
It seems like having a different URL for where you put the args would
be hard to explain to people.


  i really feel strongly that even if all the parsing logic is in
the core, even if it's all in one class: a piece of the path should be
used to determine where the streams come from.


If there's a ? in the URL, then it's args, so that could always
safetly  be parsed.  Perhaps a special arg, if present, could override
the default method of getting input streams?


consider the example you've got on your test.html page: "POST - with query
string" ... that doesn't obey the typical semantics of a POST with a query
string ... if you used the methods on HttpServletRequest to get the params
it would give you all the params it found both in the query strings *and*
in the post body.


Blech.  I was wondering about that.  Sounds like bad form, but perhaps could be
supported via something like
/solr/foo?postbody=args

-Yonik


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Chris Hostetter

: To be clear, (with the current implementation in SOLR-104) you would
: have to put this in your solrconfig.xml
:
: 
:
: Notice the preceding '/'.  I think this is a strong indication that
: someone *wants* /select to behave distinctly.

crap ... i totally misread that ... so if people have a requestHandler
registered with a name that doesn't start with a slash, they can't use the
new URL structure and they have to use the old one.

DAMN! ... that is slick dude ... okay, i agree with you, the odds of that
causing problems are pretty fucking low.

I'm still hung up on this "parse" logic thing ... i really think it needs
to be in the path .. or at the very least, there needs to be a way to
specify it in the path to force one behavior or another, and if it's not
in the path then we can guess based on the Content-Type.

Putting it in a query arg would make getting it without contaminating the
POST body kludgy, putting it at the start of the path doesn't work well
for supporting a default if it isn't there, and putting it at the end of
the PATH messes up the nice work you've done letting RequestHandlers have
extra path info for encoding info they want.

H...

What if we did soemthing like this...

   /exec/handler/name:extra/path?param1=val1
   /raw/handler/name:extra/path?param1=val1
   /url/handler/name:extra/path?param1=val1&url=...&url=...
   /file/handler/name:extra/path?param1=val1&file=...&file=...

where "exec" means guess based on the Content-TYpe, "raw" means use the
POST body as a single stream regardless of Content-Type, etc...

thoughts?


-Hoss



Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Chris Hostetter

: I just posted a new patch on SOLR-104.  I think it addresses most of
: the issues we have discussed.  (Its a little difficult to know as it
: has been somewhat circular)   I was going to reply to your points one
: by one, but i think that would just make the discussion more confusing
: then it already is!

Ryan: this patch truely does kick ass ... we can probably simplify a lot
of the Legacy stuff by leveraging your new StandardRequestBuilder -- but
that can be done later.

i'm stil really not liking the way there is a single SolrRequestBuilder
with a big complicated build method that "guesses" what streams the user
wants.   i really feel strongly that even if all the parsing logic is in
the core, even if it's all in one class: a piece of the path should be
used to determine where the streams come from.

consider the example you've got on your test.html page: "POST - with query
string" ... that doesn't obey the typical semantics of a POST with a query
string ... if you used the methods on HttpServletRequest to get the params
it would give you all the params it found both in the query strings *and*
in the post body.

This is a great example of what i was talking about: if i have no
intention of sending a stream, it should be possible for me to send params
in both the URL and in the POST body -- but in other cases i should be
able to POST some raw XML and still have params in the URL.

arguable: we could look at the Content-Type of the request and make the
assumption based on that -- but as i mentioned before, people don't
allways set the Content-TYpe perfectly.  if we used a URL fragment to
determine where the streams should come from we could be a lot more
confident that we know where the stream should come from -- and let the
RequestHandler decide if it wants to trust the ContentType

the multipart/mixed example i gave previously is another example -- your
code here assumes that should be given to the RequsetHandler as multiple
streams -- which is a great assumption to make for fileuploads, but which
gives me no way to POST multipart/mixed mime data that i want given to the
RequestHandler as a single ContentStream (so it can have access to all of
hte mime headers for each part)



-Hoss



Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Ryan McKinley


easy thing to deal with just by scoping the URLs .. put something,
ANYTHING, in front of these urls, that isn't "select" or "update" and


I'll let you and Yonik decide this one.  I'm fine either way, but I
really don't see a problem letting people easily override URLs.  I
actually think it is a good thing.




consider the case where a user today has this in his solrconfig...

  



To be clear, (with the current implementation in SOLR-104) you would
have to put this in your solrconfig.xml



Notice the preceding '/'.  I think this is a strong indication that
someone *wants* /select to behave distinctly.


Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Chris Hostetter

: > that scares me ... not only does it rely on the client code sending the
: > correct content-type
:
: Not really... that would perhaps be the default, but the parser (or a
: handler) can make intelligent decisions about that.
:
: If you put the parser in the URL, then there's *that* to be messed up
: by the client.

but the HTTP Client libraries in vaious languages don't allways make it
easy to set Content-type -- and even if they do that doesn't mean the
person using that library knows how to use it properly -- put everyone
understands how to put something in a URL.  if nothing else, think of
putting the "parsetype" in the URL as a checksum that the RequestParaser
can use to validate it's assumptions -- if it's not there, then it can do
all of the intellegent things you think it should do, but if it is there
that dictates what it should do.

(aren't you the one that convinced me a few years back that it was better
to trust a URL then to trust HTTP Headers? ... because people understand
URLs and put things in them, but they don't allways know what headers to
send .. curl being the great example, it allways sends a Content-TYpe even
if the user doesn't ask it to right?)

: Multi-part posts will have the content-type set correctly, or it won't work.
: The big use-case I see is browser file upload, and they will set it correctly.

right, but my point is what if i want the multi-part POST body left alone
so my RequestHandler can deal with it as a single stream -- if i set
every header correctly, the "smart" parsing code will parse it -- which is
why sometihng in the URL telling it *not* to parse it is important.

: We should not preclude wacky handlers from doing things for
: themselves, calling our stuff as utility methods.

how? ... if there is one and only one RequestParser which makes the
SolrRequest before the RequestHandler ever sees it, and parses the post
body because the content-type is multipart/mixed how can a  wacky
handler ever get access to the raw post body?



-Hoss



Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Chris Hostetter

: > A user should be confident that they can pick anyname they possily want
: > for their plugin, and it won't collide with any future addition we might
: > add to Solr.
:
: But that doesn't seem possible unless we make user plugins
: second-class citizens by scoping them differently.  In the event there
: is a collision in the future, the user could rename one of the
: plugins.

when it comes to URLs, our plugins currently are second class citizens --
plugin names appear in the "qt" or "wt" params -- users can pick any names
they want and they are totally legal, they don't have to worry about any
possibility that a name they pick will collide with a path we have mapped
to a servlet.

Users shouldn't have the change the names of requestHandlers juse because
SOlr adds a new feature with the same name -- changing a requestHandler
name could be a heavy burden for a Solr user to make depending on how many
clients *they* have using that requestHandler with that name.  i wouldn't
make a big deal out of this if it was unavoidable -- but it is such an
easy thing to deal with just by scoping the URLs .. put something,
ANYTHING, in front of these urls, that isn't "select" or "update" and
then put the requestHandler name and we've now protected ourself and our
users.

consider the case where a user today has this in his solrconfig...

  

..with the URL structure you guys are talking about, with the
DispatchFilter matching on /* and interpreting the first part of hte path
as a posisble requestHandler name, that user can't upgrade Solr
because he's relying on the old "/select?qt=select" style URLs to
work ... he has to change the name of his requestHandler and all of his
clients, then upgrade, then change all of his clients againt to take
advantage of the new URL structure (and the new features it provides for
updates)



-Hoss



[jira] Commented: (SOLR-104) Update Plugins

2007-01-20 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466277
 ] 

Ryan McKinley commented on SOLR-104:



I just thought of something that will make Hoss' blod curl!  I KNOW it
is a bad idea for things within solr-core, but it would be the
cleanest/cheapest way to expose the unknown things a potential
RequestHandler would want from the HttpServletRequest without changing
the existing API.  It goes like this:

  SolrRequest solrReq = (build the solr request)
  solrReq.getContent().put( "HttpServletRequest", req );

It would never be used by anything in core.

The alternative I see is to give each handler some mechanism to tell
the RequestBuilder what attributes it needs set, then have the
RequestBuilder put those attributes in the context or solr params.  In
my opinion, that is a lot of overhead to do stuff that clearly falls
outside of what solr-core should be doing.

ryan


> Update Plugins
> --
>
> Key: SOLR-104
> URL: https://issues.apache.org/jira/browse/SOLR-104
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
> Fix For: 1.2
>
> Attachments: commons-fileupload-20070107.jar, commons-io-1.2.jar, 
> DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, 
> HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, 
> HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, 
> HandlerRefactoring.DRAFT.zip
>
>
> The plugin framework should work for 'update' actions in addition to 'search' 
> actions.
> For more discussion on this, see:
> http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Ryan McKinley

I just posted a new patch on SOLR-104.  I think it addresses most of
the issues we have discussed.  (Its a little difficult to know as it
has been somewhat circular)   I was going to reply to your points one
by one, but i think that would just make the discussion more confusing
then it already is!



> (i don't trust HTTP Client code -- but for the sake
> of argument let's assume all clients are perfect) what happens when a
> person wants to send a mim multi-part message *AS* the raw post body -- so
> the RequestHandler gets it as a single ContentStream (ie: single input
> stream, mime type of multipart/mixed) ?

Multi-part posts will have the content-type set correctly, or it won't work.
The big use-case I see is browser file upload, and they will set it correctly.



I don't see it as a big problem because we don't have to deal with
legacy streams yet.  No one is expecting their existing stream code to
work.  The only header values the SOLR-104 code relies on is
'multipart'  I think that is a reasonable constraint since it has to
be implemented properly for commons-file-upload to work.

ryan


[jira] Commented: (SOLR-104) Update Plugins

2007-01-20 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466273
 ] 

Ryan McKinley commented on SOLR-104:


I just updated DispatchFilter.path to implement most of our discussion on 
solr-dev

The implemented URL structure is:
http://${host}:${port}/${context}/${path/defined/in/solrconfig.xml}:${optional/path/for/handler}?${params}

(If there needs to be a constant between ${context}  and  ${path} I am ok with 
it, but i don't think its necessary.)

If you get this running, check:
http://localhost:8983/solr/test.html

This is a test page that shows the various methods to get streamed content into 
the handler
* with param stream.URL - puts the content of remote url into stream
* with stream.BODY - puts the content of the parameter into a stream
* multipart upload.  put the fields into SolrParams and the Files into streams
* POST with no query string.  - uses the fields to fill SolrParams
* POST with query string.  - uses the post body as the ContentStream, fills 
SolrParams from the query string

I think this covers all the normal cases.  If you can think of others, let me 
know.  I believe things that would iterate over a huge collection of streams 
should be implemented as a RequestHandler, not as the RequestBuilder

- - - - - - - - - - -

Things to note:

1) /select and /update are handled with their same old servlets.  They have 
just been refactored to LegacyUpdateServlet etc.  I *think* the example 
solrconfig.xml should map /update to the new framework, not the old one.  This 
would get people who start using solr to use the new framework, but still work 
for people who don't map /update in their solrconfig.xml.  This would also 
require we change the included 'post.sh' to use: 
URL=http://localhost:8983/solr/update?stream  (so the content is read as a 
stream)

2) Even when /update is mapped to the legacy servlet, you can map subfolders to 
the new one.  I included /update/commit in this patch

3) Configuration?  Where should we configure enable/disable streams?  max file 
upload size?  upload temp directory?  I REALLY think its a bad idea to enable 
stream.URL by default.  Although the model is that solr sits in a private 
network, we know that is not always the case.  It may also be good to configure 
a required user role to be able to stream.  for example, stream.URL requires 
isUserInRole( 'admin' );

4) Sending context to handlers.  Some handlers will want/need additional 
information about the request (headers,user,remote host,path, etc).  In this 
patch, I add 'path' to all requests.  There should be a way for the handler to 
say what information it needs


ryan


> Update Plugins
> --
>
> Key: SOLR-104
> URL: https://issues.apache.org/jira/browse/SOLR-104
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
> Fix For: 1.2
>
> Attachments: commons-fileupload-20070107.jar, commons-io-1.2.jar, 
> DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, 
> HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, 
> HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, 
> HandlerRefactoring.DRAFT.zip
>
>
> The plugin framework should work for 'update' actions in addition to 'search' 
> actions.
> For more discussion on this, see:
> http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-104) Update Plugins

2007-01-20 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-104:
---

Attachment: commons-io-1.2.jar

> Update Plugins
> --
>
> Key: SOLR-104
> URL: https://issues.apache.org/jira/browse/SOLR-104
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
> Fix For: 1.2
>
> Attachments: commons-fileupload-20070107.jar, commons-io-1.2.jar, 
> DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, 
> HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, 
> HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, 
> HandlerRefactoring.DRAFT.zip
>
>
> The plugin framework should work for 'update' actions in addition to 'search' 
> actions.
> For more discussion on this, see:
> http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-104) Update Plugins

2007-01-20 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-104:
---

Attachment: DispatchFilter.patch

> Update Plugins
> --
>
> Key: SOLR-104
> URL: https://issues.apache.org/jira/browse/SOLR-104
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
> Fix For: 1.2
>
> Attachments: commons-fileupload-20070107.jar, commons-io-1.2.jar, 
> DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, 
> HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, 
> HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, 
> HandlerRefactoring.DRAFT.zip
>
>
> The plugin framework should work for 'update' actions in addition to 'search' 
> actions.
> For more discussion on this, see:
> http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-104) Update Plugins

2007-01-20 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-104:
---

Attachment: commons-fileupload-20070107.jar

> Update Plugins
> --
>
> Key: SOLR-104
> URL: https://issues.apache.org/jira/browse/SOLR-104
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
> Fix For: 1.2
>
> Attachments: commons-fileupload-20070107.jar, DispatchFilter.patch, 
> DispatchFilter.patch, HandlerRefactoring-DRAFT-SRC.zip, 
> HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring.DRAFT.patch, 
> HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.zip
>
>
> The plugin framework should work for 'update' actions in addition to 'search' 
> actions.
> For more discussion on this, see:
> http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Ryan McKinley

>
> I'm not sure what "it" is in the above sentence ... i believe from the
> context of the rest of hte message you are you refering to
> using a ServletFilter instead of a Servlet -- i honestly have no opinion
> about that either way.

I thought a filter required you to open up the WAR file and change
web.xml, or am I misunderstanding?



If your question is do you need to edit web.xml to change the URL it
will apply to, my suggestion is to may /* to the DispatchFilter and
have it decide weather or not to handle the requests.  With a filter,
you can handle the request directly or pass it up the chain.  This
would allow us to have the URL structures defined by solrconfig.xml
(without a need to edit web.xml)

If your question is about configuring the RequestParser,  Yes, you
would need to edit web.xml

My (our?) reasons for suggesting this are
1) I think we only have one RequestParser that will handle all normal
requests.  Unless you have extreemly specialized needs, this is not
something you would change.
2) Since the RequestParser is tied so closely to HttpServletRequest
and your desired URL structure, it seems appropriate to configure it
in web.xml.  A RequestParser is just a utility class for
servlets/filters
3) We don't want to add RequestParser to 'core' unless it really needs
to be a pluggable interface.  I don't see the need for it just yet.

ryan


[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

2007-01-20 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466248
 ] 

Yonik Seeley commented on SOLR-116:
---

If you want to commit early and still mess around with the parameters and 
response formats,
one could add a 'NOTICE'=>'This interface is experimental and will be changing'
to the response.

As this handler returns info about the index, is this where listing of terms 
and docfreqs should also go?

> StructureRequestHandler - allowing client to discover all fields in the index
> -
>
> Key: SOLR-116
> URL: https://issues.apache.org/jira/browse/SOLR-116
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Erik Hatcher
> Assigned To: Erik Hatcher
>Priority: Minor
> Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format 
> (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their 
> types with parsing of schema.xml to glean a lot and dynamically configure 
> based on what is inside an index.  Should more information per field be 
> returned, or is simply the type name sufficient?   What else is desirable for 
> this request handler?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

2007-01-20 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466247
 ] 

Yonik Seeley commented on SOLR-116:
---

Looks good, I like the fieldnames as the keys.  The only change I might make is 
to make it extensible by returning a map as the value.

Instead of:
  'id'=>'string'
It could be
  'id'=>{type=>'string'}

And then other info could optionally go in there:
  'id'=>{type=>'string', multiValued=>'false', 'indexed'=>'true', 
'stored'=>'true', 'defaultValue'=>'...'}

Hmmm, and what are the aesthetics of the XML?


  string  
  ...

Not bad...
 

> StructureRequestHandler - allowing client to discover all fields in the index
> -
>
> Key: SOLR-116
> URL: https://issues.apache.org/jira/browse/SOLR-116
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Erik Hatcher
> Assigned To: Erik Hatcher
>Priority: Minor
> Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format 
> (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their 
> types with parsing of schema.xml to glean a lot and dynamically configure 
> based on what is inside an index.  Should more information per field be 
> returned, or is simply the type name sufficient?   What else is desirable for 
> this request handler?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Update Plugins (was Re: Handling disparate data sources in Solr)

2007-01-20 Thread Alan Burlison

Chris Hostetter wrote:


: 1) I think it should be a ServletFilter applied to all requests that
: will only process requests with a registered handler.

I'm not sure what "it" is in the above sentence ... i believe from the
context of the rest of hte message you are you refering to
using a ServletFilter instead of a Servlet -- i honestly have no opinion
about that either way.


I thought a filter required you to open up the WAR file and change 
web.xml, or am I misunderstanding?


--
Alan Burlison
--


[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

2007-01-20 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466226
 ] 

Erik Hatcher commented on SOLR-116:
---

The initial example was from an older example index.  From trunk, the response 
is this:

{'responseHeader'=>{'status'=>0,'QTime'=>2},'fields'=>{'includes'=>'text','cat'=>'text_ws','alphaNameSort'=>'alphaOnlySort','id'=>'string','text'=>'text','manu_exact'=>'string','features'=>'text','price'=>'sfloat','incubationdate_dt'=>'date','timestamp'=>'date','sku'=>'textTight','name'=>'text','nameSort'=>'string','manu'=>'text','weight'=>'sfloat','inStock'=>'boolean','popularity'=>'sint'}}

incubationdate_dt is a dynamic field, and thus could not be gleaned from simply 
reading schema.xml.

> StructureRequestHandler - allowing client to discover all fields in the index
> -
>
> Key: SOLR-116
> URL: https://issues.apache.org/jira/browse/SOLR-116
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Erik Hatcher
> Assigned To: Erik Hatcher
>Priority: Minor
> Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format 
> (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their 
> types with parsing of schema.xml to glean a lot and dynamically configure 
> based on what is inside an index.  Should more information per field be 
> returned, or is simply the type name sufficient?   What else is desirable for 
> this request handler?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

2007-01-20 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-116:
--

Attachment: structure_handler.patch

> StructureRequestHandler - allowing client to discover all fields in the index
> -
>
> Key: SOLR-116
> URL: https://issues.apache.org/jira/browse/SOLR-116
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Erik Hatcher
> Assigned To: Erik Hatcher
>Priority: Minor
> Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format 
> (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their 
> types with parsing of schema.xml to glean a lot and dynamically configure 
> based on what is inside an index.  Should more information per field be 
> returned, or is simply the type name sufficient?   What else is desirable for 
> this request handler?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

2007-01-20 Thread Erik Hatcher (JIRA)
StructureRequestHandler - allowing client to discover all fields in the index
-

 Key: SOLR-116
 URL: https://issues.apache.org/jira/browse/SOLR-116
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Erik Hatcher
 Assigned To: Erik Hatcher
Priority: Minor


This request handler returns all fields and their type.  In Ruby format 
(&wt=ruby) the results, for the example index, look like this currently:

{'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}

A client wanting to introspect Solr could combine the actual fields and their 
types with parsing of schema.xml to glean a lot and dynamically configure based 
on what is inside an index.  Should more information per field be returned, or 
is simply the type name sufficient?   What else is desirable for this request 
handler?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira