[jira] Commented: (SOLR-112) Hierarchical Handler Config
[ https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466308 ] J.J. Larrea commented on SOLR-112: -- Re foo vs. /foo: I think of the SolrServlet as being just one way to invoke the request dispatcher. One could for example write a SOAP or other RPC message receiver which called a method something like handleRequest(String reqName, SolrQueryRequest req, SolrQueryResponse rsp)*1*. So I wouldn't want to bind the request invocation syntax too tightly to a URL-based mechanism for invocation. Similarly, I think of allowing slashes in request handler names as merely a convention; it could be "search.products.instock" or "search-products-instock" just as easily. Of course, it is advantageous for the handler name to be RFC1738-compliant (as those examples both are) so the pathInfo can be used to set the name, as we all like, e.g. http://localhost:8989/solr/select/search-products-instock What your suggestion comes down to, Ryan, is whether the pathInfo-parsed request adds a leading / slash to the request name, or not. If it does it forces URL syntax into the request-naming space, and while that won't particularly hurt anything I'm not sure it buys anything either... Why should a SOLR configurer need to make an explicit gesture to indicate they want to use the more "modern" pathInfo-based invocation style rather than the older qt= invocation style? And shouldn't the request handler definition be either agnostic as to the request method (GET, POST, pathINFO, qt=, SOAP, direct API call, ...) or else have access to a more comprehensive mechanism for filtering which methods they respond to? *1* (I haven't yet had a chance to catch up on the voluminous SOLR-104 discussion so this may not be the currently contemplated syntax, but hopefully my argument for potentially supporting non-URL-based invokers still holds.) > Hierarchical Handler Config > --- > > Key: SOLR-112 > URL: https://issues.apache.org/jira/browse/SOLR-112 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley >Priority: Minor > Fix For: 1.2 > > Attachments: SOLR-112.patch > > > From J.J. Larrea on SOLR-104 > 2. What would make this even more powerful would be the ability to "subclass" > (meaning refine and/or extend) request handler configs: If the requestHandler > element allowed an attribute extends="" and > chained the SolrParams, then one could do something like: >class="solr.DisMaxRequestHandler" > > > 0.01 > > text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 > > ... much more, per the "dismax" example in the sample solrconfig.xml ... > > ... and replacing the "partitioned" example ... >extends="search/products/all" > > > inStock:true > > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-80) negative filter queries
[ https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-80: - Attachment: negative_filters.patch > negative filter queries > --- > > Key: SOLR-80 > URL: https://issues.apache.org/jira/browse/SOLR-80 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Yonik Seeley > Attachments: negative_filters.patch > > > There is a need for negative filter queries to avoid long filter generation > times and large caching requirements. > Currently, if someone wants to filter out a small number of documents, they > must specify the complete set of documents to express those negative > conditions against. > q=foo&fq=id:[* TO *] -id:101 > In this example, to filter out a single document, the complete set of > documents (minus one) is generated, and a large bitset is cached. You could > also add the restriction to the main query, but that doesn't work with the > dismax handler which doesn't have a facility for this. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-80) negative filter queries
[ https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466307 ] Yonik Seeley commented on SOLR-80: -- attached draft (it doesn't work yet, and there isn't any test code). > negative filter queries > --- > > Key: SOLR-80 > URL: https://issues.apache.org/jira/browse/SOLR-80 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Yonik Seeley > Attachments: negative_filters.patch > > > There is a need for negative filter queries to avoid long filter generation > times and large caching requirements. > Currently, if someone wants to filter out a small number of documents, they > must specify the complete set of documents to express those negative > conditions against. > q=foo&fq=id:[* TO *] -id:101 > In this example, to filter out a single document, the complete set of > documents (minus one) is generated, and a large bitset is cached. You could > also add the restriction to the main query, but that doesn't work with the > dismax handler which doesn't have a facility for this. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Resolved: (SOLR-114) HashDocSet new hash(), andNot(), union()
On 1/21/07, Mike Klaas <[EMAIL PROTECTED]> wrote: On 1/20/07, Yonik Seeley (JIRA) <[EMAIL PROTECTED]> wrote: > > Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union(). Out of curiosity, what is your current plan for this? Something along the lines of storing a negated flag, which would be used to do andNot() rather than intersection() in SolrIndexSearcher.getDocSet()? I think it would be a great feature and can help out with devel or review. There are two related things, and I'm only tackling one. I'm *not *looking at a generated DocSet and then choosing to try and cache it as an inverse if it would be smaller. I am looking at queries, and determining if they are negative (no positive elements, currently matches nothing in Lucene). If they are negative, I generate and cache the positive version, and do andNot() for operations. Code is done but untested (no test code yet even). I'll add a draft for your review now. -Yonik
[jira] Commented: (SOLR-112) Hierarchical Handler Config
[ https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466304 ] J.J. Larrea commented on SOLR-112: -- I'm sure you won't like your extemperaneous suggestion (foo/baz implicitly extending foo with baz) once you get a chance to think about it, Hoss. :-) The concern of efficiently structuring request handlers in solrconfig is quite different from he concern of publishing them to the outside world. For example, mightn't one set up the equivalent of an "abstract base class" request config which has no value being invoked directly in a request, but has great value as the root of a tree of request configs which will be invoked? And similarly, shouldn't one be able to rearrange the internal configuration (e.g. refactoring) without affecting an already "published" request syntax? If it didn't break backwards compatibility, one could even consider having separate arguments defining an internal name (used for extending) and an external name (used for invoking), with either one being optional -- allowing configs which are uninvokable but extendable, and vice versa. Or for better backwards compatibility, one name could default to the other, but could be explicitly overridden (potentially to the empty string) if so desired. I am not advocating either of these approaches (simpler is perhaps better) as much as using them to illustrate the separability of the concerns. Does this make sense? > Hierarchical Handler Config > --- > > Key: SOLR-112 > URL: https://issues.apache.org/jira/browse/SOLR-112 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley >Priority: Minor > Fix For: 1.2 > > Attachments: SOLR-112.patch > > > From J.J. Larrea on SOLR-104 > 2. What would make this even more powerful would be the ability to "subclass" > (meaning refine and/or extend) request handler configs: If the requestHandler > element allowed an attribute extends="" and > chained the SolrParams, then one could do something like: >class="solr.DisMaxRequestHandler" > > > 0.01 > > text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 > > ... much more, per the "dismax" example in the sample solrconfig.xml ... > > ... and replacing the "partitioned" example ... >extends="search/products/all" > > > inStock:true > > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Resolved: (SOLR-114) HashDocSet new hash(), andNot(), union()
On 1/20/07, Yonik Seeley (JIRA) <[EMAIL PROTECTED]> wrote: > Looking at the negative filters stuff, I realized that andNot() had no optimized implementation for HashDocSet, so I implemented that and union(). Out of curiosity, what is your current plan for this? Something along the lines of storing a negated flag, which would be used to do andNot() rather than intersection() in SolrIndexSearcher.getDocSet()? I think it would be a great feature and can help out with devel or review. -Mike
[jira] Resolved: (SOLR-114) HashDocSet new hash(), andNot(), union()
[ https://issues.apache.org/jira/browse/SOLR-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-114. --- Resolution: Fixed committed. > HashDocSet new hash(), andNot(), union() > > > Key: SOLR-114 > URL: https://issues.apache.org/jira/browse/SOLR-114 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Yonik Seeley > Attachments: hashdocset.patch, test.patch > > > Looking at the negative filters stuff, I realized that andNot() had no > optimized implementation for HashDocSet, so I implemented that and union(). > While I was in there, I did a re-analysis of hash collision rates and came up > with a cool new hash method that goes directly into a linear scan and is > hence simpler, faster, and has fewer collisions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/20/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I'm on board as long as the URL structure is: : ${path/from/solr/config}?stream.type=raw actually the URL i was suggesting was... ${parser/path/from/solr/config}${handler/path/from/solr/config}?param=val ...i was trying to avoid keeping the parser name out of the query string, so we don't have to do any hack parsing of HttpServletRequest.getQueryString() to get it. We need code to do that anyway since getParameterMap() doesn't support getting params from the URL if it's a POST (I believe I tried this in the past and it didn't work). Aesthetically, having an optional parser in the queryString seems nicer than in the path. basically if you have this... Pluggable request parsers seems needlessly complex, and it gets harder to explain it all to someone new. Can't we start simple and defer anything like that until there is a real need? if they really had a reason to want to force one type of parsing, they could register it with a differnet prefix. That is a point. I'm not sure of the usecases though... it's not safe to let untrusted people update solr at all, so I don't understand prohibiting certain types of streams. * default URLs stay clean * no need for an extra "stream.type" param * urls only get ugly if people want them to get ugly because they don't want to make their clients set the mime type correctly. The first and last points are also true for a stream.type type of thing. After all, we will need other parameters for specifying local files, right? Or is opening local files up to the RequestHandler again? Anyway, I'm not too unhappy either way, as long as I can leave out any explicit "parser" and just get the right thing to happen. -Yonik
[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index
[ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466293 ] Yonik Seeley commented on SOLR-116: --- Facets are slightly different than docfreq's... one is expensive, and one is very cheap since it's pre-calculated by lucene. The disad to the lucene version is that the docfreq doesn't take deleted docs into account. If you want to page through or download *all* terms of a full-text field, the faceting code would take forever in comparison. other ideas for info: "index" : { "numDocs" : 10123, "maxDoc" : 12345, "age" : 2000, #number of milliseconds the index has been open... sort of equivalent to index freshness, but not really. "version":123425235, #index version. Actually, I think this should be in responseHeader to aid in client-side caching } I think this stuff is useful, it's just a matter of preference if it goes in the same handler or not. If this *does* go in this handler, then perhaps it should be named "indexinfo" or something. I'd be fine with this hander being only about schema too though. > StructureRequestHandler - allowing client to discover all fields in the index > - > > Key: SOLR-116 > URL: https://issues.apache.org/jira/browse/SOLR-116 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Erik Hatcher > Assigned To: Erik Hatcher >Priority: Minor > Attachments: structure_handler.patch > > > This request handler returns all fields and their type. In Ruby format > (&wt=ruby) the results, for the example index, look like this currently: > {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}} > A client wanting to introspect Solr could combine the actual fields and their > types with parsing of schema.xml to glean a lot and dynamically configure > based on what is inside an index. Should more information per field be > returned, or is simply the type name sufficient? What else is desirable for > this request handler? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index
[ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466292 ] Erik Hatcher commented on SOLR-116: --- I had thought of the Map for the field name keyed value as well. Terms and document frequencies make more sense from a facet handler, it seems, which you can already do with &qt=standard&facet=true&facet.field=fieldname&q=[* TO *] I believe. I'll add the Map level in there, and the notice, and commit soon so we can tinker with it in Flare as a way to provide a dynamic UI based on the fields in the index. > StructureRequestHandler - allowing client to discover all fields in the index > - > > Key: SOLR-116 > URL: https://issues.apache.org/jira/browse/SOLR-116 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Erik Hatcher > Assigned To: Erik Hatcher >Priority: Minor > Attachments: structure_handler.patch > > > This request handler returns all fields and their type. In Ruby format > (&wt=ruby) the results, for the example index, look like this currently: > {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}} > A client wanting to introspect Solr could combine the actual fields and their > types with parsing of schema.xml to glean a lot and dynamically configure > based on what is inside an index. Should more information per field be > returned, or is simply the type name sufficient? What else is desirable for > this request handler? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On Sat, 20 Jan 2007, Ryan McKinley wrote: : Date: Sat, 20 Jan 2007 19:17:16 -0800 : From: Ryan McKinley <[EMAIL PROTECTED]> : Reply-To: solr-dev@lucene.apache.org : To: solr-dev@lucene.apache.org : Subject: Re: Update Plugins (was Re: Handling disparate data sources in : Solr) : : > : > ...what if we bring that idea back, and let people configure it in the : > solrconfig.xml, using path like names... : > : > : > : > : > : > : > ...but don't make it a *public* interface ... make it package protected, : > or maybe even a private static interface of the Dispatch Filter .. either : > way, don't instantiate instances of it using the plugin-lib ClassLoader, : > make sure it comes from the WAR to only uses the ones provided out of hte : > box. : I'm on board as long as the URL structure is: : ${path/from/solr/config}?stream.type=raw actually the URL i was suggesting was... ${parser/path/from/solr/config}${handler/path/from/solr/config}?param=val ...i was trying to avoid keeping the parser name out of the query string, so we don't have to do any hack parsing of HttpServletRequest.getQueryString() to get it. basically if you have this... ...then these urls are all valid... http://localhost:/solr/raw/update?param=val ..uses raw post body for update http://localhost:/solr/multi/update?param=val ..uses multipart mime for update http://localhost:/solr/update?param=val ..no requestParser matched path prefix, so default is choosen and COntent-Type is used to decide where streams come from. but if instead my config looks like this... ...then these URLs would fail... http://localhost:/solr/raw/update?param=val http://localhost:/solr/multi/update?param=val ...because the empty string would match as a parser, but "/raw/update" and "/multi/update" wouldn't match as requestHandlers (the registration of "/raw" as a parser would be useless) this URL would work however... http://localhost:/solr/update?param=val ..treat all requetss as if they have multi-part mime streams ...i use this only as an example of what i'm describing ... not sa an example of soemthing we shoudl recommend. The key to all of this being that we'd check parser names against the URL prefix in order from shortest to longest, then check the rest of the path as a requestHandler ... if either of those fail, then the filter would skip the request. What we would probably recommended is that people map the "guess" request parser to "/" so that they could put in all of hte options they want on buffer sizes and such, then map their requestHandlers without a "/" prefix, and use content types correctly. if they really had a reason to want to force one type of parsing, they could register it with a differnet prefix. * default URLs stay clean * no need for an extra "stream.type" param * urls only get ugly if people want them to get ugly because they don't want to make their clients set the mime type correctly. -Hoss
[jira] Commented: (SOLR-112) Hierarchical Handler Config
[ https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466291 ] Ryan McKinley commented on SOLR-112: I think that path should be specified explicitly. I like that ... will only match /select?wt=foo and that: ... will match /foo (and /select?wt=/foo) I like the idea that somone adding the prefix '/' is an explicit gesture they want to set the URL path. (even if it overrides something else, for example /admin) > Hierarchical Handler Config > --- > > Key: SOLR-112 > URL: https://issues.apache.org/jira/browse/SOLR-112 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley >Priority: Minor > Fix For: 1.2 > > Attachments: SOLR-112.patch > > > From J.J. Larrea on SOLR-104 > 2. What would make this even more powerful would be the ability to "subclass" > (meaning refine and/or extend) request handler configs: If the requestHandler > element allowed an attribute extends="" and > chained the SolrParams, then one could do something like: >class="solr.DisMaxRequestHandler" > > > 0.01 > > text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 > > ... much more, per the "dismax" example in the sample solrconfig.xml ... > > ... and replacing the "partitioned" example ... >extends="search/products/all" > > > inStock:true > > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
...what if we bring that idea back, and let people configure it in the solrconfig.xml, using path like names... ...but don't make it a *public* interface ... make it package protected, or maybe even a private static interface of the Dispatch Filter .. either way, don't instantiate instances of it using the plugin-lib ClassLoader, make sure it comes from the WAR to only uses the ones provided out of hte box. I'm on board as long as the URL structure is: ${path/from/solr/config}?stream.type=raw and if you are missing the parameter it chooses a good option. (stream.type can change, just that the parser is configured in the query string, not he path) I like it! Also, this would give us a natural place to configure the max size etc for multi-part upload
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
(the three of us are online way to much ... for crying out loud it's a saturday night folks!) : In my opinion, I don't think we need to worry about it for the : *default* handler. That is not a very difficult constraint and, there : is no one out there expecting to be able to post parameters in the URL : and the body. I'm not sure it is worth complicating anything if this : is the only thing we are trying to avoid. you'd be suprised the number of people i've run into who expect thta to work. : I think the *default* should handle all the cases mentioned without : the client worrying about different URLs for the various methods. : : The next question is which (if any) of the explicit parsers you think : are worth including in web.xml? holy crap, i think i have a solution that will make all of us really happy... remember that idea we all really detested of a public plugin interface, configured in the solrconfig.xml that looked like this... public interface RequestParser( SolrRequest parse(HttpServletRequest req); } ...what if we bring that idea back, and let people configure it in the solrconfig.xml, using path like names... ...but don't make it a *public* interface ... make it package protected, or maybe even a private static interface of the Dispatch Filter .. either way, don't instantiate instances of it using the plugin-lib ClassLoader, make sure it comes from the WAR to only uses the ones provided out of hte box. then make the dispatcher check each URL first by seeeing if it starts with the name of any registered requestParser ... if it doesn't then use the default "UseContentTypeRequestParser" .. *then* it does what the rest of ryans current Dispatcher does, taking the rest of hte path to pick a request handler. the bueaty of this approach, is that if no tags appear in the solrconfig.xml, then the URLs look exactly like you guys want, and the request parsing / stream building semantics are exactly the same as they are today ... if/when we (or maybe just "i") write those other RequestParsers people can choose to turn them on (and change their URLs) if they want, but if they don't they can keep having the really simple URLs ... OR they could register something like this... ...and have really simple URLs, but be garunteed that they allways got their streams from raw POST bodies. This would also solve Ryans concern about allowing people to turn off fetching streams from remote URLs (or from local files, a small concern i had but hadn't mentioend yet since we had bigger fish to fry) Thoughts? -Hoss
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/20/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: > It would be: > http://${context}/${path}?stream.type=post Yes! Feels like a much more natural place to me than as part of the path of the URL. Just need to hash out meaningful param names/values? Oh, and I'm more interested in the semantics of those param/values, and not what request parser it happens to get mapped to. I'd vote for different request parsers being an implementation detail, and keeping those details (plugability) out of solrconfig.xml for now. We could always add it later, but it's a lot tougher to remove things. -Yonik
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/20/07, Ryan McKinley <[EMAIL PROTECTED]> wrote: > >- put everyone > > understands how to put something in a URL. if nothing else, think of > > putting the "parsetype" in the URL as a checksum that the RequestParaser > > can use to validate it's assumptions -- if it's not there, then it can do > > all of the intellegent things you think it should do, but if it is there > > that dictates what it should do. > > If it's optional in the args, I could be on board with that. > If its optional in the req.getQueryString() I'm in. Ignore my previous post about ${context}/multipart/asdgadsga It would be: http://${context}/${path}?stream.type=post Yes! Feels like a much more natural place to me than as part of the path of the URL. Just need to hash out meaningful param names/values? -Yonik
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
>- put everyone > understands how to put something in a URL. if nothing else, think of > putting the "parsetype" in the URL as a checksum that the RequestParaser > can use to validate it's assumptions -- if it's not there, then it can do > all of the intellegent things you think it should do, but if it is there > that dictates what it should do. If it's optional in the args, I could be on board with that. If its optional in the req.getQueryString() I'm in. Ignore my previous post about ${context}/multipart/asdgadsga It would be: http://${context}/${path}?stream.type=post
[jira] Commented: (SOLR-112) Hierarchical Handler Config
[ https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466290 ] Hoss Man commented on SOLR-112: --- random idea i had that we might consider, not sure yet if i like it yet but i wanted to throw it out there... if someone has.. ... ... ... (NOTE: foo/baz has no class or extends) could/should we assume that "foo/baz" extends "foo" since it's a prefix of the name? > Hierarchical Handler Config > --- > > Key: SOLR-112 > URL: https://issues.apache.org/jira/browse/SOLR-112 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley >Priority: Minor > Fix For: 1.2 > > Attachments: SOLR-112.patch > > > From J.J. Larrea on SOLR-104 > 2. What would make this even more powerful would be the ability to "subclass" > (meaning refine and/or extend) request handler configs: If the requestHandler > element allowed an attribute extends="" and > chained the SolrParams, then one could do something like: >class="solr.DisMaxRequestHandler" > > > 0.01 > > text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 > > ... much more, per the "dismax" example in the sample solrconfig.xml ... > > ... and replacing the "partitioned" example ... >extends="search/products/all" > > > inStock:true > > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
> consider the example you've got on your test.html page: "POST - with query > string" ... that doesn't obey the typical semantics of a POST with a query > string ... if you used the methods on HttpServletRequest to get the params > it would give you all the params it found both in the query strings *and* > in the post body. Blech. I was wondering about that. Sounds like bad form, but perhaps could be supported via something like /solr/foo?postbody=args In my opinion, I don't think we need to worry about it for the *default* handler. That is not a very difficult constraint and, there is no one out there expecting to be able to post parameters in the URL and the body. I'm not sure it is worth complicating anything if this is the only thing we are trying to avoid. I think the *default* should handle all the cases mentioned without the client worrying about different URLs for the various methods. The next question is which (if any) of the explicit parsers you think are worth including in web.xml? http://${host}/${context}/${path/from/config} (default) http://${host}/${context}/params/${path/from/config} (used getParameterMap() to fill args) http://${host}/${context}/multipart/${path/from/config} (force multipart request) http://${host}/${context}/stream/${path/from/config} (params from URL, body as stream)
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/20/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: but the HTTP Client libraries in vaious languages don't allways make it easy to set Content-type -- and even if they do that doesn't mean the person using that library knows how to use it properly - I think we have to go with common usages. We neither rely on, nor discard content-type in all cases. - When it has a charset, believe it. - When it says form-encoded, only believe it if there aren't args on the URL (because many clients like curl default to "application/x-www-form-urlencoded" for a post. - put everyone understands how to put something in a URL. if nothing else, think of putting the "parsetype" in the URL as a checksum that the RequestParaser can use to validate it's assumptions -- if it's not there, then it can do all of the intellegent things you think it should do, but if it is there that dictates what it should do. If it's optional in the args, I could be on board with that. (aren't you the one that convinced me a few years back that it was better to trust a URL then to trust HTTP Headers? ... because people understand URLs and put things in them, but they don't allways know what headers to send .. curl being the great example, it allways sends a Content-TYpe even if the user doesn't ask it to right?) Well, for the update server, we do ignore the form-data stuff, but we don't ignore the charset. : Multi-part posts will have the content-type set correctly, or it won't work. : The big use-case I see is browser file upload, and they will set it correctly. right, but my point is what if i want the multi-part POST body left alone so my RequestHandler can deal with it as a single stream -- if i set every header correctly, the "smart" parsing code will parse it -- which is why sometihng in the URL telling it *not* to parse it is important. That sounds like a pretty rare corner case. : We should not preclude wacky handlers from doing things for : themselves, calling our stuff as utility methods. how? ... if there is one and only one RequestParser which makes the SolrRequest before the RequestHandler ever sees it, and parses the post body because the content-type is multipart/mixed how can a wacky handler ever get access to the raw post body? I wasn't thinking *that* whacky :-) There are always other options, such as using your own servlet though. I don't think we should try to solve every case (the whole 80/20 thing). -Yonik
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
On 1/20/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: Ryan: this patch truely does kick ass ... we can probably simplify a lot of the Legacy stuff by leveraging your new StandardRequestBuilder -- but that can be done later. Much is already done by the looks of it. i'm stil really not liking the way there is a single SolrRequestBuilder with a big complicated build method that "guesses" what streams the user wants. But I don't need a separate URL to do GET vs POST in HTTP. It seems like having a different URL for where you put the args would be hard to explain to people. i really feel strongly that even if all the parsing logic is in the core, even if it's all in one class: a piece of the path should be used to determine where the streams come from. If there's a ? in the URL, then it's args, so that could always safetly be parsed. Perhaps a special arg, if present, could override the default method of getting input streams? consider the example you've got on your test.html page: "POST - with query string" ... that doesn't obey the typical semantics of a POST with a query string ... if you used the methods on HttpServletRequest to get the params it would give you all the params it found both in the query strings *and* in the post body. Blech. I was wondering about that. Sounds like bad form, but perhaps could be supported via something like /solr/foo?postbody=args -Yonik
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: To be clear, (with the current implementation in SOLR-104) you would : have to put this in your solrconfig.xml : : : : Notice the preceding '/'. I think this is a strong indication that : someone *wants* /select to behave distinctly. crap ... i totally misread that ... so if people have a requestHandler registered with a name that doesn't start with a slash, they can't use the new URL structure and they have to use the old one. DAMN! ... that is slick dude ... okay, i agree with you, the odds of that causing problems are pretty fucking low. I'm still hung up on this "parse" logic thing ... i really think it needs to be in the path .. or at the very least, there needs to be a way to specify it in the path to force one behavior or another, and if it's not in the path then we can guess based on the Content-Type. Putting it in a query arg would make getting it without contaminating the POST body kludgy, putting it at the start of the path doesn't work well for supporting a default if it isn't there, and putting it at the end of the PATH messes up the nice work you've done letting RequestHandlers have extra path info for encoding info they want. H... What if we did soemthing like this... /exec/handler/name:extra/path?param1=val1 /raw/handler/name:extra/path?param1=val1 /url/handler/name:extra/path?param1=val1&url=...&url=... /file/handler/name:extra/path?param1=val1&file=...&file=... where "exec" means guess based on the Content-TYpe, "raw" means use the POST body as a single stream regardless of Content-Type, etc... thoughts? -Hoss
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: I just posted a new patch on SOLR-104. I think it addresses most of : the issues we have discussed. (Its a little difficult to know as it : has been somewhat circular) I was going to reply to your points one : by one, but i think that would just make the discussion more confusing : then it already is! Ryan: this patch truely does kick ass ... we can probably simplify a lot of the Legacy stuff by leveraging your new StandardRequestBuilder -- but that can be done later. i'm stil really not liking the way there is a single SolrRequestBuilder with a big complicated build method that "guesses" what streams the user wants. i really feel strongly that even if all the parsing logic is in the core, even if it's all in one class: a piece of the path should be used to determine where the streams come from. consider the example you've got on your test.html page: "POST - with query string" ... that doesn't obey the typical semantics of a POST with a query string ... if you used the methods on HttpServletRequest to get the params it would give you all the params it found both in the query strings *and* in the post body. This is a great example of what i was talking about: if i have no intention of sending a stream, it should be possible for me to send params in both the URL and in the POST body -- but in other cases i should be able to POST some raw XML and still have params in the URL. arguable: we could look at the Content-Type of the request and make the assumption based on that -- but as i mentioned before, people don't allways set the Content-TYpe perfectly. if we used a URL fragment to determine where the streams should come from we could be a lot more confident that we know where the stream should come from -- and let the RequestHandler decide if it wants to trust the ContentType the multipart/mixed example i gave previously is another example -- your code here assumes that should be given to the RequsetHandler as multiple streams -- which is a great assumption to make for fileuploads, but which gives me no way to POST multipart/mixed mime data that i want given to the RequestHandler as a single ContentStream (so it can have access to all of hte mime headers for each part) -Hoss
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
easy thing to deal with just by scoping the URLs .. put something, ANYTHING, in front of these urls, that isn't "select" or "update" and I'll let you and Yonik decide this one. I'm fine either way, but I really don't see a problem letting people easily override URLs. I actually think it is a good thing. consider the case where a user today has this in his solrconfig... To be clear, (with the current implementation in SOLR-104) you would have to put this in your solrconfig.xml Notice the preceding '/'. I think this is a strong indication that someone *wants* /select to behave distinctly.
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: > that scares me ... not only does it rely on the client code sending the : > correct content-type : : Not really... that would perhaps be the default, but the parser (or a : handler) can make intelligent decisions about that. : : If you put the parser in the URL, then there's *that* to be messed up : by the client. but the HTTP Client libraries in vaious languages don't allways make it easy to set Content-type -- and even if they do that doesn't mean the person using that library knows how to use it properly -- put everyone understands how to put something in a URL. if nothing else, think of putting the "parsetype" in the URL as a checksum that the RequestParaser can use to validate it's assumptions -- if it's not there, then it can do all of the intellegent things you think it should do, but if it is there that dictates what it should do. (aren't you the one that convinced me a few years back that it was better to trust a URL then to trust HTTP Headers? ... because people understand URLs and put things in them, but they don't allways know what headers to send .. curl being the great example, it allways sends a Content-TYpe even if the user doesn't ask it to right?) : Multi-part posts will have the content-type set correctly, or it won't work. : The big use-case I see is browser file upload, and they will set it correctly. right, but my point is what if i want the multi-part POST body left alone so my RequestHandler can deal with it as a single stream -- if i set every header correctly, the "smart" parsing code will parse it -- which is why sometihng in the URL telling it *not* to parse it is important. : We should not preclude wacky handlers from doing things for : themselves, calling our stuff as utility methods. how? ... if there is one and only one RequestParser which makes the SolrRequest before the RequestHandler ever sees it, and parses the post body because the content-type is multipart/mixed how can a wacky handler ever get access to the raw post body? -Hoss
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
: > A user should be confident that they can pick anyname they possily want : > for their plugin, and it won't collide with any future addition we might : > add to Solr. : : But that doesn't seem possible unless we make user plugins : second-class citizens by scoping them differently. In the event there : is a collision in the future, the user could rename one of the : plugins. when it comes to URLs, our plugins currently are second class citizens -- plugin names appear in the "qt" or "wt" params -- users can pick any names they want and they are totally legal, they don't have to worry about any possibility that a name they pick will collide with a path we have mapped to a servlet. Users shouldn't have the change the names of requestHandlers juse because SOlr adds a new feature with the same name -- changing a requestHandler name could be a heavy burden for a Solr user to make depending on how many clients *they* have using that requestHandler with that name. i wouldn't make a big deal out of this if it was unavoidable -- but it is such an easy thing to deal with just by scoping the URLs .. put something, ANYTHING, in front of these urls, that isn't "select" or "update" and then put the requestHandler name and we've now protected ourself and our users. consider the case where a user today has this in his solrconfig... ..with the URL structure you guys are talking about, with the DispatchFilter matching on /* and interpreting the first part of hte path as a posisble requestHandler name, that user can't upgrade Solr because he's relying on the old "/select?qt=select" style URLs to work ... he has to change the name of his requestHandler and all of his clients, then upgrade, then change all of his clients againt to take advantage of the new URL structure (and the new features it provides for updates) -Hoss
[jira] Commented: (SOLR-104) Update Plugins
[ https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466277 ] Ryan McKinley commented on SOLR-104: I just thought of something that will make Hoss' blod curl! I KNOW it is a bad idea for things within solr-core, but it would be the cleanest/cheapest way to expose the unknown things a potential RequestHandler would want from the HttpServletRequest without changing the existing API. It goes like this: SolrRequest solrReq = (build the solr request) solrReq.getContent().put( "HttpServletRequest", req ); It would never be used by anything in core. The alternative I see is to give each handler some mechanism to tell the RequestBuilder what attributes it needs set, then have the RequestBuilder put those attributes in the context or solr params. In my opinion, that is a lot of overhead to do stuff that clearly falls outside of what solr-core should be doing. ryan > Update Plugins > -- > > Key: SOLR-104 > URL: https://issues.apache.org/jira/browse/SOLR-104 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley > Fix For: 1.2 > > Attachments: commons-fileupload-20070107.jar, commons-io-1.2.jar, > DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, > HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, > HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, > HandlerRefactoring.DRAFT.zip > > > The plugin framework should work for 'update' actions in addition to 'search' > actions. > For more discussion on this, see: > http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
I just posted a new patch on SOLR-104. I think it addresses most of the issues we have discussed. (Its a little difficult to know as it has been somewhat circular) I was going to reply to your points one by one, but i think that would just make the discussion more confusing then it already is! > (i don't trust HTTP Client code -- but for the sake > of argument let's assume all clients are perfect) what happens when a > person wants to send a mim multi-part message *AS* the raw post body -- so > the RequestHandler gets it as a single ContentStream (ie: single input > stream, mime type of multipart/mixed) ? Multi-part posts will have the content-type set correctly, or it won't work. The big use-case I see is browser file upload, and they will set it correctly. I don't see it as a big problem because we don't have to deal with legacy streams yet. No one is expecting their existing stream code to work. The only header values the SOLR-104 code relies on is 'multipart' I think that is a reasonable constraint since it has to be implemented properly for commons-file-upload to work. ryan
[jira] Commented: (SOLR-104) Update Plugins
[ https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466273 ] Ryan McKinley commented on SOLR-104: I just updated DispatchFilter.path to implement most of our discussion on solr-dev The implemented URL structure is: http://${host}:${port}/${context}/${path/defined/in/solrconfig.xml}:${optional/path/for/handler}?${params} (If there needs to be a constant between ${context} and ${path} I am ok with it, but i don't think its necessary.) If you get this running, check: http://localhost:8983/solr/test.html This is a test page that shows the various methods to get streamed content into the handler * with param stream.URL - puts the content of remote url into stream * with stream.BODY - puts the content of the parameter into a stream * multipart upload. put the fields into SolrParams and the Files into streams * POST with no query string. - uses the fields to fill SolrParams * POST with query string. - uses the post body as the ContentStream, fills SolrParams from the query string I think this covers all the normal cases. If you can think of others, let me know. I believe things that would iterate over a huge collection of streams should be implemented as a RequestHandler, not as the RequestBuilder - - - - - - - - - - - Things to note: 1) /select and /update are handled with their same old servlets. They have just been refactored to LegacyUpdateServlet etc. I *think* the example solrconfig.xml should map /update to the new framework, not the old one. This would get people who start using solr to use the new framework, but still work for people who don't map /update in their solrconfig.xml. This would also require we change the included 'post.sh' to use: URL=http://localhost:8983/solr/update?stream (so the content is read as a stream) 2) Even when /update is mapped to the legacy servlet, you can map subfolders to the new one. I included /update/commit in this patch 3) Configuration? Where should we configure enable/disable streams? max file upload size? upload temp directory? I REALLY think its a bad idea to enable stream.URL by default. Although the model is that solr sits in a private network, we know that is not always the case. It may also be good to configure a required user role to be able to stream. for example, stream.URL requires isUserInRole( 'admin' ); 4) Sending context to handlers. Some handlers will want/need additional information about the request (headers,user,remote host,path, etc). In this patch, I add 'path' to all requests. There should be a way for the handler to say what information it needs ryan > Update Plugins > -- > > Key: SOLR-104 > URL: https://issues.apache.org/jira/browse/SOLR-104 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley > Fix For: 1.2 > > Attachments: commons-fileupload-20070107.jar, commons-io-1.2.jar, > DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, > HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, > HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, > HandlerRefactoring.DRAFT.zip > > > The plugin framework should work for 'update' actions in addition to 'search' > actions. > For more discussion on this, see: > http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-104) Update Plugins
[ https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-104: --- Attachment: commons-io-1.2.jar > Update Plugins > -- > > Key: SOLR-104 > URL: https://issues.apache.org/jira/browse/SOLR-104 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley > Fix For: 1.2 > > Attachments: commons-fileupload-20070107.jar, commons-io-1.2.jar, > DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, > HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, > HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, > HandlerRefactoring.DRAFT.zip > > > The plugin framework should work for 'update' actions in addition to 'search' > actions. > For more discussion on this, see: > http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-104) Update Plugins
[ https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-104: --- Attachment: DispatchFilter.patch > Update Plugins > -- > > Key: SOLR-104 > URL: https://issues.apache.org/jira/browse/SOLR-104 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley > Fix For: 1.2 > > Attachments: commons-fileupload-20070107.jar, commons-io-1.2.jar, > DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, > HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, > HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, > HandlerRefactoring.DRAFT.zip > > > The plugin framework should work for 'update' actions in addition to 'search' > actions. > For more discussion on this, see: > http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-104) Update Plugins
[ https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-104: --- Attachment: commons-fileupload-20070107.jar > Update Plugins > -- > > Key: SOLR-104 > URL: https://issues.apache.org/jira/browse/SOLR-104 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley > Fix For: 1.2 > > Attachments: commons-fileupload-20070107.jar, DispatchFilter.patch, > DispatchFilter.patch, HandlerRefactoring-DRAFT-SRC.zip, > HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring.DRAFT.patch, > HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.zip > > > The plugin framework should work for 'update' actions in addition to 'search' > actions. > For more discussion on this, see: > http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
> > I'm not sure what "it" is in the above sentence ... i believe from the > context of the rest of hte message you are you refering to > using a ServletFilter instead of a Servlet -- i honestly have no opinion > about that either way. I thought a filter required you to open up the WAR file and change web.xml, or am I misunderstanding? If your question is do you need to edit web.xml to change the URL it will apply to, my suggestion is to may /* to the DispatchFilter and have it decide weather or not to handle the requests. With a filter, you can handle the request directly or pass it up the chain. This would allow us to have the URL structures defined by solrconfig.xml (without a need to edit web.xml) If your question is about configuring the RequestParser, Yes, you would need to edit web.xml My (our?) reasons for suggesting this are 1) I think we only have one RequestParser that will handle all normal requests. Unless you have extreemly specialized needs, this is not something you would change. 2) Since the RequestParser is tied so closely to HttpServletRequest and your desired URL structure, it seems appropriate to configure it in web.xml. A RequestParser is just a utility class for servlets/filters 3) We don't want to add RequestParser to 'core' unless it really needs to be a pluggable interface. I don't see the need for it just yet. ryan
[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index
[ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466248 ] Yonik Seeley commented on SOLR-116: --- If you want to commit early and still mess around with the parameters and response formats, one could add a 'NOTICE'=>'This interface is experimental and will be changing' to the response. As this handler returns info about the index, is this where listing of terms and docfreqs should also go? > StructureRequestHandler - allowing client to discover all fields in the index > - > > Key: SOLR-116 > URL: https://issues.apache.org/jira/browse/SOLR-116 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Erik Hatcher > Assigned To: Erik Hatcher >Priority: Minor > Attachments: structure_handler.patch > > > This request handler returns all fields and their type. In Ruby format > (&wt=ruby) the results, for the example index, look like this currently: > {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}} > A client wanting to introspect Solr could combine the actual fields and their > types with parsing of schema.xml to glean a lot and dynamically configure > based on what is inside an index. Should more information per field be > returned, or is simply the type name sufficient? What else is desirable for > this request handler? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index
[ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466247 ] Yonik Seeley commented on SOLR-116: --- Looks good, I like the fieldnames as the keys. The only change I might make is to make it extensible by returning a map as the value. Instead of: 'id'=>'string' It could be 'id'=>{type=>'string'} And then other info could optionally go in there: 'id'=>{type=>'string', multiValued=>'false', 'indexed'=>'true', 'stored'=>'true', 'defaultValue'=>'...'} Hmmm, and what are the aesthetics of the XML? string ... Not bad... > StructureRequestHandler - allowing client to discover all fields in the index > - > > Key: SOLR-116 > URL: https://issues.apache.org/jira/browse/SOLR-116 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Erik Hatcher > Assigned To: Erik Hatcher >Priority: Minor > Attachments: structure_handler.patch > > > This request handler returns all fields and their type. In Ruby format > (&wt=ruby) the results, for the example index, look like this currently: > {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}} > A client wanting to introspect Solr could combine the actual fields and their > types with parsing of schema.xml to glean a lot and dynamically configure > based on what is inside an index. Should more information per field be > returned, or is simply the type name sufficient? What else is desirable for > this request handler? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
Chris Hostetter wrote: : 1) I think it should be a ServletFilter applied to all requests that : will only process requests with a registered handler. I'm not sure what "it" is in the above sentence ... i believe from the context of the rest of hte message you are you refering to using a ServletFilter instead of a Servlet -- i honestly have no opinion about that either way. I thought a filter required you to open up the WAR file and change web.xml, or am I misunderstanding? -- Alan Burlison --
[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index
[ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466226 ] Erik Hatcher commented on SOLR-116: --- The initial example was from an older example index. From trunk, the response is this: {'responseHeader'=>{'status'=>0,'QTime'=>2},'fields'=>{'includes'=>'text','cat'=>'text_ws','alphaNameSort'=>'alphaOnlySort','id'=>'string','text'=>'text','manu_exact'=>'string','features'=>'text','price'=>'sfloat','incubationdate_dt'=>'date','timestamp'=>'date','sku'=>'textTight','name'=>'text','nameSort'=>'string','manu'=>'text','weight'=>'sfloat','inStock'=>'boolean','popularity'=>'sint'}} incubationdate_dt is a dynamic field, and thus could not be gleaned from simply reading schema.xml. > StructureRequestHandler - allowing client to discover all fields in the index > - > > Key: SOLR-116 > URL: https://issues.apache.org/jira/browse/SOLR-116 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Erik Hatcher > Assigned To: Erik Hatcher >Priority: Minor > Attachments: structure_handler.patch > > > This request handler returns all fields and their type. In Ruby format > (&wt=ruby) the results, for the example index, look like this currently: > {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}} > A client wanting to introspect Solr could combine the actual fields and their > types with parsing of schema.xml to glean a lot and dynamically configure > based on what is inside an index. Should more information per field be > returned, or is simply the type name sufficient? What else is desirable for > this request handler? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index
[ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-116: -- Attachment: structure_handler.patch > StructureRequestHandler - allowing client to discover all fields in the index > - > > Key: SOLR-116 > URL: https://issues.apache.org/jira/browse/SOLR-116 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Erik Hatcher > Assigned To: Erik Hatcher >Priority: Minor > Attachments: structure_handler.patch > > > This request handler returns all fields and their type. In Ruby format > (&wt=ruby) the results, for the example index, look like this currently: > {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}} > A client wanting to introspect Solr could combine the actual fields and their > types with parsing of schema.xml to glean a lot and dynamically configure > based on what is inside an index. Should more information per field be > returned, or is simply the type name sufficient? What else is desirable for > this request handler? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index
StructureRequestHandler - allowing client to discover all fields in the index - Key: SOLR-116 URL: https://issues.apache.org/jira/browse/SOLR-116 Project: Solr Issue Type: New Feature Components: search Reporter: Erik Hatcher Assigned To: Erik Hatcher Priority: Minor This request handler returns all fields and their type. In Ruby format (&wt=ruby) the results, for the example index, look like this currently: {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}} A client wanting to introspect Solr could combine the actual fields and their types with parsing of schema.xml to glean a lot and dynamically configure based on what is inside an index. Should more information per field be returned, or is simply the type name sufficient? What else is desirable for this request handler? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira