[jira] Commented: (SOLR-43) query parameter overhaul
[ http://issues.apache.org/jira/browse/SOLR-43?page=comments#action_12430086 ] Yonik Seeley commented on SOLR-43: -- Committed current version. Left to do off the top of my head: - deprecate methods dealing with params in PluginUtils - change use of deprecated methods (including dismax handler) - dismax handler: were to get defaults from solrconfig.xml... the base level, or "defaults". If the latter, provide some backward compat for existing configs? Highlighter stuff: - allow specification of markup - allow fragsize per-field - keep in mind recent highlighter work going on in Lucene... we should try and specify what instead of how (not use exact class names, etc) - start using "hl" namespace for highlighter params... this is just a convention to help clarify the semantics of a parameter at a glance. - for consistency, should "highlight" => "hl", "highlightFields" => "hl.fields" or "hl.fl", "maxSnippets" => "hl.snippets"? Normally backward compatibility is very important for the external interfaces, *but* things will change while a feature is in development... every commit does not constitute a release. Is highlighting new enough that we can change these parameters? Is anyone using these parameters in production where it would be a burden if we changed these? Examples of potential highlighter param names: hl=true hl.fl=name,title,body hl.snippets=4 hl.fragsize=100 hl.formatter=simple hl.simple.pre= hl.simple.post= And per field params: f.title.hl.fragsize=0 // overrides fragsize only for field 'title' > query parameter overhaul > > > Key: SOLR-43 > URL: http://issues.apache.org/jira/browse/SOLR-43 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > Attachments: solrparams.patch, solrparams.patch > > > Goals: > - per field parameters that fall back to global values > - defaults in solrconfig.xml per request handler, overridable per > This is desirable for highlighting additions: > http://issues.apache.org/jira/browse/SOLR-37 > last email thread: > http://www.nabble.com/parameter-defaults-and-config-tf2020863.html#a5556298 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Commented: (SOLR-43) query parameter overhaul
JIRA didn't send my comments to the list for some reason (or I just never received it). I'll cc here: Committed current version. Left to do off the top of my head: - deprecate methods dealing with params in PluginUtils - change use of deprecated methods (including dismax handler) - dismax handler: were to get defaults from solrconfig.xml... the base level, or "defaults". If the latter, provide some backward compat for existing configs? Highlighter stuff: - allow specification of markup - allow fragsize per-field - keep in mind recent highlighter work going on in Lucene... we should try and specify what instead of how (not use exact class names, etc) - start using "hl" namespace for highlighter params... this is just a convention to help clarify the semantics of a parameter at a glance. - for consistency, should "highlight" => "hl", "highlightFields" => "hl.fields" or "hl.fl", "maxSnippets" => "hl.snippets"? Normally backward compatibility is very important for the external interfaces, *but* things will change while a feature is in development... every commit does not constitute a release. Is highlighting new enough that we can change these parameters? Is anyone using these parameters in production where it would be a burden if we changed these? Examples of potential highlighter param names: hl=true hl.fl=name,title,body hl.snippets=4 hl.fragsize=100 hl.formatter=simple hl.simple.pre= hl.simple.post= And per field params: f.title.hl.fragsize=0 // overrides fragsize only for field 'title'
Re: new wiki software
: Check out Geronimo's new Wiki it only looks 10 times better than moin-moin. : : http://cwiki.apache.org/geronimo/ >From a look and feel persepctive i've been using the "classic" theme for moin-moin -- it seems just as nice as the look/feel of the geronimo wiki ... but that software (Confluence) does seem to have some nice features... Tree view of all pages... http://cwiki.apache.org/confluence/pages/listpages-dirview.action?key=GMOxDOC11 Detailed Page Info... http://cwiki.apache.org/confluence/pages/pageinfo.action?pageId=4902 -Hoss
Re: Re: Re: Solr and UIMA?
On 8/23/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: We (Solr devs) never discussed it... A quick gmail search shows it's been brought up on the Lucene and Nutch lists FYI, an UIMA proposal just landed at the incubator: http://tinyurl.com/m3taj -Bertrand
new wiki software
Check out Geronimo's new Wiki it only looks 10 times better than moin-moin. http://cwiki.apache.org/geronimo/ -Yonik
Re: Re: Solr and UIMA?
We (Solr devs) never discussed it... A quick gmail search shows it's been brought up on the Lucene and Nutch lists. -Yonik On 8/23/06, Yoav Shapira <[EMAIL PROTECTED]> wrote: Hi, I thought we discussed this already, mostly concluding UIMA was an IBM-proprietary bear that's not only far from a standard at this point, but not that promising and therefore not worth pursuing. But it could be that we didn't actually have that discussion on this mailing list: I may have had it in private with a couple of friends who use Solr instead. Does anyone else remember discussing it here, perhaps among the committers before we had the public solr-dev mailing list? Yoav On 8/23/06, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote: > On 8/23/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > What exactly is the UIMA standard? I didn't see a standard > > mentioned at the UIMA site... > > I don't know if "standard" is the correct word, but [1] mentions an > IBM product that "exposes the UIMA interfaces", so there must be an > API of some kind. But it's not too easy to gather from that website, > exactly what this API is ;-( > > From [2] it seems like one of the main goals is to allow analysis > engines to be plugged in on the way to indexation, to add metadata to > what they call "Common Analysis Structure" objects. That page also > links to a (364 pages long...) SDK Users Guide and Reference, [3]. > > -Bertrand > > [1] http://www.research.ibm.com/UIMA/ > [2] http://www.research.ibm.com/UIMA/UIMA%20Architecture%20Highlights.html > [3] http://dl.alphaworks.ibm.com/technologies/uima/UIMA_SDK_Users_Guide_Reference.pdf >
Re: [jira] Commented: (SOLR-43) query parameter overhaul
On 8/23/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: + // TODO: We have some constants in SolrQueryRequestBase, and some in CommonParams... ...i vote for pulling them out of SolrQueryRequestBase ... i would go so far as to recommend deprecating the named accessors like getQueryString, getLimit, and getStart -- the only params that should be treated special are "qt" and "wt". As for where they should live, ... I'm thinking CommonParams is headed the way of the Dodo, SolrParams seems like the right place to me. I think SolrParams is fine, I'll move them there unless someone has a better idea. 3) in this method, would it be better to let getOriginalParams() return null? .. as is information (the lack of params when the request was constructed) can be lost if setParams is called more then once... + public void setParams(SolrParams params) { +if (this.origParams==null) this.origParams=params; +this.params = params; + } The original lack of params wasn't meaningful before (see LocalSolrQuery constructor which called the superclass constructor first (as it must) and then created the parameters and called setParams()). I can refactor out the parameter construction into a static method so it can be passed to the superclass constructor though. -Yonik
Re: [jira] Commented: (SOLR-43) query parameter overhaul
Mike Klaas commented on SOLR-43: One thing I believe that was lost in this patch is static (source-level) defaults for parameters. Presumably these would be defined using another level of defaul parameters which is a static member of CommonParams or somesuch? I assume you mean the default for things like "start" or "fl" if no handler defaults are defined? That would be one way to handle it... but most defaults are null anyway I think. And it's not too bad just having some defaults hardcoded like params.getInt(CommonParams.START, 0); If we did this, one might want to merge the global defaults with the handler defaults to eliminate the additional level of lookup. Also, should SolrParams.parseBool() perhaps do a case-insensitive test? I'm not opposed... is it needed though? -Yonik
Re: Re: Solr and UIMA?
Hi, I thought we discussed this already, mostly concluding UIMA was an IBM-proprietary bear that's not only far from a standard at this point, but not that promising and therefore not worth pursuing. But it could be that we didn't actually have that discussion on this mailing list: I may have had it in private with a couple of friends who use Solr instead. Does anyone else remember discussing it here, perhaps among the committers before we had the public solr-dev mailing list? Yoav On 8/23/06, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote: On 8/23/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: > What exactly is the UIMA standard? I didn't see a standard > mentioned at the UIMA site... I don't know if "standard" is the correct word, but [1] mentions an IBM product that "exposes the UIMA interfaces", so there must be an API of some kind. But it's not too easy to gather from that website, exactly what this API is ;-( From [2] it seems like one of the main goals is to allow analysis engines to be plugged in on the way to indexation, to add metadata to what they call "Common Analysis Structure" objects. That page also links to a (364 pages long...) SDK Users Guide and Reference, [3]. -Bertrand [1] http://www.research.ibm.com/UIMA/ [2] http://www.research.ibm.com/UIMA/UIMA%20Architecture%20Highlights.html [3] http://dl.alphaworks.ibm.com/technologies/uima/UIMA_SDK_Users_Guide_Reference.pdf
Re: Re: Solr and UIMA?
On 8/23/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: What exactly is the UIMA standard? I didn't see a standard mentioned at the UIMA site... I don't know if "standard" is the correct word, but [1] mentions an IBM product that "exposes the UIMA interfaces", so there must be an API of some kind. But it's not too easy to gather from that website, exactly what this API is ;-( From [2] it seems like one of the main goals is to allow analysis engines to be plugged in on the way to indexation, to add metadata to what they call "Common Analysis Structure" objects. That page also links to a (364 pages long...) SDK Users Guide and Reference, [3]. -Bertrand [1] http://www.research.ibm.com/UIMA/ [2] http://www.research.ibm.com/UIMA/UIMA%20Architecture%20Highlights.html [3] http://dl.alphaworks.ibm.com/technologies/uima/UIMA_SDK_Users_Guide_Reference.pdf
Re: Solr and UIMA?
What exactly is the UIMA standard? I didn't see a standard mentioned at the UIMA site. Erik On Aug 23, 2006, at 4:40 AM, Bertrand Delacretaz wrote: Hi, In the comments of my article at xml.com [1], someone's asking whether Solr supports the upcoming UIMA standard [2]. I was going to answer "not at this time", but if someone has additional information about UIMA in relation to Solr or Lucene, it is welcome. -Bertrand [1] http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with- lucene-andrest.html?page=3 [2] http://www.research.ibm.com/UIMA/
Solr and UIMA?
Hi, In the comments of my article at xml.com [1], someone's asking whether Solr supports the upcoming UIMA standard [2]. I was going to answer "not at this time", but if someone has additional information about UIMA in relation to Solr or Lucene, it is welcome. -Bertrand [1] http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.html?page=3 [2] http://www.research.ibm.com/UIMA/
Re: making schema.xml nicer to read/use
: - if no factory can be found, an attempt will be made to construct : one dynamically (easiest would be to create a generic factory that : works via reflection). People could use simple filters w/o creating a : factory for it. I think i mentioned this before ... my opinion depends on what the performance impacts are -- if reflection costs are "high" because of class resolution, but instantiation times are roughly the same, then i'm for it because we can resolve the Class once at startup; but if the performance differnece is still significant, i vote vote we force people who want to mix and match custom Filters/Tokenizers to write Factories for them -- it doesn't penalyze people who have custom Analyzers, those don't require Factories, but if you want to mix and match you should be able to whip up a two line factory ... hell, we can provide some code to do it automatically (and run it on the Lucene jar everytime we update it) what i'd really hate to see happen, is to need a FAQ item about how "slow" Solr is at indexing docs and have the answer be: "Don't rely on the built in reflection mechanism to build you analyzers, create explicit Factories for each Tokenizer/Filter" ... I'd hate for Solr's to have reflection based Analyzer construction that winds up like the Lucene "Hits" class -- overused and the source of countless complaints about performance. Ah yes, here's the old discussion... http://www.nabble.com/foo-tf1737025.html#a4720545 -Hoss
Re: [jira] Commented: (SOLR-43) query parameter overhaul
: Attached refresh. I've been too busy the past two days to play with this -- or even read it carefully, but here's some impressions as i skim it... 1) This looks like an idiom in the making ... it might be worth while to go ahead and refactor into a static utility now (ie: "SolrParams p = SolrParams.wrapDefaults(req,defaults)") ... + SolrParams p = req.getParams(); + if (defaults != null) { +p = new DefaultSolrParams(p,defaults); +// set params so they will be visible to other components such as the response writer +req.setParams(p); + } 2) ... + // TODO: We have some constants in SolrQueryRequestBase, and some in CommonParams... ...i vote for pulling them out of SolrQueryRequestBase ... i would go so far as to recommend deprecating the named accessors like getQueryString, getLimit, and getStart -- the only params that should be treated special are "qt" and "wt". As for where they should live, ... I'm thinking CommonParams is headed the way of the Dodo, SolrParams seems like the right place to me. 3) in this method, would it be better to let getOriginalParams() return null? .. as is information (the lack of params when the request was constructed) can be lost if setParams is called more then once... + public void setParams(SolrParams params) { +if (this.origParams==null) this.origParams=params; +this.params = params; + } -Hoss
Re: spatial queries
: thanks for the answer, I am also interested in the jdbc connectivity. Sorry, i thought that was and "if not" clause on your question. I've heard of some attempts at extending Lucene's "Directory" with a RDBMS backed implimentation -- from what i'm told they tend to focus on modeling lucene files as rows, which isn't really what people tend to be looking for when they ask about keeping a lucene index in a database -- people typically want to be able to do lucene queries and do relational queries at the same time; or as i like to call it: "eat their cake and eat and eat an upside down cake that was made from the same two eggs". By which i mean that since Lucene is an inverted index it approaches data from an "upside down" perspective compared to the way an RDBMS application would -- so sharing a single view of the data doesn't seem like it would work very well. : I think my concern wit the rangeset query is that there wouldn't be an : index for the rangeset, and as such wouldn't scale. That sounds like a RDBMS way of thinking -- not a an inverted index way of thinking :) ... in Lucene, every indexed field has a DB like "index" on it that makes traversing the values in a range. Trust me: I have applications that do a *lot* of "range queries" in Solr (which FYI: Solr cleverly deals with using RangeFilter's) and the performance is fine -- especially if you use a lot of ranges frequently (ie: if you are commonly doing bounding boxes arround the coordinates of major cities) -Hoss