[jira] Commented: (SOLR-43) query parameter overhaul

2006-08-23 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-43?page=comments#action_12430086 ] 

Yonik Seeley commented on SOLR-43:
--

Committed current version.

Left to do off the top of my head:
 - deprecate methods dealing with params in PluginUtils
 - change use of deprecated methods (including dismax handler)
 - dismax handler: were to get defaults from solrconfig.xml... the base level, 
or "defaults".  If the latter, provide some backward compat for existing 
configs?

Highlighter stuff:
 - allow specification of markup
 - allow fragsize per-field
 - keep in mind recent highlighter work going on in Lucene... we should try and 
specify what instead of how (not use exact class names, etc)
 - start using "hl" namespace for highlighter params... this is just a 
convention to help clarify the semantics of a parameter at a glance.
   - for consistency, should "highlight" => "hl", "highlightFields" => 
"hl.fields" or "hl.fl", "maxSnippets" => "hl.snippets"? 
Normally backward compatibility is very important for the external 
interfaces, *but* things will change while a feature is in development... every 
commit does not constitute a release.  Is highlighting new enough that we can 
change these parameters?  Is anyone using these parameters in production where 
it would be a burden if we changed these?

Examples of potential highlighter param names:
hl=true
hl.fl=name,title,body
hl.snippets=4
hl.fragsize=100
hl.formatter=simple
hl.simple.pre=  
hl.simple.post=

And per field params:
f.title.hl.fragsize=0  // overrides fragsize only for field 'title'



> query parameter overhaul
> 
>
> Key: SOLR-43
> URL: http://issues.apache.org/jira/browse/SOLR-43
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
> Attachments: solrparams.patch, solrparams.patch
>
>
> Goals:
> - per field parameters that fall back to global values
> - defaults in solrconfig.xml per request handler, overridable per
> This is desirable for highlighting additions: 
> http://issues.apache.org/jira/browse/SOLR-37 
> last email thread: 
> http://www.nabble.com/parameter-defaults-and-config-tf2020863.html#a5556298

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] Commented: (SOLR-43) query parameter overhaul

2006-08-23 Thread Yonik Seeley

JIRA didn't send my comments to the list for some reason (or I just
never received it).  I'll cc here:

Committed current version.

Left to do off the top of my head:
- deprecate methods dealing with params in PluginUtils
- change use of deprecated methods (including dismax handler)
- dismax handler: were to get defaults from solrconfig.xml... the
base level, or "defaults". If the latter, provide some backward compat
for existing configs?

Highlighter stuff:
- allow specification of markup
- allow fragsize per-field
- keep in mind recent highlighter work going on in Lucene... we
should try and specify what instead of how (not use exact class names,
etc)
- start using "hl" namespace for highlighter params... this is just a
convention to help clarify the semantics of a parameter at a glance.
  - for consistency, should "highlight" => "hl", "highlightFields" =>
"hl.fields" or "hl.fl", "maxSnippets" => "hl.snippets"?
   Normally backward compatibility is very important for the external
interfaces, *but* things will change while a feature is in
development... every commit does not constitute a release. Is
highlighting new enough that we can change these parameters? Is anyone
using these parameters in production where it would be a burden if we
changed these?

Examples of potential highlighter param names:
hl=true
hl.fl=name,title,body
hl.snippets=4
hl.fragsize=100
hl.formatter=simple
hl.simple.pre=
hl.simple.post=

And per field params:
f.title.hl.fragsize=0 // overrides fragsize only for field 'title'


Re: new wiki software

2006-08-23 Thread Chris Hostetter

: Check out Geronimo's new Wiki it only looks 10 times better than 
moin-moin.
:
: http://cwiki.apache.org/geronimo/

>From a look and feel persepctive i've been using the "classic" theme
for moin-moin -- it seems just as nice as the look/feel of the geronimo
wiki ... but that software (Confluence) does seem to have some nice
features...

Tree view of all pages...
http://cwiki.apache.org/confluence/pages/listpages-dirview.action?key=GMOxDOC11

Detailed Page Info...
http://cwiki.apache.org/confluence/pages/pageinfo.action?pageId=4902

-Hoss



Re: Re: Re: Solr and UIMA?

2006-08-23 Thread Bertrand Delacretaz

On 8/23/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

We (Solr devs) never discussed it... A quick gmail search shows it's
been brought up on the Lucene and Nutch lists


FYI, an UIMA proposal just landed at the incubator: http://tinyurl.com/m3taj

-Bertrand


new wiki software

2006-08-23 Thread Yonik Seeley

Check out Geronimo's new Wiki it only looks 10 times better than moin-moin.

http://cwiki.apache.org/geronimo/

-Yonik


Re: Re: Solr and UIMA?

2006-08-23 Thread Yonik Seeley

We (Solr devs) never discussed it... A quick gmail search shows it's
been brought up on the Lucene and Nutch lists.

-Yonik

On 8/23/06, Yoav Shapira <[EMAIL PROTECTED]> wrote:

Hi,
I thought we discussed this already, mostly concluding UIMA was an
IBM-proprietary bear that's not only far from a standard at this
point, but not that promising and therefore not worth pursuing.  But
it could be that we didn't actually have that discussion on this
mailing list: I may have had it in private with a couple of friends
who use Solr instead.  Does anyone else remember discussing it here,
perhaps among the committers before we had the public solr-dev mailing
list?

Yoav

On 8/23/06, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:
> On 8/23/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> > What exactly is the UIMA standard?   I didn't see a standard
> > mentioned at the UIMA site...
>
> I don't know if "standard" is the correct word, but [1] mentions an
> IBM product that "exposes the UIMA interfaces", so there must be an
> API of some kind. But it's not too easy to gather from that website,
> exactly what this API is ;-(
>
> From [2] it seems like one of the main goals is to allow analysis
> engines to be plugged in on the way to indexation, to add metadata to
> what they call "Common Analysis Structure" objects. That page also
> links to a (364 pages long...) SDK Users Guide and Reference, [3].
>
> -Bertrand
>
> [1] http://www.research.ibm.com/UIMA/
> [2] http://www.research.ibm.com/UIMA/UIMA%20Architecture%20Highlights.html
> [3] 
http://dl.alphaworks.ibm.com/technologies/uima/UIMA_SDK_Users_Guide_Reference.pdf
>


Re: [jira] Commented: (SOLR-43) query parameter overhaul

2006-08-23 Thread Yonik Seeley

On 8/23/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:

+  // TODO: We have some constants in SolrQueryRequestBase, and some in 
CommonParams...

...i vote for pulling them out of SolrQueryRequestBase ... i would go so
far as to recommend deprecating the named accessors like getQueryString,
getLimit, and getStart -- the only params that should be treated special
are "qt" and "wt".  As for where they should live, ... I'm thinking
CommonParams is headed the way of the Dodo, SolrParams seems like the
right place to me.


I think SolrParams is fine, I'll move them there unless someone has a
better idea.


3) in this method, would it be better to let getOriginalParams() return
null? .. as is information (the lack of params when the request was
constructed) can be lost if setParams is called more then once...

+  public void setParams(SolrParams params) {
+if (this.origParams==null) this.origParams=params;
+this.params = params;
+  }


The original lack of params wasn't meaningful before (see
LocalSolrQuery constructor which called the superclass constructor
first (as it must) and then created the parameters and called
setParams()).  I can refactor out the parameter construction into a
static method so it can be passed to the superclass constructor
though.

-Yonik


Re: [jira] Commented: (SOLR-43) query parameter overhaul

2006-08-23 Thread Yonik Seeley

Mike Klaas commented on SOLR-43:

 One thing I believe that was lost in this patch is static (source-level) 
defaults for parameters.  Presumably these would be defined using another level 
of defaul parameters which is a static member of CommonParams or somesuch?


I assume you mean the default for things like "start" or "fl" if no
handler defaults are defined?  That would be one way to handle it...
but most defaults are null anyway I think.  And it's not too bad just
having some defaults hardcoded like params.getInt(CommonParams.START,
0);

If we did this, one might want to merge the global defaults with the
handler defaults to eliminate the additional level of lookup.


Also, should SolrParams.parseBool() perhaps do a case-insensitive test?


I'm not opposed... is it needed though?

-Yonik


Re: Re: Solr and UIMA?

2006-08-23 Thread Yoav Shapira

Hi,
I thought we discussed this already, mostly concluding UIMA was an
IBM-proprietary bear that's not only far from a standard at this
point, but not that promising and therefore not worth pursuing.  But
it could be that we didn't actually have that discussion on this
mailing list: I may have had it in private with a couple of friends
who use Solr instead.  Does anyone else remember discussing it here,
perhaps among the committers before we had the public solr-dev mailing
list?

Yoav

On 8/23/06, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:

On 8/23/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> What exactly is the UIMA standard?   I didn't see a standard
> mentioned at the UIMA site...

I don't know if "standard" is the correct word, but [1] mentions an
IBM product that "exposes the UIMA interfaces", so there must be an
API of some kind. But it's not too easy to gather from that website,
exactly what this API is ;-(

From [2] it seems like one of the main goals is to allow analysis
engines to be plugged in on the way to indexation, to add metadata to
what they call "Common Analysis Structure" objects. That page also
links to a (364 pages long...) SDK Users Guide and Reference, [3].

-Bertrand

[1] http://www.research.ibm.com/UIMA/
[2] http://www.research.ibm.com/UIMA/UIMA%20Architecture%20Highlights.html
[3] 
http://dl.alphaworks.ibm.com/technologies/uima/UIMA_SDK_Users_Guide_Reference.pdf



Re: Re: Solr and UIMA?

2006-08-23 Thread Bertrand Delacretaz

On 8/23/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:

What exactly is the UIMA standard?   I didn't see a standard
mentioned at the UIMA site...


I don't know if "standard" is the correct word, but [1] mentions an
IBM product that "exposes the UIMA interfaces", so there must be an
API of some kind. But it's not too easy to gather from that website,
exactly what this API is ;-(


From [2] it seems like one of the main goals is to allow analysis

engines to be plugged in on the way to indexation, to add metadata to
what they call "Common Analysis Structure" objects. That page also
links to a (364 pages long...) SDK Users Guide and Reference, [3].

-Bertrand

[1] http://www.research.ibm.com/UIMA/
[2] http://www.research.ibm.com/UIMA/UIMA%20Architecture%20Highlights.html
[3] 
http://dl.alphaworks.ibm.com/technologies/uima/UIMA_SDK_Users_Guide_Reference.pdf


Re: Solr and UIMA?

2006-08-23 Thread Erik Hatcher
What exactly is the UIMA standard?   I didn't see a standard  
mentioned at the UIMA site.


Erik


On Aug 23, 2006, at 4:40 AM, Bertrand Delacretaz wrote:


Hi,

In the comments of my article at xml.com [1], someone's asking whether
Solr supports the upcoming UIMA standard [2].

I was going to answer "not at this time", but if someone has
additional information about UIMA in relation to Solr or Lucene, it is
welcome.

-Bertrand

[1] http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with- 
lucene-andrest.html?page=3


[2] http://www.research.ibm.com/UIMA/




Solr and UIMA?

2006-08-23 Thread Bertrand Delacretaz

Hi,

In the comments of my article at xml.com [1], someone's asking whether
Solr supports the upcoming UIMA standard [2].

I was going to answer "not at this time", but if someone has
additional information about UIMA in relation to Solr or Lucene, it is
welcome.

-Bertrand

[1] 
http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.html?page=3

[2] http://www.research.ibm.com/UIMA/


Re: making schema.xml nicer to read/use

2006-08-23 Thread Chris Hostetter

:  - if no factory can be found, an attempt will be made to construct
: one dynamically (easiest would be to create a generic factory that
: works via reflection).  People could use simple filters w/o creating a
: factory for it.

I think i mentioned this before ... my opinion depends on what the
performance impacts are -- if reflection costs are "high" because of class
resolution, but instantiation times are roughly the same, then i'm for it
because we can resolve the Class once at startup; but if the performance
differnece is still significant, i vote vote we force people who
want to mix and match custom Filters/Tokenizers to write Factories for
them -- it doesn't penalyze people who have custom Analyzers, those don't
require Factories, but if you want to mix and match you should be able to
whip up a two line factory ... hell, we can provide some code to do it
automatically (and run it on the Lucene jar everytime we update it)

what i'd really hate to see happen, is to need a FAQ
item about how "slow" Solr is at indexing docs and have the answer be:
"Don't rely on the built in reflection mechanism to build you analyzers,
create explicit Factories for each Tokenizer/Filter"  ... I'd hate for
Solr's to have reflection based Analyzer construction that winds up like
the Lucene "Hits" class -- overused and the source of countless complaints
about performance.

Ah yes, here's the old discussion...

http://www.nabble.com/foo-tf1737025.html#a4720545



-Hoss



Re: [jira] Commented: (SOLR-43) query parameter overhaul

2006-08-23 Thread Chris Hostetter

: Attached refresh.

I've been too busy the past two days to play with this -- or even read
it carefully, but here's some impressions as i skim it...


1) This looks like an idiom in the making ... it might be worth
while to go ahead and refactor into a static utility now
(ie: "SolrParams p = SolrParams.wrapDefaults(req,defaults)") ...

+  SolrParams p = req.getParams();
+  if (defaults != null) {
+p = new DefaultSolrParams(p,defaults);
+// set params so they will be visible to other components such as the 
response writer
+req.setParams(p);
+  }


2) ...

+  // TODO: We have some constants in SolrQueryRequestBase, and some in 
CommonParams...

...i vote for pulling them out of SolrQueryRequestBase ... i would go so
far as to recommend deprecating the named accessors like getQueryString,
getLimit, and getStart -- the only params that should be treated special
are "qt" and "wt".  As for where they should live, ... I'm thinking
CommonParams is headed the way of the Dodo, SolrParams seems like the
right place to me.


3) in this method, would it be better to let getOriginalParams() return
null? .. as is information (the lack of params when the request was
constructed) can be lost if setParams is called more then once...

+  public void setParams(SolrParams params) {
+if (this.origParams==null) this.origParams=params;
+this.params = params;
+  }






-Hoss



Re: spatial queries

2006-08-23 Thread Chris Hostetter

: thanks for the answer, I am also interested in the jdbc connectivity.

Sorry, i thought that was and "if not" clause on your question.

I've heard of some attempts at extending Lucene's "Directory" with a RDBMS
backed implimentation -- from what i'm told they tend to focus on modeling
lucene files as rows, which isn't really what people tend to be looking
for when they ask about keeping a lucene index in a database -- people
typically want to be able to do lucene queries and do relational queries
at the same time; or as i like to call it: "eat their cake and eat and
eat an upside down cake that was made from the same two eggs".  By which i
mean that since Lucene is an inverted index it approaches data from an
"upside down" perspective compared to the way an RDBMS application would
-- so sharing a single view of the data doesn't seem like it would work
very well.

: I think my concern wit the rangeset query is that there wouldn't be an
: index for the rangeset, and as such wouldn't scale.

That sounds like a RDBMS way of thinking -- not a an inverted index way of
thinking :) ... in Lucene, every indexed field has a DB like "index" on it
that makes traversing the values in a range.  Trust me: I have
applications that do a *lot* of "range queries" in Solr (which FYI: Solr
cleverly deals with using RangeFilter's) and the performance is fine --
especially if you use a lot of ranges frequently (ie: if you are commonly
doing bounding boxes arround the coordinates of major cities)



-Hoss