Ranking Question.

2007-03-08 Thread shai deljo

Hi,
Maybe a trivial/stupid questions but:
I have a fairly simple schema with a title, tags and description.
I have my own ranking/scoring system that takes into account the
similarity of each tag to a term in the query but now that i want to
include also the title and description (the description is somewhere
between short to a moderate length)  i am not sure how to handle this.
For example, would parsing the description and title before indexing
in SOLR and adding them as tags makes sense ? it sounds like that
would replicate a mechanism of stop words, stemming etc... built into
lucene.
My goal at the end is change as little as possible in the retrieval
process but then be able to rank based the keywords extracted from the
entire document.
Any ideas / directions ?
Thanks
Shai


Re: Solr on Tomcat 6.0.10?

2007-03-08 Thread galo

I'm using 6.0.9 and no issues (fingers crossed)

Walter Underwood wrote:

Is anyone running Solr on Tomcat 6.0.10? Any issues?
I searched the archives and didn't see anything.

wunder
  



--
Galo Navarro, Developer

[EMAIL PROTECTED]
t. +44 (0)20 7780 7080

Last.fm | http://www.last.fm
Karen House 1-11 Baches Street
London N1 6DL 


http://www.last.fm/user/galeote



Re: [2] SQL Update

2007-03-08 Thread Debra

I could create a list of field name + type, but doing so I might as well
create it and add it to fields in schema.xml.
Does solr reread the schema file when I post an add action or only on starup
(or someother point)?

In general, I wonder if adding the suffix for dynamic fields is not posing
some usability tradeoff.
I think, For a user (not a programmer) it's not intuitive to think of id as
an integer and therefore enter id_i when searching,
what do you think?



Chris Hostetter wrote:
 
 
 and i suppose you could make
 a customized ResponseWriter that when writing out documents striped off
 any suffixes it could tell came from dynamicFields so the response docs
 contained str name=user and int name=id ... but when parsing the
 query string your clients send, and they ask for user:42 how would the
 request handler know that it shoudl rewrite that to user_string:42 and not
 user_int:42 ?
 
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/SQL-Update-tf3358303.html#a9372391
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Re[2]: Solr and Multiple Index Partitions

2007-03-08 Thread Erik Hatcher


On Mar 7, 2007, at 9:20 PM, Jack L wrote:

Selecting by type will do the job. But I suppose it sacrifice
performance because having multiple document types in the same
index will render a larger index. Is it bad?


A many documents we talking here?

My hunch is you'll be fine :)

Erik



Re: Solr and Multiple Index Partitions

2007-03-08 Thread Chris Hostetter

: I use a custom Analyzer which extends Lucene's StandardAnalyzer. When I
: configured Solr to use this one, It throws an exception
: RuntimeException(Can't set positionIncrementGap on custom analyzer  +
: analyzer.getClass()).
:
: Do I need to extend a specific Analyzer for it to work with Solr?

you can use any Analyzer you want, but you can't configure a
positionIncrementGap in the schema.xml unless your Analyzer extends
SolrAnalyzer (the concept of a position increment gap is an inherient
property that Lucene Analyzers can specify, but configuring it explicitly
is a Solr concept)




-Hoss



Re: Solr and Multiple Index Partitions

2007-03-08 Thread Chris Hostetter

whoops .. forgot the documentaiton link...

http://wiki.apache.org/solr/SolrPlugins#head-9939da9abe85a79eb30a026e85cc4aec0beac10c

: you can use any Analyzer you want, but you can't configure a
: positionIncrementGap in the schema.xml unless your Analyzer extends
: SolrAnalyzer (the concept of a position increment gap is an inherient
: property that Lucene Analyzers can specify, but configuring it explicitly
: is a Solr concept)


-Hoss



Re: Solr and Multiple Index Partitions

2007-03-08 Thread Venkatesh Seetharam

Thanks Chris for a wonderful explanation. I completely get it now. Thanks
for the handy URL too.

Venkatesh

On 3/8/07, Chris Hostetter [EMAIL PROTECTED] wrote:



: I use a custom Analyzer which extends Lucene's StandardAnalyzer. When I
: configured Solr to use this one, It throws an exception
: RuntimeException(Can't set positionIncrementGap on custom analyzer  +
: analyzer.getClass()).
:
: Do I need to extend a specific Analyzer for it to work with Solr?

you can use any Analyzer you want, but you can't configure a
positionIncrementGap in the schema.xml unless your Analyzer extends
SolrAnalyzer (the concept of a position increment gap is an inherient
property that Lucene Analyzers can specify, but configuring it explicitly
is a Solr concept)




-Hoss




Re: HA and load balancing Qauestion

2007-03-08 Thread Yonik Seeley

On 3/8/07, Venkatesh Seetharam [EMAIL PROTECTED] wrote:

Howdy. I'd like to know if I can configure Multiple Solr instances working
with a single read-only index partition for failover/HA and load balancing
purposes. Or is there any other way that Solr has built-in features to
handle the same.


On the front-end, HTTP is easily load-balanced via software or
hardware loadbalancers.

To distribute a single index to multiple solr searchers, see
http://wiki.apache.org/solr/CollectionDistribution

You don't have to do it that way though... if you have another
mechanism to get the index to the searchers, that could work too.

-Yonik


Re: Solr on Tomcat 6.0.10?

2007-03-08 Thread Walter Underwood
Java 1.5.0_05 on Intel and PowerPC (IBM) plus any DST changes. --wunder

On 3/8/07 4:08 AM, James liu [EMAIL PROTECTED] wrote:

 today i use tomcat 6.0.10,,,but no time to search.
 
 tomorrow i will test it.
 
 which java version you use?
 
 2007/3/8, Walter Underwood [EMAIL PROTECTED]:
 
 Is anyone running Solr on Tomcat 6.0.10? Any issues?
 I searched the archives and didn't see anything.
 
 wunder
 --
 Walter Underwood
 Search Guru, Netflix



Re: HA and load balancing Qauestion

2007-03-08 Thread Venkatesh Seetharam

Thanks for the reply Yonik.

I'm not using HTTP and using a wrapper to wrap Solr for searching. I'm using
RPC to talk to multiple servers.

Can I point 2 Solr instances to the same index partition, having the same
path in SolrConfig? Is this safe or I need to make 2 copies of the same
index partition and point the Solr instances to these copies? Since my index
partition lives on a shared NetApp mount, I'd like to use the same index
partition for multiple Solr instances.

Thanks for any help,
Venkatesh

On 3/8/07, Yonik Seeley [EMAIL PROTECTED] wrote:


On 3/8/07, Venkatesh Seetharam [EMAIL PROTECTED] wrote:
 Howdy. I'd like to know if I can configure Multiple Solr instances
working
 with a single read-only index partition for failover/HA and load
balancing
 purposes. Or is there any other way that Solr has built-in features to
 handle the same.

On the front-end, HTTP is easily load-balanced via software or
hardware loadbalancers.

To distribute a single index to multiple solr searchers, see
http://wiki.apache.org/solr/CollectionDistribution

You don't have to do it that way though... if you have another
mechanism to get the index to the searchers, that could work too.

-Yonik



Re: [2] synonym filter fix

2007-03-08 Thread Mike Klaas

On 3/7/07, nick19701 [EMAIL PROTECTED] wrote:



thanks mike for your confirmation. it turns out to be tomcat's problem.
even though the new build was within tomcat's reach, it didn't use it.
After I deleted the folders under tomcat/webapps, the new build was picked
up immediately and everything works perfectly.


Great!  Thanks for your bug report.

-Mike


Re: [2] SQL Update

2007-03-08 Thread Mike Klaas

On 3/8/07, Debra [EMAIL PROTECTED] wrote:


I could create a list of field name + type, but doing so I might as well
create it and add it to fields in schema.xml.



Alternative solution: write a SQL schema - Solr schema mapper.
Should be relatively simple, as long as you are confining yourself to
flat tables.  Or, it could provide the mapping on the fly going into
and out of Solr.


In general, I wonder if adding the suffix for dynamic fields is not posing
some usability tradeoff.
I think, For a user (not a programmer) it's not intuitive to think of id as
an integer and therefore enter id_i when searching,
what do you think?


In my experience, it is very common for SQL schemata to include
suffices indicates the datatype of the field.

As we've discussed, Solr needs some way of distinguishing that a field
is a given type, so it is infeasible to simply drop the suffix.  If
you think it should go, there has to be some kind of alternatice
mechanism for recognizing dynamic field types.

cheers,
-Mike


Re: HA and load balancing Qauestion

2007-03-08 Thread Venkatesh Seetharam

Thanks Yonik and Chris for your confirmation. Chris, these are read-only
index partitions. I perform updates/deletions on a master index which will
be snapshotted at some fixed intervals. I'll look into the Collection
Distribution of Solr. Sounds very powerful.

I'm struck with Solr requiring an index directory under dataDir configured
in SolrConfig.  Why does it not take a complete path_to_index configured
under dataDir but append index? Is there anyway I can workaround this?

org.apache.solr.core.SolrCore:  this.index_path = dataDir + / + index;

Thanks,
Venkatesh

On 3/8/07, Chris Hostetter [EMAIL PROTECTED] wrote:



:  Can I point 2 Solr instances to the same index partition, having the
same
:  path in SolrConfig?
:
: Yes, that should work fine.

you might run into some weirdness if you send updates/delets to both
instances .. basically you'll want to configure all but one instace as a
slave and anytime you do a commit on the master you'll want to trigger
a commit on all of the slaves so that they reopen the index.

(just like using the snapshot scripts, except you don't need to snapshoot,
snappull, or snapinstall)



-Hoss




Re: HA and load balancing Qauestion

2007-03-08 Thread Venkatesh Seetharam

Thanks Hoss for the clarification. I think I can make a copy of the index
for searching and rename 'em. I think I can work around this one but good to
know the bigger picture.

Venkatesh

On 3/8/07, Chris Hostetter [EMAIL PROTECTED] wrote:



: I'm struck with Solr requiring an index directory under dataDir
configured
: in SolrConfig.  Why does it not take a complete path_to_index configured
: under dataDir but append index? Is there anyway I can workaround this?

i think at one time we were assuming there might be other types of data
you'd want Solr to store besides the index... the assumption is that you
tell Solr where you want it to keep all of it's data, and after that you
shouldn't care what lives in that directory.

if i remember correctly, the dataDir is alwo where the solr snapshots and
temp dirs get put ... if Solr let you configure the indexdir directly,
we'd need you to also configure those locations seperately -- except that
they have to be on the same physical disk for hardlinks to work, so it's
relaly just a lot simpler if you tell SOlr the dataDir and let it take
care of everything else.

(except in your case where you *want* to take care of it ... Solr wasn't
really designed for that case)



-Hoss




stack trace response

2007-03-08 Thread Koji Sekiguchi
Hello,

We have a front application server that uses Solr to search our contents.

If the front application calls Solr with /select but missing q parameter,
Solr returns a stack trace with its response body, while we expect
XML response with an error message ( the stack trace in the XML).
Is it a feature?

If so, is the front server responsible about checking required params
before requesting to Solr, correct?

I've found a similar issue in JIRA:

an empty query string in the admin interface throws an exception
http://issues.apache.org/jira/browse/SOLR-48

but the solution was that the front checked q parameter before sending.

We are hoping Solr returns a readable response produced by Response Writer
even if the front server sends wrong request to Solr,
but if it is a feature, we will validate params at the front.
We'd like to just confirm that.

Thank you,

Koji



RE: Hierarchical Facets

2007-03-08 Thread Chris Hostetter

: level1Dir1/level1
: level2Dir1/Subdir1/level2
: level3Dir1/Subdir1/SubSubDir1/level3

or something like...

 level1Dir1/level1
 level2Subdir1/level2
 level3SubSubDir1/level3

...but this is why Hierarchical facets are hard.

(it just occured to me that this is a differnet hiearchical facets thread
then the one i thought it was .. you may want to check the arcives for
some other recent discussion on this)


-Hoss



Re: Ranking Question.

2007-03-08 Thread Chris Hostetter

you need to elaborate a little more on what yo uare currently doing, and
what you want to be doing... youmention my own ranking/scoring system
... is this something you've implemented in code already? Is this a custom
Simalrity class or Query class, or something basic htat you've done with a
custom request hadler?

how do you want matches on the title/description to affect things? should
htey contribute to hte score (ie: influence ordering) or just affect
wether or not a document is included i nthe results set?

when you say change as little as possible in the retrieval process are
you refering to some existing process you've implemented, or hte default
logic of the StandardRequestHandler?


: I have a fairly simple schema with a title, tags and description.
: I have my own ranking/scoring system that takes into account the
: similarity of each tag to a term in the query but now that i want to
: include also the title and description (the description is somewhere
: between short to a moderate length)  i am not sure how to handle this.
: For example, would parsing the description and title before indexing
: in SOLR and adding them as tags makes sense ? it sounds like that
: would replicate a mechanism of stop words, stemming etc... built into
: lucene.
: My goal at the end is change as little as possible in the retrieval
: process but then be able to rank based the keywords extracted from the
: entire document.




-Hoss



Re: stack trace response

2007-03-08 Thread Ryan McKinley

I agree, the display is a bit weird.  But, if you check the response
headers it the response code is 400 Bad Request

In firefox or IE, you would need to inspect the headers to see what is going on.

The issue is that /select uses a servlet that writes out a stack trace
for every error it hits directly.  Is uses:

 try{response.setStatus(rc);} catch (Exception e) {}
 PrintWriter writer = response.getWriter();
 writer.write(msg);

Down the line, when we have
http://issues.apache.org/jira/browse/SOLR-141, this will be the best
option.

I'm lobbying to let the SolrDispatchFilter handle /select.  The
SolrDispatchFilter passes the error code and message on to the servlet
container so it is formatted in the most standard way.  It also only
includes a stack trace for 500 errors, not 400,403,etc.  It uses:

 res.sendError( code, ex.getMessage() );


ryan